TIOBE Programming Community Index DefinitionSince there are many questions about the way the TIOBE index is assembled, a special page is devoted to its definition. RatingsThe ratings are calculated by counting hits of the most popular search engines. The search query that is used is +"<language> programming" The search query is executed for the regular Google, Google Blogs, MSN, Yahoo!, and YouTube web search for the last 12 months. The web site Alexa.com has been used to determine the most popular search engines. The number of hits determine the ratings of a language. The counted hits are normalized for each search engine for the first 50 languages. In other words, the first 50 languages together have a score of 100%. Let's define "hits50(SE)" as the sum of the number of hits for the first 50 languages for search engine SE and "hits(PL,SE)" as the number of hits for programming language PL for search engine SE. Possible false positives for a query are already filtered out in the definition of "hits(PL,SE)". This is done by using a manually determined confidence factor per query. A query such as "Basic programming" also returns pages that contain "Improve your basic programming skills in Java". The first 100 pages per search engine are checked for possible false positives and this is used to define the confidence factor. If this factor is 90%, then only 90% of the hits are used for "hits(PL,SE)". An overview of the confidence factor can be found in the groupings table below. The ratings are calculated with the following formula: ((hits(PL,SE1)/hits50(SE1) + ... + hits(PL,SEn)/hits50(SEn))/n where n is the number of search engines used. YouTube only counts for 7%, the other search engines 23% for each. StatusBesides the rating of programming languages, there is also a status indicated in the TIOBE chart. Programming languages that have status "A" are considered to be mainstream languages. Status "A-" and "A--" indicate that a programming language is between status "A" and "B". If a programming language has a rating that is higher than 0.7% (yes, this number is arguable but we had to fix it somewhere) for at least 3 months it is rewarded status "A". The first two months the programming language will receive status "A--" and "A-" respectively. The opposite holds for languages that go from status "A" to status "B". So if a language had status "A" 2 months ago, a rating of "0.607%" last month and a rating of "0.687%" now, it will have status "A--". From a supportability point of view, it is strongly advised to stick to mainstream languages for industrial, mission-critical software systems. This is for three reasons:
Groupings and ExceptionsProgramming languages that are very similar are grouped together. Currently the maximum of the hits of the individual languages is taken into account when calculating the ratings of groupings. In the future we will do a better job and take the union (from mathematical set theory) of all the hits. There is a lot of discussion about what languages should be grouped together. It is very hard to have a definition that can be applied to all situations, so we just made a choice we thought reasonable. If you disagree, please notify us. Keep in mind that you shouldn't submit grouping/degrouping proposals just to get a higher rating ("take C and C++ together") or ungroup languages for tracking a minor variant ("decouple Mono from C#.NET"). The following table contains the definition of all groupings and exceptions.
Artifacts or ideas on improving the calculation of the TIOBE index will be received with gratitude (tpci@tiobe.com). |