TIOBE Programming Community Index Definition

Since there are many questions about the way the TIOBE index is assembled, a special page is devoted to its definition.

Ratings

The ratings are calculated by counting hits of the most popular search engines. The search query that is used is

+"<language> programming"

The search query is executed for the regular Google, Google Blogs, MSN, Yahoo!, and YouTube web search for the last 12 months. The web site Alexa.com has been used to determine the most popular search engines.

The number of hits determine the ratings of a language. The counted hits are normalized for each search engine for the first 50 languages. In other words, the first 50 languages together have a score of 100%. Let's define "hits50(SE)" as the sum of the number of hits for the first 50 languages for search engine SE and "hits(PL,SE)" as the number of hits for programming language PL for search engine SE. Possible false positives for a query are already filtered out in the definition of "hits(PL,SE)". This is done by using a manually determined confidence factor per query. A query such as "Basic programming" also returns pages that contain "Improve your basic programming skills in Java". The first 100 pages per search engine are checked for possible false positives and this is used to define the confidence factor. If this factor is 90%, then only 90% of the hits are used for "hits(PL,SE)". An overview of the confidence factor can be found in the groupings table below.

The ratings are calculated with the following formula:

((hits(PL,SE1)/hits50(SE1) + ... + hits(PL,SEn)/hits50(SEn))/n

where n is the number of search engines used. YouTube only counts for 7%, the other search engines 23% for each.

Status

Besides the rating of programming languages, there is also a status indicated in the TIOBE chart. Programming languages that have status "A" are considered to be mainstream languages. Status "A-" and "A--" indicate that a programming language is between status "A" and "B". If a programming language has a rating that is higher than 0.7% (yes, this number is arguable but we had to fix it somewhere) for at least 3 months it is rewarded status "A". The first two months the programming language will receive status "A--" and "A-" respectively. The opposite holds for languages that go from status "A" to status "B". So if a language had status "A" 2 months ago, a rating of "0.607%" last month and a rating of "0.687%" now, it will have status "A--".

From a supportability point of view, it is strongly advised to stick to mainstream languages for industrial, mission-critical software systems. This is for three reasons:

  • The pool of skilled engineers is much smaller for non-mainstream languages
  • Tool vendors do not write and maintain tools for non-mainstream languages
  • In general fewer libraries are available for non-mainstream languages
It is important to note that this is only one of many criteria to be used before taking a decision to adopt a language. Other criteria are: suitability for the application domain, reliability of compilers, expression power, performance, and scalability. Hence, Ada can still be used for mission-critical systems although one should consider alternatives. This is what you also see in daily practice: Ada is hardly used for new mission-critical systems anymore. The other way around is also true. Everybody will agree that it is not wise to program missile software in JavaScript.

Groupings and Exceptions

Programming languages that are very similar are grouped together. Currently the maximum of the hits of the individual languages is taken into account when calculating the ratings of groupings. In the future we will do a better job and take the union (from mathematical set theory) of all the hits.

There is a lot of discussion about what languages should be grouped together. It is very hard to have a definition that can be applied to all situations, so we just made a choice we thought reasonable. If you disagree, please notify us. Keep in mind that you shouldn't submit grouping/degrouping proposals just to get a higher rating ("take C and C++ together") or ungroup languages for tracking a minor variant ("decouple Mono from C#.NET").

The following table contains the definition of all groupings and exceptions.

Name Confidence Exception/Grouping
ABC   Exception: "tv", "channel"
ActionScript   Grouping: ActionScript, AS1, AS2, AS3
ATLAS   Grouping: ATLAS, C/ATLAS
Awk   Grouping: awk, gawk, mawk, nawk
Bourne shell   Grouping: Bash, Bourne shell, sh, Almquist shell, ash, dash, ksh, zsh
C shell   Grouping: csh, tcsh, C shell
C#   Grouping: C#, C-Sharp, C Sharp, CSharp, CSharp.NET, C#.NET, C# 1.0, C# 2.0, C# 3.0
CFML   Grouping: CFML, CFScript
CL (OS/400)   Exception: Lisp
Caml   Grouping: Caml, OCaml, F#
D 90% Exception: "3-D Programming"
Delphi/Kylix   Grouping: Delphi, Kylix, Object Pascal, Free Pascal, Chrome, Oxygene
DBL   Grouping: DBL, DIBOL, Synergy/DE
Focus   Exception: "linux"
Groovy   Grouping: Groovy, GPATH, GSQL
IDL   Exception: "corba"
JavaScript   Grouping: JavaScript, JScript, ECMAScript
Lisp/Scheme   Exception: "tv", "channel", Grouping: Lisp, Scheme, Allegro CL, Elisp, Guile
ML   Grouping: ML, SML
Objective-C   Grouping: Objective-C, objc, Obj-C
Perl   Grouping: Perl, Pugs, PGE, rakudo
PL/I   Grouping: PL/1, PL/I
PowerBuilder   Grouping: PowerBuilder, PowerScript
Python   Grouping: Python, Jython, IronPython, pypy
R   Addition: "statistical"
Ruby   Grouping: Ruby, JRuby, MetaRuby, Rubinius, YARV, Ruby.NET, IronRuby
Smalltalk   Grouping: Smalltalk, Squeak
T-SQL   Grouping: T-SQL, Transact-SQL
Tcl/Tk   Grouping: Tcl/Tk, Tcl, Tk
Visual Basic 85% Grouping: Basic, VB.NET, Visual Basic.NET, Visual Basic .NET, Visual Basic 2005, VB 2005, Visual Basic 2003, VB 2003, Visual Basic 2002, VB 2002, VB, VB9, Visual Basic 9.0, Visual Basic 2008, VB6
xBase/FoxPro   Grouping: FoxPro, Fox Pro, VFP, VFP6, FoxPro 6, VFP8, FoxPro 8, VFP9, FoxPro 9, dBase, dBaseIII, dBaseIV, dBaseV, Clipper, Flagship, QuickSilver, Recital, xBase++, xHarbour, Harbour, HMG

Artifacts or ideas on improving the calculation of the TIOBE index will be received with gratitude (tpci@tiobe.com).