TIOBE Programming Community Index Definition

Since there are many questions about the way the TIOBE index is assembled, a special page is devoted to its definition.

Programming Language

Before discussing how the ratings are calculated, first it needs to be clarified what counts as a programming language for the TIOBE index. There are 2 criteria that should both hold:

  • The language should have an own entry on Wikipedia and it should clearly state that it concerns a programming language. This is the reason why ColdFusion, (Ruby on) Rails, Excel, Cocoa, ASP and AJAX are not considered programming languages for the index.
  • The programming language should be Turing complete. As a consequence, HTML and XML are not considered programming languages. This also holds for data query language SQL. SQL is not a programming language because it is, for instance, impossible to write an infinite loop in it. On the other hand, SQL extensions PL/SQL and Transact-SQL are programming languages.

The following languages are tracked by the TIOBE index:

  • (Visual) Basic
  • (Visual) FoxPro
  • 4th Dimension/4D
  • ABAP
  • ABC
  • ActionScript
  • Ada
  • Agilent VEE
  • Algol
  • Alice
  • Angelscript
  • Apex
  • APL
  • AppleScript
  • Arc
  • AspectJ
  • Assembly
  • ATLAS
  • AutoIt
  • Automator
  • Avenue
  • Awk
  • Bash
  • bc
  • BCPL
  • BETA
  • BlitzMax
  • Boo
  • Bourne Shell
  • C
  • C Shell
  • C#
  • C++
  • C++/CLI
  • C-Omega
  • Caml
  • CFML
  • cg
  • Ch
  • CHILL
  • CIL
  • CL (OS/400)
  • Clarion
  • Clean
  • Clipper
  • Clojure
  • CLU
  • COBOL
  • Cobra
  • COMAL
  • cT
  • Curl
  • D
  • DCL
  • Delphi/Object Pascal
  • DiBOL
  • Dylan
  • E
  • EGL
  • Eiffel
  • Erlang
  • Etoys
  • Euphoria
  • EXEC
  • F#
  • Factor
  • Falcon
  • Fantom
  • Felix
  • Forth
  • Fortran
  • Fortress
  • Gambas
  • Go
  • Gosu
  • Groovy
  • Haskell
  • haXe
  • Heron
  • HPL
  • HyperTalk
  • Icon
  • IDL
  • Inform
  • Informix-4GL
  • INTERCAL
  • Io
  • Ioke
  • J
  • J#
  • JADE
  • Java
  • Java FX Script
  • JavaScript
  • JScript
  • JScript.NET
  • Korn Shell
  • LabVIEW
  • LabWindows/CVI
  • Ladder Logic
  • Lasso
  • Limbo
  • Lingo
  • Lisp
  • Logo
  • LotusScript
  • LPC
  • Lua
  • Lustre
  • M4
  • MAD
  • Magic
  • Magik
  • Malbolge
  • MANTIS
  • Maple
  • Mathematica
  • MATLAB
  • Max/MSP
  • MAXScript
  • MEL
  • Mercury
  • Miva
  • ML
  • Monkey
  • Modula-2
  • Modula-3
  • MOO
  • Moto
  • MS-DOS Batch
  • MUMPS
  • NATURAL
  • Nemerle
  • NQC
  • NSIS
  • NXT-G
  • Oberon
  • Object Rexx
  • Objective-C
  • OCaml
  • Occam
  • OpenCL
  • OpenEdge ABL
  • OPL
  • Oz
  • Paradox
  • Pascal
  • Perl
  • PHP
  • Pike
  • PILOT
  • PL/I
  • PL/SQL
  • Pliant
  • PostScript
  • POV-Ray
  • PowerBasic
  • PowerScript
  • PowerShell
  • Processing
  • Prolog
  • Python
  • Q
  • R
  • REALBasic
  • REBOL
  • Revolution
  • REXX
  • RPG (OS/400)
  • Ruby
  • Rust
  • S
  • S-PLUS
  • SAS
  • Sather
  • Scala
  • Scheme
  • Scratch
  • sed
  • Seed7
  • SIGNAL
  • Simula
  • Simulink
  • Slate
  • Smalltalk
  • Smarty
  • SPARK
  • SPSS
  • SQR
  • Squeak
  • Squirrel
  • Standard ML
  • Suneido
  • SuperCollider
  • TACL
  • Tcl
  • Tex
  • thinBasic
  • TOM
  • Transact-SQL
  • Vala/Genie
  • VBScript
  • Verilog
  • VHDL
  • Visual Basic .NET
  • Whitespace
  • X10
  • xBase
  • XBase++
  • Xen
  • XPL
  • XSLT
  • yacc
  • Yorick
  • Z shell

Ratings

The ratings are calculated by counting hits of the most popular search engines. The search query that is used is

+"<language> programming"

This search query is executed for the top 9 websites of Alexa that meet the following conditions:

  • The entry page of the site contains a search facility
  • The result of querying the site contains an indication of the number of page hits
Based on these criteria currently the following search engines are used:
  • Google: 30%
  • Blogger: 30%
  • Wikipedia: 15%
  • YouTube: 9%
  • Baidu: 6%
  • Yahoo!: 3%
  • Bing: 3%
  • Amazon: 3%

The number of hits determines the ratings of a language. The counted hits are normalized for each search engine for the first 50 languages. In other words, the first 50 languages together have a score of 100%. Let's define "hits50(SE)" as the sum of the number of hits for the first 50 languages for search engine SE and "hits(PL,SE)" as the number of hits for programming language PL for search engine SE. Possible false positives for a query are already filtered out in the definition of "hits(PL,SE)". This is done by using a manually determined confidence factor per query. A query such as "Basic programming" also returns pages that contain "Improve your basic programming skills in Java". The first 100 pages per search engine are checked for possible false positives and this is used to define the confidence factor. If this factor is 90%, then only 90% of the hits are used for "hits(PL,SE)". An overview of the confidence factor can be found in the groupings table below.

The ratings are calculated with the following formula:

((hits(PL,SE1)/hits50(SE1) + ... + hits(PL,SEn)/hits50(SEn))/n

where n is the number of search engines used.

Status

Besides the rating of programming languages, there is also a status indicated in the TIOBE chart. Programming languages that have status "A" are considered to be mainstream languages. Status "A-" and "A--" indicate that a programming language is between status "A" and "B". If a programming language has a rating that is higher than 0.7% (yes, this number is arguable but we had to fix it somewhere) for at least 3 months it is rewarded status "A". The first two months the programming language will receive status "A--" and "A-" respectively. The opposite holds for languages that go from status "A" to status "B". So if a language had status "A" 2 months ago, a rating of "0.607%" last month and a rating of "0.687%" now, it will have status "A--".

From a supportability point of view, it is strongly advised to stick to mainstream languages for industrial, mission-critical software systems. This is for three reasons:

  • The pool of skilled engineers is much smaller for non-mainstream languages
  • Tool vendors do not write and maintain tools for non-mainstream languages
  • In general fewer libraries are available for non-mainstream languages
It is important to note that this is only one of many criteria to be used before taking a decision to adopt a language. Other criteria are: suitability for the application domain, reliability of compilers, expression power, performance, and scalability. Hence, Ada can still be used for mission-critical systems although one should consider alternatives. This is what you also see in daily practice: Ada is hardly used for new mission-critical systems anymore. The other way around is also true. Everybody will agree that it is not wise to program missile software in JavaScript.

Groupings and Exceptions

Programming languages that are very similar are grouped together. Currently the maximum of the hits of the individual languages is taken into account when calculating the ratings of groupings. In the future we will do a better job and take the union (from mathematical set theory) of all the hits.

The definition of what languages are grouped has been formalized according to the following rules:

  • If a language has its own Wikipedia entry it will not be grouped with another language
  • If a language A automatically redirects to another Wikipedia entry B, A will be grouped together with B.
  • If a language A has no separate Wikipedia entry but is mentioned as part of another Wikipedia entry B, A will be grouped together with B.

In order to filter out false positives, two mechanisms are used. First of all a confidence is defined for a language. By default the confidence is 100%, but for some difficult search queries such as "Basic Programming", the confidence will be lower. Apart from the confidence, sometimes also exceptions or mandatory additions are used to weed out false positives.

The following table contains the definition of all groupings, confidences and exceptions.

Name Confidence Exception/Grouping
ABC   Exception: tv, channel
ActionScript   Grouping: ActionScript, AS1, AS2, AS3
Alice 90%  
ATLAS   Grouping: ATLAS, C/ATLAS
Awk   Grouping: awk, gawk, mawk, nawk
BETA 70%  
BlitzMax   Grouping: BlitzMax, BlitzBasic, Blitz Basic
Bourne shell   Grouping: Bourne shell, sh
C shell   Grouping: csh, C shell
C#   Grouping: C#, C-Sharp, C Sharp, CSharp, CSharp.NET, C#.NET
cg   Exception: computer game
CH   Addition: ChScite
CL (OS/400)   Exception: Lisp
Grouping: CL, CLLE
Cobra   Exception: interface
D 90% Exception: 3-D Programming, DTrace
Delphi/Object Pascal   Grouping: Delphi, Delphi.NET, Object Pascal
DiBOL   Grouping: DBL, DIBOL, Synergy/DE
F#   Grouping: F#, F-Sharp, F Sharp, FSharp
Go   Addition: Google
Groovy   Grouping: Groovy, GPATH, GSQL, Groovy++
Icon 90%  
IDL   Exception: corba, interface
Lisp   Grouping: Lisp, Elisp
Logo 96%  
MAD 50%  
Objective-C   Grouping: Objective-C, objc, Obj-C
OCaml   Grouping: Objective Caml, OCaml
OpenEdge ABL   Grouping: Progress, Progress 4GL, ABL, Advanced Business Language, OpenEdge
PL/I   Grouping: PL/1, PL/I
Processing   Addition: Sketchbook
R   Addition: statistical
RPG 80% Exception: role
Grouping: RPG, ILERPG, RPGIV, RPGIII, RPGLE, RPG400, RPGII, RPG4
S   Addition: statistical
S-PLUS   Addition: statistical
Scheme   Exception: tv, channel
Standard ML   Grouping: Standard ML, SML
T-SQL   Grouping: T-SQL, Transact-SQL, TSQL
Tcl/Tk   Grouping: Tcl/Tk, Tcl
Tom 50%  
(Visual) Basic 85% Grouping: Basic, VB
Visual Basic .NET   Grouping: Visual Basic .NET, Visual Basic.NET, VB.NET
(Visual) FoxPro   Grouping: FoxPro, Fox Pro, VFP
Z shell   Grouping: Z shell, zsh

Artifacts or ideas on improving the calculation of the TIOBE index will be received with gratitude (tpci@tiobe.com).