TIOBE Programming Community Index Definition
Since there are many questions about the way the TIOBE index is assembled, a special page is devoted
to its definition.
Programming Language
Before discussing how the ratings are calculated, first it needs to be clarified
what counts as a programming language for the TIOBE index. There are 2 criteria
that should both hold:
-
The language should have an own entry on Wikipedia and
it should clearly state that it concerns a programming language. This is the reason why (Ruby on) Rails, Excel, Android, Boost, Cocoa, ASP and AJAX are not considered programming languages for the index.
-
The programming language should be
Turing complete. As a consequence, HTML and XML are not considered
programming languages. This also holds for data query language SQL. SQL is not a
programming language because it is, for instance, impossible to write an infinite
loop in it. On the other hand, SQL extensions PL/SQL and Transact-SQL are
programming languages.
The following languages are tracked by the TIOBE index:
- (Visual) Basic
- (Visual) FoxPro
- 4th Dimension/4D
- ABAP
- ABC
- ActionScript
- Ada
- Agilent VEE
- Algol
- Alice
- Angelscript
- Apex
- APL
- AppleScript
- Arc
- AspectJ
- Assembly
- ATLAS
- AutoIt
- AutoLISP
- Automator
- Avenue
- Awk
- Bash
- bc
- BCPL
- BETA
- BlitzMax
- Boo
- Bourne Shell
- C
- C Shell
- C#
- C++
- C++/CLI
- C-Omega
- Caml
- CFML
- cg
- Ch
- CHILL
- CIL
- CL (OS/400)
- Clarion
- Clean
- Clipper
- Clojure
- CLU
- COBOL
- Cobra
- CoffeeScript
- COMAL
- Common Lisp
- cT
- Curl
- D
- Dart
- DCL
- Delphi/Object Pascal
- DiBOL
- Dylan
- E
- ECMAScript
- EGL
- Eiffel
- Emacs Lisp
- Erlang
- Etoys
- Euphoria
- EXEC
- F#
- Factor
- Falcon
- Fantom
- Felix
- Forth
- Fortran
- Fortress
- Gambas
- GNU Octave
- Go
- Gosu
- Groovy
- Haskell
- Haxe
- Heron
- HPL
- HyperTalk
- Icon
- IDL
- Inform
- Informix-4GL
- INTERCAL
- Io
- Ioke
- J
- J#
- JADE
- Java
- Java FX Script
- JavaScript
- JScript
- JScript.NET
- Korn Shell
- LabVIEW
- Ladder Logic
- Lasso
- Limbo
- Lingo
- Lisp
- Logo
- LotusScript
- LPC
- Lua
- Lustre
- M4
- MAD
- Magic
- Magik
- Malbolge
- MANTIS
- Maple
- Mathematica
- MATLAB
- Max/MSP
- MAXScript
- MEL
- Mercury
- Miva
- ML
- Monkey
- Modula-2
- Modula-3
- MOO
- Moto
- MS-DOS Batch
- MUMPS
- NATURAL
- Nemerle
- NQC
- NSIS
- NXT-G
- Oberon
- Object Rexx
- Objective-C
- OCaml
- Occam
- OpenCL
- OpenEdge ABL
- OPL
- Oz
- Paradox
- Pascal
- Perl
- PHP
- Pike
- PILOT
- PL/I
- PL/SQL
- Pliant
- PostScript
- POV-Ray
- PowerBasic
- PowerScript
- PowerShell
- Processing
- Prolog
- Pure Data
- Python
- Q
- R
- Racket
- REALBasic
- REBOL
- Revolution
- REXX
- RPG (OS/400)
- Ruby
- Rust
- S
- S-PLUS
- SAS
- Sather
- Scala
- Scheme
- Scratch
- sed
- Seed7
- SIGNAL
- Simula
- Simulink
- Slate
- Smalltalk
- Smarty
- SPARK
- SPSS
- SQR
- Squeak
- Squirrel
- Standard ML
- Suneido
- SuperCollider
- TACL
- Tcl
- Tex
- thinBasic
- TOM
- Transact-SQL
- TypeScript
- Vala/Genie
- VBScript
- Verilog
- VHDL
- Visual Basic .NET
- WebDNA
- Whitespace
- X10
- xBase
- XBase++
- Xen
- XPL
- XSLT
- yacc
- Yorick
- Z shell
Ratings
The ratings are calculated by counting hits of the most popular search engines. The search query
that is used is
+"<language> programming"
This search query is executed for the top 9 websites of
Alexa that meet the
following conditions:
-
The entry page of the site contains a search facility
-
The result of querying the site contains an indication of the number of page hits
Based on these criteria currently the following search engines are used:
- Google: 30%
- Blogger: 30%
- Wikipedia: 15%
- YouTube: 9%
- Baidu: 6%
- Yahoo!: 3%
- Bing: 3%
- Amazon: 3%
The number of hits determines the ratings of a language. The counted hits are normalized for each search
engine for the first 50 languages. In other words, the first 50 languages together have a score of 100%.
Let's define "hits50(SE)" as the sum of the number of hits for the first 50 languages for search engine
SE and "hits(PL,SE)" as the number of hits for programming language PL for search engine SE. Possible false positives for a query are already filtered out in the definition of "hits(PL,SE)". This is done by using a manually determined confidence factor per query. A query such as "Basic programming" also returns pages that contain "Improve your basic programming skills in Java". The first 100 pages per search engine are checked for possible false positives and this is used to define the confidence factor. If this factor is 90%, then only 90% of the hits are used for "hits(PL,SE)". An overview of the confidence factor can be found in the groupings table below.
The ratings are calculated with the following formula:
((hits(PL,SE1)/hits50(SE1) + ... + hits(PL,SEn)/hits50(SEn))/n
where n is the number of search engines used.
Status
Besides the rating of programming languages, there is also a status indicated in the TIOBE chart.
Programming languages that have status "A" are considered to be mainstream languages. Status "A-" and
"A--" indicate that a programming language is between status "A" and "B". If a programming language
has a rating that is higher than 0.7% (yes, this number is arguable but we had to fix it
somewhere) for at least 3 months it is rewarded status "A". The first two months the programming
language will receive status "A--" and "A-" respectively. The opposite holds for languages that go from
status "A" to status "B". So if a language had status "A" 2 months ago, a rating of "0.607%" last month
and a rating of "0.687%" now, it will have status "A--".
From a supportability point of view, it is strongly advised to stick to mainstream languages
for industrial, mission-critical software systems. This is for three reasons:
-
The pool of skilled engineers is much smaller for non-mainstream languages
-
Tool vendors do not write and maintain tools for non-mainstream languages
-
In general fewer libraries are available for non-mainstream languages
It is important to note that this is only one of many criteria to be used before taking a decision to
adopt a language. Other criteria are: suitability for the application domain, reliability of compilers,
expression power, performance, and scalability. Hence, Ada can still be used for mission-critical
systems although one should consider alternatives. This is what you also see in daily practice: Ada is
hardly used for new mission-critical systems anymore. The other way around is also true. Everybody will
agree that it is not wise to program missile software in JavaScript.
Groupings and Exceptions
Programming languages that are very similar are grouped together. Currently the maximum of the hits
of the individual languages is taken into account when calculating the ratings of groupings. In the
future we will do a better job and take the union (from mathematical set theory) of all the hits.
The definition of what languages are grouped has been formalized according to the following
rules:
-
If a language has its own Wikipedia entry it will not be grouped with another language
-
If a language A automatically redirects to another Wikipedia entry B, A will be
grouped together with B.
-
If a language A has no separate Wikipedia entry but is mentioned as part of another
Wikipedia entry B, A will be grouped together with B.
In order to filter out false positives, two mechanisms are used. First of all
a confidence is defined for a language. By default the confidence is 100%, but for
some difficult search queries such as "Basic Programming", the confidence will be
lower. Apart from the confidence, sometimes also exceptions or mandatory additions
are used to weed out false positives.
The following table contains the definition of all groupings, confidences and
exceptions.
|
Name
|
Confidence
|
Exception/Grouping
|
|
ABC
|
|
Exception: tv, channel
|
|
ActionScript
|
|
Grouping: ActionScript, AS1, AS2, AS3
|
|
Alice
|
90%
|
|
|
ATLAS
|
|
Grouping: ATLAS, C/ATLAS
|
|
Awk
|
|
Grouping: awk, gawk, mawk, nawk
|
|
BETA
|
70%
|
|
|
BlitzMax
|
|
Grouping: BlitzMax, BlitzBasic, Blitz Basic
|
|
Bourne shell
|
|
Grouping: Bourne shell, sh
|
|
C
|
|
Exception: Objective-C
|
|
C shell
|
90%
|
Grouping: csh, C shell
|
|
C#
|
|
Grouping: C#, C-Sharp, C Sharp, CSharp, CSharp.NET, C#.NET
|
|
CFML
|
|
Grouping: CFML, ColdFusion
|
|
cg
|
80%
|
Exception: computer game
|
|
CH
|
|
Addition: ChScite
|
|
CL (OS/400)
|
|
Exception: Lisp
Grouping: CL, CLLE
|
|
Cobra
|
|
Exception: interface
|
|
D
|
90%
|
Exception: 3-D Programming, DTrace
|
|
Delphi/Object Pascal
|
|
Grouping: Delphi, Delphi.NET, Object Pascal
|
|
DiBOL
|
|
Grouping: DBL, DIBOL, Synergy/DE
|
|
Emacs Lisp
|
|
Grouping: Emacs Lisp, Elips
|
|
F#
|
|
Grouping: F#, F-Sharp, F Sharp, FSharp
|
|
Go
|
|
Grouping: Go (Addition: Google), golang
|
|
Groovy
|
|
Grouping: Groovy, GPATH, GSQL, Groovy++
|
|
Icon
|
90%
|
|
|
IDL
|
|
Exception: corba, interface
|
|
JavaScript
|
|
Grouping: JavaScript, JS, SSJS
|
|
Lisp
|
|
Grouping: Lisp, Elisp
|
|
Logo
|
96%
|
Exception: tv
|
|
MAD
|
50%
|
|
|
Objective-C
|
|
Grouping: Objective-C, objc, Obj-C
|
|
OCaml
|
|
Grouping: Objective Caml, OCaml
|
|
OpenEdge ABL
|
|
Grouping: Progress, Progress 4GL, ABL, Advanced Business Language, OpenEdge
|
|
PILOT
|
50%
|
|
|
PL/I
|
|
Grouping: PL/1, PL/I
|
|
PostScript
|
|
Grouping: PostScript, PS
|
|
Processing
|
|
Addition: Sketchbook
|
|
Pure Data
|
|
Grouping: Pure Data, PD
|
|
R
|
|
Addition: statistical
|
|
Revolution
|
|
Grouping: LiveCode, Revolution
|
|
RPG
|
80%
|
Exception: role
Grouping: RPG, ILERPG, RPGIV, RPGIII, RPGLE, RPG400, RPGII, RPG4
|
|
S
|
|
Addition: statistical
|
|
S-PLUS
|
|
Addition: statistical
|
|
Scheme
|
|
Exception: tv, channel
|
|
Standard ML
|
|
Grouping: Standard ML, SML
|
|
T-SQL
|
|
Grouping: T-SQL, Transact-SQL, TSQL
|
|
Tcl/Tk
|
|
Grouping: Tcl/Tk, Tcl
|
|
Tom
|
50%
|
|
|
(Visual) Basic
|
85%
|
Grouping: Basic, VB
|
|
Visual Basic .NET
|
|
Grouping: Visual Basic .NET, Visual Basic.NET, VB.NET
|
|
(Visual) FoxPro
|
|
Grouping: FoxPro, Fox Pro, VFP
|
|
Z shell
|
|
Grouping: Z shell, zsh
|
Artifacts or ideas on improving the calculation of the TIOBE index will be received with gratitude (tpci@tiobe.com).
|