TIOBE Programming Community Index Definition

Since there are many questions about the way the TIOBE index is assembled, a special page is devoted to its definition. Basically the calculation comes down to counting hits for the search query

+"<language> programming"

In the next few sections it is explained what search engines qualify, what programming languages qualify and how the ratings are exactly calculated.

Search Engines

There are 25 search engines that are used to calculate the TIOBE index. The selected search engines are the 25 highest ranked websites of Alexa that meet the following conditions:

  • The entry page of the site contains a search facility
  • The result of querying the site contains an indication of the number of page hits
  • The results should be available in HTML with clear tags
  • Search engines in languages with special characters should be encoded properly
  • The search engine should at least return 1 hit for 1 query
  • The results of querying the site shouldn't contain too many outliers
  • Porn sites are excluded

Based on these criteria the following search engines are qualified:

  • Google.com: 7.69%
  • Youtube.com: 7.38%
  • Baidu.com: 7.08%
  • Yahoo.com: 6.77%
  • Wikipedia.org: 6.46%
  • Amazon.com: 6.15%
  • Qq.com: 5.85%
  • Google.co.in: 5.54%
  • Bing.com: 5.23%
  • Google.co.jp: 4.92%
  • Msn.com: 4.62%
  • Google.de: 4.31%
  • Hao123.com: 4.00%
  • Ebay.com: 3.69%
  • Google.co.uk: 3.38%
  • Amazon.co.jp: 3.08%
  • Google.com.br: 2.77%
  • Google.fr: 2.46%
  • Google.it: 2.15%
  • Google.es: 1.85%
  • Google.com.mx: 1.54%
  • Google.ca: 1.23%
  • Amazon.de: 0.92%
  • Google.pl: 0.62%
  • Google.co.id: 0.31%

The following search engines didn't qualify for the indicated reason:

  • 163.com: NO_SEARCH_FIELD
  • 360.cn: SITE_TIMED_OUT
  • About.com: SOURCES_NOT_PARSABLE
  • Adf.ly: NO_SEARCH_FIELD
  • Adnetworkperformance.com: NO_SEARCH_FIELD
  • Adobe.com: NOT_ALLOWED_TO_ACCESS
  • Akamaihd.net: NO_WEBSITE
  • Alibaba.com: NO_RESULTS_AT_ALL
  • Aliexpress.com: NO_SEARCH_FIELD
  • Alipay.com: NO_SEARCH_FIELD
  • Apple.com: NO_RESULTS_AT_ALL
  • Ask.com: NO_COUNTERS
  • Bbc.co.uk: NO_COUNTERS
  • Blogger.com: NO_SEARCH_FIELD
  • Blogspot.com: NO_SEARCH_FIELD
  • Bongacams.com: PORN_SITE
  • Booking.com: NO_SEARCH_FIELD
  • Chinadaily.com.cn: SOURCES_NOT_PARSABLE
  • Cnn.com: SOURCES_NOT_PARSABLE
  • Cntv.cn: SOURCES_NOT_PARSABLE
  • Craigslist.org: NO_SEARCH_FIELD
  • Diply.com: NO_SEARCH_FIELD
  • Dropbox.com: NO_SEARCH_FIELD
  • Ebay.co.uk: SOURCES_NOT_PARSABLE
  • Ebay.de: SOURCES_NOT_PARSABLE
  • Espn.go.com: ENCODING_PROBLEM
  • Facebook.com: NO_SEARCH_FIELD
  • Fc2.com: NO_SEARCH_FIELD
  • Flickr.com: NO_COUNTERS
  • Flipkart.com: SOURCES_NOT_PARSABLE
  • Github.com: NO_RESULTS_AT_ALL
  • Gmw.cn: NO_SEARCH_FIELD
  • Go.com: NO_SEARCH_FIELD
  • Godaddy.com: NO_SEARCH_FIELD
  • Google.co.kr: SOURCES_NOT_PARSABLE
  • Google.com.hk: SOURCES_NOT_PARSABLE
  • Google.com.tr: SOURCES_NOT_PARSABLE
  • Google.com.tw: SOURCES_NOT_PARSABLE
  • Google.ru: SOURCES_NOT_PARSABLE
  • Googleusercontent.com: NO_WEBSITE
  • Huffingtonpost.com: NOT_ALLOWED_TO_ACCESS
  • Imdb.com: NO_COUNTERS
  • Imgur.com: NO_COUNTERS
  • Indiatimes.com: SOURCES_NOT_PARSABLE
  • Instagram.com: NO_SEARCH_FIELD
  • Jd.com: NO_RESULTS_AT_ALL
  • Kat.cr: SOURCES_NOT_PARSABLE
  • Linkedin.com: SOURCES_NOT_PARSABLE
  • Live.com: NO_SEARCH_FIELD
  • Mail.ru: NO_COUNTERS
  • Microsoft.com: NO_COUNTERS
  • Naver.com: SOURCES_NOT_PARSABLE
  • Netflix.com: NO_SEARCH_FIELD
  • Nicovideo.jp: NO_RESULTS_AT_ALL
  • Office.com: NO_SEARCH_FIELD
  • Ok.ru: NO_SEARCH_FIELD
  • Onclickads.net: NO_SEARCH_FIELD
  • Outbrain.com: NO_SEARCH_FIELD
  • Paypal.com: NO_SEARCH_FIELD
  • Pinterest.com: NO_SEARCH_FIELD
  • Pixnet.net: SOURCES_NOT_PARSABLE
  • Popads.net: NO_SEARCH_FIELD
  • Pornhub.com: PORN_SITE
  • Rakuten.co.jp: SOURCES_NOT_PARSABLE
  • Reddit.com: NO_COUNTERS
  • Sina.com.cn: NO_COUNTERS
  • Sohu.com: NO_SEARCH_FIELD
  • Soso.com: ENCODING_PROBLEM
  • Stackoverflow.com: SOURCES_NOT_PARSABLE
  • T.co: NO_SEARCH_FIELD
  • Taobao.com: SOURCES_NOT_PARSABLE
  • Terraclicks.com: NO_SEARCH_FIELD
  • Tianya.cn: ENCODING_PROBLEM
  • Tmall.com: ENCODING_PROBLEM
  • Tumblr.com: NO_COUNTERS
  • Twitter.com: NO_SEARCH_FIELD
  • Vk.com: NO_SEARCH_FIELD
  • Walmart.com: NO_RESULTS_AT_ALL
  • Weibo.com: NO_SEARCH_FIELD
  • Whatsapp.com: NO_SEARCH_FIELD
  • Wordpress.com: NO_SEARCH_FIELD
  • Wordpress.org: SOURCES_NOT_PARSABLE
  • Xhamster.com: PORN_SITE
  • Xinhuanet.com: NO_SEARCH_FIELD
  • Xvideos.com: PORN_SITE
  • Yahoo.co.jp: SOURCES_NOT_PARSABLE
  • Yandex.ru: NO_COUNTERS
  • Youku.com: NO_COUNTERS

Programming Language

In this section it is clarified what counts as a programming language for the TIOBE index. There are 3 requirements that should all hold:

  • The language should have an own entry on Wikipedia and Wikipedia should clearly state that it concerns a programming language. This is the reason why (Ruby on) Rails, Excel, Android, Boost, Cocoa, ASP and AJAX are not considered programming languages for the index.
  • The programming language should be Turing complete. As a consequence, HTML and XML are not considered programming languages. This also holds for data query language SQL. SQL is not a programming language because it is, for instance, impossible to write an infinite loop in it. On the other hand, SQL extensions PL/SQL and Transact-SQL are programming languages.
  • The programming language should have at least 5,000 hits for +"<language> programming" for Google.

Programming languages that are very similar are grouped together. Currently the maximum of the hits of the individual languages is taken into account when calculating the ratings of groupings. In the future we will do a better job and take the union (from mathematical set theory) of all the hits.

The definition of what languages are grouped has been formalized according to the following rules:

  • If a language has its own Wikipedia entry it will not be grouped with another language.
  • If a language A automatically redirects to another Wikipedia entry B, A will be grouped together with B.
  • If a language A has no separate Wikipedia entry but is mentioned as part of another Wikipedia entry B, A will be grouped together with B.

In order to filter out false positives, two mechanisms are used. First of all a confidence is defined for a language. By default the confidence is 100%, but for some difficult search queries such as "Basic Programming", the confidence will be lower. Apart from the confidence, sometimes also exceptions or mandatory additions are used to weed out false positives.

The following table contains all programming languages tracked including its groupings, confidences and exceptions.

  • (Visual) FoxPro: FoxPro, Fox Pro, VFP
  • 4th Dimension/4D: 4D, 4th Dimension
  • ABAP
  • ABC: ABC (exceptions: -tv -channel)
  • ActionScript: ActionScript, AS1, AS2, AS3
  • Ada
  • Agilent VEE
  • Algol
  • Alice: Alice (confidence: 90%)
  • Angelscript
  • Apex
  • APL
  • Applescript
  • Arc
  • AspectJ
  • Assembly language: Assembly, Assembly language
  • ATLAS
  • AutoIt
  • AutoLISP
  • Automator
  • Avenue
  • Awk: Awk, Mawk, Gawk, Nawk
  • Bash
  • Basic: Basic (confidence: 0%)
  • BBC BASIC
  • bc
  • BCPL
  • BETA: BETA (confidence: 10%)
  • BlitzMax: BlitzMax, BlitzBasic, Blitz Basic
  • Boo
  • Bourne shell: Bourne shell, sh
  • C shell: Csh, C shell (confidence: 90%)
  • C#: C#, C-Sharp, C Sharp, CSharp, CSharp.NET, C#.NET
  • C++
  • C++/CLI
  • C-Omega
  • C: C (exceptions: -"Objective-C")
  • Caml
  • Ceylon
  • CFML: CFML, ColdFusion
  • cg: cg (confidence: 80%, exceptions: -"computer game" -"computer graphics")
  • Ch: Ch (exceptions: +ChScite)
  • CHILL
  • CIL
  • CL (OS/400): CL (exceptions: -Lisp), CLLE
  • Clarion
  • Clean: Clean (confidence: 43%)
  • Clipper
  • Clojure
  • CLU
  • COBOL
  • Cobra
  • CoffeeScript
  • COMAL
  • Common Lisp
  • cT
  • Curl
  • D: D (confidence: 90%, exceptions: -"3-D programming" -"DTrace"), dlang
  • Dart
  • DCL
  • Delphi/Object Pascal: Delphi, Delphi.NET, DwScript, Object Pascal, Pascal (confidence: 95%)
  • DiBOL: DBL, Synergy/DE, DIBOL
  • Dylan
  • E: E (exceptions: +specman)
  • ECMAScript
  • EGL
  • Eiffel
  • Elixir
  • Elm
  • Emacs Lisp: Emacs Lisp, Elips
  • Erlang
  • Etoys
  • Euphoria
  • EXEC
  • F#: F#, F-Sharp, FSharp, F Sharp
  • Factor
  • Falcon
  • Fantom
  • Felix: Felix (confidence: 86%)
  • Forth
  • Fortran
  • Fortress
  • Gambas
  • GNU Octave
  • Go: Go (exceptions: +Google), Golang
  • Gosu
  • Groovy: Groovy, GPATH, GSQL, Groovy++
  • Hack
  • Haskell
  • Haxe
  • Heron
  • HPL
  • HyperTalk
  • Icon: Icon (confidence: 90%)
  • IDL: IDL (exceptions: -corba -interface)
  • Inform
  • Informix-4GL
  • INTERCAL
  • Io
  • Ioke
  • J#
  • J: J (confidence: 50%)
  • JADE
  • Java
  • JavaFX Script
  • JavaScript: JavaScript, JS, SSJS
  • JScript
  • JScript.NET
  • Julia
  • Korn shell: Korn shell, ksh
  • Kotlin
  • LabVIEW
  • Ladder Logic
  • Lasso
  • Limbo
  • Lingo
  • Lisp
  • LiveCode: Revolution, LiveCode
  • Logo: Logo (confidence: 90%, exceptions: -tv)
  • LotusScript
  • LPC
  • Lua
  • Lustre
  • M4
  • MAD: MAD (confidence: 50%)
  • Magic: Magic (confidence: 50%)
  • Magik
  • Malbolge
  • MANTIS
  • Maple
  • Mathematica: Mathematica, Wolfram
  • MATLAB
  • Max/MSP
  • MAXScript
  • MDX
  • MEL
  • Mercury
  • Miva
  • ML
  • Modula-2
  • Modula-3
  • Monkey
  • MOO
  • Moto
  • MQL4: MQL4, MQL5
  • MS-DOS batch
  • MUMPS
  • NATURAL
  • Nemerle
  • NQC
  • NSIS
  • NXT-G
  • Oberon
  • Object Rexx
  • Objective-C: Objective-C, objc, obj-c
  • OCaml: Objective Caml, OCaml
  • Occam
  • OpenCL
  • OpenEdge ABL: Progress, Progress 4GL, ABL, Advanced Business Language, OpenEdge
  • OPL
  • Oxygene
  • Oz
  • Paradox
  • Pascal: Pascal (confidence: 5%)
  • Perl
  • PHP
  • Pike
  • PILOT: PILOT (confidence: 50%, exceptions: -"Palm Pilot programming")
  • PL/I: PL/1, PL/I
  • PL/SQL
  • Pliant
  • PostScript: PostScript, PS
  • POV-Ray
  • PowerBasic
  • PowerScript
  • PowerShell
  • Processing: Processing (exceptions: +"sketchbook")
  • Programming Without Coding Technology: Programming Without Coding Technology, PWCT
  • Prolog
  • Pure Data: Pure Data, PD
  • PureBasic
  • Python
  • Q
  • R: R (confidence: 90%, exceptions: +"statistical")
  • Racket
  • REBOL
  • REXX
  • RPG (OS/400): RPG (confidence: 80%, exceptions: -role), RPGLE, ILERPG, RPGIV, RPGIII, RPG400, RPGII, RPG4
  • Ruby
  • Rust
  • S-PLUS: S-PLUS (exceptions: +statistical)
  • S: S (exceptions: +statistical)
  • SAS
  • Sather
  • Scala
  • Scheme: Scheme (exceptions: -tv -channel)
  • Scratch
  • sed
  • Seed7
  • SIGNAL: SIGNAL (confidence: 10%)
  • Simula
  • Simulink
  • Slate: Slate (confidence: 57%)
  • Smalltalk
  • Smarty
  • SPARK
  • SPSS
  • SQR
  • Squeak
  • Squirrel
  • Standard ML: Standard ML, SML
  • Stata
  • Suneido
  • SuperCollider: SuperCollider (confidence: 80%)
  • Swift
  • TACL
  • Tcl: Tcl/Tk, Tcl
  • Tex
  • thinBasic
  • TOM: TOM (confidence: 50%)
  • Transact-SQL: T-SQL, Transact-SQL, TSQL
  • TypeScript
  • Vala/Genie: Vala, Genie
  • VBScript
  • Verilog
  • VHDL
  • Visual Basic .NET: Visual Basic .NET, VB.NET, Visual Basic.NET, Visual Basic (confidence: 50%), VB (confidence: 50%)
  • Visual Basic: Visual Basic (confidence: 50%), VB (confidence: 50%), VBA, VB6
  • WebDNA
  • Whitespace
  • X10
  • xBase
  • XBase++
  • Xen
  • Xojo: REALbasic, Xojo
  • XPL
  • XQuery
  • XSLT
  • Xtend
  • yacc
  • Yorick
  • Z shell: Z shell, zsh

Ratings

The ratings are calculated by counting hits of the most popular search engines. The search query that is used is

+"<language> programming"

The number of hits determines the ratings of a language. The counted hits are normalized for each search engine for all languages in the list. In other words, all languages together have a score of 100%. Let's define "hits(SE)" as the sum of the number of hits for all languages for search engine SE and "hits(PL,SE)" as the number of hits for programming language PL for search engine SE. Possible false positives for a query are already filtered out in the definition of "hits(PL,SE)". This is done by using a manually determined confidence factor per query. A query such as "Basic programming" also returns pages that contain "Improve your basic programming skills in Java". The first 100 pages per search engine are checked for possible false positives and this is used to define the confidence factor. If this factor is 90%, then only 90% of the hits are used for "hits(PL,SE)".

The ratings are calculated with the following formula:

((hits(PL,SE1)/hits(SE1) + ... + hits(PL,SEn)/hits(SEn))/n

where n is the number of search engines used.

Artifacts or ideas on improving the calculation of the TIOBE index will be received with gratitude (tpci@tiobe.com).