Languages
We use statistical techniques to identify blog language. That means that our algorithm decides what language a blog is in by looking at the text content, and not at any language attributes in the markup. Weblogs with fewer than 500 bytes of text content are not included in this list.
How reliable is our algorithm? We're currently doing statistical sampling to find out, and will post results in this table when they are ready.
| Type | Count | |
|---|---|---|
| English | 1958443 | |
| Too_short | 284749 | |
| Catalan | 123320 | |
| French | 83950 | |
| Spanish | 80509 | |
| Portuguese | 71561 | |
| German | 35870 | |
| Italian | 26659 | |
| Chinese-big5 | 25123 | |
| Farsi | 19730 | |
| Chinese-gb2312 | 19324 | |
| Japanese | 18576 | |
| Dutch | 13133 | |
| Danish | 9870 | |
| Indonesian | 8831 | |
| Malay | 6658 | |
| Japanese-euc_jp | 5413 | |
| Swedish | 5267 | |
| Czech | 5089 | |
| Icelandic | 3776 | |
| Tagalog | 3608 | |
| Finnish | 3326 | |
| Turkish | 2817 | |
| Esperanto | 2803 | |
| Slovak-ascii | 2592 |
