Preliminary break-down of Mokum posts per language: 10% English; 49% Russian; 8% Italian; 5% Turkish; 4% Farsi; 20% others and unrecognized. Update: disregard this, here are better numbers, w/o archive posts: 6% English, 38% Russian, 17% Italian, 11% Turkish, 10% Farsi.
20% [DATA EXPUNGED]. ‎· Дошутитель из Бомбея
I have a feeling the unrecognised type could be slang used by any of the language group... But it's a very interesting break-down ‎· Brixie
what did you use for language detection? ‎· былин
CLD, afaik (compact language detection from chromium project, same as in this post: ‎· в сгущающейся тьме
@brixie also there are some Ukrainians here (which is apparently badly guessed by CLD), and even some short Russian post are sometimes guessed as Bulgarian, Serbian or other Cyrillic-script language... ‎· в сгущающейся тьме
@brixie: I'll look at it closer, but it seems that it's just short texts with unusual words, such as "andrreas: Лиссабон -- вечер": first word is nickname, second one is word of foreign origin ("Lisbon"), and only the third word is proper Russian word. So, overall it detects it as Bulgarian. o_O Fortunately, for the task at hand we are not really interested in such posts. ‎· псы в рапиде
I am particularly curious about the "unrecognized" type. ‎· maitani
2.2.4 :003 > CLD.detect_language("GIAO") => {:name=>"ENGLISH", :code=>"en", :reliable=>true} ‎· псы в рапиде
%5 çok az ya....kişi olarak sayımız kaç sayın admin. ? ‎· zedka
@bylin: chrome language detector, as a Ruby gem. ‎· псы в рапиде
@squadette: he means, Turkish users rate is very low and asking how many turkish users are there? ‎· ustavecirak
8%?!? Italian posts only 8%?!? Very sad day... ‎· lui
@lui: most probably that's because of archives. Update: YES, please see updated numbers. ‎· псы в рапиде
@ustavecirak adam sana cevap vermedi....... ‎· zedka
@zedka: ben de sana. ‎· ustavecirak
Adam sana terbiyesizlik yaptı resmen @ustavecirak ‎· zedka
Ok @squadette, now it's better!!! ‎· lui
Howmany Turkish users are there. ?dedik..... ‎· zedka
giao belongs to us italians! ‎· inconsolabilmente Lucretia
Would it be possible to see the breakdown for more recent posts, say, written within last week? ‎· скулптрица
Impossible. ... ‎· zedka
@elenius: yes, I'll add that to the stats page. ‎· псы в рапиде
Imay oncernedcay atthay erethay is otnay ufficientsay epresentationray orfay igpay atinlay. ‎· Spidra Webster
@spidra: {:name=>"TAGALOG", :code=>"tl", :reliable=>false} ‎· псы в рапиде