Languages of Melbourne detected from Twitter


Inspired by the great maps from CASA showing the distribution of twitter languages for London and New York, I decided to take on a self development project to try and do the same for Melbourne and in the process learn some more about python, SQL and web development.

At the time of writing, the application shows geolocated tweets that have been collected from November 2012 to March 2013 around Melbourne.   The application allows you to switch between a map of languages (Top 5 languages) and a map of profanity.  Tweets are still being collected and the map caches will continue to be updated with new data every fortnight or so.

There are already some interesting stories in the map such as people tweeting from the airport runways; the lack of Greek being spoken on Twitter despite it being the 2nd most spoken language at home (Census 2011); big mix of languages at universities, the airport and in parks; people tweet a lot on trams & trains; the lack of Chinese languages; tweeters out at sea (only recording confident location matches); and a number of residential houses that like to swear a lot.

A total of 58 languages have been detected so far (map only shows the top 5 that have the most tweets against them).  For now, Malay is leading in the top 5 Languages spoken on Twitter (disregarding English) and the outer eastern suburbs are marginally winning in which suburb swears the most.

Feel free to use the comments section to ask any questions or make any observations/hypotheses of patterns within the data.  It will be interesting to see the data evolve over the next few months to see if the patterns change at all.  If anyone is interested to know more about my learning experiences with this project, I intend on doing a quick talk about it at the next Esri Australia developer meetup in Melbourne.  I will put my slides up as a separate blog post when that happens.

Click to open an interactive map.

Click to open an interactive map.

 

With webmaps getting all the limelilght these days, I thought I would put together a static map. Click for a HQ version.

32 thoughts on “Languages of Melbourne detected from Twitter

    1. Simon J Post author

      Thanks. Im making use of Googles Common Language Runtime library to detect the languages via a python script. I had to first strip out any hash tags and URLs to avoid confusing it. Feel free to head down to our developer meetup to find out more

      Esri Dev Meet III

      Thursday, May 23, 2013, 6:00 PM

      Location details are available to members only.

      16 Esri Devs Went

      Hi Esri Developers.It’s time for our next meetup. The format is as before: talks, beer and Pizza.Nice and casual.Please let me know if you have a 10 minute talk? (on any Esri related development topic – simple or complex, it does not matter).Would be great to have some non Esri Talks…..Let me knowJohnGeneral Run of things (10 or 15 Minutes…

      Check out this Meetup →

      Reply
  1. Bin

    Thank you for your work.

    First, the first interactive map doesn’t work on my end when I click it. It just shows a black webpage with a gear logo on the top right corner and five languages listed on the bottom left corner in grey colour.

    Second, as I see it, a lot of Greek speakers are old people in Melbourne (the younger generation mainly use English) who migrated here postwar, and they are not familiar with texting let alone updating on Twitter.

    Third, Twitter has been blocked in China since 2009 and a lot of Chinese even do not know this website. China launched its own Twitter-like service, called Weibo (literally means Micro Blogs) http://www.weibo.com , which claims having 300 millions users. Therefore, there is no point to update on Twitter in Chinese by overseas Chinese. If overseas Chinese would like to share their thoughts with their Chinese peers, they will choose Weibo or another popular mobile app, WeChat.

    BTW, Twitter is not easy to share pictures, videos and long contents, listen streaming radios and even play online games like Weibo does, so Twitter is not appealing for Chinese.

    Reply
    1. Simon J Post author

      Hi Steve, your link is according to the 2nd most language in Australia. For Melbourme 2nd = Greek.

      Reply
  2. Linux Circle

    No, its not Malay. It is Bahasa Indonesia, the Indonesian language which is the sister language of Malay. Your detector cant differentiate between the two. Indonesians is the number 2 twitter user in terms of number of users and number of tweets in the world

    Reply
    1. Gwyn

      Yeah that was my thought when I saw “Malay”, I thought, where’s Indonesian?? My guess is it’s a combination of the two because there are a fair few Malaysians in Melbourne as well, but we’re (Indonesians) definitely heavy tweeters. And tumblrs, and facebookers

      Reply
  3. Simon J Post author

    As part of my learning experience looks like im going to have to learn a bit more about dealing with unexpected load spikes. Just ironing out a few issues with our web server being a bit of a bottleneck. Should not take long to get it back up and running. IN the meantime, the static map can be found here:

    Ill do a follow up blog post on how I should have made use of Amazon auto scaling to accomodated this…

    Reply
  4. Emi

    As a Malay person. I find that interesting that our language is in the top 5. 🙂 I thought Mandarin would be higher due to the higher number of Chinese population.

    Reply
  5. Mat

    Emi, as has been noted earlier, there are other microblogging sites that are much more popular for Chinese speakers such as Sina Weibo (新浪微博). Of course one of the motivating factors here is that Twitter is blocked in China.

    Earlier commenters have also pointed out that it’s very difficult to differentiate between Malay and Indonesian. I performed a quick test using some Indonesian text and sure enough, Google’s API claimed it was Malaysian. So we should understand Malay/Indonesian as a combined group in this data.

    Reply
  6. Simon J Post author

    The Data in the map shows detected tweets from Nov-March. The below is the list from Nov – Today

    1. Malay
    2, Arabic
    3. Japanese
    4. Indonesian
    5. Spanish
    6. Tagalog
    7. Thai
    8. Turkish
    9. Italian
    10. French
    11. Korean
    12. Chinese

    Ill try and update the map with the new data at the weekend.
    Ill try and get a new cut of the data online to show this. The race for both 1st and 5th place is quite tight.

    Reply
  7. Mat

    Pretty sure you’ve just got Indonesian in a firm first place and the whole malay/indonesian split is an artefact of Google’s language recognition.

    Reply
  8. Cath

    Fascinating! Surely some of the users of certain migrant groups (possibly Greek) would be older now, and maybe not that into technology? They are into the 3rd gen now and their young people have undergone language shift to English. Just an idea?

    Reply
    1. Stella Lambrou

      I think this is the prime reason for not Twitting in Greek. On the other hand you can see a lot of Greek in Facebook, where there is a mixed usage Greek and English from young and old Greeks.
      Convenience is the 2nd reason I can think of. Καλό Πάσχα 🙂

      Reply
  9. Kevin Barry

    Simon, as a newcomer to GIS and ESRI can you tell me how/where you collected the data. I am an old f–t so can you use laymans terms please. I am at the bottom of the steep learning curve working for a council near Canberra.

    Reply
    1. Simon J Post author

      Welcome to the world of gis, it’s definitely an awesome industry to be in right now.
      I aim to do a follow up blog post on the ‘how’ but in a nutshell you can listen to tweets via the twitter streaming api, and then filter the tweets to the ones that have a location behind them and then persist these into a database.

      Reply
      1. Kevin Barry

        You are right that is awesome. I will google twitter streaming api and see what i can learn. You say you have access to all the ESRI tools, I bought the working from home licence and am using that to learn as much as possible since I went on the ESRI workbook courses. I cant wait to see your next blog. Sorry but this old f__t is getting excited.

    1. Simon J Post author

      Have not actively been monitoring it since I wrote this post. But that is a nice idea, collect the data again, and then cross compare where languages have shifted and where they have stayed the same.

      Reply

Got something to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s