That’s So Gay: Tracking twitter posts containing “gay”

I’m posting some documentation of a quick, one-day experiment to try and map out the location of specific tweets from Twitter. I’ve been playing with the Twitter Streaming API, in anticipation of a new project. Interest in this technology happened to be concurrent with President Obama’s recent endorsement of same-sex marriage (gobama!). As a result, I thought it might be interesting to filter the live twitter “stream” to look for only tweets containing the word “gay”. This introduced some fun/interesting challenges and taught me a few things about working with Twitter. I’m not really sure what I was looking for, to start. I wanted to begin by simply observing the frequency and sheer magnitude of tweets.

Some tech nonsense: I used the Twitter4j library to work with Twitter using Processing. My starting goal was to simply see if I could filter the tweets & grab the posted location from that Twitter user. Then I used a Yahoo service called PlaceFinder to disambiguate location names into lat/lon coordinates and map them. Because I did this in the morning/afternoon, most tweets were coming from North America. PlaceFinder is pretty forgiving so some of these may not be terribly accurate. For example, a user might put “Earth, dawg!!” as their location, and it would disambiguate this to Earth, Texas….

For the next step, I wanted to see if I could glean the tone of the text. A little internet sleuthing pointed me towards a form of computer science research called “Sentiment Analysis” a form of natural-language-processing which does exactly this: given a body of text, an algorithm assigns a “score” to how negative or positive the text is. (I love this kind of thing, by the way, see here). I used a service called AlchemyAPI to do this. I colorized the tweets from red to green based on the score of each tweet from negative to positive, respectively. I saw a lot of green, but ultimately this method didn’t work very well. I didn’t get the green-on-the-coasts-red-in-the-center result I expected. It could have been technical: somebody might feel positively about same-sex marriage and write “God, Obama.. Finally. How about instead of talking about it, you do something about it, like make it legal…”. The tone of this sentence is negative even though the message is positive. This is sort of confusing — I’m not sure how to work around it.


Ultimately I chose to make a time-lapse video of live tweets appearing over several hours. My hope was that if I recorded for enough time, you would be able to see the time-zone change taking place as active internet users shifted to the left. I output an image once every second to make the video below, but I only got two hours worth of data. Even though Yahoo PlaceFinder doesn’t have a formal rate-limit (the amount of times you can make requests per day), I guess I made so many that I got mysterious “HTTP Response Code 999″ errors from Yahoo. Looks like they denied me because I was using it too much! They probably though I was ANONYMOUS, trying to shut down Yahoo.

I admit, it’s not much to look at. One can barely make out the United States and Western Europe based on the tweets… In the end it’s weak as a data visualization because it doesn’t reveal anything new: Yes people are using the word “Gay” in their tweets. A LOT. To be impactful or insightful though, such a study would have to be done the day after Obama’s announcement (as it was here) and the day before or long after, as a control to see if use of the word use truly shot up as a result of the news-du-jour. It was a fun experiment though and I learned a lot. I’m going to use the Streaming API for an upcoming project idea I have, but it will be more sculptural than screen-based.


Comments are closed.