Hack/Reduce Montréal 2011
Hack/Reduce on Saturday (2011/26/3) was a real success. Just look at some of the tweets! Here are some pictures from Hack/Reduce participants: Hack/Reduce on Picasa and Steven’s pics on Flickr.
The day started out with coffee and a presentation by the team at Hopper about Mapreduce and about using the infrastructure and data mappers we had created. We quickly ran through a tutorial and examples available on github. Most people that were present hadn’t used Mapreduce before so the overview was quite exhaustive and took about one and a half hours. We also gave a rundown of accessing and running on the Amazon Elastic Mapreduce instance that we were using.

Some teams had already started hacking during the presentation. For the ones who didn’t already have a set team we gave the opportunity to pitch their idea quickly or just talk about what they might want to build. About 15 people gave a short pitch about what they might want to build. Several teams were formed from this when people got some idea of what the others’ were interested in and could connect with other developers with matching interests.
After some more coffee, the hacking seriously got underway. The mentors from Hopper were very busy in the beginning, but pretty quickly people got the hang of it and could start concentrating more on getting the coding done.
The coding continued furiously and I think everyone had a great time. We also had visitors come and go, mostly other developers who were interested in seeing what was going on or just came to meet their friends.
Pretty quickly jobs started popping up in our AWS management console. We had planned to run 100 Amazon Extra Large Instances. Each Extra Large instance has:
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
(http://aws.amazon.com/ec2/instance-types/)
Eventually, we had to bump this up to 150. So we had 150 Extra Large Instances running.
Results
We had the first presentation around 6.30. Most teams still wanted to go on. The last presentations were held around 8pm. With the presentations we had more pizza and Sylvain Carle from Needium also brought in some beer. Thanks Sylvain! Some teams were unable to present because of the slowness of finishing bixi jobs. Even if all of the teams didn’t have anything to show, all of the teams gave a short presentation about what they were trying to do and what kinds of problems they had experienced. Some of the presentations also turned into discussion about how to efficiently solve different problems and develop for mapreduce.
Some highlights from the presentations
Flight Day – Jean Francois Im

Jean-Francois Im analyzed our flight data of the prices of 500 000 return flights. There were about 5.5 million datapoints in the dataset, with the prices of the same 500 000 flights queried at different times.
Jean-Francois’ analyzed which weekdays are the cheapest for return flights. According to the analysis, flights that depart on Fridays and Saturdays are the cheapest. This was also confirmed by the Hopper team that has experience from the travel industry.
Lazy bixi activity – Alexandre Abreu, Raimi Rufai
The Lazy bixi activity team mashed up bixi date with Google Elevation API to analyze where the laziest bixi users were situated in Montreal. Unfortunately the team couldn’t completely finish the task since they got blacklisted by the Elevation API forsending too many requests. The consensus was that at a big data hackathon that’s still a feat and a respectable effort.

Spiking words in a specific year – Ben Kirwin and Michael Mulley
By analyzing words from the Google n-grams (Books dataset), Ben Kirwin and Michael Mulley
were able to pick out words that we’re particulary popular in specific years compared to other years. Viagra was one of these words in 2005. A comment from the team: “we spent part of the day swearing on Java before we discovered Hadoop Streaming with python”

Twitter Influence – David Cryans, Maxim Martineau, Mathieu Ouin
The Twitter influence team calculated the most influential tweeters based on the number of followers followers. The results were the following:
- @barackobama
- @aplusk (Ashton Kutcher)
- @cnnbrk (CNN breaking news)
- @nansen (Nansen Malin)
- @bradhoward
The complete results have been published: http://nexalogy.com/demo/hackreduce/

In total, there were 11 presentations. Some of the other presentations were:
Technical Trading Analysis: verify the value of a trigger event on when to buy stock.
Inverted Index: building lucene-like inverted index on all of wikipedia.
PeopleRank: PageRank implementation of ranking people using twitter followers as “links”
Busiest Bixi stations: calculating the times when specific bixi spots in Montreal have the most activity.
To read more about Hack/Reduce, you can check out these blogposts by participants:
http://pascaldimassimo.com/2011/03/28/fun-at-hackreduce/#comment-13
http://dataholic.ca/place/2011/03/27/mapreduce-hackathon-hopper-travel/
http://blog.simonmathieu.com/post/4135400114/montreals-hackreduce-got-it-right
We would like to thank all the participants again, it was a great Saturday and we really enjoyed it! We also want to thank our sponsors, Hopper, Google and Needium.
Stay tuned for more Hack/Reduce and information about the exclusive bixi dataset!