We had an excellent Hack/Reduce event again this weekend in Ottawa. Our knowledge has been expanded again, we now know how desperate daters can get and at what time people do their online shopping among other things.
In order to get all the tech to work and keep the clusters humming a lot of coffee and pizza was consumed. I would be afraid to calculate how much.
Most importantly, we saw a lot of great Hack/Reductions by the participants. Actually, first and foremost it was a great weekend for the Hack/Reduce team again – the best way to spend a Saturday – for real! Thanks!
Here’s a short description of some of the presentations we saw:
The Hackify team worked on the shopify dataset with information about US Shopify orders. The team used Ruby and the streaming api.
Firstly, the team analyzed where people shop the most. Naturally, the answer was California (largest state).
Second, the team wanted to find the date when people had shopped the most, which was on August 19th between noon and 6pm ($185891).
The team concluded that using Ruby with the streaming api makes it easy to do map/reduce and that Hadoop is cool!
Petro Verkhogliad (@vpetro)
Petro first worked with the Hackify team but then turned to using Python which he was more familiar with. Petro also analyzed the Shopify shopping data.
Petro found that people shop the most at 9am in the morning and 9pm in the evening. The most popular shopping day is Friday with Saturday as a close second.
Geographically most shopping is done by Californians ordering from California.
Rod Dunne, Marc Lepage, Mohamed Mansour (@mohamedmansour) , Alexis Brunet
Team-RIM was voted as winner by the participants. They worked on the google n-grams data and the Amazon review data. First, the team calculated the amount of alliterative 2-grams by letter per year. They also calculated the amount of alliterative 2-grams by letter per year.
For the amazon review data, team-RIM calculated the average rating given on specific dates calculated over all of the years in the dataset. You can see that all products are almost given a rating of 4. You can also notice that right after christmas the ratings drop off, ie. people give worse review right after christmas.
The amazon dataset also includes data over how useful reviews are. The reviews can be voted up or down on amazon by users. Team-RIM analyzed this data and came to the conclusion that reviews that give a higher rating to a product are considered more useful.
The team also analyzed the usefulness of reviews based on the review length. From the picture we can see that 50-60 character reviews are considered most useful. As a bonus, the team calculated that products received worse reviews as time went by.
Pascal – Analysis of the most desperate daters
Pascal from Hopper wanted to find the most desperate daters. He analyzed the amount of profile views by users. According to his analysis, some users are checking so many profiles that with a 30-second timeframe it amounts to full-time work. (~17k views per month… Which amounts to something like 60 profile views every working hour…)
Pascal also found the most visited profiles.
Lastly, Pascal analyzed how users use the Mate1 site. The results were quite surprising, as 23% of users only view one other profile. 65% only ever check 10 profiles.
JF (@jeanfrancoisim) from Hopper also analyzed the dating dataset.
First, JF mapped the birth year of users to their hotness by gender. JF calculated hotness with the following formula: (msgs received+msgs sent) * ((msgs received+10)/(msgs sent+10)).
The red dots represent women and blue dots men. We can clearly see that the woman demographic is all in all younger. I’m afraid to draw any other conclusions from the results…
Next JF mapped hotness and height by gender. You could easily see that men are taller than women but it is unclear if height directly influences hotness.
As a last analysis JF mapped income to hotness. Most people answered “rather not say” to this question, why most datapoints are in the second column. It’s also unclear if income influences hotness.
The XKCD team wanted to test the internet “truism” that has been said in a xkcd comic: “Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”
The team didn’t really end up with a result for this since it turned out to be difficult to only include the actual article content and not all of the other links on the page. They started their work in Clojure but later ended up using java instead.
Team Dating 8
Martin Samson (@pgdown) , Edgar Acosta, David Germain
Team 8 explored the dating dataset with the goal of measuring popularity/hotness and correlating it to other factors.
The team started by analyzing some basic measures for profiles.
They created their own popularity formula.
The team made some nice graphs mapping times listed vs. messages received, times listed vs. profile views, popularity vs. number of times viewed..
The team used streaming with Python and fought with it for a while. Lesson: Do not use the same file name for the mapper and the reducer scripts.
Learnt: Do not use the same file name for the mapper and reducer scripts.
Team weather and crime
Richard Desmarais, Chris Camden, Philippe Savoie, Ryan McLeod, Eric Ax, Manuel Belmadani (@pragmatwit)
The weather and crime team created a web interface that could answer queries such as “What is the maximum temperature in january 2007”. When the search query is launched it will run through the dataset and give you the answer. The team used python streaming.
Team UOttawaNLP + others
Team UOttawa analyzed the sentiment of the Amazon reviews. As expected, there were clear differences in sentiment between the good and the bad reviews.
In the end, we let the participants vote for the best Hack/Reduction. Team-RIM with their n-grams and Amazon analyses took the win… Congrats!
Thanks to everyone for coming, the Hack/Reduce team had a great time and it was amazing to meet you all. We hope you keep hacking and we hope we’ll see you next time!