Hack/Reduce Toronto Presentations

All videos from the presentations: Hack/Reduce Vimeo

Hi everyone, this is coming a bit late now since Hack/Reduce Toronto was already over a week ago, but we were too busy setting up for Boston to write about the presentations we saw in Toronto… Anyways, we had an amazing time in Toronto. Thanks to all of the participants who were working hard the whole day! Seeing people working hard, learning and enjoying Hack/Reduce is what makes it all worth while for us!

The day started off with coffee, a short introduction to using the cluster and pitches by the participants. The Hopper team then gave a short tutorial on Hadoop and Map/Reduce.

People got to work really quickly and the amount of noise in the beginning when the teams were discussing their projects was mind-blowing. A lot of buzz. Soon after 1 o’clock all the teams hunkered down to start coding and there was an eerie silence again…

In the end, the teams really got a lot done. We saw some really amazing presentations. I’ll give some short descriptions of the pitches here. I’ve put up all of the presentations we had on our Vimeo channel.

There are some interesting things to learn from the videos, mostly about the technologies used and tested, so I suggest you check them out!

In the end, 10 teams out of 21 ended up presenting:

Team 1

Check out the presentation on Vimeo

Bartek Ciszkowski (@bartek), Ash Christopher (@ashchristopher)

Bartek and Ash analyzed search queries that had been made during the course of one day. They grouped search queries in four categories: travel, sex, nerd and cooking. They then analyzed how the popularity of these categories in searches varied during the day.

Here are two pictures of the results, grouped by 1 minute and 30 minutes:

1 minute:

30 minutes:

The source code is on github at https://github.com/ashchristopher/HackReduceToronto.

Team 2

Check out the presentation on Vimeo

Joel Crocker (@joelcrocker), Johan Harjono (@jharjono), Joey Robert (@joeyrobert), Ian Stevens (@istevens)

Joel, Johan, Joey and Ian used a 10 000 song subset of the million song dataset. They were using the Disco distributed computing framework with Python.

They analyzed:

  • The most romantic year by looking for the word love in song titles.
  • The variation of words in song titles (Only 100 words are used in song titles)
  • Average song tempo per year
  • Song lengths per year
  • Saddest tones (Turns out D is really sad)
  • Recording locations.

The source code can be found on github: Github/joyerobert/hackreduce

Team 3

Check out the presentation on Vimeo

Gar Liu (@lonelydatum), Nathan Rambarran (@wibblz), Khurram Virani (@viranik)

Gar, Nathan and Khurram first wanted to figure out if oil prices affected flight prices. They took oil company stock prices and the average price from all the flights in the dataset. However, the flight dataset was limited and the results ambiguous, so the team changed direction. Next, they wanted to calculate which stocks were the most volatile in the NYSE data. They created a scoring algorithm to determine which stocks are the most volatile.

Team 3 used Mandy, an easy library to use Hadoop with Ruby. They also tried out Wukong, and don’t recommend that. Mandy worked very well though.

Team 4

Check out the presentation on Vimeo

Seak Pek Chhan, Nick Ursa (@nickursa), Athir Nvaimi, Gabe Sawhney

Pek, Nick, Athir and Gabe took a month and a half of Toronto bixi data and wanted to see if bixi data is affected by the weather. The answer is yes. Pek also took the Perl code and turned it into Python for fun.

Team 5

Check out the presentation on Vimeo

Stefan Arentz (@satefan), Olivier Yiptong (@sayhello), David Chang, Mike Pettypiece (@mtpettyp)

Stefan, Olivier, David and Mike had no prior experience with Hadoop. They used Python with mrjob. They analyzed DNS data for various things:

Average number of nameservers (it’s 2.25, max is 6)
Number of domains with a specific number of characters. (11 is the most popular)
Domains for which there exists most numbers of permutations of the same domain (mostly used by spammers. Every permutation of Yahoo and Youtube for example exist)

The team noted that the configuration of number of mappers and reducers is very important to speed up the jobs.

Team 6

Check out the presentation on Vimeo

Jordan Christiansen (Kobo, @thebigjc)

Jordan analyzed the correlations of every single stock pair on NYSE. The data started at 0.5 gb and expanded to 250gb when the pairs and prices had bee created. A linear regression was then run for the dataset ending up with 4M pairs. Some interesting correlations were found and Jordan ended up with a huge list of correlated stocks.

You can find the code on github: github.com/thebigjc/hackreduce

Team 7

Check out the presentation on Vimeo

Cleaver Barnes (@cleaverbarnes), Max Brodie (@maxwellbrodie), Shanly Suepaul, Matt MacLean

Cleaver, Max, Shanly and Matt ran their last job while the pitches were already under way.

They analyzed the “connectedness” of various tech communities based on the twitter social graph. It was done by choosing a couple of influencers per community and a person was determined to be part of the community if he followed any of the influencers of that community. For example, John Resig was a community leader in jQuery. You can check out the results in the video.

Team 8

Check out the presentation on Vimeo

Yong Liang

Yong worked on finding the cheapest flight combinations. He found the cheapest chained flights from Seattle. The projects was limited because of the limitation of the dataset (only flights from Seattle.)

Team 9

Check out the presentation on Vimeo

Christophe Biocca, Akash Vaswani, Jake Nielsen, Drew Gross

Team 9 wanted to   Basically the team ended up workeing on parsing wikipedia and came to the conclusion that it’s painful.

In the end they just calculated which article has the most outbound links, but it was uncertain if it actually worked correctly. The result was some error correction page, for more details, check the video.

Team 10

Check out the presentation on Vimeo

Jamie Wong (@jlfwong), Snady Wu, Wien Leung, Maverick Lee, Christopher Wu, Christopher Cooper

Team number 10 had members that worked on a couple of different projects:

Jamie Wong analyzed what made people notable from specific years based on the year they were born and what they had become famous for.

Snady Wu and Christopher Cooper worked on indound links to articles but were halted by the wikipedia parsing issues.

Christopher Wu worked on figuring out how long before you should by your flight in order to get the cheapest flight.


Thanks a lot for the event everyone. We also want to thank the sponsors, Hopper, Amazon, Kobo, Mantella Venture Partners, Chango, Attachments.me and Startupnorth


  1. Trackback: 灯具
  2. Trackback: http://myseks.uw.hu
  3. Trackback: https://vimeo.com/85777262
  4. Trackback: Rhinoplasty los angeles
  5. Trackback: qwxgvnmkfbrvecganfhv
  6. Trackback: axmcsnrcaxmgcnacgnr
  7. Trackback: garcinia max
  8. Trackback: scnkgrfmstngjsngmgcrthv
  9. Trackback: continue reading this.. Phoenix DUI attorneys
  10. Trackback: youtUBE.COM/WATCH?V=-ypGoBqbtg0
  11. Trackback: top 10 testosterone supplements
  12. Trackback: svsjgvgvbbvcfncggjkdf
  13. Trackback: cara alami memutihkan kulit
  14. Trackback: hoeveel mag ik lenen
  15. Trackback: acgggggggdbjmhkfasdj
  16. Trackback: Anonymous
  17. Trackback: garcinia cambogia a scam
  18. Trackback: dr oz garcinia cambogia extract pure
  19. Trackback: cars
  20. Trackback: aparato digestivo fotos
  21. Trackback: plastic surgery in canada
  22. Trackback: paintless dent removal training
  23. Trackback: paintless dent removal training
  24. Trackback: finanzas forex
  25. Trackback: seguros guatemala
  26. Trackback: paintless dent removal training
  27. Trackback: Oceanfront Suites Floorplan
  28. Trackback: panorama 2 bedroom condo floor plans
  29. Trackback: pdr-training.net
  30. Trackback: pdr tool sets
  31. Trackback: paintless dent repair training by nodents.com
  32. Trackback: cnxdhffnefmnnrarcbfaxxrnf
  33. Trackback: Marietta handyman
  34. Trackback: basement remodeling Atlanta
  35. Trackback: custom home builders Atlanta Ga
  36. Trackback: membership club
  37. Trackback: tile installation atlanta
  38. Trackback: ถ่ายรูปรับปริญญา
  39. Trackback: gobierno en chile
  40. Trackback: buy commonwealth towers
  41. Trackback: gtcnrkmxcrlaxenfmgarmfn
  42. Trackback: dr oz 2 weeks diet plan
  43. Trackback: ถ่ายรูป
  44. Trackback: free slotomania coins
  45. Trackback: vapor cigarettes okc
  46. Trackback: newpropertyout.blogspot.com
  47. Trackback: visit this site
  48. Trackback: deer antler spray for sale
  49. Trackback: truth of man
  50. Trackback: new launch condo
  51. Trackback: pure extract garcinia cambogia dr. oz
  52. Trackback: click over here
  53. Trackback: dr oz and garcinia cambogia
  54. Trackback: garcinia-cambogia-droz
  55. Trackback: pure garcinia cambogia free trial
  56. Trackback: best testosterone booster 2011
  57. Trackback: insanity workout free download
  58. Trackback: best hcg drops reviews