Ottawa

From the blog about » Ottawa

 

Shopping and Dating Hacks – Ottawa

We had an excellent Hack/Reduce event again this weekend in Ottawa. Our knowledge has been expanded again, we now know how desperate daters can get and at what time people do their online shopping among other things.

In order to get all the tech to work and keep the clusters humming a lot of coffee and pizza was consumed. I would be afraid to calculate how much.

Most importantly, we saw a lot of great Hack/Reductions by the participants. Actually, first and foremost it was a great weekend for the Hack/Reduce team again – the best way to spend a Saturday – for real! Thanks!

We want to thank all the participants who made it out and the sponsors of course: Hopper, Infoglutton and Shopify.

Here’s a short description of some of the presentations we saw:

Hackify

Jean-Claude Batista (@jcbatista) , Taswar Bhati (@taswarbhatti) , Joel Sachs, Andrew Clunis (@orospakr), (Petro Verkhogliad)

The Hackify team worked on the shopify dataset with information about US Shopify orders. The team used Ruby and the streaming api.

Firstly, the team analyzed where people shop the most. Naturally, the answer was California (largest state).
Second, the team wanted to find the date when people had shopped the most, which was on August 19th between noon and 6pm ($185891).

The team concluded that using Ruby with the streaming api makes it easy to do map/reduce and that Hadoop is cool!

Petro

Petro Verkhogliad (@vpetro)

Petro first worked with the Hackify team but then turned to using Python which he was more familiar with. Petro also analyzed the Shopify shopping data.

Petro found that people shop the most at 9am in the morning and 9pm in the evening. The most popular shopping day is Friday with Saturday as a close second.

Geographically most shopping is done by Californians ordering from California.

Team-RIM

Rod Dunne, Marc Lepage, Mohamed Mansour (@mohamedmansour) , Alexis Brunet

Code: https://github.com/mlepage/HackReduce

Team-RIM was voted as winner by the participants. They worked on the google n-grams data and the Amazon review data. First, the team calculated the amount of alliterative 2-grams by letter per year. They also calculated the amount of alliterative 2-grams by letter per year.

For the amazon review data, team-RIM calculated the average rating given on specific dates calculated over all of the years in the dataset. You can see that all products are almost given a rating of 4. You can also notice that right after christmas the ratings drop off, ie. people give worse review right after christmas.

The amazon dataset also includes data over how useful reviews are. The reviews can be voted up or down on amazon by users. Team-RIM analyzed this data and came to the conclusion that reviews that give a higher rating to a product are considered more useful.
The team also analyzed the usefulness of reviews based on the review length. From the picture we can see that 50-60 character reviews are considered most useful. As a bonus, the team calculated that products received worse reviews as time went by.

Pascal – Analysis of the most desperate daters

Pascal from Hopper wanted to find the most desperate daters. He analyzed the amount of profile views by users. According to his analysis, some users are checking so many profiles that with a 30-second timeframe it amounts to full-time work. (~17k views per month… Which amounts to something like 60 profile views every working hour…)

Pascal also found the most visited profiles.

Lastly, Pascal analyzed how users use the Mate1 site. The results were quite surprising, as 23% of users only view one other profile. 65% only ever check 10 profiles.

JF

JF (@jeanfrancoisim) from Hopper also analyzed the dating dataset.

First, JF mapped the birth year of users to their hotness by gender. JF calculated hotness with the following formula: (msgs received+msgs sent) * ((msgs received+10)/(msgs sent+10)).

The red dots represent women and blue dots men. We can clearly see that the woman demographic is all in all younger. I’m afraid to draw any other conclusions from the results…

Next JF mapped hotness and height by gender. You could easily see that men are taller than women but it is unclear if height directly influences hotness.

As a last analysis JF mapped income to hotness. Most people answered “rather not say” to this question, why most datapoints are in the second column. It’s also unclear if income influences hotness.

Team XKCD

Steven Noble (@snoble), Chris Saunders(@chris_saunders) , David Underwood (@davefp)

The XKCD team wanted to test the internet “truism” that has been said in a xkcd comic: “Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”

The team didn’t really end up with a result for this since it turned out to be difficult to only include the actual article content and not all of the other links on the page. They started their work in Clojure but later ended up using java instead.

Team Dating 8

Martin Samson (@pgdown) , Edgar Acosta, David Germain

Team 8 explored the dating dataset with the goal of measuring popularity/hotness and correlating it to other factors.
The team started by analyzing some basic measures for profiles.

They created their own popularity formula.

The team made some nice graphs mapping times listed vs. messages received, times listed vs. profile views, popularity vs. number of times viewed..

The team used streaming with Python and fought with it for a while. Lesson: Do not use the same file name for the mapper and the reducer scripts.

Learnt: Do not use the same file name for the mapper and reducer scripts.

Team weather and crime

Richard Desmarais, Chris Camden, Philippe Savoie, Ryan McLeod, Eric Ax, Manuel Belmadani (@pragmatwit)

The weather and crime team created a web interface that could answer queries such as “What is the maximum temperature in january 2007”. When the search query is launched it will run through the dataset and give you the answer. The team used python streaming.

Team UOttawaNLP + others

Chris Fournier (@cfournie ), Oana Frunza, Alistair Kennedy, Russell Luo, Dominic Plouffe (@dplouffe )

Team UOttawa analyzed the sentiment of the Amazon reviews. As expected, there were clear differences in sentiment between the good and the bad reviews.

Winner

In the end, we let the participants vote for the best Hack/Reduction. Team-RIM with their n-grams and Amazon analyses took the win… Congrats!

 

Thanks to everyone for coming, the Hack/Reduce team had a great time and it was amazing to meet you all. We hope you keep hacking and we hope we’ll see you next time!

 

Enhanced by Zemanta

Ottawa

Hack/Reduce 4 Ottawa, Saturday 13th of August 2011, 10am-8pm at the Language Technologies Research Center.

We’ll spawn up clusters with hundreds of nodes for free use.
You can use our datasets or send us your own and hack on anything you want.
We provide the clusters, Hadoop/Mapreduce experts and food and drinks.

Intense coding for 7 hours and presentations at the end of the day. What could be a better way to spend a Saturday?

Just bring your laptop and an idea of what you want to solve.

Contents:

  1. What we offer at the event
  2. How the event works and schedule
  3. How to prepare
  4. Technologies
  5. Examples of what can be built
  6. Datasets
  7. Venue
  8. From the Blog

What we offer at the event:

  • Free access to large clusters (usually a total of 500 nodes, depending on what’s needed.)
  • Hands-on support for popular big data technologies, mentors, Hadoop experts
  • Food and drinks

How the event works and schedule

Basically you show up, pitch your idea if you have one and gather or find a team to work with. We’ll have a short presentation about the infrastructure and how to run your Hadoop jobs (if you will be using Hadoop). Then you just start coding with your team. Food and drinks are served throughout the day. At the end of the day everyone presents their results and/or shares what they did and what they learnt.

Schedule:

10-10.15 Coffee and Introduction
10.15-10.45 Participants get to pitch their ideas and find or gather a team to work with
10.45-11.30 Intro to the infrastructure, tutorial and setup (optional)
11.30-18.30 Development time
19-20 Presentations by the teams, beers
20 Gathering stuff and closing

How to prepare

You don’t have to do any special preparations for Hack/Reduce. However, one of the things we often hear from participants is that they should have RTFM. Here’s how you can prepare:

  • Check out the datasets and the sample code on github and try out some simple stuff
  • If you want us to upload another dataset send a link to us
  • Gather some friends to your team. (You can also find teams at the event.)
  • Figure out what you want to build, prepare to pitch your idea to the other participants, it’s your chance to get a team at the event. You can also optionally try to find another team to work with at the event.

At the other events we’ve had 10-20 people stand up and give a short 30 second pitch about what they wanted to build or what dataset they were interested in. After the pitches anyone could go talk to the people that had pitched to join their team. We also had several teams that had been pre-formed. We also had some people that worked alone. Anything is possible.

You will need to bring your own laptop to the event, everything else will be provided. We have bad experiences of getting Windows environments to work with hadoop, we suggest you bring a linux or osx machine.

Technologies

We’ll provide about 10 Amazon clusters with regular Amazon EC2 xlarge instances that we can scale up according to need (Usually up to 500 nodes). We’ll have Hadoop installed and instructions ready for how to run your Hadoop jobs. If you want to use other technologies you’ll have to contact us well in advance (2 weeks) or install them yourself on the day of. You’ll have full access to the clusters. The cluster nodes are regular EC2 m1.xlarge instances running Ubuntu 10.04.

Examples of what can be built

You can read examples of what has been built on the pages of the Montreal, Toronto and Boston events. Of course, we encourage you to figure out your own great ideas using any datasets you can find (you’ll have to contact us to make sure we can make the datasets available).

You only have max 8 hours of efficient coding time, so you have to take that into account. It’s a good idea to gather a team beforehand and figure out what you want to build before coming to the event.

Datasets

We can make any datasets that you want available for the event. However, you’ll have to contact us before the event so that we have time to upload the dataset (because it can take a long time).

Check out the existing datasets on github.

We can also upload any dataset that you provide a link for. You can also look through the public datasets on Amazon. If you want us to make any of these available for Hack/Reduce you need to contact us. We also have a post about other possible datasets. You can also look through Infochimps for interesting datasets.

Venue

The event will be held at the Language Technologies Research Center
Language Technologies Research Center

283 Alexandre-Taché Blvd.
Gatineau, Quebec J8X 3X7
Canada

Saturday, August 13, 2011 from 10:00 AM – 8:00 PM (ET)


View Hack/Reduce Ottawa in a larger map