Enough Machine Learning to Make Hacker News Readable Again

12:15 PM - 12:45 PM on August 17, 2014, Room 704

Ned Jackson Lovely

Audience level:: intermediate
Watch:: http://youtu.be/xI2jidR4RsI

Description

It's inevitable that online communities will change, and that we'll remember the community with a fondness that likely doesn't accurately reflect the former reality. We'll explore how we can take a set of articles from an online community and winnow out the stuff we feel is unworthy. We'll explore some of the machine learning tools that are just a "pip install" away, such as scikit-learn and nltk.

Abstract

Machine learning can be an intimidating topic. Rather than intimidate, this should exhilarate; sure this stuff is science, but it's science you can do on your laptop.

We'll discuss how machine learning is actually an accessible topic. There is an interesting set of tools you can play with; there might still be some rocket science involved, but we can treat a fair amount of the truly fancy math as a black box. We'll explore gathering a training set, turning blobs of text into usable data, training models, and the magic that is scikit-learn.

We'll use this amazing amount of processing power, data, and science to separate Hacker News articles into buckets of stuff I want to read and stuff I want to pretend doesn't exist. Attendees should learn enough to start experimenting with machine learning themselves.