Python for Curious People who Like Natural Language a Lot

04:15 PM - 05:00 PM on August 16, 2014, Room 705

Jackie Cohen

Audience level:: novice
Watch:: http://youtu.be/aJUi-PDjb6E

Description

This talk will be an introductory discussion of why Python is an awesome programming language for analyzing, and playing with, natural language. (It’s not the only one, but especially for people just diving in to using a programming language for language play or research, it’s great.) Rather than going into details of algorithms, I'm going to give some simple, easy-to-build-upon examples of how Python and open source Python packages can be used to quickly dive into some really awesome aspects of research/investigation in linguistics, and bring them together to explain, at a high level, why I believe Python is an excellent bridge between linguists interested in programming, beginning programmers interested in linguistics, and any curious people who like figuring stuff out about languages all along a spectrum of formality.

Abstract

A lot of nifty Natural Language Processing applications and scripts can be written in many a language, but Python’s large proliferation of tools and the large number of good starting points for learning make it a particularly good tool for doing simple research about natural language, figuring out cool stuff about how people speak and write, as well as a good tool for thinking critically about languages (both formal and natural!).

I plan to address at least a subset of the following questions: How do those ‘what famous author do you most write like’ tools really work? For people who self-identify as technical but have no linguistics background, what’s a good way to think about the question ‘what is linguistics research’? What sort of stuff should you be thinking about if you know lots of stuff about some natural language or natural language systems and want to write a program to help you learn more stuff? How much math do you need to know to figure out some cool things from a big text corpus, or a few (answer: not much)? What is a corpus anyway and how can I find one that’ll be useful to me?

I will show a few brief implementations of: character set counting classification of a couple different types (phonetic -- that’s sounds, semantic -- that’s meaning) a Markov Chain some machine translation

and I will explain what the results of those small pieces of code mean and why they matter. Overall, I will use these examples to show how, with a bit of curiosity, a little bit of experience with Python, comfort reading some technical documentation, and a moderately fast internet connection, you can come to novel conclusions about patterns of speech, word games, pronoun use, or any of a ton of other aspects of natural language -- pretty useful, since natural language touches pretty much everything.