Using Python with an Ordered Key-Value Store

04:15 PM - 04:45 PM on August 16, 2014, Room 702

Stephen Pimentel

Audience level:: intermediate
Watch:: http://youtu.be/q0xF3X6cr90

Description

This talk will present an approach to querying data that directly uses some of Python’s most powerful language features, rather than a separate query language like SQL. We’ll show how to persist data in an ordered key-value store and use generators, itertools, and comprehensions to formulate queries. Examples will given using FoundationDB, which implements an ordered key-value store on distributed clusters.

Abstract

When developers need to store and query data, they often think first of SQL, whether using a lightweight tool like SQLite or a full-fledged ORM like SQLAlchemy. For many data storage tasks, SQL is more than we really need.

This talk presents an alternative approach that directly employs some of Python’s most powerful language features. Using a distributed key-value store, we can make our data persistent with an interface similar to a Python dictionary. Python then gives us a number of tools “out of the box” that we can use to form queries:

generators for memory-efficient data retrieval;

itertools to filter and group data;

comprehensions to assemble the query results.

Taken together, these features give us a really powerful query capability, and most of it is straight Python. We'll walk through a number of example queries using the Enron email dataset.

By the end of this talk, you'll have an understanding of some sophisticated but accessible Python features that are immediately useful for manipulating data.