This talk will explore legal issues surrounding the storage and use of data.
Software architecture can be an intimidating topic, and a cause of pre-launch angst for anyone who has tried building something from scratch.
We’ll go through the last three years of technical evolution at Tictail as an example of where and when to introduce more complexity, along with some lessons learnt from bad calls over the years.
Brian Faherty in Core Python
*args and **kwargs are all over my code, but what do they really mean?
Bugra Akyildiz in Web Development
Asynchronous proramming can be loosely defined that the program can do different tasks while it waits for I/O operation to complete. It is an important advantage as I/O is slow and CPU can operate on other tasks while I/O is being executed. This not only removes waiting for I/O to complete but tasks that are not I/O bounded can be done efficiently.
Doing an I/O operation blocks the program in Python. In order to remove this disadvantage and provide asynchronous programming capabilities, new asyncio (asynchronous i/o) module introduced to the standard library in Python 3.4.
Python + Multicorn allows us to define simple data backends for Postgres via its Foreign Data Wrapper interface. This opens up a slew of interesting applications for applying SQL to arbitrary data backends.
Biomedical science is increasingly becoming a quantitative discipline. State-of-the-art technologies can now provide a detailed molecular-level snapshot of an individual, revolutionizing our understanding of disease and fundamental biology. Mass spectrometer is one such enabling technology that profiles (identifies and quantifies) proteins and other biomolecules present in samples like blood, saliva, tissue etc. Making sense of the complex data from a mass spectrometer entails application of sophisticated informatics algorithms. At other times, such research relies heavily on exploratory analyses and visualizations to generate new hypotheses for further investigation. This talk will focus on our efforts to build a scalable framework in Python for large-scale mining of mass-spectrometry datasets. We exploit modern “Big Data” technologies in conjunction with Python’s mature data analytics libraries, to harness these data in novel ways. In particular, MongoDB, a document-oriented database, will be discussed in the context of our informatics applications.
Building flexible tools to store sums and report on CSV data (or, collections.Counter: Where have you been all my life?!)
If you're new to Python, you might find that you're using Python as if it were C. This talk will demonstrate how to take advantage of Python's special data structures to build tools for analyzing and creating nicely-formatted reports from CSV data. ("CSV" stands for "Comma Separated Values" although the term describes flat files where the fields in each row are delimited by commas, or tabs, or pipe characters, or whatever. )
Allison Kaptur in Core Python
Byterun is a Python interpreter written in Python with Ned Batchelder. It's architected to mirror the structure of CPython (and be more readable, too)! Learn how the interpreter is constructed, how ignorant the Python compiler is, and how you use a 1,500 line switch statement every day.
Whether you're looking to make your web app run faster or scale better, one great way to achieve both is to simply do less work. How? By using caches, the data hidey-holes which generations of engineers have thoughtfully left at key junctures in computing infrastructure from your CPU to the backbone of the internet.
An unsupervised machine learning algorithm to exploit the underlying data structure in historical stock market returns shows promising classification results with implications for macroeconomic analysis and for creating financial indices.
Jared Lander in Data Analysis
The lasso is one of the most significant machine learning algorithms from the past 15 years. Conceived by Hastie, Tibshirani and Friedman from Stanford, the lasso performs dimension reduction and variable selection making it well suited for the high dimensionality of today's datasets. In this talk we will go over some of the math behind the lasso and discuss some recent advancements in performing inference on lasso-fitted models.
A solution to keep your code base clean in a SaaS environment that is predominantly driven by API web services, which has occasion for minor customization of core web services, based on the needs of your clients.
Marianne Bellotti in Data Analysis
With data based decision making and arguments becoming more and more popular, the opportunities to misconstrue seemingly irrefutable hard facts become more tempting. But should you use your knowledge for truth and justice, or take advantage of the general ignorance about data science to serve your own agenda? Marianne Bellotti talks about the most common ways people attempt to lie using data, how to spot vulnerabilities in a model, and how to protect yourself from criticism.
"Functions are first-class objects in Python." This talk will discuss what we mean by first-class objects and how we can use this language feature to build more efficient, less redundant code with decorators without fear.
Ned Jackson Lovely
It's inevitable that online communities will change, and that we'll remember the community with a fondness that likely doesn't accurately reflect the former reality. We'll explore how we can take a set of articles from an online community and winnow out the stuff we feel is unworthy. We'll explore some of the machine learning tools that are just a "pip install" away, such as scikit-learn and nltk.
One of the biggest challenges of building distributed systems is dealing with failure. In this talk we'll cover a number of approaches and tools to help you build systems that deal with failure as gracefully as possible.
Tools are often a big influence on success or failure teaching Python in person. The dreaded install and compatibility problems can leave many students convinced that Python isn't for them. I’ll describe my zero-install teaching stack: Python's turtle module for beginners, IPython notebooks for academics, and cloud-based IDE Nitrous for intermediate and advanced Web development.
Follow through three example game projects to learn the most helpful elements of the PyGame library.
Matt's obsession with Python's rich comparison methods started about 6 years ago with a gnarly bug and some bad assumptions. A few long nights and a lot of reading later, the docs corrected my assumption:
There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false.
Mind == blown and mind != blown. In Getting Rich with Comparison Methods we'll start by learning how to avoid common and costly bugs in your rich comparison methods, learn a bit about which methods are executed in comparisons and why, and then go beyond symmetrical comparisons exploring some of the ways we can take full advantage of asymmetrical comparison methods.
Adrian Heilbut in Data Analysis
This tutorial will provide a fast-paced, practical overview of analyzing network graph data using python, drawing on one case study from computational biology (protein and genetic interaction networks) and one from finance (correlation networks). We will compare and contrast the major libraries available for analyzing graphs with python (igraph, networkx, and graph-tool) as well as tools for graph visualization. Each section will consist of a 35 min talk/lecture, followed by a 25min guided laboratory exercise (presented as IPython notebooks) to demonstrate and apply the concepts.
Changing demographics in New York State's 13th Congressional District led to a close election that was decided by fewer than 1,100 votes two years ago. The contest between the long standing incumbent Representative Charles B. Rangel and State Senator Adriano D. Espaillat will be replayed this June. In this talk I will use Python Numpy and pandas to analyze campaign contribution data provided for each candidate provided by the US Federal Election Commission and the United State Census Bureau.
The talk will explain how to group campaign contributors by categories such as State and Occupation, and will show how to plot aggregated statistics as a bar plot and on map. It will also explore whether one candidate received many small contributions or a small number of big contributions. The talk will explore whether these donation distributions had a bigger impact on the election results versus endorsements from well known politicians. Finally, it will explore the effect of an increase in the Latino population and gentrification.
Python doesn't know how to play chess. But there are chess engines that do. Pystockfish is a small package that integrates the stockfish chess engine with Python. It's contribution is allowing a python user to interact the UCI-based software in an easy to use class. The project highlights the multiple ways a novice can contribute to the python community. With simplicity being a goal in itself.
Meg Winston Ray
A programmer and educator will talk about effective collaboration for secondary computer science education using Python. Meg Winston Ray is a teacher at Bronx Compass High School. Errol King is the creator of Beta the Game. They worked together to teach programming concepts to 9th grade students over the past school year.
After many trails and tribulations, a few months ago I finally got my 10-year-old brother hooked on programming with Python. I posted a brief snippet on reddit's /r/learnpython about it and the post went over very well, with many people asking me for more in-depth instructions of and a sort of curriculum they could use. I have created that material and would like to expand upon it with this talk.
Human trafficking is still a huge issue in the world today. The industry commands 32 billion dollars in annual revenue a year (source: polaris project). As a professor and researcher at NYU I am developing tools to fight this horrible problem. Come learn about the work I've done so far and where my research is going. I will be showing you how recreate some of the tools I've made and used in python. I will be covering web scraping, text processing, and image processing.
I will be taking folks through a number of web scraping tools that I use to find instances of human trafficking. We will then go through classification algorithms I use to determine genuine instances of trafficking. Finally I will discuss image processing and facial recognition.
Brandon Rhodes in Core Python
While Tolkien had friends who could devise ingenious ways to critique his work without sounding critical, he had others whose remarks were merciless and direct — to the point that Tolkien simply stopped sharing new chapters as he wrote The Lord of the Rings. As programmers we share many of the struggles of writers and artists, and we often react just as badly to critique of our code. From Tolkien’s experience we will draw lessons about how to make critique generous instead of damaging, and actionable instead of personal.
The multi-paradigm flexibility of Python can bite developers new to Python or object-orientation, since self-restraint and design know-how is needed to keep code style paradigm-consistent. Learn about OO principles like SOLID and Tell-Don't-Ask and how they apply in Python for more uniform, testable, and working OO code.
Slides now online here: http://slinkp.com/sisyphus_pygotham_2014/
All code has a design, whether deliberate or not. If you don't think about it, the odds of creating well-designed code is nil. How and why does code get worse as it ages?
This talk describes a particularly common anti-pattern - overuse of inheritance - how it gets to be so common, and will detail how to improve an existing design.
This talk will provide the backstory of Salt, one of the biggest Python-based projects ever, with a focus on how to successfully create an enthusiastic community in support of an active and friendly project.
Everyone loves making their programs faster. Unfortunately, it's easy to waste your time trying to speed up the wrong things. If you want to improve your code's performance, you need to learn to use a profiler. In this presentation you'll learn how to identify the slow parts in your code so you can get the most bang for your buck when performance tuning.
Mike Bayer in Web Development
Bitly provides functionality in our API and clients by implementing microservices, a trendy new word that describes a service-oriented architecture built using some common-sense guidelines. We use the Tornado web framework to implement these services, along with a liberal sprinkling of other languages and tools. I will illustrate our system architecture, our patterns and best practices for building a service, and show how to create a scalable application using these techniques.
Antonio Cesar Vargas in Core Python
This talk will be about my experience creating a Python framework for the creation of phylogenetic, lineage, tree of Android malware and to make predictions of generative malware using machine learning within the MEDS framework. Most of this work is from my master's thesis. The goal is to release this framework as an open source project for the benefit of the cybersecurity and digital forensics communities. Furthermore, the intention of this talk is to bring awareness to the Python community of how Python is being used to fight malware and make the Internet a safer place.
NumPy arrays combine the speed of C with the convenience of Python. It is the fundamental package for scientific and statistical computing in Python. MongoDB’s scale, speed, and flexibility make it ideal for storing large amounts of data. However, the official MongoDB driver is not optimized for loading MongoDB documents into NumPy arrays. Enter “Monary”, which allows you to easily examine and manipulate data using NumPy arrays. We will explore how Monary can accelerate your scientific analysis while providing you with the scale and flexibility of MongoDB and the ease of Python.
Julie Steele in Data Analysis
The power of Python for visualization lies in its many specialized libraries. But whether you're using Matplotlib, Vispy, Bokeh, or Vincent, these are the principles of design and interaction that you'll want to keep in mind.
Andy Fundinger in Data Analysis
In this lecture we will show an applied case with Ipython notebook as a quant explores a typical financial problem with the help of various Python libraries and software engineering. The specific case is a small scale market risk platform using historical simulation to calculate value at risk.
Hannah Aizenmann in Data Analysis
The Python visualization landscape has a couple of really great libraries for doing data visualization, but most everyone defaults to always using the same library for all their pictures. This talk will give an overview of the philosophies underpinning matplotlib, chaco, bokeh, vispy, vincent, and d3py and discuss what sort of applications each library is best suited for.
Anna Smith in Data Analysis
In this talk I will provide tips on working with glorious world of batch/ETL systems and explore open source projects to help maintain these data pipelines. Also cats.
The talk will cover that, but also give some real-life performance examples of where PyParallel shines in comparison to the existing options (e.g. against asyncio, Twisted, tornado etc). This will typically be web-server based stuff, as, well you know, that's actually something that works properly in PyParallel :-)
Dwight J. Browne
Integration of Ipython notebook and Julia
An overview of why and how one might use Docker for Python applications
Python as uniting programming language across computer graphics packages and 3D automated manufacturing
If you heard about 3D Printing and you are curious about it... Then you will know how to make things using computer controlled tools and which software tools to use ,and where Python fits in, after this talk.
Python Begets Python: BattleSchool Provisioning, via Ansible, can Self-Document and Configure a Mac to get Productive faster and thus Produce Working Software sooner..
Anne Moroney in Core Python
A main goal of DevOps is to automate everything that's not pure development. Businesses need and want to help their developers produce working software faster, so DevOps is now key. Yet DevOps isn't just for startups and big companies any more. It can be for you! Come to this talk to learn how to get started in the quest to be ever more controlling of your Macintosh OSX system. A common use case is setting up a new Green field machine. This talk will walk you through how to take a machine from wiped to running with your preferred apps and tweaks. As background and context, we will discuss why BattleSchool matters and where it sits in the field. Brown field machine provisioning and other operating systems, e.g. Linux and Windows, will also be considered. Mac is the middle ground today - neither as automatable as Linux nor as unsupported by Ansible and others as Windows. Matt Wright's Docker talk ( http://pygotham.org/talks/16 ) should fit right after this material.
This talk will be an introductory discussion of why Python is an awesome programming language for analyzing, and playing with, natural language. (It’s not the only one, but especially for people just diving in to using a programming language for language play or research, it’s great.) Rather than going into details of algorithms, I'm going to give some simple, easy-to-build-upon examples of how Python and open source Python packages can be used to quickly dive into some really awesome aspects of research/investigation in linguistics, and bring them together to explain, at a high level, why I believe Python is an excellent bridge between linguists interested in programming, beginning programmers interested in linguistics, and any curious people who like figuring stuff out about languages all along a spectrum of formality.
Python is used extensively in the video game industry at many levels. This talk is specifically about how Python based analytics can be integrated operationally into an organization at two levels: in ad hoc investigations and in production analytic services. This is about best practices in documented, reproducible and archived data investigations as well as production development with Python. A case study of an analytic service for cheating detection in Call of Duty Ghosts will make this real and provide some fun. This talk is rated PG-13 for some awesome video game violence.
Amy Hanlon in Core Python
Some behavior of the Python interpreter is really weird. Have you ever wondered why you tend to get bugs when you have mutable default arguments? Or why you shouldn't use the is statement to determine equality of integers? Or why tuples are greater than strings but strings are greater than lists? Let's investigate these odd behaviors, and more, to learn what's really going on. You'll leave this talk with some practical knowledge of how to avoid common bugs and some fun Python trivia.
Building service oriented web apps is a great way to separate concerns, parallelize development, and scale high traffic apps. One of the downsides of service oriented web applications, however, is managing the high overhead of communication between services, and handling the additional complexities that come along with service driven development.
In this presentation, Randall Degges, Stormpath Developer Evangelist and OpenCNAM co-founder, will share all of the best practices he learned while building OpenCNAM, supporting billions of API requests.
Daniel Kronovet in Core Python
A walkthrough of the process of creating a good development workflow in Python (with a focus on IPython). This includes organizing the files in your project (and setting up relative imports through packages and modules), writing tests with Unittest, and debugging with pdb. This tutorial will also include a section on the extra features of IPython.
6.8 billion people in the world use SMS every day and yet text apps still are stuck with all caps KEYWORD interfaces for programmatic interaction. This live coding presentation explores the practical application of natural language processing to make SMS more human (and more forgiving for those of us with big thumbs) with a few Python tools.
Co-presented with Juliet Hougland.
Pandas is a fast and expressive library for data analysis that doesn’t naturally scale to more data than can fit in memory. PySpark is the Python API for Apache Spark that is designed to scale to huge amounts of data but lacks the natural expressiveness of Pandas. We will introduce Sparkling Pandas, a new library that brings together the best features of Pandas and PySpark; Expressiveness, speed, and scalability.
Speed without drag: making your code faster when there's no time to waste.
Implementing get_one_or_create() for SQLAlchemy.
Aaron Hall in Data Analysis
During this 30 minute talk and tutorial, we'll work through analysis of data with linear models and diagnostics to get an optimal model.
James Powell in Core Python
Software programming is a young discipline, placed somewhere between the rigorous world of mathematics and the pragmatic world of engineering. As evidence of its immaturity as a discipline, consider how frequently the same problems arise in the practice of writing a programme, and how these problems are unaccompanied by widely-disseminated conceptualisations or a commonly agreed-upon pathology or even a well-defined guiding philosophy and epistemology. In other words, we keep running into the same problems, and we often lack even a basis for discussing them (much less avoiding them)!
TSAR (the TimeSeries AggregatoR), - how to count tens of billions of daily events in real time using open source technologies
Anirudh Todi in Data Analysis
Twitter depends heavily on real-time event aggregation. Classic timeseries applications include site traffic, service health, and user engagement monitoring; these are increasingly complemented by a range of products and features that surface aggregated timeseries data directly to end users. Services that power such features need to be resilient enough to ensure a consistent user experience, flexible enough to accommodate a rapidly changing product roadmap, and able to scale to tens of billions of events per day.
Experience has shown that truly robust real-time aggregation services are hard to build; scaling and evolving them gracefully is even harder; and, moreover, many timeseries applications call for essentially the same architecture, with slight variations in the data model. Solving this broad class of problems at Twitter has been a multiyear effort. In previous talks we have introduced Summingbird, a high-level abstraction library for generalized distributed computation, which provides an elegant descriptive framework for complex aggregation problems. In this talk, I will describe how we built a flexible, reusable, end-to-end service architecture on top of Summingbird, called TSAR (the TimeSeries AggregatoR).
TSAR uses Python to provide an service toolkit that integrates with essential services that provide data processing, data warehousing, query capability, observability, and alerting, automatically configuring and orchestrating its components.
This talk will present an approach to querying data that directly uses some of Python’s most powerful language features, rather than a separate query language like SQL. We’ll show how to persist data in an ordered key-value store and use generators, itertools, and comprehensions to formulate queries. Examples will given using FoundationDB, which implements an ordered key-value store on distributed clusters.
Dean Silfen in Core Python
An introduction to the video processing libraries available in Python.
J. Randall Hunt
The weather is everywhere and always. That makes for a lot of data. This talk will walk you through how you can use MongoDB to store and analyze worldwide weather data from the entire 20th century in a graphical application. You'll learn how to ask and answer questions about capacity planning and scaling for both real-time and ad-hoc operations when dealing with huge datasets.
Learn to read unstructured data from web pages and put them into a useful format
Alfred Lee in Core Python
If you're changing careers into programming, or especially data science, don't be intimidated by the stars and the experts. You may know more than you think you know.
A peek under the hood of all your favorite scientific Python packages.
This tutorial will cover how you would build a messaging server that know nothing about the messages it is sending or receiving. We will quickly gloss over some choices in cipher suite and the recommended libraries for encryption for python developers. The server will be extended to also identity verification from both the user's and the server's perspective. Finally we will discuss possible extension that could be done to make the server more secure and functional.
- This is a light hearted talk around cryptography and secret sharing, there will be memes.