Failing With Grace

09:45 AM - 10:30 AM on August 17, 2014, Room 701

Sean O'Connor

Audience level:
intermediate
Watch:
http://youtu.be/UYllUxqjVBo

Description

One of the biggest challenges of building distributed systems is dealing with failure. In this talk we'll cover a number of approaches and tools to help you build systems that deal with failure as gracefully as possible.

Abstract

One of the biggest challenges of building distributed systems is dealing with failure. In this talk we'll cover a number of approaches and tools to help you build systems that deal with failure as gracefully as possible.

Some of the specific topics to be covered include:
  • Async Queues - NSQ and the beauty of pub sub messaging
  • Timeouts - Don't get stuck.
  • Smart Retries - Not giving up while not making things worse.
  • Immutable Data - Everything is easier when nothing changes.
  • Monitoring - It's broken, you just don't know it yet.