Building Streaming REST APIs with Ruby

Twitter popularized the term "firehose API", to mean a realtime stream of data sent through a persistent connection. But even if you're not a realtime service, streaming APIs are great for pushing data from the backend to clients. They reduce resource usage because the server can decide when it's a good time to send a incremental chunk of data. They can also improve the responsiveness of your user experience. The same HTTP API can be reused to power multiple different apps. For example, you could write your web frontend with a Javascript frameworks like [Backbone.js](http://documentcloud.github.com/backbone/), but reuse the same API to power a native iOS application. Follow the jump to read about how streaming APIs work, and how you can write one with [Rack::Stream](https://github.com/intridea/rack-stream).

TL;DR

Rack::Stream is rack middleware that lets you write streaming API endpoints that understand HTTP, WebSockets, and EventSource. It comes with a DSL and can be used alongside other rackable web frameworks such as Sinatra and Grape.

What's Streaming HTTP?

Normally, when an HTTP request is made, the server closes the connection when it's done processing the request. For streaming HTTP, also known as Comet, the main difference is that the server doesn't close the connection and can continue sending data to the client at a later time.

normal http

streaming http

To prevent the connection from closing, rack-stream uses Thin's 'async.callback' to defer closing the connection until either the server decides to close the connection, or the client disconnects.

Rack::Stream

Rack::Stream is rack middleware that lets you write streaming HTTP endpoints that can understand multiple protocols. Multiple protocols means that you can write an API endpoint that works with curl, but that same endpoint would also works with WebSockets in the browser. The simplest streaming API you can make is:

# config.ru
# run with `thin start -p 9292`
require 'rack-stream'

class App
  def call(env)
    [200, {'Content-Type' => 'text/plain'}, ["Hello", " ", "World"]]
  end
end

use Rack::Stream
run App

If you ran this basic rack app, you could then use curl to stream it's response:

> curl -i -N http://localhost:9292/

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: close
Server: thin 1.3.1 codename Triple Espresso

Hello World

This isn't very exciting, but you'll notice that the Transfer-Encoding for the response is set to chunked. By default, rack-stream will take any downstream application's response bodies and stream them over in chunks. You can read more about chunked transfer encoding on Wikipedia.

Let's spice it up a bit and build an actual firehose. This next application will keep sending data to the client until the client disconnects:

require 'rack-stream'

class Firehose
  include Rack::Stream::DSL

  def call(env)
    EM.add_periodic_timer(0.1) {
      chunk "\nChunky Monkey"
    }
    [200, {'Content-Type' => 'text/plain'}, ['Hello']]
  end
end

use Rack::Stream
run Firehose

The first thing to notice is the Firehose rack endpoint includes Rack::Stream::DSL. This are convenience methods that allow you to access env['rack.stream'], which is injected into env whenever you use Rack::Stream. When a request comes in, the #call method schedules a timer that runs every 0.1 seconds and uses the #chunk method to stream data. If you run curl, you would see:

> curl -i -N http://localhost:9292/

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Connection: close
Server: thin 1.3.1 codename Triple Espresso

Hello
Chunky Monkey
Chunky Monkey
Chunky Monkey
# ... more monkeys

rack-stream also allows you to register callbacks for manipulating response chunks, and controlling when something is sent with different callbacks. Here's a more advanced example with callbacks added:

require 'rack-stream'

class Firehose
  include Rack::Stream::DSL

  def call(env)
    after_open do
      chunk "\nChunky Monkey"
      close  # start closing the connection
    end

    before_chunk do |chunks|
      chunks.map(&:upcase)  # manipulate chunks
    end

    before_close do
      chunk "\nGoodbye!"  # send something before we close
    end

    [200, {'Content-Type' => 'text/plain'}, ['Hello']]
  end
end

use Rack::Stream
run Firehose

If you ran curl now, you would see:

> curl -i -N http://localhost:9292/

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Connection: close
Server: thin 1.3.1 codename Triple Espresso

HELLO
CHUNKY MONKEY
GOODBYE!

For details about the callbacks, see the project page.

Up until this point, I've only used curl to demonstrate hitting the rack endpoint, but one of the big benefits of rack-stream is that it'll automatically recognize WebSocket and EventSource requests and stream through those as well. For example, you could write an html file that accesses that same endpoint:

<html>
<body>
  <script type='text/javascript'>
    var socket       = new WebSocket('ws://localhost:9292/');
    socket.onopen    = function()  {alert("socket opened")};
    socket.onmessage = function(m) {alert(m.data)};
    socket.onclose   = function()  {alert("socket closed")};
  </script>
</body>
</html>

Whether you access the endpoint with curl, ajax, or WebSockets, your backend API logic doesn't have to change.

For the last example, I'll show a basic chat application using Grape and Rails. The full runnable source is included in the examples/rails directory.

require 'grape'
require 'rack/stream'
require 'redis'
require 'redis/connection/synchrony'

class API < Grape::API
  default_format :txt

  helpers do
    include Rack::Stream::DSL

    def redis
      @redis ||= Redis.new
    end

    def build_message(text)
      redis.rpush 'messages', text
      redis.ltrim 'messages', 0, 50
      redis.publish 'messages', text
      text
    end
  end

  resources :messages do
    get do
      after_open do
        # subscribe after_open b/c this runs until the connection is closed
        redis.subscribe 'messages' do |on|
          on.message do |channel, msg|
            chunk msg
          end
        end
      end

      status 200
      header 'Content-Type', 'application/json'
      chunk *redis.lrange('messages', 0, 50)
      ""
    end

    post do
      status 201
      build_message(params[:text])
    end
  end
end

This example uses redis pubsub to push out messages that are created from #post. Thanks to em-synchrony, requests are not blocked when no messages are being sent. It's important do the redis subscribe after the connection has been opened. Otherwise, the initial response won't be sent.

What about socket.io?

socket.io is great because it provides many transport fallbacks to give maximum compatibility with many different browsers, but its pubsub interface is too low level for capturing common app semantics. The application developer doesn't have nice REST features like HTTP verbs, resource URIs, parameter and response encoding, and request headers.

The goal of rack-stream is to provide clean REST-like semantics when you're developing, but allow you to swap out different transport protocols. Currently, it supports normal HTTP, WebSockets, and EventSource. But the goal is to support more protocols over time and allow custom protocols. This architecture allows socket.io to become another protocol handler that can be plugged into rack-stream. If you wanted to use Pusher as a protocol, that could also be written as a handler for rack-stream.

Summary

rack-stream aims to be a thin abstraction that lets Ruby developers write streaming APIs with their preferred frameworks. I plan to broaden support and test against common use cases and popular frameworks like Sinatra and Rails. If you have any questions or comments, feel free to submit an issue or leave a comment below!