justin appears

online

Queue your async updates

Asynchronous XHR updates from a JavaScript client app may not be processed in the order they are sent.

I think this is obvious and not something I expect would surprise many developers. Nonetheless I think it bears mentioning because it is a principle that becomes increasingly important as interfaces become more dynamic.

Motivation

While our basic tools for making asynchronous updates haven’t changed in a while, persistence interfaces have. Save buttons are being replaced with unobtrusive “Saving…/All changes saved” notices. Submit buttons are disappearing from forms. Forms themselves are being replaced with more “tactile” controls that seamlessly persist updates upon interaction.

The result is that users generate many more update requests, and these requests often come in quick sucession. The more updates, the better the chance that something other than the final request is processed last and overwrites a more recent update.

Why we don’t think about this

This has always been an issue with asynchronous updates. Only because it is so difficult to trigger with type-then-click-save interaction modes did we seldom–if ever–have to think about it. Further, developing applications on a fast local connection and (often) a single server process makes it practially impossible to trigger out-of-order updates.

However, not thinking about it doesn’t make the problem go away. This is going to happen in your production apps. You might as well plan for it instead of waiting for inconsistent data and frustrated users.

What to do

Debounce

For some types of interactions (e.g. text areas that listen for changes) debouncing updates can help considerably but never eliminate the problem. In these cases it is still worth debouncing to avoid sending an unreasonable number of update requests. For this purpose I am partial to underscore’s debounce function:

_.debounceGithub
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Underscore.js 1.4.3
// http://underscorejs.org
// (c) 2009-2012 Jeremy Ashkenas, DocumentCloud Inc.
// Underscore may be freely distributed under the MIT license.
//
// note: very slightly modified to not use _ namespace
var debounce = function(func, wait, immediate) {
  var timeout, result;
  return function() {
    var context = this, args = arguments;
    var later = function() {
      timeout = null;
      if (!immediate) result = func.apply(context, args);
    };
    var callNow = immediate && !timeout;
    clearTimeout(timeout);
    timeout = setTimeout(later, wait);
    if (callNow) result = func.apply(context, args);
    return result;
  };
};

Queue your async updates

The only reliable solution is forcing updates to run sequentially1. While every application is different, a general solution that covers many common interaction models is simply queueing updates. I use a short script that wraps jQuery.ajax to queue requests while others are in progress. It also wraps completion callbacks to initiate queued requests. It manages one or more request queues that are scoped using a string identifier2.

AjaxQGithub
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// AjaxQ jQuery Plugin
// Copyright (c) 2012 Foliotek Inc.
// MIT License
// https://github.com/Foliotek/ajaxq
//
// note: modified for formatting, removed comments and helper functions
(function($) {
  var queues = {};

  $.ajaxq = function(qname, opts) {
    if (typeof opts === "undefined") {
      opts = qname;
      qname = opts.url;
    }

    var deferred = $.Deferred(),
        promise = deferred.promise();

    promise.success = promise.done;
    promise.error = promise.fail;
    promise.complete = promise.always;

    var clonedOptions = $.extend(true, {}, opts);

    enqueue(function() {
      var jqXHR = $.ajax.apply(window, [clonedOptions]).always(dequeue);

      jqXHR.done(function() { deferred.resolve.apply(this, arguments); });
      jqXHR.fail(function() { deferred.reject.apply(this, arguments); });
    });

    return promise;

    function enqueue(cb) {
      if (!queues[qname]) {
        queues[qname] = [];
        cb();
      } else {
        queues[qname].push(cb);
      }
    }

    function dequeue() {
      var nextCallback = queues[qname].shift();
      if (nextCallback) {
        nextCallback();
      } else {
        delete queues[qname];
      }
    }
  };
})(jQuery);

Summary

Updates are not guaranteed to be processed in the order they are fired. This is something everyone needs to consider when building client interfaces. The above queueing strategy is a simple way to address this problem. There are of course more sophisticated solutions available but simple resource-scoped request queues are suitable for a large class of apps.


  1. You can of course set XHR requests to be synchronous, but this is almost always a terrible idea. There is no sense in blocking the entire application when we need only avoid making concurrent updates to a single resource.

  2. The original script required an additional parameter for the queue name. This is a little verbose and prevents the wrapper for being used as a drop-in replacement for jQuery.ajax. My (extremely minor) fork modifies it to default to the request URI. This improves the interface and gives good granularity for apps with granular resource URIs. If you are working on an app that has very coarse resource URIs (e.g. every update is a POST to /update.php) then you might want to use the optional second queue name parameter or else modify the script to unpack scoping information from the data property being passed.

No-downtime deploys with Unicorn

As mentioned in my last post, a surprising number of Unicorn users do not take advantage of one of its best features, no-downtime deploys. Why? There are some common pitfalls that can make this difficult to setup. I want to help address these issues so that more people can make use of this amazing feature.

What are no-downtime deploys?

There does not have to be a service interruption whenever you:

  • deploy a new version of your app
  • rollback to a previous version of your app
  • upgrade the version of Unicorn you are using
  • upgrade the version of Ruby you are using

And when I say “does not have to” I mean “should not”.

No-downtime deploys need not be an out-of-band upgrade process requiring you to log into production servers and monkey around with live config files using your favourite text editor. Even for a deploy that changes application code, updates Unicorn, and also upgrades to the latest Ruby, I just run $ cap deploy as I would with any other deploy.

Why isn’t everyone using this?

I suspect many people read about no-downtime deploys, try them out, and then give up soon after once something goes wrong. Something often goes wrong, at least initially, because this feature does not (and cannot) work out-of-the box in all environments.

Successfully running no-downtime deploys depends on a number of factors including (but not limited to):

  • how Ruby is installed
  • whether you are using a sandboxing system (e.g. RVM or rbenv)
  • how you are using Bundler
  • how you automate deploys (e.g. with Capistrano, Vlad, Puppet or Chef)
  • what the resource utilization on your production servers looks like (e.g. if you are pegging CPUs or close to the memory limit)

So unfortunately, just using the unicorn.rb config file from Github’s Unicorn blog post will likely fail. You will need to learn a bit about how Unicorn works and then configure it to work with your setup.

TL;DR

This is a fairly long post and there is a fair chance you will get bored reading it.

If you don’t want to figure things out yourself but don’t want to read about all of the ways that things can fail, just skip to the What I do section and adapt my code to your environment.

If you find my explanation verbose and/or confusing you can read SIGNALS along with Tips for using Unicorn with Sandbox installation tools and figure out something that works for you and your production environment.

If you just want a high-level overview of the Unicorn upgrade process read SIGNALS, especially the “Procedure to replace a running unicorn executable” section.

Unicorn’s reexec API

Unicorn facilitates no-downtime deploys by means of a simple reexec API1:

  1. send the original Unicorn master process the USR2 signal
  2. original master forks itself and then execs in the original working directory, with the exact same command and arguments that were used to create the original master process2
  3. at this point, there will be two master processes. if the new master process has spawned workers, both original workers and new workers will be responding to requests
  4. shut down the original workers3
  5. shut down the original master
  6. new workers and master remain running and are responding to requests

Now, this is how a successful reexec goes. If there was a problem with the new code, you can of course shut down the new master and workers instead, leaving the original master and workers to continue serving requests.

The simplest thing that will work

Let’s start off with a clean slate. We will assume the following setup:

  • system-wide install of Ruby and rubygems somewhere general like /usr/bin/ruby
  • Ruby binary has no version-related suffix (e.g. /usr/bin/ruby1.8)
  • Ruby binary isn’t a symlink to a binary with a version-related suffix
  • no Bundler, RVM, rbenv, Capistrano, Vlad, etc.
  • application code is a simple checkout or even just a bunch of files in a fixed directory
  • deploy consists of somehow updating code in place (not making a new directory and changing a symlink), and manually installing more recent versions of gems using rubygems cli directly

This is the simplest configuration imaginable. So simple in fact that I would be surprised if many people actually run something like this in production. However, it is useful as a starting point for our purposes.

If this is what your environment looks like, then the upgrade process described above will likely work without issue. However, almost any change you make will break it. I’m going to walk through some examples of environment modifications and the corresponding configuration changes needed to handle them.

Changing the source location on deploy

This is most commonly encountered in setups where a new directory containing app code is created on each deploy. For example, Capistrano’s default configuration creates a new /path/to/app/releases/123456789 directory on each deploy and then updates the /path/to/app/current symlink to point to it.

Unicorn remembers the exact path you originally deployed from and subsequently attempts to cd to it before each reexec. If you are deploying from a new directory each time you will have to tell Unicorn:

config/unicorn.rb
1
2
app_root = "/path/to/app/current"
working_directory app_root

If you don’t do this, your first few deploys will appear to work (although you will be repeatedly restarting the original version). You will be made painfully aware that something is wrong once Capistrano prunes the oldest release dir and your next deploy fails because Unicorn can no longer cd to it.

You can alternatively do something like:

config/unicorn.rb
1
2
app_root = File.expand_path(File.join(File.dirname(__FILE__), '..'))
working_directory app_root

instead of hard-coding the path so that you can run your setup in your development or staging environment as well.

Changing the source location on deploy AND using Bundler

There is another issue that results from source location changes. As of Bundler 1.0.3 the executable template fully resolves all symlinks when determining the Gemfile path. So when using Capistrano this will be something like /path/to/app/releases/123456789/Gemfile when it should be /path/to/app/current/Gemfile. This means Bundler will always try to load the gem environment from your original deploy instead of the current one. The solution is adding the following to your config:

config/unicorn.rb
1
2
3
before_exec do |server|
  ENV["BUNDLE_GEMFILE"] = "#{app_root}/Gemfile"
end

Changing the location of the Unicorn executable

Installing the Unicorn gem puts a unicorn executable somewhere on your $PATH. Depending on your setup, this could end up in a lot of different places from a system-wide gems installation to something local to your app managed by RVM and/or Bundler. Now, because Unicorn remembers the exact path to this executable, you want it to remember something generic, not something specific.

For example, if it is something like /var/lib/gems/1.8/bin/unicorn or /home/deployuser/.rvm/gems/ruby-1.9.2-pXYZ/bin/unicorn then you are tied to a specific version of Ruby, and if you try to do a reexec deploy that upgrades the version of Ruby you are using, you will end up trying to run the old version.

The low-tech way of solving this is symlinking the real executable to /usr/local/bin/unicorn (or somewhere on your path), and then adding:

config/unicorn.rb
1
Unicorn::HttpServer::START_CTX[0] = "/usr/local/bin/unicorn"

to your unicorn.rb config file. Then, when you want to change the version of Ruby you are deploying to, you point that symlink somewhere else.

I don’t like this method for a few reasons. First, it requires that you remember to do something out of band when you want to deploy to a new ruby version. Second, it couples your Unicorn executable to a particular version of Ruby. If your deploy to a new version of Ruby fails badly for some reason, you must revert the symlink before you can restart the old version and recover.

Copying the executable contents and creating a file in your app at bin/unicorn is slightly better. You can then do:

config/unicorn.rb
1
Unicorn::HttpServer::START_CTX[0] = "#{app_root}/bin/unicorn"

and remember to update this file when you want your app to run on a different version of Ruby. However, this isn’t foolproof–upgrades still might not work due to ENV pollution.

ENV Pollution / using sandboxing tools like RVM or rbenv

Even if your unicorn executable can be updated before each deploy, you are liable to have problems because your new master process inherits the environment that the original master was created in. This leads to all sorts of problems.

Additionally, things like $PATH, $GEM_PATH, $GEM_HOME, $RUBY_VERSION and $RUBYOPT are all going to be what they originally were when you first started the original master process. That means when the reexec looks up which Ruby to use and which version of the gem to use, you are going to get the original versions.

In principle, this can be fixed by adding additional entries to the before_exec block as above. If you are using RVM or rbenv, there are probably a number of other ENV variables you have to set as well. For me, figuring out all of the ENV variables RVM or rbenv is setting and resetting them on deploy is too much effort and too error prone.

What I do

I create a special executable that bootstraps the environment it needs and then execs the unicorn executable in that environment. This executable gets checked into my application source so that I do not have to modify anything on my production server before a deploy (or rollback). It has the additional benefit of allowing the application to select the specific version of Ruby it needs to run. Here is what I use with RVM4:

bin/unicorn
1
2
3
4
5
#!/bin/bash

source $HOME/.rvm/scripts/rvm
rvm use 1.9.3-p194 &> /dev/null
exec bundle exec unicorn "$@"

This file gets committed to [app root]/bin/unicorn. It is also necessary to add its path to the unicorn.rb config file because the exec at the end of the executable wrapper causes Unicorn to remember the fully qualified path to the actual Unicorn executable5.

config/unicorn.rb
1
Unicorn::HttpServer::START_CTX[0] = "#{app_root}/bin/unicorn"

Troubleshooting

While you are setting up and testing your no-downtime Unicorn setup, you will want to be sure that everything you expect to change is actually changing. Two useful tools for this are adding logging statements into the config and reloading it, and using lsof -p.

You can, for example, modify your currently loaded config file to print Unicorn::HttpServer::START_CTX[0] or ENV to STDERR, and then reload it by sending the HUP signal to the unicorn master process. This will cause your logging info to be written (by default) to log/unicorn.stderr.log. Having a look into ENV is especially useful. There are often variables set that you aren’t aware of or else had expected to be cleared or updated but were not. If you did not setup your deploy properly and still want to do a no-downtime restart, you can employ this config reload strategy to make modifications to both ENV and internal Unicorn variables.

lsof -p [unicorn master PID] will show you which version of Ruby and Unicorn you are running, the current working directory of the master process, and additionally all of the shared libraries linked to your running process.

Upgrade strategies

There are two main update strategies I have used:

Start new workers all at once

Start new workers all at oncesource
1
2
3
4
5
6
7
8
9
10
11
# adapted from http://codelevy.com/2010/02/09/getting-started-with-unicorn
before_fork do |server, worker|
  old_pid = app_root + '/log/unicorn.pid.oldbin'
  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
      # someone else did our job for us
    end
  end
end

As soon as the first new worker is spun up, it will send old master QUIT which causes it to wait for all old workers to finish processing in progress requests and then shut down once everything is complete. While this is happening, the newly spawned workers can respond to incoming requests (old worker cannot accept new requests). So while the old requests are winding down, you will actually have both old and new version of the application responding to requests. This works well if you have some memory headroom (ideally you should be able to run two full sets of workers). However, if that’s not the case, a more incremetal approach might be more appropriate.

Replace workers one at a time:

Replace workers one at a timesource
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# adapted from http://unicorn.bogomips.org/examples/unicorn.conf.rb
before_fork do |server, worker|
  old_pid = "#{server.config[:pid]}.oldbin"
  if old_pid != server.pid
    begin
      sig = (worker.nr + 1) >= server.worker_processes ? :QUIT : :TTOU
      Process.kill(sig, File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
    end
  end

  # Throttle the master from forking too quickly by sleeping.
  sleep 1
end

Unicorn supports signals for increasing and decreasing the number of workers handling requests. You can use these signals to have each new worker spawned ask the old master to decrement the existing worker count by one. Finally, once all of the new workers have been spawned you can ask the old master to shut down.

Summary

I hope this has helped to clarify some of the details involved in setting up a robust no-downtime Unicorn configuration. It is certainly not as simple as a lot of sources suggest. However, I think it is well worth the effort for the stability and flexibility you end up with. I like to deploy my apps frequently and as such do not want to worry about causing signfificant slowdowns or unnecessary downtime.


  1. this is actually the same reexec API Nginx provides (for the most part)

  2. exec is a *nix command that evaluates a script and replaces the current process with a new one.

  3. see Unicorn worker upgrade strategies

  4. I don’t have equivalent code for rbenv. I spent some time trying to find a way to reset changes rbenv makes to the original process environment but was unsuccessful.

  5. that is, the unicorn in exec bundle exec unicorn "$@" will resolve to something like /path/to/app/shared/bundle/ruby/1.9.1/bin/unicorn if you are using Capistrano and Bundler with the default options.

Use Unicorn

Unicorn is my favourite application server for Ruby web apps.

I have been running Unicorn in production since October 2009 and it has been faster, more reliable, and more predictable than any of the other options I’ve used.

Unicorn has been around for a while and hasn’t changed a whole lot, so it might seem like this post comes almost three years (unfashionably) late.1 However, there are a few reasons I felt it was relevant:

  • I still hear people say they are “thinking of checking out Unicorn”.

  • I frequently come across client projects and projects friends are working on which use Passenger, Thin, or something else for no reason in particular.2

  • I know people who actually use Unicorn but do not make use of great features like no-downtime restarts and seamless upgrades.

I want to give these people another nudge towards considering Unicorn. I wouldn’t bother if Unicorn was merely faster or easier to setup. Rather, it offers features that really give it an edge for building reliable web services that are easy to maintain. Unicorn is an absolute pleasure to run in production.

Why Unicorn is worth considering

This isn’t the first time someone has urged you to try Unicorn. You know it’s out there and is probably as great as everyone says it is, but you have been using your current deployment stack for a while and it works well enough for you. Why bother investing in something new?

Avoiding downtime between deploys

Every time (most) Heroku apps cutover to a new release there is a lag. I hate this. Most of the time the cutover is rather brief, but for some larger apps it can be quite lengthy. Passenger apparently has a facility to avoid this3 but I have never been able to get it to work reliably and have seen people resort to some pretty terrific hacks for large, slow-loading apps.

If this has ever bothered you, Unicorn offers a simple and reliable solution: start a brand new copy of your app alongside the old one and seamlessly switch over to it only once up and running.

Worker management

You are running N application server processes, how are they managed? You probably use HAProxy or Nginx to load balance between the workers, but what about lifecycle management? Do you monitor your application processes independently? How do you kill and restart workers that are misbehaving? When you are pushing new code, how are old workers shut down? Are they allowed to finish processing their current requests?4

Unicorn has two features which really make it shine for worker management. First, it uses a master-worker process setup that is extremely reliable. Without going into detail, the master process maintains a heartbeat connection with each worker allowing workers to be very quickly killed and restarted when things go awry. Second, Unicorn exposes a signal handling system that provides very fine-grained control over worker management, especially around upgrades.

Maintainability

Even if you don’t have a long lag during deploys, how long does it take to upgrade your server?5 How about Ruby? What happens if a server or Ruby upgrade doesn’t go as planned? Is it quick and safe to switch back to your previous setup?

Due to the way that Unicorn’s upgrade process works, you can upgrade your application code, Unicorn itself, and even Ruby during a deploy. What’s more, if you have things setup properly, a deploy that swaps out all of those components does not have to be any different than a deploy that only upgrades code. I literally run $ cap deploy regardless of what is getting upgraded during a deploy.6

I’m sold! Where do I start?

If you want to give Unicorn a spin, Github’s post is probably the best place to start. Unicorn’s website is http://unicorn.bogomips.org and has a lot of great information, especially on configuring Unicorn in non-standard setups (I recommend reading most of the ALLCAPS files in the sidebar).

Disclaimer

Depending on your production setup, Unicorn probably won’t do all of the wonderful things mentioned here right out of the box. You will have to configure it properly and there are a few key aspects of Unicorn upgrades that can be tricky. I have tripped on these myself and know many others who have also. I even know people who run Unicorn without using its upgrade facility because they fought with it for a while but could not get it to work properly. However, if you are prepared to invest some time, it is not all that bad and will pay huge dividends once you figure it out. I am actually in the process of writing a separate post focussing on the Unicorn upgrade process that will be published shortly.

  1. Github made a great blog post about Unicorn in late 2009 that caused a lot of people (myself included) to check out Unicorn. The post contains a number of reasons why Github switched and also a lot more technical detail on what Unicorn is about and how to get going. If you have not yet read it, your time is probably better spent there than here.

  2. Now, there are certainly good reasons to use servers other than Unicorn. However, the projects I am referring to were not making use of features unavailable in Unicorn while at the same time suffering from problems that Unicorn nicely solves.

  3. passenger_pre_start directive

  4. Again, Passenger provides facilities for worker management. However I have experienced a lot of issues with Passenger worker processes running out of control (c.f. Debugging frozen applications in the Passenger docs), and new release cutovers that were less than smooth and seamless.

  5. If you are using Passenger with Nginx, upgrading Passenger requires you to recompile Nginx. It is probably safe to assume that most Passenger installs get upgraded as often as new servers are provisioned.

  6. I feel stupid even saying this, but don’t try doing this for the first time on a production server. It will most likely fail in a horrific fashion. First read up on Unicorn’s site and mailing list and be sure to test your upgrade process in an environment that is as close as possible to what you are running in production.