Setting up Resque

I set up Resque for our startup over the last few days, to put some log analysis stuff in the background. It was harder than I thought it should be, largely due to a lack of documentation for compatibility with common tools like Capistrano and daemon kickoff tools. Here’s how I did it.

Step 1: Install Redis.

This part’s easy on OS X:

brew install redis

I already had Redis running for some other tasks on the server, so I just used the existing install for Resque. I’ll get to that in a minute.

Step 2: Add Resque to your Gemfile.

You are using Bundler, right? If not, you should be. I also decided to use Resque Retry, a plugin for Resque that allows it to transparently retry failed jobs, with advanced retry strategies like exponential back off.

gem "resque"
gem "resque-retry"

Step 3: Drop in the Resque initializer.

For us, this was really simple, since I already had a Redis initializer. But the documentation for initializing Resque is pretty simple. Just assign Resque.redis to either a config hash, or to an existing Redis object. Ours looks like this.

require 'resque'
Resque.redis = REDIS

I also implemented the built-in support for Hoptoad, which necessitates an extra require and a small configuration block, but it was no big deal.

Step 4: Add Resque rake tasks.

I just dropped resque.rake into lib/tasks, and this is all there is to it:

require 'resque/tasks'
require 'resque_scheduler/tasks'

This will give you all the rake tasks you need to run Resque items.

Step 5: Write a Resque worker.

This part wasn’t too bad, either:

require 'resque-retry'

class ResqueTestWorker
  extend Resque::Plugins::ExponentialBackoff
  @queue = :resque_test_queue

  def self.perform(*args)
    MONGO['test_queue'].insert(args)
  end
end

All I’re doing in ours (since this is a proof-of-concept experiment) is taking some arguments to a function and throwing them into a Mongo collection. Later, I can run some queries on the mongo collection and make sure it’s what I expect.

Step 6: Configure backend server.

Resque ships with a nice little Sinatra app to keep track of workers, jobs, and failures. It also has some additions from Resque Retry. As far as I know, this server uses the Redis defaults when launched, and that’s not going to work on staging and production environments. So, I have to do a little configuration. I put together a Rackup file to configure a simple thin server to run the Resque backend on. Here’s what it looks like:

#!/usr/bin/env ruby
require 'logger'
require 'yaml'
require 'resque/server'
require 'resque-retry'
require 'resque-retry/server'

use Rack::ShoIxceptions

redis_config = YAML.load_file(File.expand_path(File.dirname(__FILE__) + '/redis.yml'))[ENV['RESQUE_THIN_ENV']]
rc = {}; redis_config.each{|k,v| rc[k.to_sym] = v } # symbolize keys from yaml
Resque.redis = Redis.new(rc)

# Set the AUTH env variable to your basic auth password to protect Resque.
AUTH_PASSWORD = "example_password"
if AUTH_PASSWORD
  Resque::Server.use Rack::Auth::Basic do |username, password|
    password == AUTH_PASSWORD
  end
end

run Resque::Server.new

As you can see, it loads our existing Redis.yml without having access to the Rails initializer for Redis. I also include all the bells and whistles from Retry, and at the bottom run the Rack app included in all that.

Step 7: Process management for deploy servers

Here was one I didn’t expect – there’s very little process management in Resque! It includes a Rake task to start a single worker or multiple workers, but I didn’t find them useful outside the development environment. When deployed with Capistrano, these tasks didn’t survive the single-command non-login shell that Cap uses for everything. Even with nohup and &, the rake process was still killed as soon as Capistrano logged out.

I found this post from Thomas Mango on how to start and restart with God, but I’re not using God right now (that’s a project for another day). If you use Monit there’s also the example Monit script, but since our Monit scripts are not currently managed via chef or git, I wasn’t comfortable messing with those just yet.

I have some legacy workers from DelayedJob that uses Daemon-Spawn, a pretty simple library for running things in the background. I also glanced at Daemon-Controller, but since I had examples in the codebase for Daemon-Spawn I decided it wasn’t worth switching right now.

Daemon-Spawn has PID-file and logging handled, and can also start multiple instances of the same process, so it does everything I needed it to right now. I needed two daemons to run – the Resque Scheduler to handle retrying failed jobs, and one to run a couple workers. Since I’re just implementing a proof-of-concept, I left each worker to handle all available queues, but you could conceivably create a daemon for each different queue you want to handle, and specify how many workers each queue should get.

Anyway, here’s what my scheduler daemon looked like.

#!/usr/bin/env ruby

ENV['RAILS_ENV'] ||= 'development'
require File.expand_path('../../config/boot',  __FILE__)
require Rails.root.join("config", "environment")

class ResqueSchedulerDaemon < DaemonSpawn::Base
  def start(args)
    Resque::Scheduler.verbose = true
    Resque::Scheduler.run
  end

  def stop
    Resque::Scheduler.shutdown
  end
end

ResqueSchedulerDaemon.spawn!({
  :log_file => File.join(RAILS_ROOT, "log", "resque_scheduler.log"),
  :pid_file => File.join(RAILS_ROOT, 'tmp', 'pids', 'resque_scheduler.pid'),
  :sync_log => true,
  :working_dir => RAILS_ROOT,
  :singleton => true
})

And here’s the worker daemon:

#!/usr/bin/env ruby

ENV['RAILS_ENV'] ||= 'development'
require File.expand_path('../../config/boot',  __FILE__)
require Rails.root.join("config", "environment")

class ResqueWorkerDaemon < DaemonSpawn::Base
  def start(args)
    @worker = Resque::Worker.new('*') # Specify which queues this worker will process
    @worker.verbose = 1 # Logging - can also set vverbose for 'very verbose'
    @worker.work
  end

  def stop
    @worker.try(:shutdown)
  end
end

ResqueWorkerDaemon.spawn!({
  :processes => 5,
  :log_file => File.join(RAILS_ROOT, "log", "resque_worker.log"),
  :pid_file => File.join(RAILS_ROOT, 'tmp', 'pids', 'resque_worker.pid'),
  :sync_log => true,
  :working_dir => RAILS_ROOT,
  :singleton => true
})

At first I tried doing a bunch of command-line fu using system, but that was wrong-headed. Looking into the code for the Rake tasks that resque-scheduler and resque provide, starting these background processes is blindingly simple.

Step 8: Add daemons and server to Capistrano

With those daemons crafted, it’s just a matter of adding them to deploy.rb (or another file that Capistrano loads, as per your preference). I added a Resque namespace in Capistrano and put each of the tasks in after ‘deploy:’ calls, so I had the following setup.

after "deploy:stop",        "resque:stop"
after "deploy:start",       "resque:start"
after "deploy:restart",     "resque:restart"

An then the associated tasks:

namespace :resque do
  def rails_env
    fetch(:rails_env, false) ? "RAILS_ENV=#{fetch(:rails_env)}" : ''
  end

  desc "Start resque scheduler, workers"
  task :start, :roles => :app, :only => { :jobs => true } do
    run "cd #{current_path};#{rails_env} daemons/resque_scheduler start"
    run "cd #{current_path};#{rails_env} daemons/resque_worker start"
    run "cd #{current_path};RESQUE_THIN_ENV=#{stage} bundle exec thin -d -P /tmp/thin.pid -p 9292 -R config/resque_config.ru start; true"
  end

  # test commit for nohup
  desc "Stop resque workers"
  task :stop, :roles => :app, :only => { :jobs => true } do
    run "cd #{current_path};#{rails_env} daemons/resque_scheduler stop"
    run "cd #{current_path};#{rails_env} daemons/resque_worker stop"
    run "cd #{current_path};RESQUE_THIN_ENV=#{stage} bundle exec thin -d -P /tmp/thin.pid -p 9292 -R config/resque_config.ru stop; true"
  end

  desc "Restart resque workers"
  task :restart, :roles => :app, :only => { :jobs => true } do
    run "cd #{current_path};#{rails_env} daemons/resque_scheduler restart"
    run "cd #{current_path};#{rails_env} daemons/resque_worker restart"
    [:stop, :start].each { |cmd| run "cd #{current_path};RESQUE_THIN_ENV=#{stage} bundle exec thin -d -P /tmp/thin.pid -p 9001 -R config/resque_config.ru #{cmd}; true"}
  end
end

Since thin manages its own daemon, you have to be a bit more explicit with what to do with it. Specifying the PID file to be outside the Rails code path is necessary so that you’re not trying to determine whether thin’s PID file is in the current or previous release. After all, there should be only one.

Also, I have a Ib server running on port 80 already, and since the Resque backend is only for developers to tinker with and has HTTPauth on it, I figured it’s fine to let it reside on another port. You can always proxy it with Nginx or a CNAME or something.

I had to call stop / start on thin, since the restart action was timing out waiting for the PID file to get deleted. I think this is an issue with Resque-web disconnecting from Resque after shutdown is complete, and only erasing the PID file then. Probably not a bug in thin itself.

Finally, if I don’t have a pretty backend for Resque, it’s not necessary to roll back the deployment – that can be noted and dealt with. So, instead of taking the output of thin’s process, I just return true in any case.

Step 9: Deploy!

You may have to call cap production resque:start before you can do your normal deploy cycle – our deploy path calls the restart actions by default, and these actions got quite irritated if there weren’t any processes running to begin with. Just calling resque:start from my local machine fixed that.

Good luck! If you have any questions, improvements, or trouble, please post in the comments.

Took a long time to get all that together. Discovering the myriad of ways NOT to do all this was quite an explorative process. And there’s still lots of improvements to make – managing thin with Capistrano is hackey at best. Also, I really should get God or Monit in on the action, since I want to seriously ensure that workers are running all the time. But this does get our Resque proof out, and I can compare execution times to our existing synchronous methods.

a t e v a n s . c o m