New Relic RPM announced that they’ve started to support Sinatra and Rack a few days ago (article). It’s also come up with a better background job monitoring. Good news. So I tried monitoring jobs being handled on Resque which we’ve started using recently for managing background jobs.
Unfortunately RPM 2.10.5 couldn’t monitor jobs of Resque. Resque forks to process a job. RPM agent instance can’t chase the things happend in forked processes which terminates in a short time.
However I succeeded to monitor Resque jobs with RPM with a little hack. I want to share it here, though, it is just a hack. The information might be obsolete soon. So I explain my process I took to find it too. In that way, I hope that it gives you a help when this hack is no longer applicable.
Step 1. Follow the instruction of RPM
Firstly I followed RPM’s instruction (API document) and implemented a sample job like this.
class HelloWorldJob
include NewRelic::Agent::Instrumentation::ControllerInstrumentation
@queue = :low
def self.perform
HelloWorldJob.new.hello_world
end
def hello_world
RAILS_DEFAULT_LOGGER.info "#{Time.now}: Hello World!"
RAILS_DEFAULT_LOGGER.flush
sleep(3)
end
add_transaction_tracer :hello_world, :name=>'hello_world'
end
But even if the job is processed on Resque, it is not shown as a transaction in management screen of RPM.
Step 2. Google it
Secondly I googled it. I felt that researching myself is a waste of time if someone has found the way already(This idea is always wrong though). But Google didn’t give me the answer this time. I thank Google for that because it gives me an oppotunity to understand how both Resque and Newrelic RPM work deeply.
Step 3. Set up environment to debug
For Resque: start resque worker with very vobose mode.
e.g.
VVERBOSE=true QUEUE=* rake environment resque:work
For Newrelic: set log level with debug in newrelic.yml
log_level: debug
Then I tested the job with script/console. You can run the job like this:
% script/console > Resque.enqueue HelloWorldJob => Enqueues a job and it will be executed on Resque worker > HelloWorldJob.process => Executes a job directly without queueing
You should be able to confirm that the job executed directly on console is monitored on RPM and shown in its management screen. But the one proccessed by Resque never comes to the screen.
Step 4. Try to know what happens in Resque
I proceeded the investigation with the following cycle.
1. Run job on Resque and script/console
2. Compare the log of Resque to the one of script/console
3. Make an assumption and put some debug codes in Resque or Newrelic RPM (Thanks god those are opensource!)
4. Restart Resque worker and script/console
5. Back to 1
Sorry but there is no magic and shortcut. But after I spent about 15 minutes or so with the cycle, I found(and remembered at the same time) that Resque uses forking for processing a job and I was convinced that it would cause the issue. So I changed the code on Reque not to fork the process as an experment.
in lib/resque/worker.rb
def fork
+ return false
Then ran the job via Resque. Yay, the transaction information was shown in RPM.
Step 5. Then a little hack
Of course, you can’t turn off the forking of Resque. That is something that makes Resque awesome. So I reveted the change I made on worker.rb and focused on finding a way to monitor forked process with RPM.
What I tried at first is to call NewRelic::Agent.shutdown which sends the all queued data after the job is done. RPM sends the log every 60 seconds but the job finishes about in 3 seconds. So that the job has to make sure to send the data before ended.
class HelloWorldJob
include NewRelic::Agent::Instrumentation::ControllerInstrumentation
@queue = :low
def self.perform
HelloWorldJob.new.hello_world
NewRelic::Agent.shutdown
end
def hello_world
RAILS_DEFAULT_LOGGER.info "#{Time.now}: Hello World!"
RAILS_DEFAULT_LOGGER.flush
sleep(3)
end
add_transaction_tracer :hello_world, :name=>'hello_world'
end
On console/script, it worked in a way I expected, though, it didn’t on Resque. Dammn it. Next I looked into RPM’s codebase with the cycle I showed on Step 4.
Then I found that RPM resets the connection information if the current process id is different from the one which RPM got when it started. Then, on that situation, it tries to start worker thread which reports the information to RPM server every 60 seconds. But it takes seconds to launch and the job finishes before the thread finishes communicating server.
Anyway you don’t want to create a thread for RPM everytime Resque job is created. It just wastes your machines resource and makes a job process slow in terms of background job. Alternatively it would be nice if the forked process use the parent connection information simply. I monkey-patched onto RPM to behave in that way. This patch shouldn’t affect monitoring for other applications.
in lib/new_relic/agent/agent.rb
def ensure_worker_thread_started
return unless control.agent_enabled? && control.monitor_mode? && !@invalid_license
- if !running?
+ if !running? && !control['resque']
Then changed the codebase of the job like this.
class HelloWorldJob
include NewRelic::Agent::Instrumentation::ControllerInstrumentation
@queue = :low
def self.perform
NewRelic::Control.instance['resque'] = true
HelloWorldJob.new.hello_world
NewRelic::Agent.shutdown
end
def hello_world
RAILS_DEFAULT_LOGGER.info "#{Time.now}: Hello World!"
RAILS_DEFAULT_LOGGER.flush
sleep(3)
end
add_transaction_tracer :hello_world, :name=>'HelloWorldJob#hello_world'
end
Wheeee, now the job is shown and tracked as a transaction on RPM. You can track the exceptions occurred in Resque jobs as well as slow transactions with RPM.
Step 6. Refactor
To monitor all jobs in same way, I refactor HelloWorldJob with its super class. I hope that RPM is going to support forked background process like Resque officially in a nice way and the hack is no longer needed soon. So it doesn’t make sence to write this ugly hack on every single job. Instead of that, I put all newrelic related code onto a super class.
So here is a final codebase.
class BaseJob
include NewRelic::Agent::Instrumentation::ControllerInstrumentation
# This is called from Resque.
# To be wrapped with the common operation, please describe job in the instance's perform method.
def self.perform(*args)
NewRelic::Control.instance['resque'] = true
self.new.perform_wrapper(*args)
ensure
NewRelic::Agent.shutdown
end
# Sends a hint to our monitoring system, newrelic rpm
def add_custom_parameters(key, value)
NewRelic::Agent.add_custom_parameters(key => value.to_s)
end
# Calls :perform with monitoring on newrelic rpm
def perform_wrapper(*args)
# :task seems available only for charged plan. When you test this with a free plan, you have to get rid of it.
# perform_action_with_newrelic_trace(:name=>'perform') do
perform_action_with_newrelic_trace(:name=>'perform', :category => :task) do
begin
perform(*args)
ensure
logger.flush if logger && logger.respond_to?(:flush)
end
end
end
# Override this method instead of the class method.
def perform(*args)
end
def logger
RAILS_DEFAULT_LOGGER
end
end
# Test purpose job. It output the information to log.
class HelloWorldJob < BaseJob
@queue = :low
def perform(who='World')
# This sends the information to NewRelic RPM
add_custom_parameters :whom, who
logger "#{Time.now}: Hello #{who}!"
sleep(3)
end
end
Epilogue
I have to say lastly that reading Resque’s codebase was absolutely fun. It is simple and neat. You might feel that you can write the codebase yourself easily. But there are the good design decisions behind that: which are using only Redis for queue backend and forking for processing a job. Cool.


recent comments