Not All Jobs are Created Equal
In the real world jobs fail all the time and, depending on which job it is, this could be an emergency or something we'll take a look at when there is time (never). The problem with Resque is that it treats all failures the same. All failures end up on one sometimes enormous queue whether the job is of type EndOfTheWorldIfThisDoesntRunJob or ThisProbablyShouldRunButWhoReallyCaresJob. Let's take a look at a typical resque-web console:
Ok, deep breath... we have some failures here. Let's take a look at what these are. Just click on the failed link:
Looking at the first few few pages of failures, we make an educated guess that it's just the ThisProbablyShouldRunButWhoReallyCaresJob. We could just hope for the best and assume there's nothing important on that failure queue but how can we be sure that an EndOfTheWorldIfThisDoesntRunJob isn't sandwiched in there somewhere? Ok, so what options do we have:
- Manually page through each failure results page 20 at a time.
- Write a custom script that counts the number of failures by job type.
- Pretend this problem doesn't exist and go back to what we were doing before.
Once installed, resque-cleaner adds a cleaner link to the header of resque-web:
When we follow the "Cleaner" link, we can see all of our failed jobs broken down by type. Not a moment too soon either.. it looks like there's an EndOfTheWorldIfThisDoesntRunJob sitting on a failure queue.
There are plenty of other cool features that resque-cleaner brings to the table and they do a great job of explaining it on their github README.
Let's Make it Legal
Resque is a great tool for job queueing but after using it for a while, it's apparent that something is missing. The problem is that when things go wrong, there is only one failure queue. Failed jobs that are not important can hide those failures that must be dealt with immediately. When we add resque-cleaner into the picture, this problem goes away. I cannot picture anyone not wanting resque-cleaner out of box with resque. Let's just merge this functionality in with resque so everyone knows when the EndOfTheWorldIfThisDoesntRunJob has failed on their project.
This comment has been removed by the author.
ReplyDeletetotally agree, for a project with a non-trivial set of jobs this feature is a lifesaver
ReplyDeleteI am glad to hear that you love it :)
ReplyDeleteThanks!