I am a nerd. This and my kids is what keeps me up at night. This post is about how I find a way to get to sleep without over-medicating.
Airbrake
This probably isn't that enlightening, but when an error happens in your app, you should probably know about it. Airbrake takes care of this for you in a smart way.
Still Alive
There are plenty of services out there that ping your site but what does that prove? Is your site doing what it's supposed to be doing? Still Alive is like Cucumber for your live app. Here's what we have running every minute:
Still not satisfied? This is what we have running every 15 minutes:
Notice this is actually shares stuff which sends emails, but are we sure our emails are actually being delivered?
Email Monitoring Job
Every 15 minutes we run a resque job that checks our test email account to see if it's received an email in the last 30 minutes.
If no recent email is found, it raises an error. That error gets picked up by airbrake since we have this within our resque initializer:
How did those emails get there you ask? Remember our Still Alive script sends email every 15 minutes during it's test run.
I'm getting alerts, what's going on?
I'm not going to pretend that 99% of you reading this post don't know about newrelic, but this is our go to tool to diagnose any kind of performance related issues. It also pings your website every minute which is a nice addition to stillalive. After all, if two alerting systems are reporting errors, it's a real sign that there is definitely an issue.
You have invalid ActiveRecord objects lurking in your database
Yes I am trying to scare you into reading this, but let's face it, you probably do. There are a bunch of ways it can happen.
- That cowboy sitting next to you saved a bunch of objects with validation turned off
- Code changes - new validations that haven't always been there
- Faulty logic in your code that let's invalid stuff through
So you be saying, is this a big deal? The answer in many cases is yes. Suppose you add some new code that validates a new presence of a new field on your user object but you forget to update all of your existing user profiles. If any page within your app updates a user object, your customers may be in for a 500 page.
This is where active_sanity come in which checks all of the objects in your database for validity. For each object that is invalid, it inserts an entry into an InvalidRecord table. Again, we have a resque job that runs active sanity once a night:
You're probably noticing a pattern here. If there are any invalid records, an error is raised, which fires off an airbrake alert. This alert has a link to the rails_admin page which shows a list of all invalid records:
Conclusion
It's important for your sanity and general well being to be the first one to know that your app isn't functioning properly. At View the Space, we are constantly trying to think of new ways to improve upon our monitoring. Let us know what you do? Drop us a comment.