Tech Corner: Auto recovery with MongoDB replica sets

Earlier this year, we rolled out MongoDB for a “page views” feature. This project was in response to an itch we had, and MongoDB scratched that itch quite well. Still, at the time, we were not 100% dedicated to MongoDB. There were no hard plans to utilize it again, and we were running one lonely server without any backups and little monitoring.

Fast forward a few months, when the “recommendations” project, which had been kicking around the office for over a year, landed at my doorstep. I had known about MongoDB for quite some time and had even attended the MongoSF conference in April. MongoDB seemed like it would be a lovely solution for this project’s requirements: fast updates, blazingly fast queries, flexibility and room for growth.

Our first step was building out our Eventbrite recommendation system. Based on your past event attendance, we build a social graph for you, highlighting friends you’ve attended an event with. The end result of this social graph is that we can now provide suggestions based on what your friends are attending. All of this work was built on top of MongoDB. It worked very well and gave us quite a bit of confidence in the system.

During the development of our recommendation engine, we knew that eventually we’d be integrating other social networks to further enhance recommendations. The next step in this project was integrating Facebook connections—and we can now show you the events your Facebook friends are attending. Thanks to the schema-free nature of MongoDB, it was fairly easy to move our data around a bit to get the right structure to keep track of Facebook connections, and also have the room to add more social networks as needed.

Now that we were tracking more than just page view counts, we needed to treat our lonely MongoDB instance as a first-class citizen. After some experimenting with Replica Sets, it was pretty clear this was the way to go, providing us with data redundancy, automatic failover and automatic recovery. Late one Sunday evening, we added in one more MongoDB host along with an arbiter. We made this change on our production system with about 30 seconds of down time on our original MongoDB host.

The payoff came just last week, when one of our Nagios alarms went off, alerting us that our master MongoDB host was down. I saw this message from our Ops team when I checked my email in the morning:

> “I opened a ticket with AWS. Our master mongo failed over to the secondary so we are good.”

Sweet! The entire AWS host wedged and we ended up losing everything. The other instance in our replica set automatically took control as master, and continued to serve requests and accept connections without a hiccup. Our site continued to tick along and recommendations continued to be served up to users. Over the course of that day, we brought a new MongoDB host online and without a hitch, it connected to the replica set, synced all of the live data from the new master and jumped back into the replica pool as the secondary.

This was a huge win. Needless to say, we’re even bigger fans of Mongo after this. If you’d like to learn more about MongoDB or how we’re using it at Eventbrite, check out these events:

http://mongosv.eventbrite.com/

http://www.meetup.com/San-Francisco-MongoDB-User-Group/calendar/13899860/