Getting more reliability out of RabbitMQ
The Scenario
When we had our first beta user give us a heavy load, we ran into some issues. In this instance, we were using RabbitMQ to communicate with our text messaging service. In our testing, we never had any problem with missed text messages, but we started hearing reports of text messages never getting delivered or received.
Now, I’ve seen blog posts saying that can crank a million messages a second out of RabbitMQ. I know you need to be a pro to do stuff like that, but it never occured to me that our level of usage thus far (a couple hundred spread out over a minute or two) would ever cause a problem. But, after not finding any issues in the surrounding logic, I decided to test RabbitMQ.
The setup
The consumer:
|
|
The producer:
|
|
Notice the last few lines. We’re sending 100 messages in rapid succession. Let’s see what our producer shows.
All received. Very good. What happens if we up the load a bit: Say… let’s try 300 messages by upping the loop boundary.
We have 5 messages that didn’t go through! And since only 295 were sent, we see that the problem is on the sending side.
What’s happeing?
Each time I attempt to send, I’m opening up a connection. That leaves many opportunties for connection failures.
What if I just use one connection
With a little reorganization we can keep the connection and channel open.
|
|
All the message made it! But I don’t like this pattern. My messaging is sporatic. Do I want to deal with the overhead of managing the connection? Maybe? Probably? But not now.
What if I just resend on a failed connection
Simple, I like it! But we need to be able to see if our message was successfully sent off. To do that we use ConfirmChannel
It basically means that instead of createChannel
, we use createConfirmChannel
. The server will then acknowledge our message when we issue the publish
command. If it doesn’t we can schedule a resend. There’s some extra code in there to create a little space and potentially give up, but so far everything works well.
|
|
Here’s the proof.
Conclusion
I’ve barely scratched the surface with how make RabbitMQ rock solid, but since I have implemented these changes, we have had no more complaints about messages getting through.
I would like to learn more about the cost of opening so many connections at once. It may be worth it to better manage my messages as our load increases.
Another next step is that, instead of giving up, we can store the message in a database for re-sending at a later time.