Getting more reliability out of RabbitMQ

June 20 2016

The Scenario

When we had our first beta user give us a heavy load, we ran into some issues. In this instance, we were using RabbitMQ to communicate with our text messaging service. In our testing, we never had any problem with missed text messages, but we started hearing reports of text messages never getting delivered or received.

Now, I’ve seen blog posts saying that can crank a million messages a second out of RabbitMQ. I know you need to be a pro to do stuff like that, but it never occured to me that our level of usage thus far (a couple hundred spread out over a minute or two) would ever cause a problem. But, after not finding any issues in the surrounding logic, I decided to test RabbitMQ.

The setup

The consumer:

var amqp = require('amqplib');
var count = 0;
amqp.connect('amqp://localhost').then(function(conn) {
  process.once('SIGINT', function() { conn.close(); });
  return conn.createChannel().then(function(channel) {
    var exchange = 'triggers';
    var routingKey = 'foo.target';
    channel.assertExchange(exchange, 'topic').then(function() {
      channel.assertQueue('', {exlusive: true}).then(function(qok) {
        var queue = qok.queue;
        console.log('consuming from ' + queue);
        channel.bindQueue(queue, exchange, routingKey).then(function() {
          channel.consume(queue, function(message) {
            count++;
            console.log(" [" + count + "] Received '%s'", message.content.toString());
            channel.ack(message);
          }, {noAck: false}).then(function() {
            console.log(' [*] Waiting for messages. To exit press CTRL+C');
          });
        });
      });
    });
  });
}).then(null, console.warn);

The producer:

var amqp = require('amqplib');
var send = function(message, key) {
  message = JSON.stringify(message);
  return amqp.connect('amqp://localhost').then(function(conn) {
      console.log('connected to rabbit');
      return conn.createChannel().then(function(channel) {
        var exchange = 'triggers';
        channel.assertExchange(exchange, 'topic').then(function() {
          channel.publish(exchange, key, new Buffer(message));
          console.log(" [x] Sent %s:'%s'", key, message);
          return channel.close().then(function() {
            return conn.close();
          });
        });
      });
  });
};
for (var i = 0; i < 100; i++) {
  send('A message', 'foo.target');
}

Notice the last few lines. We’re sending 100 messages in rapid succession. Let’s see what our producer shows.

All received

All received. Very good. What happens if we up the load a bit: Say… let’s try 300 messages by upping the loop boundary.

5 dropped!

We have 5 messages that didn’t go through! And since only 295 were sent, we see that the problem is on the sending side.

What’s happeing?

Each time I attempt to send, I’m opening up a connection. That leaves many opportunties for connection failures.

What if I just use one connection

With a little reorganization we can keep the connection and channel open.

var amqp = require('amqplib');
var count = 0;
var preSend = function() {
  return new Promise(function(resolve, reject) {
    amqp.connect('amqp://localhost').then(function(conn) {
      console.log('connected to rabbit');
      conn.createChannel().then(function(channel) {
        var exchange = 'triggers';
        channel.assertExchange(exchange, 'topic').then(function() {
          resolve({
            channel: channel,
            conn: conn
          });
        });
      });
    });
  });
};
var send = function(channel, message, key) {
  message = JSON.stringify(message);
  count++;
  var exchange = 'triggers';
  channel.publish(exchange, key, new Buffer(message));
  console.log(" [" + count + "] Sent %s:'%s'", key, message);
};
preSend().then(function(cc) {
  for (var i = 0; i < 300; i++) {
    send(cc.channel, 'hi there', 'foo.target');
  }
  cc.channel.close().then(function() {
    cc.conn.close().then(function() {
      console.log('TX complete');
    });
  });
});

All the message made it! But I don’t like this pattern. My messaging is sporatic. Do I want to deal with the overhead of managing the connection? Maybe? Probably? But not now.

What if I just resend on a failed connection

Simple, I like it! But we need to be able to see if our message was successfully sent off. To do that we use ConfirmChannel

It basically means that instead of createChannel, we use createConfirmChannel. The server will then acknowledge our message when we issue the publish command. If it doesn’t we can schedule a resend. There’s some extra code in there to create a little space and potentially give up, but so far everything works well.


var amqp = require('amqplib');
var count = 0;
var maxSendAttempts = 5;
var resend = function(message, key,  attempt) {
  var _this = this;
  var timeout = 2000;
  setTimeout(function() {
    console.log('reconnecting');
    send(message, key, attempt);
  }, timeout);
};
var send = function(message, key, attempt) {
  attempt = attempt || 0;
  attempt++;
  message = JSON.stringify(message);
  amqp.connect('amqp://localhost').then(function(conn) {
    console.log('connected to rabbit');
    conn.createConfirmChannel().then(function(channel) {
      var exchange = 'triggers';
      channel.assertExchange(exchange, 'topic').then(function() {
        channel.publish(exchange, key, new Buffer(message), {}, function(error, ok) {
          if (error) {
            console.log('Message was not confirmed');
            console.log(error);
            if (attempt < maxSendAttempts) {
              resend(JSON.parse(message), key, attempt);
            } else {
              console.log('Message failed ' + maxSendAttempts + ' times. Giving up.');
              console.log(JSON.parse(message));
              channel.close().then(function() {
                console.log('closing connection');
                conn.close();
              });
            }
          } else {
            count++;
            console.log(" [" + count + "] Sent %s:'%s'", key, message);
            channel.close().then(function() {
              console.log('closing connection');
              conn.close();
            });
          }
        });
      });
    });
  }).catch(function(error) {
    if (attempt < maxSendAttempts) {
      resend(JSON.parse(message), key, attempt);
    } else {
      console.log('Message failed ' + maxSendAttempts + ' times. Giving up.');
      console.log(JSON.parse(message));
    }
  });
};
// send a bunch of stuff
for (var i = 0; i < 600; i++) {
  send('A message', 'foo.target');
}

Here’s the proof.

All Received Again

Conclusion

I’ve barely scratched the surface with how make RabbitMQ rock solid, but since I have implemented these changes, we have had no more complaints about messages getting through.

I would like to learn more about the cost of opening so many connections at once. It may be worth it to better manage my messages as our load increases.

Another next step is that, instead of giving up, we can store the message in a database for re-sending at a later time.