Message Queuing in the Cloud

Charlie Koster
6 min readSep 5, 2020


A message queue is a service that allows producers to place serialized messages onto a highly reliable queue from which one or more consumers can read those items. Message queues are an essential aspect of decoupled, event-driven architectures and in this post I cover a few of the potential use cases and tradeoffs.

Message Queue Use Cases

Non-blocking UX: Word Counts for Medium Drafts

When writing a post you may notice the drafts list page will show a word count for each post. Upon closer inspection you may also notice it often takes several minutes for the word count to display accurately. This eventually consistent user experience could very well be implemented using a message queue.

A draft word count that doesn’t update in real time

The (476 words) so far word count isn’t updated in real time resulting in a tradeoff.

  1. Autosaving is much faster when words aren’t counted as part of the operation. This has the convenient side-effect of minimizing blocking actions such as an alert preventing you from navigating away while the draft is automatically saving.
  2. Autosaving is more robust because parsing rich text in particular can be error prone. A failure to parse rich text should not prevent a draft from saving.

One way to build a more robust autosave with lower latency is for the autosave operation to persist the draft immediately (blue arrows) while enqueuing the ID of the draft on a message queue to be processed later (orange arrow) by an autoscaled Function or Virtual Machine.

An eventually consistent word count feature for Medium draft stories

Note that the above diagram implements the claim check pattern in order to avoid message length limits (AWS / Azure) which can be exacerbated by including draft text with extensive usage of rich text in queue messages.

Performance tradeoff: A potential improvement could be to use a conditional claim check pattern by conditionally including the entire draft in the message if the draft size is small enough. This will decrease database load for smaller drafts at the expense of causing the persisted word count to lag behind even more by counting words of a potentially outdated draft.

Batch and Throttle Requests: Usage Analytics

Sites tracking concurrent usage using third-party APIs may have to work around rate limits imposed by those external APIs. Rate limiting becomes an increasing problem as the number of concurrent users increases.

Rate limiting enforced by a third-party API

The above diagram depicts thousands of users concurrently generating usage data and when a specified invocation per second threshold is exceeded some number of third-party API invocations may drop.

A simple way to address this problem is to use a message queue to buffer usage data, effectively throttling API calls, and process usage in batches.

Message queues allow requests to be batched and throttled

The combination of processing usage data in batches while throttling third-party API invocations through tunable autoscaling parameters allows one to stay within third-party API rate limits even with high volume traffic.

An additional benefit that comes for free is if the third-party service temporarily goes offline the in-flight messages are not lost. When Usage Service fails to push messages to those messages will become visible again on the queue to be processed at a later time.

Priority Queuing: Serve Paying Customers First

In a previous post I described how a message queue can be used to decouple cloud compute and storage services for a video stabilization service.

Architectural decoupling for a video stabilization service

Hypothetically, this service could serve both paying and free trial members. In that scenario the SQS message queue above would be a single queue to handle requests from both paying and non-paying users. Understandably, the lack of improved service performance could be frustrating for users with paid membership.

With very little architectural change, the above SQS message queue can split into two creating a priority queuing architecture.

Priority queuing which favors paying members

A free trial queue would service free trial members and have limited compute resources serving their needs. Another members only queue would service paying members and dedicate substantially more resources to serve their needs, even borrowing compute resources from free trial members depending on how many messages are in each queue.

Message Queue Considerations

Message Ordering and Duplicates

Producers enqueuing messages on a queue may perform that operation as part of a more complicated action, an action not guaranteed to complete successfully. When a producer retries the action it may have already successfully enqueued a message causing a subsequent duplicate message to be placed on the queue.

Relatedly, some message queue implementations do not enforce strict message ordering. Messages may not enqueue in the same order they were received from producers.

One way to help mitigate these caveats is to process queue messages with idempotency. Operations like counting words of a draft specified inline to the message are naturally idempotent. Processing duplicate messages results in the same persisted word count. On the other hand, sometimes you’ll find yourself at the mercy of third-party APIs which may require additional techniques to achieve idempotency.

UX for Eventual Consistency

In another post I describe the anti-pattern of happy path engineering. From a user experience perspective eventual consistency is yet another less-than-happy path to be accounted for. For example, how should the tradeoff of counting words in near real time vs crafting a robust, minimally blocking autosave experience be considered?

This is yet another reason to involve Engineering early in feature ideation and design. It wouldn’t be reasonable for those outside of Engineering to be expected to master the ins and outs of cloud development and anticipate where eventual consistency will be employed. However, when everyone collaborates in unison these scenarios can be anticipated and accounted for earlier in the product development lifecycle.

Poison Messages

Popular cloud message queue implementations allow for peek or soft deletion policies where a failure to process a message will cause the message to remain on the queue. That works fine when the failure is a transient issue with the compute instance reading from the queue. Those may self-resolve. It’s a different outcome when the problem is with the message itself.

A queue message that will continually produce consumer failures and is repeatedly placed back on the queue for other consumers to ultimately fail is called a poison message.

Poison messages can be dealt with by observing message read counts and conditionally sending messages deemed as poisonous to a dead letter queue to be processed separately and unblock consumers.

Increased Complexity

Message queues will add complexity even if incrementally. CI, traceability, dead letter queues, cognitive load for developers. Each consideration may be minor but it can add up.

This is a tradeoff that should be considered carefully. Is a real time word count feature worth the additional point of failure and a longer autosave operation? Does dropping a subset of usage data still provide enough signal for future business decisions? Should paying members have to wait as long as free trial members for video stabilization? Maybe!

Note that a message queue is not an all or nothing solution. It is completely possible to maintain a monolith while at the same time identifying a new service as needing to be backed by a message queue at the expense of incrementally added complexity. Message queuing is a tool and sometimes it’s the right tool for the job.



Charlie Koster

Software Architect | Conference Speaker | Blog Author