Async Does Not Mean Scalable

⏱ 7 min read

We keep hearing about async/await syntactic sugar and message queues and they make our applications faster, more responsive, more scalable. Teams rewrite synchronous endpoints to async ones and feel better about the decision. And when that's not enough, someone adds a message queue — and suddenly the system feels fixed.

But what if it wasn't broken in the way you thought?

The Reflex 🔗

Every system that starts struggling under load goes through the same ritual. Traffic climbs, latency spikes, and someone in the room says: "let's make it async." The team nods. It sounds right. And for a while, it works — or at least, the dashboard stops screaming.

Then the queue backs up. Or the thread pool saturates. Or the database starts timing out. And the team discovers that async didn't fix the problem. It just changed where the problem lived.

This isn't a criticism of async/await or message queues. They're genuinely useful - especially the message queues . But there's a persistent confusion in how teams reach for them — as a scaling tool rather than a decoupling tool — and it leads to systems that feel fast right up until they don't.

Why async feels like a scaling solution 🔗

Imagine a restaurant where the kitchen is slow. Orders pile up, customers wait, tables sit occupied for too long.

Someone has an idea: hire a host to take orders at the door and give customers a buzzer. Now people don't stand in line at the counter — they sit down, browse their phones, and the kitchen calls them when food is ready. The entrance is clear. The restaurant feels more efficient.

But the kitchen is still slow. The same number of meals gets cooked per hour. The queue just moved from the door to the pager system — and now it's harder to see.

This is exactly what happens when you reach for async/await or a message queue as a scaling solution.

What async/await actually does 🔗

When you await an HTTP request or a database call in .NET, the current thread is released back to the thread pool while the work happens. Once it completes, the continuation resumes on another thread.

public async Task<Order> GetOrderAsync(int id)
{
    // Thread is released here while the DB query runs
    return await _db.Orders.FindAsync(id);
}

That's genuinely useful. In ASP.NET Core, a server handling thousands of concurrent I/O-bound requests with a small thread pool instead of spinning up thousands of threads is a better server. It's the right answer to thread pool exhaustion.

But notice what async doesn't do: it doesn't make the database query faster. It doesn't reduce the number of queries. If 1,000 requests come in simultaneously, the database still gets 1,000 queries — they just don't each hold a thread while waiting.

The thread is free. The work is not.

What adding a queue does 🔗

The queue is a debt counter, not a buffer

Adding a message queue feels like a more decisive fix. The API returns instantly. Traffic spikes are absorbed. The team ships it and calls it done.

But every message still needs a consumer. The consumer still hits the database, calls the downstream service, runs the business logic. What changed is where the pressure accumulates.

Think of it like a debt. If your producer sends 1,000 messages per second and your consumer handles 200, you're accumulating 800 messages of debt every second. After ten minutes: 480,000 unprocessed messages. The producer side looks fine. The queue is quietly growing.

When will you notice? When redelivery storms start hitting your consumer. When messages expire. When the lag is hours, not seconds.

xychart-beta
    title "Queue debt over time (producer: 1000/s, consumer: 200/s)"
    x-axis ["0m", "2m", "4m", "6m", "8m", "10m"]
    y-axis "Unprocessed messages" 0 --> 500000
    bar [0, 96000, 192000, 288000, 384000, 480000]

The bottlenecks async doesn't touch 🔗

In my experience, most throughput and scalability problems aren't thread problems at all. They're resource contention problems — and no amount of concurrency management fixes a contention problem:

  • Database connection pool — finite by design. Callers queue for a connection whether the calling code is async or not.
  • Row-level locks — concurrent writers serialize. An await doesn't skip the queue.
  • External rate limits — the third-party API that allows 100 requests per second doesn't care that your code is elegant.
  • Shared mutable state — still needs coordination. The concurrency model doesn't change because you added async.

Making the calling code async moves the problem one layer down. It doesn't remove it — it just makes it harder to see because you've separated the producer from the consumer.

When to actually reach for async 🔗

None of this is a reason to avoid async/await. It's a reason to reach for it for the right reason: genuine thread pool pressure in high-concurrency I/O-bound workloads.

Message queues have their place too — decoupling services with different uptime requirements, absorbing traffic spikes without dropping requests, building retry logic for unreliable operations. These are real problems worth solving.

Just don't confuse those benefits with throughput. Decoupling and throughput are different properties. A well-decoupled system can still be slow. A fast system doesn't require a queue.

What actually improves throughput 🔗

When throughput is the real problem, the levers are different:

  • More consumer instances — horizontal scaling works when consumers are stateless and independent. If processing one message takes 50ms and you need to handle 500 messages/second, you need at least 25 concurrent consumers. Spin up more; the math is simple.
  • Partitioning — route work by a meaningful key (customer ID, tenant ID) so consumers operate on non-overlapping subsets without competing for the same rows. This is where real parallelism comes from.
graph LR
    Q[Queue] -->|customer A–M| C1[Consumer 1]
    Q -->|customer N–Z| C2[Consumer 2]
    C1 --> DB1[(DB shard 1)]
    C2 --> DB2[(DB shard 2)]
  • Keeping handlers lean — shared mutable state limits how many you can run in parallel. A handler that locks a shared cache or calls a single-instance service will serialize regardless of how many instances you run.

And measure the right thing. Consumer lag, not queue depth. Queue depth is a snapshot — it tells you how much work is waiting right now. Consumer lag tells you how fast you're falling behind. One is a count. The other is a velocity. You need the velocity to plan capacity.

async/await, scalability, and what to do instead 🔗

async/await is a thread management tool. Message queues are a decoupling tool. Neither is a throughput tool on its own.

If your system is slow, profile before you refactor. Measure before you add infrastructure. The database index you're not adding, the N+1 query you haven't noticed, the downstream service with no timeout — these are more likely your problem than the absence of await.

Async shifts the work. Scale requires designing where the work goes.


If this resonates with you, share it with someone who just added a queue to a slow service and called it a day. I'd love to hear your thoughts — find me on LinkedIn and drop a comment.

Enjoyed this post?

Join my newsletter and get notified about new posts on .NET and the world around it.