How Architecture Patterns Actually Differ When You Have to Run Them

What does a monolith actually give you before you break it apart?

A monolith is a single deployable unit. All the code runs in the same process, shares the same memory, and talks to the same database. Function calls are in-process. There is no network between components. Data is consistent because there is one database and one schema.

The benefits are real. Deployments are simple: one artifact, one deployment pipeline, one place to look when something breaks. Local development is fast. Debugging a request means reading a single log stream. Transactions work without any special coordination because the database handles them.

The pattern I keep coming back to after reading through a number of post-mortems is that teams abandon monoliths before the monolith has actually become the problem. A monolith running on well-configured hardware can serve hundreds of millions of requests per day. The problems that drive people away from monoliths are usually organizational, not technical: too many engineers sharing a codebase, deployment coupling slowing down independent teams, a single component consuming all the memory and blocking everything else. When those problems are real, breaking apart the monolith makes sense. When they are not, the complexity cost of distribution is paid without the benefit.

What does the client-server model assume, and where does that assumption break?

The client-server model structures a system around a clear boundary: clients make requests, servers handle them, a database persists the state. The API is the contract across that boundary.

This model works for most systems most of the time. The reason it is worth naming explicitly is that its assumptions break in specific ways.

The server is stateless by assumption. State lives in the database. If the server holds state in memory (sessions, in-memory caches), horizontal scaling stops being simple: a request routed to a different server instance will not find the session. The fix is to move state out of the server (into Redis, into a database) or to use sticky routing, which limits the flexibility of horizontal scaling.

The database is the shared bottleneck. Every server instance talks to the same database. As server instances scale horizontally, read and write pressure on the database scales with them. The database becomes the ceiling unless it is scaled separately, and scaling a database is harder than scaling stateless application servers.

The API boundary does not disappear under load. A client waiting for a slow server response holds a connection open. A server waiting for a slow database query holds a thread. Under load, these held resources compound. Timeouts and circuit breakers exist to bound the damage, but they require deliberate design. They do not happen automatically.

What does breaking a monolith into microservices actually cost?

A microservices architecture decomposes a system into independently deployable services, each owning its own data and its own deployment lifecycle. Teams can deploy independently. Services can be scaled independently. A failure in one service can be isolated from the others.

The costs that are underplayed in most introductions:

Network replaces function calls. In a monolith, a function call takes nanoseconds. A network call between services takes milliseconds and can fail. Every cross-service call is a distributed systems problem: timeouts, retries, partial failures, and the need to design for the case where the downstream service is slow or unavailable. Teams migrating to microservices consistently underestimate how much of their existing reliability was coming for free from the monolith's in-process calls.

Data ownership creates consistency problems. Each service owns its own database. A transaction that would span a single database in a monolith must now be coordinated across services. Distributed transactions (two-phase commit) exist but are expensive and fragile. The more common approach is eventual consistency: services emit events, other services consume them and update their own state asynchronously. This means the system is consistent eventually, not immediately, and the application must be designed to handle the interim state.

Operational surface multiplies. One service means one deployment, one log stream, one set of metrics. Ten services mean ten deployments, ten log streams, ten metric sets, and the need for distributed tracing to correlate a single user request across all of them. Service discovery becomes necessary. Each service needs its own health checks, alerting, and scaling policies. The infrastructure investment required to operate microservices well is substantial.

The question I am currently using to evaluate when microservices earn their cost: is there a specific team or a specific component where the deployment coupling or the resource contention is causing a measurable problem? If yes, extract that service. If no, the cost is real and the benefit is not yet present.

What does event-driven architecture change about who controls the flow?

In a request-response architecture, the caller controls the flow. Service A calls Service B and waits for the response before continuing. In an event-driven architecture, the flow is inverted. Service A emits an event and continues. Service B (and Service C, and Service D) consume the event and act on it independently.

The structural benefit is decoupling. Service A does not know that B or C exist. Adding a new consumer means adding a new service that subscribes to the event. No change to Service A is required. This is genuinely useful when a single action needs to trigger multiple downstream processes: an order placed event might need to trigger inventory reservation, payment processing, email confirmation, and analytics ingestion.

The operational realities:

Debugging is harder. A request-response system has an obvious causal chain. A function returned an error and the stack trace tells you where. In an event-driven system, a failure in downstream processing may surface as a symptom far removed from its cause. Distributed tracing with a correlation ID on every event is not optional; it is the only way to reconstruct what happened.

Ordering is not guaranteed without design. Kafka partitions guarantee ordering within a partition. If events for the same entity (say, the same order ID) can land on different partitions, consumers may process them out of order. Idempotency and careful partition key design are required to handle this correctly.

Consumers can fall behind. Event-driven systems absorb bursts naturally: the queue grows when producers outpace consumers. But a consumer that falls behind builds up a lag. If that lag grows faster than the consumer can clear it, and the event stream has a retention window, the consumer can fall off the end of the log. Monitoring consumer lag is a first-class operational concern, not an afterthought.

What is CQRS, and when does separating reads from writes make sense?

CQRS (Command Query Responsibility Segregation) is the pattern of using separate models for reads and writes. The write side accepts commands and updates the authoritative data store. The read side builds and serves optimized read models from that data, often by consuming events from the write side.

The reason this pattern exists is that the optimal data model for writes and the optimal data model for reads are often different. A normalized relational schema is efficient for writes (no data duplication, referential integrity enforced) but requires joins for reads. A denormalized read model (a single table with all the fields a UI needs, pre-joined) is fast for reads but expensive to keep consistent on writes. CQRS lets you have both: the write side stays normalized and consistent, the read side stays fast and pre-computed.

The cost is eventual consistency between the write side and the read side. A user submits a command. The command updates the write store. An event is emitted. A consumer updates the read model. The read model reflects the command after a delay. If the user immediately queries the read model after submitting a command, they may not see their own write.

This lag is usually acceptable for most applications. It is not acceptable for systems where a user's action must be immediately visible to them (financial account balances, for instance, where seeing a pre-update balance after a transfer would be confusing or dangerous). For those cases, CQRS either needs a way to serve the user's own writes directly from the write model, or it is the wrong pattern.

The operational cost: two data stores instead of one (or one data store with two schemas), an event pipeline between them, and the discipline to never let the read side be the authoritative source of truth.

What does sharding change about how data is accessed and where it breaks?

Sharding splits data across multiple database nodes by a partition key. A users table sharded by user ID means user records are spread across N shards, each holding a range or hash bucket of IDs. Each shard handles a fraction of the reads and writes. As data grows or write throughput increases, more shards can be added.

The question that determines whether sharding is the right move: is the bottleneck actually the database's write throughput or storage, or is it something else? Sharding is not the first tool to reach for. Read replicas, caching, and query optimization can push a single-node database much further than most applications need before sharding becomes necessary.

When sharding is the right answer, the partition key choice is the most consequential design decision:

A range-based key (shard by date range, for instance) is easy to reason about but creates hotspots. If recent data is queried most often, the shard holding the most recent data handles disproportionate load.
A hash-based key distributes load evenly but makes range queries difficult or impossible. Fetching all records in a time range requires querying every shard.
A directory-based key uses a lookup table to map entities to shards, which allows flexible reassignment but introduces the lookup table as a dependency on every request.

Cross-shard queries are expensive or impossible depending on the database. Joining data across shards requires pulling data from multiple shards into the application layer and joining there. Operations that were free with a single database (count, aggregate, join) become engineering problems at the sharding layer.

Resharding, the process of moving data between shards as the data grows or the access patterns change, is operationally painful. Consistent hashing minimizes the data movement required when a shard is added or removed, but it does not eliminate it. Teams that shard early and choose the wrong key pay for it for years.

What does the layered architecture pattern actually separate?

Layered architecture organizes a system into horizontal layers: typically presentation (API, UI), application (business logic, use cases), and data (database, external services). Each layer depends only on the layer below it. The presentation layer calls the application layer. The application layer calls the data layer. Dependencies flow downward.

The benefit is testability. Business logic in the application layer can be tested without a running database by mocking the data layer interface. Presentation logic can be tested without running business logic. The seams between layers are explicit.

The cost is that the boundary between layers can become ceremonial. In practice, many applications end up with thin application layers that mostly translate between the presentation format and the database model, with no meaningful business logic to protect. At that point, the layering is adding abstraction without adding clarity.

The pattern is worth using when the business logic is genuinely complex, when testability of that logic matters, and when the team is large enough that clear ownership of each layer is useful. It is worth questioning when the application is primarily CRUD operations and the layer boundaries exist because that is how the framework example was written.

What does consistent hashing solve that naive sharding does not?

In naive sharding, records are assigned to shards by a formula like shard = hash(key) % N, where N is the number of shards. When a shard is added (N becomes N+1), the formula changes, and almost every record maps to a different shard. The result is a massive data migration every time the number of shards changes.

Consistent hashing places both shards and keys on a ring. Each key is assigned to the nearest shard clockwise on the ring. When a shard is added, it takes over a segment of the ring from its neighbor. Only the keys in that segment need to move. When a shard is removed, its segment is absorbed by its neighbor. Only the keys in the removed shard's segment need to move.

The practical consequence is that adding or removing a shard moves a fraction of the data proportional to 1/N rather than nearly all of it. This makes scaling the cluster far less disruptive. Distributed caches (Memcached clusters, distributed Redis) use consistent hashing for this reason. Cassandra uses a token ring that embodies the same idea.

The edge case worth knowing: consistent hashing by itself can create uneven load if the shards land unevenly on the ring. Virtual nodes (each physical shard gets multiple positions on the ring) solve this by distributing the ring positions more evenly and ensuring that removing one physical node spreads its load across many neighbors rather than concentrating it on one.

What does leader-follower replication actually protect you from?

Leader-follower (or primary-replica) replication is the default durability and availability pattern for most relational databases. One node accepts all writes (the leader). Other nodes receive a stream of changes and apply them (the followers). Reads can be served from followers, which distributes read load. If the leader fails, a follower can be promoted to leader.

The subtleties:

Replication lag is real. Followers apply changes asynchronously in most configurations. A read from a follower may return data that is seconds or minutes behind the leader. For reads that must be current (your own profile data immediately after you updated it), sending the read to the leader or using read-after-write consistency is necessary.

Failover is not instant. When the leader fails, a promotion process runs: a follower is selected as the new leader, the other followers are pointed at it, and clients must reconnect. This process takes seconds to minutes depending on the configuration. During that window, writes are unavailable. Building an application that assumes zero-downtime failover without testing it is a mistake.

Split-brain is the failure mode to design against. If a network partition separates the leader from the followers and both sides believe themselves to be the leader, two nodes may accept writes simultaneously. When the partition heals, the writes conflict. Most database clusters handle this with quorum writes (a write must be acknowledged by a majority of nodes to succeed) or by using an external consensus system (like ZooKeeper or etcd) to manage leader election.

What is service discovery, and why does it become necessary?

In a small system, services find each other by hardcoded address. Service A calls Service B at b.internal:8080. This works until Service B has more than one instance, or until instances are replaced with different IPs, or until the cluster runs on infrastructure where IPs are not stable.

Service discovery is the mechanism by which services find each other dynamically. There are two models:

Client-side discovery: each service queries a registry (Consul, Eureka) to get the list of healthy instances of the target service and picks one using a client-side load balancing algorithm. The client has more control over routing but must implement load balancing logic.

Server-side discovery: the client sends requests to a load balancer or API gateway, which queries the registry and routes the request to a healthy instance. The client knows nothing about the downstream topology. Kubernetes uses this model: services have a stable DNS name and the cluster routes traffic to healthy pods.

The registry must itself be highly available, since a registry failure means services cannot locate each other. Consul and etcd are designed for this. Both use Raft consensus to remain consistent across failures.

The practical point: service discovery becomes necessary the moment you run more than one instance of any service, or deploy to infrastructure where IPs change (containers, auto-scaling groups). In Kubernetes, it is built in. In a less managed environment, it requires deliberate setup.

What is the pattern that ties these together?

Each of these patterns is a solution to a specific problem. Microservices solve team coupling and independent deployment. Event-driven architecture solves integration coupling and burst absorption. CQRS solves the mismatch between write and read models. Sharding solves write throughput and storage limits. Consistent hashing solves the cost of resharding. Leader-follower replication solves read scaling and basic availability. Service discovery solves service location at scale.

The failure mode I keep encountering in post-mortems and design retrospectives is not that a pattern was applied incorrectly. It is that a pattern was applied before the problem it solves was actually present. The team read that microservices scale better and split the monolith before they had ten engineers or ten million users. The team added CQRS because the pattern was familiar, not because reads and writes had different scaling requirements. The complexity arrived immediately. The benefit did not.

The question worth asking for each pattern is not "is this a good pattern?" but "is this the problem I have right now?" The answer to the first question is almost always yes. The answer to the second question determines whether the pattern helps.