Decoding Data Consistency: Striking the Balance

Subham Kumar Sahoo
6 min readApr 30, 2024

Is the data on your databases consistent? And how much consistent it should be? Which consistency model best suits your use case? These are few questions we will be trying to get answers to in this blog. While consistency matters, the right amount ensures a robust and more efficient system and experience. We will be exploring different data consistency models along with the use cases.

While the whole world is running on data and people saying data is the new oil, maintaining the accuracy and integrity of data we have becomes quite important. Now a days as most of the modern applications often rely on distributed and scalable architectures, achieving data consistency becomes a nuanced challenge.

✨What is Data Consistency?

Data consistency refers to the degree to which distributed copies of data are synced across a system, ensuring that all the end users and applications receive reliable and expected results.

It revolves around an idea that based on the required use case all the users of the data agree upon a certain view of that data at a point in time. It is also an import component of CAP Theorem which stands for Consistency, Availability, Partition Tolerance. We can dive deep into CAP, ACID, BASE theorems etc. in upcoming blogs.

✨Data Consistency Models

In a distributed system with multiple compute, storage and especially multiple users, an ideal scenario will be to make the data fully consistent around all the systems or nodes i.e., every user will be have the same view of data as well as the most recent data. But maintaining this level of consistency will lead to lot of infrastructure overhead. Also it might increase the processing time leading to bad user experience.

These factors have led to creation of different models which best suits certain type of use-cases. Now let’s have a look into these models and discuss the respective use-cases.

1️⃣Strong Consistency

As mentioned above this is like an ideal consistency model which ensures that all the node show the same data at the same time. Which means that if there is a change in data by one user (at one node) then in no time other users reading the data (maybe at other nodes) should be seeing the updated data. There should not be any lag in the data sync across the system.

Use case

A basic example is a banking application. When a customer transfers money between accounts, it’s crucial that all copies of account balances across distributed nodes are immediately consistent to avoid any risk of overdrawing or double-spending.

Implementation

  • Synchronous replication of data among the database nodes. With this data is written to the primary node and other nodes at the same time.
  • Distributed locking mechanisms can be used to ensure that only one node at a time can perform a critical update, preventing inconsistencies.

Challenges

  • Achieving strong consistency often involves increased coordination and communication overhead between nodes, impacting system performance.
  • Synchronous replication can introduce latency in system processes.

2️⃣Eventual Consistency

This ensures that data across the nodes are consistent eventually. This leads to a small delay in the updates for other users but it reduces the data sync overhead.

Use case

This model should be followed when the real time update of data is not mandatory and a small delay will not affect the user experience. One example can be the Reaction counter on different social media sites and YouTube, like number of likes, dislikes, shares etc.

Implementation

  • Asynchronous replication of data across nodes. A good example of this is the read replicas of database nodes which enables faster reads. While write operation is only done to the primary node the change is synced across all read replicas eventually.
  • Generally the replication is done when the overhead on the system is low so that the primary processes are not affected by replication.

🏆Test your knowledge!

Which model will you use for a Stock Broking App❓

3️⃣Causal Consistency

This model ensures a partial order of events by ensuring that causally related events are updates and viewed across nodes in the same order they occur. Causally related events are those events which have dependency on each other.

This type of consistency follows an eventual consistency while maintaining the order of related events or data changes. Used where the order of events matter and real-time update at every node is not mandatory.

Use case

  • When people post messages in a chat room, the data there might be distributed across different nodes and there can be bit delay in receiving the message for different user. But when the messages get updated they are shown in the same order they were posted.
  • Let’s say there is an online multiplayer game. Here we can expect some delay in the actions but the actions by each player should happen in the same order they have been performed. For example, if player A shot B first followed by player C shooting A, then all the players should see the visual in same order only. But we need not follow order of events across separate game rooms as they are not related.

Implementation

  • Causal delivery mechanism can be used to ensure that messages or updates are delivered to nodes in an order consistent with their causality. Events that are causally related are delivered in the correct order. This requires coordination among nodes to exchange information about the causal dependencies of their respective events.
  • Vector clocks or a similar mechanism are often used to track the dependencies between events. Each event is associated with a vector clock that captures information about the events it depends on. Vector clocks provide a partial order of events, allowing nodes to determine the causal relationships between different operations.

4️⃣Read Your Writes Consistency

This ensures that if an user do some data changes then the subsequent read by the same user will show the updated data done by the most recent write. This consistency model is particularly important in user-facing applications where users expect immediate feedback and visibility into their own updates.

Use case

In a social media app, when an user post something or do some actions like reactions, comments etc. then the same is reflected on his/her timeline. Here the read happens automatically and the updated data is visible to the user. While there can be a bit delay in showing the same post or comment to other users.

Implementation

  • Write identification: Each write operation initiated by a client is associated with a unique identifier like version number or timestamp. This identifier helps in tracking the specific write that a subsequent read should reflect.
  • Consistent hashing techniques may be employed to direct read requests from a specific client to the same node where its writes were processed.

🚀AWS Certification study materials and guide: Click here

While these are the primary models, there are few more consistency models which I would like to mention briefly.

🔸Monotonic Reads Consistency

Here if a client reads a particular value then it should not read an older value in future. A timestamp or version number is maintained with each piece of data. Reads are only allowed to fetch data with a version equal to or greater than the last read.

🔸Monotonic Writes Consistency

Ensures that the order of write operations by a single process is preserved. It guarantees that writes from the same process maintain a consistent order. f a process performs write w1, then w2, then all processes observe w1 before w2. For this also an unique identifier like timestamp or version number is maintained.

These two models are related to Causal Consistency model which preserves the dependency between related events.

🔎While researching I came across this site which have a good flowchart and brief explanation for each model. Feel free to have a look: https://jepsen.io/consistency

🔖Conclusion

Implementing these consistency models often involves careful consideration of the trade-offs between consistency, availability, and partition tolerance, as well as the specific requirements of the application. The choice of a consistency model should align with the desired behavior for the given use case.

⚛️FREE!! End-to-End Cloud and Data Engineering projects — Click here

If you liked this blog, kindly do clap. Feel free to share it with others.

👉Follow me on medium for more such interesting contents and projects.

Check out my other articles and projects here : https://medium.com/@subham-sahoo/

Connect with me at LinkedIn. ✨

Thank you!!

--

--