Design pattern to scale relational database horizontally

Database scalability is one of the critical factor in designing scalable web application. All the four major cloud vendors (Amazon, Microsoft, Google and Oracle) offers relational database as service which abstracts several complexities for developers to setup and manage database servers that helps developers focus on core application development. Even with cloud vendors’ service based offering scaling database horizontally (or scale out) still a challenge which developers can resolve at application level. In this post I would like to talk about the challenges and possible solution for same.

Problem

Most relational databases can have only one master database but several read replicas. One master database server means all the write operations can go in to single database server. For example AWS Aurora MySQL can have up to 15 read replicas but it can have only one master database.

In a traditional architecture the primary application will execute SELECT/INSERT/UPDATE/DELETE operations against master database and read replicas are used for analytics and business intelligence purpose.

Traditional.png

With this model the scalability of the primary application is limited to master database server capacity.

Why? CAP theorem

CAP theorem states that it is impossible for a distributed database to simultaneously provide more than two out of the following three guarantees.

Consistency – Every read receives the most recent write or an error
Availability – Every request receives a (non-error) response – without guarantee that it contains the most recent write
Partition tolerance – The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Most relational databases (MySQL, Oracle, SQL Server, PostgreSQL) supports only Consistency and Availability hence its hard to support true horizontal scalability.

Solution

Have one master database for write operation that maintains consistency (ACID) and use slave/read replica databases for read-only operations.

In order to utilize the read replica efficiently at the application level separate the read only vs writable transactions. Route all writable transactions to master database and read-only transactions to read-replicas. Most applications I have worked so far have 80% read only transactions and about 20% writable transactions; I assume majority of the web applications might fall in similar category.

Read-Replica.png

Spring framework provides option to route to appropriate data source for read vs write transaction.
You can find a good example from the link below.
http://fedulov.website/2015/10/14/dynamic-datasource-routing-with-spring/

Advantages of read replica

  1. Reduces load to master database instance
  2. Table indexes can be added only in read replica that helps master database to perform INSERT/UPDATE operation faster without index.
  3. If a SQL statement locks the table during table scan the master database read/write operation will not be impacted.
  4. Read replica can be promoted as master database if the master database becomes unavailable.

In a Nutshell

Read-write transactions -> consume master (aka source) database

Read-only transactions -> consume slave (aka read replica) database

References

https://www.facebook.com/note.php?note_id=23844338919

https://en.wikipedia.org/wiki/CAP_theorem

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: