Design pattern to scale relational database horizontally
December 11, 2017 Leave a comment
Database scalability is one of the critical factor in designing scalable web application. All the four major cloud vendors (Amazon, Microsoft, Google and Oracle) offers relational database as service which abstracts several complexities for developers to setup and manage database servers that helps developers focus on core application development. Even with cloud vendors’ service based offering scaling database horizontally (or scale out) still a challenge which developers can resolve at application level. In this post I would like to talk about the challenges and possible solution for same.
Problem
Most relational databases can have only one master database but several read replicas. One master database server means all the write operations can go in to single database server. For example AWS Aurora MySQL can have up to 15 read replicas but it can have only one master database.
In a traditional architecture the primary application will execute SELECT/INSERT/UPDATE/DELETE operations against master database and read replicas are used for analytics and business intelligence purpose.
With this model the scalability of the primary application is limited to master database server capacity.
Why? CAP theorem
CAP theorem states that it is impossible for a distributed database to simultaneously provide more than two out of the following three guarantees.
Consistency – Every read receives the most recent write or an error
Availability – Every request receives a (non-error) response – without guarantee that it contains the most recent write
Partition tolerance – The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
Most relational databases (MySQL, Oracle, SQL Server, PostgreSQL) supports only Consistency and Availability hence its hard to support true horizontal scalability.
Solution
Have one master database for write operation that maintains consistency (ACID) and use slave/read replica databases for read-only operations.
In order to utilize the read replica efficiently at the application level separate the read only vs writable transactions. Route all writable transactions to master database and read-only transactions to read-replicas. Most applications I have worked so far have 80% read only transactions and about 20% writable transactions; I assume majority of the web applications might fall in similar category.
Spring framework provides option to route to appropriate data source for read vs write transaction.
You can find a good example from the link below.
http://fedulov.website/2015/10/14/dynamic-datasource-routing-with-spring/
Advantages of read replica
- Reduces load to master database instance
- Table indexes can be added only in read replica that helps master database to perform INSERT/UPDATE operation faster without index.
- If a SQL statement locks the table during table scan the master database read/write operation will not be impacted.
- Read replica can be promoted as master database if the master database becomes unavailable.
In a Nutshell
Read-write transactions -> consume master (aka source) database
Read-only transactions -> consume slave (aka read replica) database