Polyglot Persistence

February 18, 2023

Neal Ford created the term “Polyglot Programming” in 2006 based on the idea that different programming languages are suited to different kinds of problems. The same concept has recently been applied to databases. The use of multiple database engines and paradigms to suit different workloads enables organizations to take advantage of each database engine’s strengths is referred to as Polyglot Persistence (Google Scholar shows me a post from Scott Leberknight’s blog in 2008 as one of the earliest references to the term). Relational databases are great at reducing duplication - by normalizing databases, repeated values are minimized and the storage demands are reduced. However, to be able to get information out of these databases, JOINs have to be used to bring data from multiple tables. This makes the use of relational databases more CPU intensive. NoSQL databases are great at horizontal scalability - they can work on clusters of machines by dividing the workload, and they eliminate JOINs by storing related data together at the expense of storage.

Some applications use a combination of databases by-design - Neo4J for building social-media recommendation engines, Redis for managing sessions within applications, Oracle for storing financial transactions, and MongoDB for product catalogs with varying attributes for different categories of products. At times, applications are tied to a certain primary database (Eg. WordPress’ dependency on MySQL), but plugins can enable the storage of data on other types of data stores to take advantage of certain characteristics of other database engines. An ERP system can use Oracle or MS SQL Server as the primary data storage, but an in-memory data store can be used to provide realtime analytics. Traditional ETL tools provide the ability to process changes as a batch and to get data from a source database into a format that can be read by the destination database. Tools for Change Data Capture (CDC) provide the ability to continuously migrate changes from one data store to another data store thus enabling near-realtime analysis of data.