Things I Wished More Developers Knew About Databases

  1. You are lucky if 99.999% of the time network is not a problem.
  2. ACID has many meanings.
  3. Each database has different consistency and isolation capabilities.
  4. Optimistic locking is an option when you can’t hold a lock.
  5. There are anomalies other than dirty reads and data loss.
  6. My database and I don’t always agree on ordering.
  7. Application-level sharding can live outside the application.
  8. AUTOINCREMENT’ing can be harmful.
  9. Stale data can be useful and lock-free.
  10. Clock skews happen between any clock sources.
  11. Latency has many meanings.
  12. Evaluate performance requirements per transaction.
  13. Nested transactions can be harmful.
  14. Transactions shouldn’t maintain application state.
  15. Query planners can tell a lot about databases.
  16. Online migrations are complex but possible.
  17. Significant database growth introduces unpredictability.

You are lucky if 99.999% of the time network is not a problem.

ACID has many meanings.

An illustration of data loss if MongoDB crashes before it writes to the physical disk.

Each database has different consistency and isolation capabilities.

An overview of the existing concurrency models and the relationships between them.
  • Serializable (most strict, expensive): A serializable execution produces the same effect as some serial execution of those transactions. A serial execution is one in which each transaction executes to completion before the next transaction begins. One note about Serializable level is that it is often implemented as “snapshot isolation” (e.g. Oracle) due to differences in interpretation and “snapshot isolation” is not represented in the SQL standard.
  • Repeatable reads: Uncommitted reads in the current transaction are visible to the current transaction but changes made by other transactions (such as newly inserted rows) won’t be visible.
  • Read committed: Uncommitted reads are not visible to the transactions. Only committed writes are visible but the phantom reads may happen. If another transaction inserts and commits new rows, the current transaction can see them when querying.
  • Read uncommitted (least strict, cheap): Dirty reads are allowed, transactions can see not-yet-committed changes made by other transactions. In practice, this level could be useful to return approximate aggregates, such as COUNT(*) queries on a table.
An overview of concurrency anomalies at different isolation levels per database.

Optimistic locking is an option when you can’t hold a lock.

UPDATE products
SET name = 'Telegraph receiver', version = 2
WHERE id = 1 AND version = 1

There are anomalies other than dirty reads and data loss.

BEGIN tx1;                      BEGIN tx2;SELECT COUNT(*) 
FROM operators
WHERE oncall = true;
0 SELECT COUNT(*)
FROM operators
WHERE oncall = TRUE;
0
UPDATE operators UPDATE operators
SET oncall = TRUE SET oncall = TRUE
WHERE userId = 4; WHERE userId = 2;
COMMIT tx1; COMMIT tx2;

My database and I don’t always agree on ordering.

result1 = T1() // results are actually promises
result2 = T2()

Application-level sharding can live outside the application.

An example architecture where application servers are decoupled from the sharding service..

AUTOINCREMENT’ing can be harmful.

  • In distributed database systems, auto-incrementing is a hard problem. A global lock would be needed to be able to generate an ID. If you can generate a UUID instead, it would not require any collaboration between database nodes. Auto-incrementing with locks may introduce contention and may significantly downgrade the performance for insertions in distributed situations. Some databases like MySQL may require specific configuration and more attention to get things right in master-master replication. The configuration is easy to mess up and can lead to write outages.
  • Some databases have partitioning algorithms based on primary keys. Sequential IDs may cause unpredictable hotspots and may overwhelm some partitions while others stay idle.
  • The fastest way to access to a row in a database is by its primary key. If you have better ways to identify records, sequential IDs may make the most significant column in tables a meaningless value. Please pick a globally unique natural primary key (e.g. a username) where possible.

Stale data can be useful and lock-free.

Application server reads 5-second old stale data from local replica even though the latest version is available on the other side of the Pacific Ocean.

Clock skews happen between any clock sources.

  • TrueTime uses two different sources: GPS and atomic clocks. These clocks have different fail modes, hence using both of them is increasing the reliability.
  • TrueTime has an unconventional API. It returns the time as an interval. The time could be in fact anywhere between the lower bound and the upper bound. Google’s distributed database Spanner then can wait until it is certain the current time is beyond a particular time. This method adds some latency to the system especially when the uncertainty advertised by masters are high but provides correctness even in a globally distributed situation.
Spanner components use TrueTime where TT.now() returns an interval, so Spanner can inject sleeps to ensure the current time has passed a particular timestamp.

Latency has many meanings.

Evaluate performance requirements per transaction.

  • Write throughput and latency when inserting a new row in to table X (with 50M rows) with given constraints and populating rows in related tables.
  • Latency when querying the friends of friends of a user when average number of friends is 500.
  • Latency of retrieving the top 100 records for the user timeline when user is subscribed to 500 accounts which has X entries per hour.

Nested transactions can be harmful.

with newTransaction():
Accounts.create("609-543-222")
with newTransaction():
Accounts.create("775-988-322")
throw Rollback();
function newAccount(id string) {
with newTransaction():
Accounts.create(id)
}
function newAccount(id string) {
Accounts.create(id)
}
// In main application:with newTransaction():
// Read some data from database for configuration.
// Generate an ID from the ID service.
Accounts.create(id)
Uploads.create(id) // create upload queue for the user.

Transactions shouldn’t maintain application state.

var seq int64with newTransaction():
newSeq := atomic.Increment(&seq)
Entries.query(newSeq)
// Other operations...

Query planners can tell about databases.

SELECT * FROM articles where author = "rakyll" order by title;
  • Full table scan: We can go through every entry on the table and return the articles where author name is matching, then order.
  • Index scan: We can use an index to find the matching IDs, retrieve those rows and then order.

Online migrations are complex but possible.

  • Start doing dual writes to both databases. At this stage, new database won’t have all the data but will start seeing the new ones. Once you are confident about this step, you can move on to the second.
  • Start enabling the read path to use both databases.
  • Use the new database primarily for reads and writes.
  • Stop writing to the old database although keep reading from the old database. At this point, new database still doesn’t have all the new data and you might need to fallback to the old database for old records.
  • At this point, old database is read-only. Backfill the new database with the missing data from the old database. Once migration is complete, all the read and write paths can use the new database and the old database can be removed from your system.

Significant database growth introduces unpredictability.

--

--

--

See rakyll.org for more.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Software Engineering vs Software Development

Let’s Not Argue About Technical Debt

What developers do all day long

How to Reduce Software Development Costs in 2021

Presto Federated Queries

Unlock The Power of Community and Co-Creation in Customer Care

Don't Tell Me, Show Me

A Gentle Introduction to Stream Processing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jaana Dogan

Jaana Dogan

See rakyll.org for more.

More from Medium

Overview of Caching, Distributed Cache, Caching Patterns & Techniques

On Building Scalable Systems

Distributed Rate-Limiting Algorithms

Linearizability And/Vs Serializability in Distributed Databases