Relational Database vs NoSQL Database

Author: @Alan Huang Date: Sep 24, 2020 性質: Tech Share

Database vs DBMS (database management system)

Relational Database (SQL DB)

SQL = structure query language

Database → Tables (schema) → Rows

very strict data form → schema
multiple table → relations
SQL → capable of querying relations data → join
- benefit:
  - normalize and distribute data
- drawback:
  - join data take longer operation time

Types of Relations

One to One
- unique foreign key
- e.g.
  - student ←> contact
Many to One
- foreign key (allow duplicate)
- e.g.
  - user → product
One to Many
Many to Many
- intermediate table
- e.g.
  - students <-> course
Unidirectional vs Bidirectional
not affect the mapping but will make difference on how you can access your data.
Difference Between One-to-Many, Many-to-One and Many-to-Many?

NoSQL Database

Database → Collections → Documents

No schema → flexible
- downside:
  - not sure data format
No relation → merged data manually (kind of
- very collection contain all the data you need
  - less merging data → super fast!
- duplicated data
  - great for lots of read, not many write (vice versa, since it's normalized and distributed data into multiple tables, it's better for lots write application)

collection -> document
// User
{
	username: 'alan',
	phone: {
		main: 09091234567,
		sub:  09765432111
	},
	email: {
		main: 'main@gamil.com',
		school: 'schoool@nthu.edu.tw'
	},
	school: {
		name: 'nthu',
		building: {
			CS: 'delta',
			room: {
				124: {...},
				302: {...}
			},
		...
	}
}

Compare SQL and NoSQL

Schema vs. no Schema
Relations / distributed across multiple tables
- SQL
  - (y) better for frequently changing data
  - (n) join / query large data is slow
- NoSQL
  - Data is merged / nested in few collection
  - (y) no join, query is very fast
  - (n) one change need to modify multiple collection, if application has lots of write, it may not be a good choice.
Scaling
- Horizontal: add more server / Vertical: improve server capability (have limitatoin)
- SQL hard to horizontal scaling (since relations), possible to vertical scaling
- NoSQL design to horizontal scaling, possible.
limitation to large read / write query
- SQL
  - Read → complex join take more times.
  - Write → (maybe) lock a table when writing (ACID).
- NoSQL
  - great for mass (simple) read & write request

Choose Database

SQL
- need clear schema
- lots of relation
- data change frequently
NoSQL
- read a lot
- write a lot but no update lots of collection
- better to scaling

SQL vs NoSQL or MySQL vs MongoDB

Example

適合 Relational DataBase
1. 帳戶扣款 → 同步、正確，追求 Consistancy，不追求 Availablity
適合 NoSQL Database
1. FB 按讚 → 快速回應、不需要即時正確，追求 Availablity，不強求 Consistancy

ACID

Atomicity (原子性 → can't be split) : transaction中的所有操作，全部完成或者全部不完成，不會結束在中間某個環節。在執行過程中發生錯誤，會被回滾（Rollback）到事務開始前的狀態。即，事務不可分割、不可約簡。
Consistency (一致性)：在事務開始之前和事務結束以後，資料庫的完整性沒有被破壞。
- Consistency in Data
  資料庫的完整性，由使用者定義。例如，轉帳扣款成功後發生錯誤，轉入帳戶沒有增加金額，則資料庫完整性被破壞。
  - Define by the user
  - Atomicity
  - Isolation
- Consistency in reads
  - If a transaction committed a change will a new transaction immediately see the change ?
    - Lag to update replicas of a (horizontal scaling)
    - Every database suffers from this.
  - Eventually consistency
Isolation (隔離性)：資料庫允許多個 transactions 同時對其資料進行操作，但也同時確保這些 transaction 的交叉執行，不會導致數據的不一致
- Read Phenomena
  - Dirty read
    Any transaction can read data before other transaction commit the change.
    - e.g.
      init: cost = 5
      transaction A: read cost (5)
      transaction A: cost = cost + 1 (6)
      transaction B: read cost (6)
      transaction B: cost = cost + 1 (7)
      transaction A: fail and rollback. (cost = 5)
      transaction B: commit. (cost = 7)
      expect: cost = 6, but get cost = 7
    Solve by Read committed isolation.
  - Non-repeatable read
    Read same data, get different value.
    - e.g.
      init: cost = 5
      transaction A: read v1 = cost (5)
      transaction B: read cost (5)
      transaction B: cost = cost + 1 (6)
      transaction B: commit. (cost = 6)
      transaction A: read v2 = cost (6)
      transaction A: assert v1 == v2 fail.
    Solve by Repeatable read isolation.
  - Phantom read
    New data comes in.
    Solve by Serializable transaction.
- Isolation Level
  init var = 0
  transaction A: |———————> set var = 5 ——————-> commit change.
  transaction B: |—> read var ————————> read var —————————> read var
  - Read uncommitted
    No isolation, any change from outside is visible to the transaction.
    → Dirty read.
    Problem: transaction A may fail and rollback, then 2nd read would get a failed value.
  - Read committed
    Each query in a transaction only sees committed stuff.
  - Repeatable Read
    Each query in a transaction only sees committed updates at the beginning of transaction
  - Serializable
    Transaction is serialized. (One transaction a time, no parallel)
  Higher isolation level, lower performance
Durability (耐久性)：transaction 完成後，對資料的操作就是永久的，即便系統故障也不會丟失。

Relational Database ACID Transactions (Explained by Example)

BASE

Basically Available：保持服務基本可用
Soft State: 狀態可以有一段時間的不同步
Eventual consistency (最終一致性)：雖然有一段時間不同步，但追求最後結果一致

Reference

Awesome Articles

none found.