[System Design] Data Partitioning

Justification: after a certain scale point, cheaper and feaisble to scale horizontally rather than vertically

1. Partitioning Methods

a. Horizontal Partitioning(Data Sharding)

e.g.

Table 1: ZIP code < 10000
Table 2: ZIP code > 10000

problems:

hard to pick value range -> unbalanced server

b. Vertical Partitioning

e.g.

Profile： Server1
Photo:   Server2

Pros:

easy to implement Cons:
for user growth, needs to further parition for a specific server

c. Directory-Based Partitioning

Lookup service: to find out where a particular data entity resides,
This means the GetDatabaseFor() method actually
hits a web service or a database that stores/returns the mapping between each entity key and the database server it resides on.: [Entity Key -> Database Server]

This loosely coupled approach means you can perform tasks like adding servers to the database pool or change your partitioning scheme without having to impact your application.

Partitioning Criteria

a. Key or Hash-based Partitioning: problem: rehash is problematic, as the hash function may change when server added.

b. List partition: c. round-robin: d. composite: combine all above ways to partition

Common Problem of data partitioning

operation on diff table/row -> diff server a. Join and denormalization Join across server: solved by denormalization But by denormalization -> data inconsistency problem

b. Referntial Integrity: Foreign key constraint, Most RDBMS do not support foreign keys constraints across databases on different database servers. -> application code to enforce referntial integrity

c. Rebalance:

Indexing

Example： A library Catalog