UUID for better data integrity
Serhii Potapov May 10, 2021 #database #webLet's talk a about UUID today. UUID stays for Universally Unique Identifier, it's just 128 bits that are used to encode ID of some entity. In relation databases like PostgreSQL or MySQL it's usually more common to use sequentially incremented integer for IDs, rather than UUID.
I often observe, teams turn to UUID only when they run into some scenarios, where IDs comes from from outside (a UI client or another service) and it can not be generated within DB to be sequential. Another use case for UUID is to hide real number of database entries in REST URLs.
However, they is another strong use case for UUID.
Problem
Let's say there is a simple DB schema of 3 tables: resources
, admins
, and users
. Table resources
has a foreign key admin_id
that refers to admins.id
.
+-------------------+ +----------------+ +----------------+
| resources | | admins | | users |
+-------------------+ +----------------+ +----------------+
| id INT (PK) | +-->| id INT (PK) | | id INT (PK) |
+-------------------+ | +----------------+ +----------------+
| name VARCHAR | | | name VARCHAR | | name VARCHAR |
+-------------------+ | +----------------+ +----------------+
| admin_id INT (FK) |--+
+-------------------+
Let's say we have 2 admins: Donald and Mickey.
id | name |
---|---|
1 | Donald |
2 | Mickey |
And 3 users: Billy, Willy, Dilly.
id | name |
---|---|
1 | Billy |
2 | Willy |
3 | Dilly |
We want to add a new resource Pancake
but our application has a bug: by mistake
we're adding user's id as admin_id
, instead of admin's id:
Resource.create!(name: , admin_id: willy.id)
Because willy.id
is 2, the database's FK constraint will check presence of record with ID=2 in admins
table, it will find the Mickey record, so the Pancake
resource will be successfully created.
But it's obviously not want we want. We want the DB to yell at us loudly, pointing to the error in our application logic.
UUID to rescue
This kind of scenarios could be eliminated if we use UUID type to represent IDs, instead of sequential integers. The key difference here is that each single instance of UUID is going to be unique globally over all tables, not only within a scope of one single table.
With UUID for primary and foreign keys, the content of our tables may look like this:
admins:
id | name |
---|---|
ef16123e-b1ab-11eb-8529-0242ac130003 | Donald |
d273a69d-5904-47f2-bdc6-86e34120e0a2 | Mickey |
users:
id | name |
---|---|
68295a9c-6c4e-4a34-820f-757aad0efac2 | Billy |
dc619aa3-44e0-40aa-a63c-6c7a1122683a | Willy |
1e1fcc91-8b6b-492a-9bc1-0e969a296681 | Dilly |
Now if our application tries to create a resource with incorrect reference to admins
table:
Resource.create!(name: , admin_id: willy.id)
Our DB will spit out an error, telling us that FK constraint is not satisfied, because there
is no record with id = dc619aa3-44e0-40aa-a63c-6c7a1122683a
in table admins
.
Afterword
UUID takes a twice more disk space (128 bits for UUID VS 64 bits for BIG INT), but I think it's a very good bargain, considering the benefits it brings. However, you can still strengthen data integrity even using sequential integers.
One technique is to allocate a big enough range of possible values for each single table. For example:
1..1_00_000_000
- forusers
,1_000_000_001..2_000_000_000
- foradmins
2_000_000_001..3_000_000_000
- forresources
Ideally ID values of different tables would never overlap.
The very first admin record would have id = 1_000_000_001
instead of 1
.
Another technique, that can be applied only in the languages with advanced type system, is to use wrapper types for each single id. For example in Rust it would look like:
;
;
;
In this case the compiler would never allow us to use UserId
, where AdminId
is expected.
Thanks you for reading.