Good Clustered Index Design Is All About NUNS
Let's talk a little bit about table design in SQL Server.
One of the best guiding principles of good table design involves what is often the first and most important column of the table. Yet it is also a column that your end-users will never see.
Every table you see us design will begin with a column called ID, which is an integer data type created with the IDENTITY property. This means that the ID field will autoincrement with each new row: 1, 2, 3 4... etc. The TSQL code looks like this:
ID int not null IDENTITY(1,1)
I'll also create the Primary Key of the table on this field. This means that ID column is my surrogate key - it's an artificially-generated key in the table. This is opposed to my natural or business key, which could be the product number, patient number (but not SSN!), or other business-generated key.
, CONSTRAINT PK_TableName_ID PRIMARY KEY (ID)
The ID field will be unique in this table, but this doesn't mean we can't enforce uniqueness on another business key as well. For example, this TSQL code:
, CONSTRAINT IDX_UC_TableName UNIQUE (Product_Name)
We'll also never expose system-generated surrogate keys to the end-user in the application. The users will see the product number or patient number, but not the ID field, because this ID field has much more important jobs.
This ID column will also serve as the Clustered Index key of my table. This is ideal, because a good Clustered Index key will follow the helpful rule of NUNS:
Because the ID column is system-generated and never visible to users, they're not aware of it and cannot ask it to be changed. Products can be renamed, people can change names, but a surrogate key never will. And because it's non-changing, this ID field is ideal to be used as the reference for foreign keys in other tables.
The auto-incrementing field will never overlap or roll over. Using the int data type gives us more than two billion positive values. If we need more, we can use the bigint data type for more than nine quintilliion positive values.
The integer data type is only four bytes wide per row. Doesn't sound like it would matter, but it's half the size of an eight-character string and a quarter of the size of a uniqueidentifier (GUID). Narrow is desirable because the clustered index key is reference by foreign keys in other tables, but also because it is part of each nonclustered index. A bloated clustered key will bloat your nonclustered indexes too. At scale, that data type size advantage will really add up.
The auto-incrementing number will always ascend, making for tidy data inserts and efficient storage, not to mention a sensical integer to help developers know a row's insertion order with intuition.
So there you have it, a sensible database design principal to rule each table. At Sparkhound we apply this pattern to both relational and dimensional table designs.
Remember you can and should always enforce other uniqueness in the table's natural or business keys. A unique constraint is the way to do this (which becomes a unique nonclustered indexes). But the ID column is about so much more than just uniqueness, it's about table structure, storage efficiency, and relational integrity.
Is this interesting, want to learn more, or are having the same discussions in your own department? Sparkhound's senior developers actually think this database design stuff is fun! Get in touch with us so we can discuss how we can help your business achieve its goals through leadership and technology.
Information and material in our blog posts are provided "as is" with no warranties either expressed or implied. Each post is an individual expression of our Sparkies. Should you identify any such content that is harmful, malicious, sensitive or unnecessary, please contact firstname.lastname@example.org