'Tis the season for "scary" things, so in the spirit of the fun, I present the five scariest data types to see in the design of your SQL Server tables.
These decisions are much more than a "what's new" or personal preference. Poor data type choices have significant impact on storage, not just on disk, but in backups and in memory. At scale, poor data type changes can limit performance in a very noticeable way, and reversing these poor decisions down the line is time-consuming and requires an outage. Making these choices right from the start is the best way to prevent a performance - or data quality - scare down the line.
Here's my five scariest SQL Server data types, ordered by degree of fright ascending:
5. Datetime – This one has not only been unnecessary for many years, with smalldatetime a better choice in many cases, it's been replaced recently by the superior DateTime2 and Date data types. It might be an old habit, but Datetime just isn't needed any more, there are far superior options out there for every use. Starting with SQL 2008, there is no good reason to use the datetime data type.
- Need to store just mm/dd/yyyy? Prior to SQL 2008, the smalldatetime data type was appropriate. SQL 2008 and later? There's a date data type now that just stores mm/dd/yyyy at less than half the size of datetime.
- Need to store mm/dd/yyyy hh:mm:ss? Prior to SQL 2008, the smalldatetime data type was appropriate. SQL 2008 and later? Datetime2(0) would be slightly larger per row but just fine for this purpose as well, and is what you should use considering recent guidance.
- Need to store mm/dd/yyyy hh:mm:ss.sss? Datetime was your only choice prior, but DateTime2 is now far superior, with far more fractional seconds precision than datetime. Did you know that the best that datetime ever offered was rounded precision to three thousands of a second? Datetime2 can supply precision up to ten millionth of a second (no rounding) for the same cost to store per row.
- Need to store timezone-aware date/time? You can do that now with datetimeoffset, which has all the features of datetime2 plus the ability to store a timezone offset. Much easier than trying to recreate the wheel of timezone logic yourself.
4. Varchar(max) – I've described this before as the data type for two specific cases:
I want to:
- Label myself as a "lazy developer" or
- I really do expect to store more than 8000 characters in a single field in the database.
- Unless you want to be in one of those situations, you should avoid this data type, lest I default to the first explanation above. Too often "(max)" means "I don't feel like estimating". Let's face it, 8000 characters is a lot of text. That's three to four typical news articles.
- Need to store xml data? There's a datatype much more suited for xml called xml. It stores the data natively and even allows for special xml indexes that you might find useful.
- Need to store json data? There's now (in SQL 2016+) a datatype much more suited for json called json.
- An issue endemic to (max) data types - any variable declared of this type automatically spills into tempdb, increasing activity there. So yes, it can be a performance issue. An older issue where you cannot perform ONLINE index maintenance for (max) data types was resolved in SQL 2012.
3. Nvarchar(max) – This is the default data type for any field in EntityFramework, for example, so this is the data type that screams:
- I'm okay with defaults because, again, I'm a "lazy developer" or
- I might not know what the "N" in "Nvarchar" means
- The n data types are for the unicode library - a character set that allows you to store characters needed for working international - but maybe not what you think. The typical varchar library actually stores most characters needed for many western languages, including German, Italian, etc.
- You must use unicode when you will be storing multiple langauges, or languages that do not have a Windows codeset, for example, a large number of Arabic languages. These would be stored as "?" in a varchar. Many Asian languages may also require you to store text in unicode data types.
- In others words, unless you will be storing unicode language(s) text with native users using native language keyboards, you do NOT need to use nvarchar.
- Internally, SQL Server stores many objects as unicode text because SQL Server is an internationally-released product. This doesn't mean it's a best-practice. Unless your application is also international, don’t use unicode data types.
- Same as above, any variable declared of this type automatically spills into tempdb, increasing activity there. So yes, it can be a performance issue.
2. Text or ntext – These data types are deprecated data types that were replaced by the (max) data types (see above)for varchar and nvarchar. You should absolutely never be using them. They were made obsolete in SQL 2005.
- The scariest thing about seeing any of these data types in modern applications is that there are middle school kids who were born after they were replaced. Get with it, developers!
- Managing a legacy application? You'll want to upgrade these data types, there is a major index maintenance feature called "online" that cannot be performed on these ancient data types: text, ntext and image.
- These data types belong in a museum!
1. Float (or Real) – I've written a series of blog posts talking about the woes to those to use float/real data types to store data with numbers to the right of the decimal point.
These data types are also defaults for some code-first architectures, so seeing this data type would tell me:
- I don't know what approximated data types are all about or
- I can't figure out how many values to the right of the decimal I need, and want to screw my data up as a result
- Ever heard this? "I'm comparing these two numbers, and they look the same, but they're not comparing successfully!" or "there must be a bug in SQL when you compare floats with floats... it's like it doesn't recognize it's the same number!" That's not a bug. That's your bad data type decision.
- Float (and real, which is just a subtype) is an approximate data type and in multiple cases has been responsible for what surfaces as "rounding errors" or "off by a cent" or "suspect math." An extreme example: try storing the value 123456789.123456789 in float. You'll silently only store 123456789.123457. Silent, of course, until a vice president comes screaming about his reports being off.
- The scariest thing about this data type is just how quietly it rounds your mortgage amortizations, your prices per square foot, or your instrumentation measurements. If you were to specify a proper decimal data type, this type of silent rounding is not possible. You should definitely use decimal when needing decimal-type numeric data, and make a habit of never using float, even if it is the default data type for some applications.
By the way, numeric is an alias for decimal, pick one or the other, and don’t mix.
More reading on the dangers of float:
- Float Like An Approximation, Stings Like a Bee
- Real Heavyweights: Float vs Decimal
- Fresh New Horror: PK on a Float Data Type
Holy candy corn, William, you forgot uniqueidentifier! Have you gone mad? Have you forgotten about your many rants against uniqueidentifier, especially as the key of a clustered index? I have not forgotten, and uniqueidentifier can be as efficient as a rubber hose for police work. Alone, uniqueidentifier is simply a bulky, bloated, marshmallowy way to get a unique value. It is deployed appropriately in a myriad of places in Microsoft-developer products (ex: SharePoint, SQL Server Integration Services). It is no more evil than the lion stalking in the Serengeti, for fragmenting tables and bloating rowsizes is simply in the uniqueidentifier's nature. When tamed appropriately, a uniqueidentifier field can serve your three-ring circus of an application database well, but when allowed to lead the parade... well, you see.
Do you see how many different metaphors I used in the previous paragraph? Scary, right? That kind of metaphor fragmentation is just what using a uniqueidentifier as a clustered index key can do to your table, scattering the logical order all over the place. Avoid that!
Have questions about database design? Want to talk to some of the folks who have not only passed the certification exams, but helped write them? Want to talk to other developers about methods to avoid bad default data types, store complex data or deal with built-in date time zone offsets? Want to share your own data type horror stories? We have a whole team of SQL Server DBA's and Developers who would love to hear them.