In a recent code review with our DBA (Database Administrator), he pointed out that EF (Entity Framework) Code-First had used some less than desirable conventions when generating our database. This normally doesn’t happen when someone manually architects the database but when you let a tool create it for you sometimes best practices can slip through the cracks.

The issue we came across is that EF Code-First, by default, creates all strings as nvarchar(MAX) when it generates the database. Depending on your scenario, this may be exactly what you want; however, for us, we had no need for Unicode characters and (MAX) has some limitations (e.g. partitioning, indexing, etc.) that specifying a size (4000) will not have. Also, (MAX) is designed to store up to 2GB of data which would be used to store a book or a very large amount of searchable text. It is always a best practice to specify a size that makes sense to your business needs.

Here is a common POCO class that you may have in your project:

public class Person
{
public int Id { get; set; }
public string FirstName { get; set; }
public string MiddleInitial { get; set; }
public string LastName { get; set; }
}

With this code, FirstName, MiddleInitial, and LastName will all be nvarchar(MAX). First, no one’s name will ever need this many characters and this is just wasted space in your database. This may not seem like a problem now, but when you start looking at scalability and you have millions of rows, all of those bytes add up.

We can fix the size issue by adding lengths to each property:

public class Person
{
public int Id { get; set; }

[MaxLength(30)]
public string FirstName { get; set; }

[MaxLength(30)]
public string MiddleInitial { get; set; }


[MaxLength(30)]
public string LastName { get; set; }
}

So how do we change NVARCHARs to VARCHARs to remove the Unicode? EF provides an attribute for specifying or overriding the type:

public class Person
{
public int Id { get; set; }

[Column(TypeName = "varchar")]
[MaxLength(30)]
public string FirstName { get; set; }

[Column(TypeName = "varchar")]
[MaxLength(30)]
public string MiddleInitial { get; set; }


[Column(TypeName = "varchar")]
[MaxLength(30)]
public string LastName { get; set; }
}

This is will make our DBA’s happy; however, now we as developers won’t be happy as we have to override every single property. There has to be a way to do this globally, right? If you are NOT using EF 4.1 or higher, then you can use pluggable conventions to make this change: [1]

public class MakeAllStringsNonUnicode :
IConfigurationConvention<PropertyInfo,
StringPropertyConfiguration>
{
public void Apply(PropertyInfo propertyInfo,
Func<StringPropertyConfiguration> configuration)
{
configuration().IsUnicode = false;
}
}

We then register our convention by overriding the OnModelCreating in our derived context: [1]

public class MyDataContext : DbContext
{
protected override void OnModelCreating(
ModelBuilder modelBuilder)
{
modelBuilder.Conventions.Add<MakeAllStringsNonUnicode>();
}
}

So what if we are using EF 4.1 or higher? It can be done but it takes a little more work: [2]

public class MyDataContext : DbContext
{
protected void SetAllStringPropertiesAsNonUnicode(
DbModelBuilder modelBuilder,
Type entityType)
{
var stringProperties = entityType.GetProperties().Where(
c => c.PropertyType == typeof(string)
&& c.PropertyType.IsPublic
&& c.CanWrite
&& !Attribute.IsDefined(c,
typeof(NotMappedAttribute)));

foreach (PropertyInfo propertyInfo in stringProperties)
{
dynamic propertyExpression =
GetPropertyExpression(propertyInfo);

MethodInfo entityMethod =
typeof(DbModelBuilder).GetMethod("Entity");
MethodInfo genericEntityMethod =
entityMethod.MakeGenericMethod(entityType);
var entityTypeConfiguration =
genericEntityMethod.Invoke(modelBuilder, null);

MethodInfo propertyMethod =
entityTypeConfiguration
.GetType()
.GetMethod(
"Property",
new Type[] { propertyExpression.GetType() }
);

var property =
(StringPropertyConfiguration)propertyMethod.Invoke(
entityTypeConfiguration,
new object[] { propertyExpression });
property.IsUnicode(false);
}
}

private LambdaExpression GetPropertyExpression(
PropertyInfo propertyInfo)
{
var parameter = Expression.Parameter(
propertyInfo.ReflectedType);
return Expression.Lambda(
Expression.Property(parameter, propertyInfo), parameter);
}

private IEnumerable<Type> GetEntityTypes()
{
return this
.GetType().GetProperties()
.Where(a => a.CanWrite &&
a.PropertyType.IsGenericType &&
a.PropertyType.GetGenericTypeDefinition()
== typeof(DbSet<>))
.Select(a => a.PropertyType
.GetGenericArguments()
.Single()
);
}

protected override void OnModelCreating(
DbModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);


foreach (var entityType in GetEntityTypes())
{
SetAllStringPropertiesAsNonUnicode(
modelBuilder, entityType);
}
}
}

You can see how this takes a bit more code, but this is far less tedious and less error-prone than having to attribute every single string property in your model with “varchar” for the column type.

Disclaimer: This is a change that should be done during the development phase. Once you are in production, changing the size or type has possibility to lose data and may cause EF to fail so make sure you have migrations or data-conversion in place before you attempt this in production.

Sources: the first sample code for changing the data type [1] on pre EF 4.1 was taken from MSDN's blog and the second code example [2] was taken from an answer on StackOverflow. I included the code directly in case those pages are no longer available.

Is this interesting, want to learn more, or are having the same discussions in your own department? Get in touch with us so we can discuss how we can help your business achieve its goals through leadership and technology.


Information and material in our blog posts are provided "as is" with no warranties either expressed or implied. Each post is an individual expression of our Sparkies. Should you identify any such content that is harmful, malicious, sensitive or unnecessary, please contact marketing@sparkhound.com

Get Email Notifications