Nothing Shines Light Like a Report

March 22, 2015

This blog post isn't just inspired by my recent forced abandonment of head hair, but by recent light-shining reporting projects on client data and the oh-so predictable problems that arise.

Business Intelligence projects are valuable - if not vital - to modern businesses, and a component of most projects Sparkhound works on in the Application Development and Enterprise Applications departments.

But nothing shines light on an application's data like a report.

Or rather, nothing embarrasses an application like a report.

From simple reports to PowerPivot tables to data warehouses with OLAP on top, these reports often expose problems in underlying relational data integrity and data quality.

And let me be clear - the front end of your application is responsible for most of these problems. Just don't take it personally.

Here's what you should expect when embarking on a new reporting project, and why.

 

"What do you mean there's two? The application doesn't allow that... does it?" 

Data duplication is a very common problem faced by new business intelligence systems. Key here is a distinction between the primary key of a table and any further unique constraints that should have been in place, but weren't.

  1. The primary key is a surrogate key (like an auto-incrementing value), but the actual value the business user sees can be duplicated by accidental operation of the application, causing the set of child records to be related to one of two valid parent records.
  • For example:
    1 | Baton Rouge Office | 225-216-1500
    2 | Houston Office | 713-401-5150
    3 | Dallas Office | 972-295-9110
    4 | Baton Rouge Office | 225-216-1500
Business keys are enforced to be unique in a table - but many different versions of the same text exist. When shown a distinct list of all "Companies" on a report, this lack of data quality will be immediately evident.
  • For example, instead of making users choose from a list of values, a free text field was the only data entry method. "Sparkhound", "Spakrhound" and "Spark hound" all existing in a table, because misspellings in free-text fields allowed them to.  
  • Untrimmed data in an application can be very confusing when that data is visualized. "Pipe" and "  Pipe" and "  Pipe  " appear the same to a user, but in a report, all three values are distinct in the database.
  • For example:
    1 | Baton Rouge Office | 225-216-1500
    2 | Houston Office | 713-401-5150
    3 | Dallas Office | 972-295-9110
    4 |  Baton Rouge Office | 225-216-1500
    5 | BTR | 225-216-1500
    6 | Baton Rogue | 225-216-1500
Lack of foreign keys on tables means that your application doesn't have a failsafe against creating invalid relationships - causing invalid record associations to vanish from a report, or depending on the visualization, appear to all be related to one unassociated parent record.
  • "Assumed" relationships without direct foreign key data may visualize on a report as "everything is related to everything".
Don't assume a user will search for an existing match before typing in the information all over again. In fact, users will typically follow the fastest data entry path – so providing a fast "combo box" search result will alter user behavior and prevent duplicate data entry. Compared to a paged list, a direct lookup on a key performs better and is more likely to find a matched record, preventing duplicate data entry.

 

"Why are we showing data for January 2051? And why are these two dollar figures off by one cent?"

Incorrect data types can have devastating consequences in reports, not to mention a drastic reduction in performance or increase in storage cost.

  1. Numbers stored as varchar won't sort, validate or aggregate correctly.
  • Example:
    1
    10
    11
    ..
    2
    20
    21
Variable length strings stored as char won't trim and compare correctly.
  • Example:
    "Baton Rouge" does not equal "Baton Rouge     ", though impossible to see without the quotes.
Dates stored as varchar won't validate, sort or query correctly.
  • Example:
    1/1/2015
    1/2/2015
    1/3/2051
    1/4/201
Numbers with 3+ significant digits after the decimal stored as float or real won't be stored correctly and will lead to rounding errors.Use regular expressions and enforced formats to ensure that codes are stored with expected spacing, special symbols or punctuation. Think of how this data will look when viewed all at once, instead of just one-row-at-a-time.
  • Example
    111-11-1111
    111111111
    1111-11111
    (111)111-1111

 

"Why would a user ever do that? Our users are smarter than that, they know better."

Applications that fail to enforce drop-down lists, data types, data length limits, null/emptystring differences, or other "sanity checks" for users will inevitably provide bad data to a report.

  1. Abbreviations entered into freetext fields are unreliable over time and between users. Use drop down lists populated by reference tables of known codes/abbreviations.
  • Example:
    Hair Color: "BL
    Eye Color: "BL"
Assuming – but not ensuring - users will follow the standard business logic is NOT a sound development practice.
  • Example:
    User creates a new office location without providing a physical address – the office doesn't show up on a heat map report.
Provide a date-selector with a calendar instead of relying on the user to enter MMDDYYYY formatted data, and insert that data into a date or datetime data type. You will end up with February 30th, DDMMYYYY, or "Aug 20 2200".Dates that intelligently follow business logic will also ensure that your duration and delay calculations make sense. Enforce that StartDate must be before EndDate, Checkin Date must be before Checkout Date, and don't allow overrides.

 

"It's always going to be this way… Well, 99% of the time it's going to be like that... Ok, in the vast majority…"

Tighten up your business logic and handle use case exceptions with well thought-out approaches, exceptions and approvals.

  1. Don't allow bogus data to signify special exceptions.
  • Example
    Current Status:
    100
    101
    102
    "CANCELLED"
    "OTHER SEE NOTES"
Don't allow important data to be stored in "comment" or "notes" fields. Open text fields are nearly impossible to use in any sort of aggregation strategy.
  • If your users need to track history, auditing information, change log comments, or important dates, they should not be in a "text box".
If a field is required, don't let the users bypass your validation by entering dummy data. If data is required, don't allow users to enter a blank, select "dummy" data or type in any variation of "unknown".
  • Example:
    Don't allow "   ", "1/1/1900", -1, (222) 222-2222 or 000-00-0000 or any variations.
Avoid the use of "variant" or "custom" string fields that may vary from user to user. Avoid storing important data in profile-specific fields.
  • Example:
    CustomString1:
    $123.45
    "FORD"

    1/2/2014

 

"I need to know how the data looked this time last month. I need to know how long it took this to move through our pipeline. Also, I need to see who messed this data up."

If you need intermediate and history to be referenced in the application, don't rely on the time-capsule nature of a data warehouse or "reporting copy".

  1. Applications should capture intermediate transactions and avoid overwriting important data if the "old values" would be helpful to the business. Intermediate status timestamps can be extremely valuable in measuring pipeline velocity or identifying process bottlenecks, and should be captured by the application.
  • Example:
    "It took an average of .5 days to go from Status 1 to 2, but it took 2.4 days to go from Status 2 to 3. That's where our efforts should be focused!"
"Slowly-changing dimensions" are a data warehousing concept that provides a structure for handling changes to key data and storing "as of" data over time. But no reporting system should only look forward- you want to be able to show dates back to the start of the application data.
  • Example:
    "Well, I'm sorry but this data was overwritten in the application. The data warehouse will be able to start providing year-over-year data… next year."
  1. Audit tracking, including "modified by" and "modified date" data, should be created by the application. Many business have regulatory environments requiring change tracking – even by administrators or "override" users.
  • The built-in SQL Server features Change Data Capture and SQL Audit may help your application accomplish these with significantly less complexity and overhead than a custom application-based solution.
In this day and age, aggressive data retention policies that delete or "archive" data older than a certain number of years are obsolete.
  • SQL Server table partitioning, data compression, and filtered indexes can help your application perform just as fast with 5 or 15 years of data in the same table.

 

Want to learn more about business intelligence, reporting or application design? Sparkhound employs 40+ folks in our Application Development department, and we handle projects like this all day long. We'd be happy to discuss your reporting solution or application design.

Or, try attending an upcoming SQL Server User Group meeting (www.sqlpass.org) or register for an upcoming SQL Saturday event (www.sqlsaturday.com). There are PASS user groups and SQL Saturday events all over the world, so there is surely one near you.

Information and material in our blog posts are provided "as is" with no warranties either expressed or implied. Each post is an individual expression of our Sparkies. Should you identify any such content that is harmful, malicious, sensitive or unnecessary, please contact marketing@sparkhound.com.

Meet Sparkhound

Review our capabilities and services, meet the leadership team, see our valued partnerships, and read about the hardware we've earned.

Engage with us

Get in touch with any of our offices, or check out our open career positions and consider joining Sparkhound's dynamic team.