With all of the rush and hubbub about digitize that and transforming this, I should know, I live in the Digital Transformation world, I would like to think that business has moved completely into the digital age and the days of documents are in the past… The reality is that documents held and continue to hold massive amounts of data. What is also the reality is that getting to that data, and managing the documents is still as laborious a process now as it was when the world was still paper and pencil.

As much as techno-geeks, like myself, like to deride them, documents are still the most flexible, and friendly, way for businesses to record and do business with one another. In many industries, the document is an absolutely critical piece of their business puzzle. Where would the legal space be if they didn’t have contract documents to read and sign and agree to?

In the near past, only businesses that had massive amounts of data, and funds, could do something about this document conundrum with Artificial Intelligence and Machine Learning. What has recently happened is that the big tech companies have begun to “democratize” their Artificial Intelligence systems for the normal everyday company. With the advent of their Content AI suite of products, Microsoft is leading the way with its Microsoft Syntex product.

Let’s step back just a second and talk about just what is Content AI. For something that is so very complex and technical under the hood, it is a remarkably simple concept to explain. Content AI is the use of a computer, that can be trained like a human, to pick out and extract relevant information from a document.

What’s a Model?

Ok, trained like a human… what does that even mean? Well… Think of an invoice. If you had a person who had NEVER seen an invoice before, how would you tell them to distinguish that document from another? An invoice is going to have some sort of header on it that describes the company it came from, it is going to have a due date, a customer number, and some sort of monetary value. All invoices, have the same kind of “look” and “feel.” A human can pick up on that. After looking at a couple of invoices they know how to differentiate it from something else, say a contract, or a memo.

When a human does this initial look, they “classify” the document as an invoice. Next, you tell them that these kinds of invoices get filed in this particular bin. The next step is to recognize and transpose the information on the invoice to the system. Finally, the human is instructed to destroy old invoices that the company has determined no longer need to be kept. Done! Yay!

What a Syntex Model does is EXACTLY that process! You first create a classifier that teaches the AI how to recognize what an invoice is. Syntex can then designate a “content-type” in SharePoint (those are the columns in the library, like Title, Modified By, etc.) that says “this is an invoice, and we need these pieced of information from it.” The model can then be configured to place a records retention label on that document so that the document can be managed by whatever records policy your company has put in place.

Just these two things, designating a content type, AND attaching a records retention label are two very important, labor intensive, and, let’s be real, BORING, parts of records management. The major failure of records management systems is the misfiling or mislabeling of documents. That is nearly always caused by a fault between the keyboard and the chair. It is monotonous, not fun, and done with less than an information worker’s full attention. Syntex removes that fault from the system.

Next in our human process was to get the data locked within that invoice. In your Syntex Model, that is done with Extractors. An extractor is exactly what it sounds like… It extracts data from your invoice.

Just as you had to tell your human, who has never seen an invoice before, what to look for and what to grab, you have to tell your model those same things. Think about how you would tell someone to look for the invoice amount if they didn’t know what it was and had never seen one before. You would say that most invoices are going to have a phrase like “invoice total” or “pay this amount” or the like, and have some currency amount. This phrase, and the amount, are typically at the END of the document.

This is exactly how you train your model to extract this information. You tell the AI that before the amount there will be this phrase list, and the amount will be a currency value, all found at the end of the document.

What happens then is that the AI will find these amounts, even in “fuzzy” situations, and put them in the SharePoint list. Again, this kind of thing is terribly repetitive and boring, not to mention the inevitable typo. When human data entry is involved, data integrity is always at risk.

The AI doesn’t tire. It doesn’t get boring. Your data integrity is only limited to the clarity of the scan of the document.


The ROI that is received from a system like Syntex comes in a couple of different ways. First, most obviously, is the freeing up of your human capital from doing this mundane data entry work. This is where you will see the most dramatic benefit.

Next, with the information now tagged for each document, retrieval of that document now becomes much easier. Searching for documents is best when the document has specific metadata properties that can be used to narrow and refine searching.

We have all seen how e-commerce sites use filters and sorting to make our shopping experiences easier. We have all also seen how terrible a search experience is when the search engine only has the body of the document to refine its search on.

By extracting information from the document and placing it in metadata columns, Syntex automatically gives you properties to refine your search for documents. With our invoice example, now that the information is in the system it is dead easy to find all invoices for 2022, March from Matt’s Marker Manson (haha alliteration is fun). The vendor and the invoice date are pulled from the invoices and are now searchable, sortable, and filterable properties. All YOU, the user, had to do was upload the file.

This gives you additional soft ROI on Syntex for allowing you to find and retrieve your documents that much quicker, thus freeing up your people faster, allowing them to get to their more important work.

Finally, once you have that data extracted from the documents, you can now USE it. It is an easy thing to set up Power BI to add up your invoices for the month to track what you are spending.
Or, better yet, use Power Automate to set thresholds for automatic invoice approval, and simply send those on via API call to your payment system. No human intervention is required.

Once the Syntex Model is configured and working, your documents become just another data source for you to automate, or report on. This is at the heart of what leveraging Syntex can do to make a true and real difference in your company. This AI system is now allowing you to truly do more with less!

Additional Features

In many situations, your company may need to CREATE documents based on data that is obtained from other sources. A good example is that you have a contract company that does maintenance on your facilities. To get them to do any work, a work order must be sent to them in a specific format. You can use Syntex and a Modern Template in SharePoint to take information from specific databases or SharePoint columns to automatically generate that work order document.

This is called “Content Assembly.” It has been around for a good long time in the SharePoint world, with the use of document library columns and Word “Quick Parts.” What makes this different is that the Quick Part could ONLY pull data from the SharePoint columns, and it was very labor-intensive to set up and get working.

The Syntex iteration is just a matter of highlighting areas in a template document and then configuring the corresponding columns to connect to a data source. That data source could be another SharePoint column or something in Data verse. This makes it a simple thing to configure and generate these documents.

Closing Thoughts

In a nutshell, the primary functions of Syntex allow you to categorize your documents and quickly add them to your records retention policy. They allow you to utilize the documents that you have as actual data sources, by extracting key pieces of information, all without the intervention of a person.

Where do you need a Partner? Where Sparkhound comes in is with first the implementation of Syntex, and guiding your team on its use and benefits. Really, where Sparkhound comes in is in the Information Architecture now that you have this data… what do we do with it? It drawback of having another data source is, it is just another source of data along with all of the others that your company has. Sparkhound can help guide your company to envision what the end goal of your data and digital transformation is, and partner with you to get there.

Schedule a Microsoft Syntex demo with us today by reaching out to:


Get Email Notifications