Understanding Metadata – And Why You Need More of It (Part 2)

October 26, 2016 by Pete Johnson, Director of Product Management

In Part 1 of this blog, we explored what metadata is, using an analogy to digital music stored on your phone or computer. Now let’s extend that analogy into the enterprise data management space, where we would like to use metadata to tag important files. Maybe invoices can be tagged with the Company Name, Billing Date, and Invoice Amount. Or a sales presentation would be tagged with Author, Subject, Client, and Presentation Date.

Having this metadata attached to files enables not only search, but other valuable actions such as:

  • Finding duplicates or near duplicates
  • Grouping files by company, date, or type of document
  • Linking documents together for an audit trail
  • Creating an index of documents

According to IDC, approximately 90%1 of the data in a company is unstructured or “dark”, meaning nobody knows what it is or where it resides. And the cost to store that data is $3212 per TB2. So getting your arms around that data not only improved productivity, it reduces costs. The key to doing this is to properly organize your files and tag them with metadata.

But how is this metadata generated? Sadly, most of it is not. This is why most of a company’s documents are not reused, because they are hard to find and manage. The sales presentation probably doesn’t have the important metadata attached to it that would make it easy to find, rather employees waste time searching on words they hope are contained in the file.

There are some specialized systems that may generate metadata during the process, for instance an invoicing system may require specific fields to be entered and then that data is tagged accordingly. But most of that metadata does not extend beyond the users of that specialized system, and therefore is locked away from everyone else. And most documents that exist outside of such a system usually have little to no metadata associated with them. Also, there is no way to easily connect a document in such a system with other documents outside of it, therefore multiple searches are required to locate all the documents associated with a particular customer, for instance.

So one key to being able to get the most out of your stored documents is to ensure that they are properly tagged at the time that they are created and/or modified. Most ECM (Enterprise Content Management) or CMS (Content Management Systems) have support for adding tags or metadata. As files are saved to the system, users are prompted to add the metadata based on the type of file it is. An invoice may require a certain set of fields, while a customer presentation has a different set. As more and more documents are added to the system, the metadata collection grows enabling for accurate search in the future.

But what about documents that don’t live in such a system? How can metadata be attached to these files? Typically, little is actually done. While the computer file system probably saves a “date created” and “date modified” tag, little else is usually attached.
This is where an automated tagging system can help. By being able to first classify each document type, and then extracting key data from those files, the process of adding metadata to these important document types can be significantly automated.

In summary, your company probably has many Terabytes or even Petabytes of data and documents just eating up storage space and creating no value for you. By properly tagging this data, you can understand what you have, where it is, and reclaim that last value.