Everyone would agree that media content libraries are extremely valuable. Even more so as we are seeing a sudden and growing trend for archive content to become popular, making it monetisable a long time after first being created. Ensuring your content is preserved and usable long into the future has therefore become extremely important and yet many broadcasters and content providers are not taking the necessary steps to ensure fixity and digital preservation.
A short time back, the National Digital Stewardship Alliance (NDSA) published a number of excellent articles looking at the different levels of digital preservation you should apply to your content and how you can measure it. The concepts they described are valid and still very relevant nowadays, however the media industry has changed dramatically since that time and some adjustments could be made to the importance of some of the categories they defined.
Why is Fixity Important?
A fundamental goal of digital preservation is to establish and check its “fixity” or stability. In the context of digital preservation, fixity is the property of a digital file or object being fixed or unchanged. This is synonymous with bit-level integrity. Fixity information offers evidence that one set of bits is identical to another.
The PREMIS data dictionary defines fixity information as “information used to verify whether an object has been altered in an undocumented or unauthorized way.” Fixity information is normally based on checksums or cryptographic hashes.
There are a whole range of reasons to collect, maintain and verify fixity information on your digital content:
- Assure the good reception of the content
- Assure the content hasn’t changed unexpectedly
- Assure the content hasn’t changed in transfers
- Support the repair of corrupted or altered content
- Monitor hardware degradation
- Allow change in a portion of the content leaving the rest intact
- Support the monitoring of production or digitalisation processes
- Document provenance and history
- Detect human errors in the manipulation of the content
When and Where Should Fixity be Generated?
There are different approaches on how and when to generate fixity information. That could be on ingest, on transfer, at regular intervals, or when content is moved into storage systems, for example. Sometimes, it may be generated on portions of the content.
Obviously, the generation of fixity information means that you have to access the content. The uncertainty principle can be applied here in the sense that accessing the content to extract the fixity information will generate different effects in the systems holding or accessing the content depending on how and how often fixity is generated and checked. Some of those effects can be:
- Removing CPU time from other services because they are calculating fixity information
- Degradation of the hardware holding the data
- Redundant fixity information if different systems are calculating it
All of this means that when you generate fixity depends on your specific workflow and a number of important factors, such as what other actions are happening at a given time. That will all need to be taken into consideration to ensure this is achieved with minimum impact.
Another very important aspect to take into account is where to store fixity information. Again, there are different, and all very valid, approaches. It could be stored in the object’s own metadata, together with the content, embedded in the content itself, or stored in a separate database. The key is to ensure that you have a system in place that enables you to easily extract that information when needed.
Is Metadata as Important as Content?
The NDSA defined five general categories in order of importance:
- Storage and Geographic Location
- File Fixity and Data Integrity
- Information Security
- File Formats
However, over the past few years in the media industry, the metadata has become as important as the data. This is because without metadata, it is impossible to find the correct content at the right time which makes it impossible to monetise your library, especially as it grows. I would argue that protection of metadata should therefore be as important as protecting the content itself.
It has always been my view that metadata should reside with objects in your storage and get the same protection and policies as the data. In any media workflow, fixity information should be calculated and checked at different points in the workflow and the more automated that process, the better. By continually checking this information, you will know immediately when something has changed. With the right tools, you can therefore regenerate and verify fixity information about the content, including the metadata, in order to repair the content in that instance, and even repair the content with a good instance of the object.
This is especially important as content and associated metadata is moved around, meaning you can detect any transfer errors early. At the same time, being able to control that process means you can manage how the fixity generation affects the performance of hardware to ensure limited impact on your operations and workflow.
Ensuring Security and Integrity
Security and integrity of content are vital for broadcasters and content providers wishing to achieve and maintain a high level of preservation of their content. This should always include the metadata as without it, content is practically useless.
Implementing media workflows should therefore always include ensuring a high level of digital preservation.
By Francisco Ontoso, CTO, Object Matrix
- Envoi Partners with Meta to Provide Metadata as a Service - January 11, 2021
- Play Forges Closer Partnership with Veset Through Investment - December 2, 2020
- The Finish Line Signs 5-year MatrixStore Cloud Archive Contract - November 25, 2020