Using Checksum to Ensure the Integrity of Data Files

A checksum is an indicator that signals if the original data in a file has been altered during storage or transmission. It’s a kind of “digital fingerprint” that triggers an alert of any change at all in a file’s structure.

Checksums have been used in digital cinema for years. It ensures that what was recorded onto a memory card is precisely copied to another medium. Missing bits can cause a video clip not to play. Most have had it happen to us in the past.

Checksums are unique values that are generated by cryptographic algorithms. These values are generated based on the input and are stored and transmitted with the file. Checksums are typically created using readily available open-source tools.

In general, checksums are used to catch accidental errors in data. Hash values — a numeric value of a fixed length that uniquely identifies data — represent large amounts of information and are normally used with digital signatures. Hash values identify potentially malicious data changes and tend to be larger in size than checksums.

If a file changes in the slightest way, the resulting checksum is completely different. Depending on how checksums are being used, this mismatch will result in a message that warns that the values don’t match. It will prevent the user from installing or uploading the corrupted or altered software, and will prevent undetected and unauthorized file modifications.

Checksums don’t say where in the file the change occurred — only that the file is not the same as the original.

In video, the main uses of checksums are to ensure that a master file has been correctly received from a content owner and then transferred successfully for transmission or preservation storage. It is also information given to file users in the future so they know that a file has been correctly received without errors.

A checksum procedure creates a “chain of custody” between the producer of content and those tasked to store or distribute it. If there are multiple copies of files, checksums can be used to monitor each copy and signal if any one of them has changed. If this happens, a new file can be made and checked again.

A new checksum for each file copy is redone on a regular basis and compared with the correct reference value. If a file is found to be corrupted in some way, a process called “data scrubbing” creates the new file.

Checksums can detect unwanted changes in digital materials, including those done deliberately. If the checksum changes, it requires new ones to be established as a way of checking data integrity of the file going forward.

Even archived video files should be checked against their checksums on a regular basis. How these checks are performed depends on the type of storage, how well it is maintained and how often the content is used.

As a general guideline in file preservation, checking data tapes might be done annually and checking hard drive-based systems might be done every six months. More frequent checks allow problems to be detected and fixed sooner, but at the expense of more processing resources. A balance is needed depending on the situation

Checksum data can be stored in a variety of ways. It can be kept in a database or with the file itself in the storage system. Checksum tools are integrated into many digital preservation systems.

For example, generating checksums may be part of the ingest process, which adds this information as part of ingested files. There are several levals of checksum algorithms, which are stronger and increasingly more secure

The stronger the algorithm the harder it is to deliberately change a file in a way that goes undetected. This can be especially important where there is a need to prevent malicious corruption or alteration of digital materials, such as data to be used in trials or other legal proceedings.

Standard checksum algorithms are used to detect accidental loss or damage to files in typical storage situations. Guides for this are used by the National Digital Stewardship Alliance (NDSA), which recommends four levels of digital preservation depending on the sensitivity of the content.

Write blockers are recommended for higher levels of content protection by the NDSA. This is to prevent write access to digital media prior to being copied to a preservation storage system. For example, if digital content is delivered on a hard disc drive or USB key then a write blocker could prevent accidental deletion of this digital material when the drive or key is read.

Digital material might not be stored on physical media. Perhaps the data is kept on a server or sent through a network transfer. In such situations, write blockers don’t apply. Other techniques are often used to create “read only” digital files.

Write blockers are not always used and are often not the right choice for certain types of media. If a write blocker is applicable, then the extra hassle must be balanced against the need to ensure rigorous data authenticity. In less sensitive applications, the use of write blockers is often considered unnecessary.

In our increasingly digital world, checksums are used in multiple ways and play an important role in data protection and cybersecurity. Even if you don’t understand the technical intricacies of the technology, use it to ensure that your most valuable files are protected.

(A checksum tool targeted for digital cinema use is Copy That from OWC.)

Writer at Broadcast Beat
Frank Beacham is a New York-based writer, director and producer who works in print, radio, television, film and theatre.

Beacham has served as a staff reporter and editor for United Press International, the Miami Herald, Gannett Newspapers and Post-Newsweek. His articles have appeared in the Los Angeles Times, Washington Post, the Village Voice and The Oxford American.

Beacham’s books, Whitewash: A Southern Journey through Music, Mayhem & Murder and The Whole World Was Watching; My Life Under the Media
Microscope are currently in publication. Two of his stories are currently being developed for television.

In 1985, Beacham teamed with Orson Welles over a six month period to develop a one-man television special. Orson Welles Solo was canceled after Mr. Welles died on the day principal photography was to begin.

In 1999, Frank Beacham was executive producer of Tim Robbins’ Touchstone feature film, Cradle Will Rock. His play, Maverick, about video with Orson Welles, was staged off-Broadway in New York City in 2019.
Frank Beacham
Broadcast Beat - Production Industry Resource