Brightcove
Support+1 888 882 1880
Products
Solutions
Resources
Company
Search IconA magnifying glass icon.
Talk to UsRequest a Demo

Back

By Jon-Carlos Rivera

Former Principal Software Engineer at Brightcove

Using an MP4 Inspector to Analyze Video Transmuxing Output

Tech Talk

While building Mux.js—the transmuxer at the heart of videojs-contrib-hls—we faced a problem: How do we determine if the output from Mux.js is correct?

Early on, we managed to figure out how to coax FFmpeg into creating MP4s from MPEG2-TS segments that would play back in a browser with Media Source Extensions (MSE), which at the time meant only Chrome. However, we needed a simple way to compare the output of our transmuxer with what was produced by FFmpeg. The comparison had to be aware of the MP4 format since the two outputs are extremely unlikely to be byte-identical.

Building an MP4 Inspector

The answer to that problem was to build an mp4 inspector—a tool that would parse MP4s and display a sort of JSON-like dump of any relevant boxes and their contents. By generating a dump of the output from Mux.js and comparing it to a known-good fragment generated with FFmpeg, we could see where our transmuxer's output differed.

The mp4 inspector was built as a web page so that we can have a graphical color-coded diff of the two segments. Over time, the page gained a video element and we started appending the results of transmuxing segments directly into the video element's MediaSource to aid in instant feedback and validation of changes to Mux.js.

As development continued, we would sometimes encounter streams that would fail in new and interesting ways. Some of these failures were, admittedly, due to bugs in Mux.js. As Mux.js itself became more robust, failures were increasingly caused by problems with the streams or issues with a particular implementation of the MSE specification.

It eventually dawned on us that we really needed to learn more about what was happening inside of those videos. We needed to see not just what was happening at the media container level but we had to go deeper—we needed to peek into the video data itself. For that purpose, we created Thumbcoil.

Thumbcoil is a suite of tools designed to give you a peek into the internals of H.264 video bitstreams contained inside either an MP4 or MPEG2-TS container file. Using the tools in Thumbcoil you can get a detailed view of the internal structure of the two supported media container formats.

In addition, the tools have the ability to show you the information contained within the most important NAL units that make up the H.264 bitstream. Ever wonder what kind of secret information the video encoder has squirreled away for decoders to use? With Thumbcoil, you can see for yourself.

Why We Built an MP4 Inspector

In 2016, there were very few good tools to generate a somewhat graphical display of the structure of media containers and the data that they contain. Debugging problems with video playback is usually a tedious task involving various esoteric FFmpeg and FFprobe incantations. Unfortunately at it's best, FFprobe is only able to print out a small portion of the data we were interested in.

The exact data inside of the various parameter sets for instance is not available via the command-line. Inside of FFprobe, that data is parsed and stored but there is no easy way to dump that information in a human readable form.

In H.264, there are two special types of NAL units: the seq\_parameter\_set (SPS) and the pic\_parameter\_set (PPS). These two NAL units contain a lot of information. The decoders require this information to reconstruct the video.

Thumbcoil not only provides parameter set information in excruciating detail but also keeps the information with its surrounding context—the boxes it was contained by or the frame it was specified along with. This context is often very important to understanding issues or peculiarities in streams.

How We Built an MP4 Inspector

One of the more interesting things about how Thumbcoil parses parameter sets is that is builds what is internally called a "codec" for each NAL unit type. These codecs are specified using what is essentially a fancy parser combinator-type setup.

Much of the data in the two parameter sets are stored using a method called exponential-golomb encoding. This method uses a variable number of bits to store numbers and is particularly suited to values that tends to be small.

Each function used to build the codec returns an object with two functions: decode and encode. This means that we can specify the format of, say, a seq\_parameter\_set NAL unit just once and then we can both parse from and write to the bitstream for that particular NAL unit.

The grammar used to specify NAL unit codecs is very similar to the grammar used by the H.264 specification (ISO/IEC 14496-10). The data-types that the codecs in Thumbcoil understand are, with some extensions, merely the same types defined in the specification such as signed- and unsigned- exponential golomb encoded integers.

In addition to the parameter sets, Thumbcoil provides insight into the structure of the slice layers themselves by parsing the slice\_header data. However, we stop short of parsing any of the actual slice\_data because things quickly become more difficult and less useful as you descend into that detail.

As with all Video.js projects, Thumbcoil is open-source software and we welcome suggestions, issues, and contributions on Github.

Technical Glossary

  • Transmuxer. A transmuxer takes media contained in some file format, extracts the raw compressed video and audio from inside (a process called demuxing) and repackages the compressed data into another format (termed remuxing) without performing any re-compression.

  • MP4. MP4 files are composed of boxes—hierarchical logical units that, conveniently, all start with a 32-bit length and a 32-bit box-type. Boxes will often contain other sub-boxes.

  • Media container. Inside a media container such as MP4, video and audio are contained in data called bitstreams. Bitstreams are the data produced by encoders to represent the audio signals or video frames. Some common bitstreams are AAC for audio and H.264 (AVC) for video.

  • NAL units. An H.264 encoded bitstream is composed of what are called Network Abstraction Layer (NAL) units. NALs are a simple packet format designed to use bits as efficiently as possible.


BACK TO TOP