The last two posts (Part 1, Part 2) by your intrepid compressionist were focused on dynamic concatenation and sensible rendition selection. This post continues to dive deeper towards the core of building a flexible video platform designed for growth. You can download the series "Architecting a Video Encoding Strategy Designed for Growth" as a whitepaper here.
A Quick Recap
For years, there were two basic models of Internet streaming: server-based proprietary technology such as RTMP or progressive download. Server-based streaming allows the delivery of multi-bitrate streams that can be switched on demand, but it requires licensing expensive software. Progressive download can be done over Apache, but switching bitrates requires playback to stop.
The advent of HTTP-based streaming protocols such as HLS and Smooth Streaming meant that streaming delivery was possible over standard HTTP connections using commodity server technology such as Apache; seamless bitrate switching became commonplace and delivery over CDNs was simple as it was fundamentally the same as delivering any file over HTTP. HTTP streaming has resulted in nothing short of a revolution in the delivery of streaming media, vastly reducing the cost and complexity of high-quality streaming.
When designing a video platform there are countless things to consider; however, one of the most important and oft-overlooked decisions is how to treat HTTP-based manifest files.
A Static Manifest File
In the physical world, when you purchase a video, you look at the packaging, grab the box, head to the checkout stand, pay the cashier, go home and insert it into your player.
Most video platforms are structured pretty similarly; fundamentally, a group of metadata (the box) is associated with a playable media item (the video). Most video platforms start with the concept of a single URL that connects the metadata to a single mp4 video. As a video platform becomes more complex, there may be multiple URLs connected to the metadata representing multiple bitrates, resolutions, or perhaps other media associated with the main item such as previews or special features.
Things become more complicated when trying to extend the physical model to an online streaming world that includes HTTP-based streaming protocols such as HLS. HLS is based on many fragments of a video file linked together by a text file called a manifest. When implementing HLS, the most straightforward method is to simply add a URL that links to the manifest, or m3u8 file. This has the benefit of being extremely easy, basically fitting into the existing model.
The drawbacks are that HLS is not really like a static media item. For example, an MP4 is very much like a video track on a DVD; it’s a single video at a single resolution and bitrate. The HLS manifest consists, most likely, of multiple bitrates, resolutions, and thousands of fragmented pieces of video. HLS has the capacity to do so much more than an MP4, so why treat it the same?
The HLS Playlist
An HLS playlist includes some metadata that describes basic elements of the stream, and an ordered set of links to fragments of the video. By downloading each fragment, or segment of the video and playing them back in sequence, the user is able to watch what appears to be a single continuous video.
Above is a basic m3u8 playlist. It links to four video segments. To generate this data programmatically, all that is needed is the filename of the first item, the target duration of the segments (in this case, 10), and the total number of segments.
The HLS Manifest
An HLS manifest is an unordered series of links to playlists. There are two reasons for having multiple playlists: to provide various bitrates and to provide for backup playlists. Here is a typical playlist where each of the .m3u8’s is a relative link to another HLS playlist:
The playlists are of varying bitrates and resolutions in order to provide smooth playback regardless of the network conditions. All that is needed to generate a manifest are the bitrates of each playlist and their relative paths.
Filling in the Blanks
There are many other important pieces of information that an online video platform should be capturing for each encoded video asset: video codec, audio codec, container, and total bitrate are just a few. The data stored for a single video item should be meaningful to the viewer (description, rating, cast), meaningful to the platform (duration, views, engagement), and meaningful for applications (format, resolution, bitrate). With this data, you enable a viewer to decide what to watch, the system to decide how to program, and the application to decide how to playback.
By capturing the data necessary to programmatically generate a playlist, a manifest and the codec information for each of the playlists, it becomes possible to have a system where manifests and playlists are generated per request.
Example - The First Playlist
The HLS specification determines that whichever playlist comes first in the manifest will be the first chosen to playback. In the previous section’s example, the first item in the list was also the highest quality track. That is fine for users with a fast, stable Internet connection, but for people with slower connections it will take some time for playback to start.
It would be better to determine whether the device appeared to have a good Internet connection then customize the playlist accordingly. Luckily, with dynamic manifest generation, that is exactly what the system is set up to accomplish.
For the purposes of this exercise, assume a request for a manifest is made with an ordered array of bitrates. For example, the request [2040,1540,1040,640,440,240,64] would return a playlist identical to the one in the previous section. On iOS, it’s possible to determine if the user is on WiFi or a cellular connection. Since data has been captured about each playlist including bitrate, resolution, and other such parameters, an app can intelligently decide how to order the manifest.
For example, it may be determined that it’s best to start between 800-1200kbps if the user is on WiFi and between 200-600kbps if the user is on a cellular connection. If the user were on WiFi, the app would request an array that looks something like: [1040,2040,1540,640,440,240,64]. If the app detected only a cellular connection, it would request [440,2040,1540,1040,640,240,64].
Example - The Legacy Device
On Android, video support is a bit of a black box. For years, the official Android documentation only supported the use of 640x480 baseline h264 mp4 video, even though certain models were able to handle 1080p. In the case of HLS, support is even more fragmented and difficult to understand.
Luckily, Android is dominated by a handful of marquee devices. With dynamic manifests, the app can target not only which is the best playlist to start with, but can exclude playlists that are determined to be incompatible.
Since our media items are also capturing data such as resolution and codec information, support can be targeted at specific devices. An app could decide to send all of the renditions: [2040,1540,1040,640,440,240,64]. Or, an older device that only supports up to 720p could remove the highest rendition: [1540,1040,640,440,240,64]. Furthermore, beyond the world of mobile devices, if the app is a connected TV, it could remove the lowest quality renditions: [2040,1540,1040,640].
Dynamic or Static?
Your favorite compressionist’s post on concatenation was focused on a single benefit of dynamic manifest generation; the ability to generate different combinations of clips to create an almost limitless number of videos. This post has been focused on the other benefits: targeted delivery to devices, optimized start times, and wider support for devices.
Choosing a static manifest model is perfectly fine. Some flexibility is lost, but there is nothing wrong with simplicity. Many use cases, especially in the user-generated content world, do not require the amount of complexity dynamic generation involves; however, dynamic manifest generation opens a lot of doors for those willing to take the plunge.
How manifests are treated will have a significant impact on the long-term flexibility of a video platform, and it is something that should be discussed with your resident compressionist.