The Evolution of Low-Latency Video Streaming


In the past few years, the video streaming industry has seen immense interest in low-latency OTT (Over-The-Top) streaming, which is often considered the enabler to many emerging OTT video applications, such as online betting and streaming live events that require high user interactivity (e.g., online webinars, online education, live sports events, video gaming, and video surveillance). Conventionally, live OTT streaming technologies such as HLS and DASH, with 20+ seconds delay from the livestreaming edge, are considered slower than live cable broadcasting, which has sub-5 seconds delay. Such views come from the fact that both HLS and DASH encode a video stream into segments a few seconds long. A video segment cannot be decoded and rendered unless the whole segment is downloaded. Furthermore, streaming players often have to buffer a few segments before the video starts to play. Though such additional buffering may provide users a smoother OTT streaming experience over the highly dynamic public Internet, it further increases streaming delay.

In recent years, both HLS and DASH standards have introduced low-latency modes known as Low-Latency HLS (LL-HLS) and Low-Latency DASH (LL-DASH), aiming to cut down streaming delay in order to accommodate highly interactive applications. LL-HLS and LL-DASH share some common design principles: (1) Chunked Video Encoding, which encodes a long video segment into a sequence of shorter chunks; and (2) Chunked Segment Transfer, which transfers shorter video chunks to streaming players as soon as they are generated. Ideally, at any time during a livestreaming session, an LL-HLS or LL-DASH player has to buffer only the latest video chunk generated by the encoder before it can start decoding and rendering the video stream. In this way, the streaming delay is reduced from 1 segment long (e.g., 10 seconds for HLS live) to 1 chunk long (e.g., a few hundred milliseconds long).

There are currently multiple open source or proprietary implementations of LL-DASH/LL-HLS servers and players on the market. Many of them have demonstrated lower streaming delay when only a single-bitrate HLS/DASH stream is used and when they stream over high-speed network connections. However, the real performance of these low-latency server/player implementations under a more sophisticated environment is still to be further evaluated.

In this work, we first develop novel low-latency livestreaming evaluation frameworks for both LL-HLS and LL-DASH. We then evaluate the performance of such low-latency players and streaming protocols using these frameworks. The evaluation is based on a series of livestreaming experiments repeated using identical video content, encoding profiles, and network conditions, emulated by using traces of real-world networks. A variety of system performance metrics, such as average stream bitrate, the amount of downloaded media data, streaming latency, as well as buffering and stream switching statistics, have been captured and reported in our experiments. These results are subsequently used to describe observed differences in the performance of LL-HLS and LL-DASH-based players and systems.

Overall, based on our experiments, we observe that current implementations of LL-HLS and LL-DASH technologies do not appear to be fully mature. Additional improvements in the implementations of LL-HLS and LL-DASH streaming clients, encoders, and servers are likely to be needed to make them more reliable and widely deployable at a mass scale.