Last week was the 6th annual meeting of Foundations of Open Media Software (FOMS), a yearly unconference where engineers working on video-related software get together and discuss future standards and video technology. Topics include browser technology and specifications, video formats, and more.
Google and Brightcove were the main sponsors of the event, with the Internet Archive graciously providing the meeting space. Around 45 engineers attended FOMS, representing a broad range of technology and companies including YouTube, Netflix, Dreamworks, Internet Archive, Wikimedia, Wowza, Kaltura, JWPlayer, W3C, Apple (WebKit), Chrome, Firefox, Opera, WebM, Ogg, VLC, FFmpeg, Libav, and Brightcove.
The biggest discussions at FOMS this year were around captions and subtitles (WebVTT), adaptive streaming (through Media Source Extensions), DRM (through Encrypted Media Extensions), new codecs, and real-time communication (WebRTC).
More information and session notes can be found at foms-workshop.org, but below are some of the notes compiled from Brightcove attendees.
Media Source Extensions
The media source extensions will make adaptive streaming (HLS, DASH) in HTML5 video possible by allowing developers to manually control which bytes of video are played in a video element. The API was first proposed at FOMS two years ago, and the Google Chrome team has continued to refine and implement the spec.
- Chrome: Working implementation using a slightly older API, supports WebM/VP8/VP9/Vorbis and H264/AAC/MP3
- Internet Explorer: Working implementation in IE11, supports H264/AAC
- Firefox: No support yet but working on it
- Safari: No support. Apple engineers hinted that they are working on it, but desktop only.
Most interesting notes:
- YouTube is using MSE in production now for Chrome and IE11 users. It gives them greater control over buffering and adapting, and has helped them increase "watch time", their primary metric for A/B testing player changes.
- MSE doesn't care about mixing container formats, so you could play VP9 video alongside AAC audio, or H.264 video with Vorbis audio. This happens on YouTube today, depending on which files are cached.
Encrypted Media Extensions
EME is the proposed method for providing content protection in HTML5 video.
- The spec is still being defined, but is in "last call". Currently in “working spec state”.
- It requires a key server that's not available to the public yet.
- Chrome: Supported (vendor prefixed, older spec). It's available in millions of TVs via Chrome, and also supported by ChromeCast.
- IE11: Supported (vendor prefixed, current spec)
- Safari: Has an implementation, some things are exposed in Mavericks
- Firefox: No word
Most interesting notes:
- It requires the Common Key format, which allows a video to be encrypted once and decrypted by any DRM vendor. This does however make it incompatible with any existing encrypted files not using this method.
- Youtube, Netflix, Chromecast and Google Play are all using EME in production. It's replacing Flash Access in YouTube.
- You can try it today (without the special server) using clearkey. http://simpl.info/eme/clearkey/
WEBVTT (Captions in HTML5 Video)
WebVTT is the format for providing captions (and other timed text) in HTML5 video. It is not yet up to FCC standards, and much of the discussion at FOMS was around how to get it there. Also no current provision for doing Live captioning.
- Chrome: almost full support for the current spec
- Safari: almost full support for the current spec
- Firefox: decent support, but playing catch up with other browsers
- IE: not sure
Most interesting notes:
- Demo of what types of captions need to be supported: CPC Closed Captioning Demo Video
- HLS (newer implementations) allow for a WebVTT track
- Chrome + Webm has the ability to do in band WebVTT text tracks
- JWplayer has cool demos for non-caption WebVTT. http://demo.jwplayer.com/text-tracks/thumbs.html
VP9 is now in use on Youtube for a subset of use cases
- “50%” better than VP8 (also stated as 50% bitrate at same quality)
- Many small improvements (larger macroblocks, more optimized signalling for segmented frames, better use of alt ref frames, etc.) that add up to good overall improvement.
- Hardware decoders coming in Q1 2014
- Bitstream finalized as of June 13, 2013
- Optimizing for ARM as well as Intel
- Has a lossless mode
- Unfortunately the spec is still pretty rough (lots of source code snippets, etc. -- looking into improving it though)
Opus is an open-source royalty free audio codec that is included in the WebRTC standard, and developed by Xiph/Mozilla folks.
- Covers the gamut of Internet use cases, from low-delay speech to high quality surround sound
- Combines features (and hybrid switching between) two standard codec families -- CELT and SILK
- Was developed while working with the IETF, leading to better inclusion in standards, and processes around that
- Quality on par with or better than AAC at similar bitrates
Daala is the next-generation video codec being developed by Xiph/Mozilla folks.
- Based on lapped transforms to avoid blocking artifacts -- solving major problems with lapped transforms that have come up when people considered them in the past.
- Following some of the learnings from working on Opus, as far as involving IETF / standards bodies, and developing in the open, documenting as they go.
- Working with “hardware folks” to avoid features/aspect that would make hardware encoders/decoders difficult to implement.
- Using lapped transforms bypasses a whole realm of video patents since they simply don’t apply to the methods used in Daala
- Could provide the basis for a whole new generation of “incremental improvements”
- Targeting about 20% better bitrates than H.265
- Looking at a standards timeline of 2015, with a bitstream freeze late in 2015.
- Technically a testbed for codec experiments at this point, not an actual codec.