Friday, November 07, 2008

Even More Decode Speed

The last 2-3 weeks I found my time consumed with more decoder optimization. While the core of the CineForm codec has now been around for nearly 7 years, it enhancement has never stopped, whether we are adding new pixel formats, Active Metadata, improving quality or striving for more performance. Working on the codec core is more rewarding as an engineering success is not dependent on eccentricities the third party applications like Premiere or FCP, which like to get in the way.

The decoding engine has been threaded for 8 cores for some time, but it was only efficiently using about 3-4 cores. This inefficiency was not an issue for real-time playback as the codec was already very fast, faster than necessary for real-time multi-stream playback (even on dual core systems.) Each decoder during a transition or layered effect would happily use much of the available CPU. Better codec threading was needed for a new market the CineForm is finding itself in, file based film and television archives and mezzanine storage for HD distribution. These markets have been limited by the real-time nature of tape format like D5 and HDCAM SR. If you are going to switch to file based storage, no point in limiting yourself to 1:1 real-time, you want faster than real-time for batch processing and file format conversions wherever possible. This is one reason CineForm is displacing JPEG2000 for archives, is it just too slow for batch processing in software (typically much slower than real-time 1:1, i.e. slower than tape.)

While the current public beta is more efficiently threaded for up to 8 cores, up a 50% decoder speed-up for some sources, the in-house decoder (out soon) will support up to 32 cores, ready for those new Intel powered workstations with will have 6 and 8 cores per physical part coming very shortly.

Some performance numbers from my stock HP xw8600 8-core 3GHz workstation:
444 1080p 12-bit per channel StEM footage -- 64fps.
444 1080p 12-bit per channel Stereo (3D) -- 43fps per eye (86fps total throughput.)
RAW 4K 12-bit per channel with demosaic (no GPU acceleration) -- 22fps.
RAW 4K 12-bit per channel decoding at 2K (no GPU acceleration) -- 59fps.

All testing used Build 186 of Prospect 4K beta.

4 comments:

Anonymous said...

hmmm...seems yu are getting into the dpx rhelm for Archive

David said...

Yes we are, but at a faction of the cost.

Anonymous said...

"no GPU acceleration"
I assume this is now planned?

David said...

GPU acceleration is good for some tools and not so much others. We find that we rarely benefit as we also need to read the data back for further non-GPU processing, yet vendors like Iridas use CineForm with GPU acceleration to great beneift in their DI tools. Wavelet decoding itself is not something that a GPU can help much with as the entropy decode section is a large portion of the work (CPUs are better at that.)