Wednesday, November 19, 2008

Intel Core i7 and CineForm

Wow! That was my immediate reaction when I first did a CineForm decoder performance test on the new Core-i7 processor. I've had access to these new Intel processors for a while now and I knew they where fast, I just didn't how fast. The system we very honored to have early access to was a Intel Core i7-965 Extreme Edition system (Nehalem architecture) running a 3.2Ghz quad core. When we first booted the system, we saw 8 CPUs within task manager, even though this is a quad core. These new chips have re-introduced the concept of Hyper Threading, each core can be setup as 2 virtual cores -- this means we will likely see 16 virtual CPUs in upcoming dual quad workstations. Nice! With so many virtual CPUs to run on, I knew we had to upgrade our decoder for better n-way threading (which the encoder already had.) This work I was most involved with over the last month, resulted a 50% boost in frame rate over our already fast decoder on Core-2 architecture dual-quads. Now it is time to test Core i7-965 .

In these tests I compared my beloved xw8600 HP workstation 3.16Ghz 8 core with 4GB RAM, running XP-Pro 4GB RAM, with a gaming configured desktop Core-i7 4 core running Vista 64 and 3GB RAM. No operations used GPU assistance.

Running with only half the number cores, this new processor nearly doubles the average performance of CineForm HD and 2K in 4:2:2, 4:4:4 and RAW formats, and even approaches real-time full resolution playback of 4K (a workstation class Core-i7 will be playing back 4K without issue from CineForm RAW or 4:4:4 encoded sources.) All this frame rate overhead greatly eases multiple stream processing and allows for huge efficiency increasing in batch processing of mezzanine and image archives. It also allows for much more Active Metadata (AM) processing through CDL style color databases, 3D LUT film looks and other yet to annouced AM features.

Intel and/or HP, when can I get my hands on an i7 dual Xeon? Please.

Where i7 didn't scale as well was with high quality 4K RAW demosaicing filters. Both the 4K R3D decodes and high quality debayer modes in CineForm RAW produce minimal speed-ups from 8-core Core2 to the 4-core i7 (still amazing considering the reduction in cores.) Looking at our own code, the demosaic has not used much of the SIMD (media) instruction set, nor is it particularly memory I/O limited, just lot of operations per pixel. It seems we do have room for more performance optimization in the demosaic.

All the transcodes where performed using the CineForm R2CF utility that comes with our NEO 4K and Prospect 4K products. R2CF has a very efficient implementation of the R3DSDK, allowing for close to 100% CPU utilization. I have included the R3D to CineForm transcode times, as REDCODE is known to be a particularly compute heavy format. These times do also include time for a CineForm encode, but this only effects the FPS numbers by around 10% as our encoder is very fast (up to ten times faster than a R3D decode plus additional processing [adding curves and color space controls.]) I'm showing the combined numbers as that is the CineForm workflow for R3D, you do a R3D decode once to convert to CineForm, then work with (decoding multiple time) CineForm files for the extra speed and flexibility.

I expect another factor in the widening margins for the CineForm decode performance on the Core-i7, is we avoid arithmetic coding, which is tricky and compute intensive for CPUs (and nearly impossible for GPUs to do efficiently -- we are asked about GPU acceleration often.) CineForm codec was always designed for speed on Intel processors, where faster memory and faster media instructions almost directly relates to proved frame rates, as we have compute-lite entropy coding engine. While arithmetic coding would increase bit-efficiency maybe 5-10%, the performance gains of 4-6X by not using it, made the easy choice when we started this codec work 7 years ago (on 1.7GHz P4s using MMX -- the fastest we could get could only do NTSC/PAL SD in real-time.) Now someone needs to suggest a fun use of for Express files running at 450 frame per second.

3 comments:

mikehedge said...

nice.

Anonymous said...

Hi guys!
Please note- the download link to the free neo-player (windows) on your site is broken! there's simply no way to download an up-to-date neo-player for windows. I uninstalled my neo-4k trial, and installed an (older) free neo player. Now my cineform-qt files generated from RED look totally different inside after effects...

David said...

Wrong build number, the link works with 188 instead of 186, it will be fixed in a few hours. Yes, going backwards in software version is not a good idea.