Project Supervision: Dr. Jan Fröhlich, Prof. Stefan Grandinetti
High quality video sequences are required for the evaluation of high dynamic range (HDR) for the application in cinematic imaging. In order to find out how much peak luminance and how much dynamic range is useful and satisfying, we realised a research project in 2013 to 2014 in wich we wanted to find out, how valuable contrast and brightness is for cinematic imaging.
In order to provide a data set for the evaluation of tone mapping operators and high dynamic range displays, we produced scenic and documentary scenes with a dynamic range of up to 18 stops. The scenes are staged using professional film lighting, make-up and set design to enable the evaluation of image and material appearance. To address challenges for HDR-displays and temporal tone mapping operators, the sequences include highlights entering and leaving the image, brightness changing over time, high contrast skin tones, specular highlights and bright, saturated colors. HDR-capture is carried out using two cameras mounted on a mirror-rig. To achieve a cinematic depth of field, digital motion picture cameras with Super-35mm size sensors are used. The HDR-video sequences shall serve as a common ground for the evaluation of temporal tone mapping operators and HDR-displays. They are available to the scientific community for further research.
Current trends in mainstream motion picture imaging include higher spacial resolution, higher temporal resolution and the use of high dynamic range (HDR) imaging. Whereas high resolution and higher frame rate videos could be acquired, using current generation motion picture cameras, there was no single HDR camera with a Super 35mm sized sensor available, that could deliver a satisfiyng dynamic range to learn about thresholds, convincing peak luminance levels and dynamic range for future imaging. The marginal application of HDR-video stands in great contrast to the domain of still imaging, where HDR-image capture is well studied and commonly practiced. Although there has been research on HDR-video acquisition, no cinematic HDR-video has yet been gathered.
For scientists developing next generation HDR monitors and temporal tone mapping operators, it was crucial to have cinematic HDR-content available, because image quality assessments can only be performed using high fidelity images. These images must be of sufficient spatial resolution, temporal resolution and dynamic range. Throughout the last decade, professional digital film cameras gained around 4 stops of dynamic range, from about 10 stops in 2001 to 14 stops today. With high dynamic range displays on the horizon, we expected this trend to proceed. To simulate the dynamic range of prospective recording devices, the combination of two exposures was needed. Figure 1 shows a comparison between the dynamic range of a professional motion picture camera, our dual camera setup and current display devices.
Image quality is not only determined by the signal quality of the image acquisition system, but also by lighting, make-up and staging. As an example, a faithful skin tone reproduction of a non-powdered actor in typical room lighting will not appear life-like to most observers. Humans often appear to be unhealthy or look fatigue in reproductions, when filmed without cinematic lighting and makeup. This is because our visual expectation for high quality images is to see staged pictures. Especially when dealing with non-expert observers in user studies, staged images are important to avoid misinterpretation. Our goal was to provide cinematic footage that covers the dynamic range of tomorrow’s sensors, so that aesthetics, tone mapping algorithms and HDR-displays can be evaluated regarding their ability to handle these future videos today.
HDR still images and videos are often captured by taking multiple images with different exposures, one after the other. When dealing with moving objects, artifacts can be introduced by not taking all exposures at the same time, therefore it is essential to capture all exposures simultaneously.
To generate HDR-video with different exposures captured at the same time, a mirror-rig as shown in Figure 3a is used. A common glass pane with antireflective coating is employed as a beam splitter. This results in a ratio of around 1:16 between reflection and transmittance, shifting the camera exposures by 4 stops. To be able to use large sensor motion picture cameras, the mirror is mounted in front of the lenses, instead of splitting the light behind the lens. Thus, aperture, integration time and sensor gain can be kept at identical settings in both cameras. This results in the same depth of field and motion blur, but different signal to noise ratios, which are used to enhance the dynamic range. Both cameras are adjusted mechanically for geometric alignment and the integration times of the camera-sensors are synchronized to record exactly the same fraction of time.
In postproduction, the highlight-preserving image is aligned to the lowlight-preserving image to increase spatial fit. Depending on the accuracy of the camera alignment, either a homography transform, or warping through local disparity estimation is applied to the highlight-preserving image. Subsequently the colors of this image are matched to the lowlight-preserving image through multiplication of the individual color channels. Then the two images are merged to one HDR-frame, by blending between the lowlight-preserving image and the highlight-preserving image, depending on the brightness of the individual pixels. Finally the border pixels are set to black, to mask pixels where no highlight pass is available due to the spatial displacement.
The resulting scenes were staged and recorded at the Stuttgart Media University (HdM) between January and October 2013. We planned five categories of scenes, to focus on the HDR-challenges inherent to different types of film projects, e.g. documentary, advertising or A-movie. The used lenses and the amount of lighting and makeup are corresponding to the resources typically available in the respective productions. As an example, dolly grip was just used in sequences that are representative for movie or advertising shootings. In simulated documentary shots like “Bistro”, only a reduced amount of makeup was applied.
We followed four key aspects for the scenes that we chose to bring the desired knowledge:
For additional technical details see the supplementary material on the project website and the paper Creating cinematic wide gamut HDR-video for the evaluation of tone mapping operators and HDR-displays.
We would like to thank the companies Arri, C-Motion, Dolby, Filmlight, Image Engineering, Angenieux and Zeiss for lending us their equipment. In addition to that, we are very grateful for the enthusiastic support from the students in the HDR-project group at Stuttgart Media University (HdM) who helped to shoot the sequences and perform the postproduction. Among these we would like to especially point out Jascha Vick and Jonas Trottnow for their superior engagement during the Winter shoot.