This is my first project in the research lab. It continues research that my advisor Professor Yu started when he worked with Ramesh Raskar at MERL. This camera system uses multiple flashes in different orientations in order to create shadows

  

UD Multi-flash Camera System

Overview

   Flashes have been used with cameras in one form or another for almost as long as photography has existed. Along with brightening up the photograph the vast majority of flashes also produce shadows in natural scenes. The shape of these shadows (that is, their width and contours) are a function of the occlusion boundary’s distance from the background.

   The human visual system uses stereo in the real world and is therefore able to discern discontinuities caused by occlusion boundaries (depth edges) and those due to changes in texture (material edges). It is not straightforward however to discern those kinds of discontinuities in a two dimensional image because light from all depths is projected onto only one sensor. However, if one had a series of images each illuminated with a flash from a different direction it would be easier to infer information about the structure of the scene because of the different shadows cast. In addition to enhancing object boundaries this approach also suppresses object textures making it easier to distinguish depth edges from material edges.

  

Algorithm

    By using a camera with multiple flashes at different orientations one is able to capture a sequence of images where the occlusion boundaries of some objects in the scene can be more clearly identified. The fist step of this process is to estimate the intrinsic image of the scene where it is brightly illuminated but does not contain any shadows. This intrinsic image is composite of the image sequence consisting of the brightest pixels from each of the images. Once this “max” image is constructed then each of the images in the sequence is used to construct a ratio image to highlight the shadows.

   Once the shadows have been highlighted from the ratio image then each image in the sequence is traversed along the epi-polar ray formed by flash and the image plane (eg. Downwards for the top flash). For each traversal of an image’s rows or columns one can identify a depth edge by a large negative shift in pixel intensity. 

In the image above a depth edge is marked by the blue line and a material edge by a red line.

Input images. From top to bottom, images captured with up, left, right and down flash.

Composite image composed of brightest pixels from input images.

Ratio images constructed using the maximum composite image.

Depth edge frame constructed by epi-polar traversal of ratio images.

Results

 

      One of the initial goals of this project was to construct a camera system that could produce depth edge video sequences. Using the Flea2 camera from point grey research we were able to construct frames captures at 30 frames per second. Some example of our results can be seen in GIFs on the right of the page. 

     

 

    The first result shows one of the possible applications of this project. The construction of accurate or realistic three dimensional models of plants has been the subject of much research. By using our setup to clearly delineate individual leaves and stems and using a rotating platform it is possible automatically generate a model of a plant.

 

     This project can also be used to abstract hand gestures for use in teaching. Often there is a desire to emphasize some aspects of a scene and to suppress others. Discontinuities in depth have a special meaning to the human visual system  and often, more information can be gained about a scene by enhancing  depth edges than enhancing textures within the edges. The sequence to the right shows how a hand gesture can be abstracted to aide understanding.

 

      An extension of this application is the abstraction of American Sign Language. The American Sign Language fingerspelling representation of the twenty six letter alphabet is important because of its versatility. Fingerspelling can be used to teach beginners more complex signs, represent words for which there are no sign equivalent or simply for clarification. An example of abstracted fingerspelling of the letter ‘J’ with depth edges can be seen on the right.

 

Just as the edges of a hand can enhanced for learning. It is also possible to enhance the edges of a face. Simplifying the features of a face to the important contours of the jaw, ears, nose ect while removing the clutter or blemishes on the skin has applications in security.

 

 

 

 

 

People

 

Jingyi Yu                                 yu at cis dot udel dot edu

Christopher Thorpe              thorpe at udel dot  edu

Kevin Kreiser                         kevk at udel dot edu

Feng Li                                     fengli at udel dot edu

Copyright

 

Christopher Thorpe @ Graphics Lab UD