Face Detection Using Facial Geometry and Color

Travis Nielsen
CS 521 Final Project
April 10, 2000

Figure 1  Faces detected in an image

Introduction | Pre-Processing | Feature Matching | Facial Color | Performance | Source | References

Introduction

Automatic face recognition can be divided into two sub-problems: detection (location of the face in the picture) and recognition (distinguishing between individuals). My implementation focuses primarily on face detection in a digitized image with an algorithm using facial geometry in [1].  I also added my own heuristics to increase the accuracy of the algorithm.  Here I will explain the basic algorithm and my extensions.

In addition to knowing where the face is located, it is useful to detect the location of sub-components such as the eyes, mouth, nose, and eyebrows. In order to identify where the face is overall, this algorithm identifies likely locations of each of these components. The combinations of these sub-regions are given a probability of being a face, and the most likely combination(s) are identified as a face.  Thus, multiple faces may be identified in an image.

Some difficulties include the face’s size and orientation, lighting condition, covering (ie. eyes blocked by sunglasses), and facial expression. This algorithm approaches these problems with the following reasonable constraints:

Features from the image are extracted on the basis that the facial features are generally much darker than the remainder of the face and have higher gradients in the image. Feature blocks are extracted first with preprocessing techniques and then matched with each other to determine likely faces; the algorithm for these two steps are found in tables 1 and 2.  In the event that a color image is provided, it is converted to grayscale for processing.  After faces are detected based on the grayscale image, color within the region of each face is used to further discriminate between possible faces as outlined below.

The features are displayed in the images with the following color codes -- eyes: magenta, mouth: red, nose: blue, eyebrows: cyan.

Pre-Processing

The objective of pre-processing is to extract groups of pixels, called feature blocks, that are possibly features in a face.  Each feature block will later be considered as an eye, nose, mouth, or eyebrow that will together give a score of the likelihood of a face.  This processing proceeds in four main steps as enumerated in table 1.
 

Pre-Processing

1.  Morphological grayscale opening
         removes bright light reflections
2.  Boost Filtering
         enhances the contrast between facial features
3.  Thresholding
         produces a binary image
4.  Group feature blocks 
         combines regions that are mistakenly separated

Table 1 Extracting feature blocks

First, in order to eliminate point noise and bright spots caused by light reflections, a morphological graycscale open is performed.  This is a common operation in computer vision and will not be explained here.  Second, the boost filtering is a process that will increase the brightness of pixels that are likely not part of a facial feature.  It is based on a combination of the gradient magnitude, intensity of itself and its neighbors to produce a 3x3 boost filter.  The reader is referred to the original paper for details [1].

Third, the boosted grayscale image needs to be thresholded to produce a binary image for further processing.  I implemented the Otsu thresholding algorithm, which is very robust in general for automated thresholding.  Experiments show that this threshold is too high (produces too many feature pixels) for optimal feature extraction, so I simply subtracted a constant of 50 from the threshold.  This parameter performed well for the images I tested; the original paper did not mention what thresholding strategy they used.  The threshold produced was about 100, plus or minus 20 (256 gray levels) for most images.

The features are now grouped into blocks with a connected components algorithm.  Due to small components that may have been mistakenly separated from larger components, they are grouped together with larger components within a variable radius depending on their size and orientation.  If there are still components that are too small, they are eliminated (about 10 pixels).  Also, regions with an aspect ratio that is too high or too low are eliminated since they are too skinny to be facial features.

Finally, the center of gravity and orientation for each feature block are computed.  Figure 2 shows the results of the pre-processing.  For each component, a color is chosen arbitrarily for display with a bounding box around each block.  The orientation is displayed with a short red line starting from the center of gravity.  My feature merging still needs some refining.  For example, the nose components should not have been merged with the mouth and the small regions around the shirt collar should have been combined with the shirt.
 


Figure 2  Feature Blocks

Feature Matching

Now that the feature blocks have been extracted, combinations of the blocks will be considered to form the features of a face.  Each combination within geometrical restraints is given a score that reflects the probability that it represents a face.  First, all possible pairwise combinations of blocks are considered as an eye pair.  If the orientation between the eyes is between plus or minus 45 degrees, its score is computed based on the relative width of the blocks (eyes are the same with), the distance between the blocks (eyes are about one eye-width apart), and the orientation of the eyes (they should be closely oriented to the base line).  A probability is produced with an exponential of the sum of these terms.  If the eye score is high enough (about 70%), it is saved for further consideration.

For each eye pair identified, each of the blocks are considered as a mouth.  The mouth must lie within the geometrical constraints of the eyes to be considered and given a score.  These constraints require the center of the mouth to lie between and below the centers of the eyes.  In order to provide greater robustness to rotation of the face with 45 degrees, the angle between the eyes is used to form the base line shown as the single black line in figure 3.  The two lines perpendicular to the base line are computed; using an inside/outside test for the block compared to each of the lines, it is determined whether the block lies between the lines.  If it lies within this region, it is given a score base on its distance from the base line (the mouth is about the same distance from the base line as the distance between the eyes) and orientation with respect to the base line.  Another contribution to the score I would consider in future research is how close to the center of the region between the lines that the block is.  In several images, a block close to these lines is identified as a mouth even though the real mouth is a little further from the eyes but closer to the center of the region.
 


Figure 3  Orientation constraints

In a similar way, the nose and eyebrows can be identified with facial geometry.  Their restraints are even more restrictive than for the mouth to ensure that the regions lie in valid places for a face and the scores are also based on their distance from the base line as some fraction of the distance to between the eyes (0.6 and 0.3 for the nose and eyebrows, respectively).

A weighted probability is computed from the probability of each of the feature blocks of a face:

            p = 0.5 * pEyes + 0.25 * pMouth + 0.15 * Nose + 0.05 * pLeftEyebrow + 0.05 * pRightEyebrow.

where pEyes is the probability of the eye pair based on the evaluation function above.  To obtain a probability sufficient to qualify for a face need not contain all features which may have been eliminated in the pre-processing step.  In summary, the algorithm is presented in table 2.
 

Matching
for ( i = 1 to #facial features)
        for ( j = i+1 to #facial features)
               consider featurei and featurej as eyes & find orientation q to x-axis
               if (q is in range) //face cannot be tilted more than 45 degrees
                          evaluate the likelihood E that featurei and featurej are an eye pair
                           if (E > threshold) //threshold » 0.7
                                        add pair to list of eye-pair candidates
for (each eye-pair candidate)
        for (each facial feature)
                find the most likely mouth
                find the most likely eyebrows
                find the most likely nose

for each of the resulting facial feature combinations
         if (face has higher probability than some threshold » 0.6)
                 found a face
 

Table 2 Combining feature blocks to produce faces
Facial Color

Although the geometry will identify the location of faces accurately in an image, alone it will produce many false positives.  Simply said, the geometry does not take into consideration many characteristics that can differentiate a true face from another object or noise that happens to have correct facial geometry.  Thus, I take the color of the face into account to eliminate many of these false positives.  Using color is robust because it "provides a computationally efficient yet effective method which is robust under rotations in depth and partial occlusions" [2].  Also, "human skin forms a relatively tight cluster in colour space even when different races are considered" [3].  A color model for skin can be constucted using Gaussian mixtures with a priori probabilities [2].  Unfortunately I did not have the time to implement this approach.  A neural network would also be an effective approach to approximating colors of the skin.

For each likely face combination (usually between 1-12 possibilities), the average color is computed within the face boundaries excluding the facial features themselves.  In other words, the average color of the skin between the eyes and mouth (excluding those regions) is computed.  If this color is a valid skin tone it is accepted.

My time constraints limited me to performing simple thresholds in RGB and HSB color space.  Analysis of my sample images showed that in RGB space, red is always greater than green and blue, and green and blue are never greater than 200.  Although the latter condition excludes faces under bright lighting conditions, it was a safe threshold for my data.  Also, under the HSB model the brightness was always above 0.5.  If the feature block combinations do not pass these color threshold tests they are rejected even though they have good enough geometry to be considered a face.

Performance

At first glance this algorithm appears slow because of the number of feature block combinations that must be performed.  However, performance is not a big issue due to the simplicity of feature comparison.  Table 3 shows the time to process a typical image on a PIII 450 running Windows NT 4.0.
 

Image 1
400x310
Image 2
400x310
Image 3
240x320
Pre-Processing 1.38s 1.49s 0.66s
Feature Matching 0.22s 0.77s 0.06s

The majority of the time is spent in the pre-processing stage; time is about evenly split three ways between the grayscale opening, boost filtering, and connecting components.  Very little time is spent thresholding or combining small features.  Feature matching time is largely dependent on the number of feature blocks that are created in the pre-processing step.  More heuristics on accurately combining feature blocks in pre-processing would help cut the feature matching time even more.

Furthermore, I didn't go to great pains to optimize for speed and my implementation is in Java 1.2.2.  If this were implemented in C++ and the processing steps optimized, the time would drop significantly.
 
 


Figure 4  Smiling at success

Source

Here is a ZIP file containing all of the Java classes and source files.  Note the Readme.txt to help you get started.

References

  1. Jeng, Shi-Hong et al. "Facial Feature Detection Using Geometrical Face Model: An Efficient Approach". Pattern Recognition, Vol. 31, No. 3, pp. 273-282, 1998.
  2. R. Frischholz.  "Face Detection Homepage: Face detection and recognition." http://home.t-online.de/home/Robert.Frischholz/face.htm.
  3. M. Hunke and A. Waibel. "Face locating and tracking for human-computer interaction." 28th Asilomar Conf. Signals, Systems and Computers, California, 1994.
  4. Govindaraju, V. et al. "Locating Human Faces in Newspaper Photographs". Proc. Computer Vision and Pattern Recognition. 1989.
  5. Manjunath, B. S. "A Feature Based Approach to Face Recognition". Proc. Computer Vision and Pattern Recognition, pp. 373-378, 1992.