|
| Travis
Nielsen
CS 521 Final Project April 10, 2000 |
Figure 1 Faces detected in an image |
Introduction | Pre-Processing | Feature Matching | Facial Color | Performance | Source | References
Automatic face recognition can be divided into two sub-problems: detection (location of the face in the picture) and recognition (distinguishing between individuals). My implementation focuses primarily on face detection in a digitized image with an algorithm using facial geometry in [1]. I also added my own heuristics to increase the accuracy of the algorithm. Here I will explain the basic algorithm and my extensions.
In addition to knowing where the face is located, it is useful to detect the location of sub-components such as the eyes, mouth, nose, and eyebrows. In order to identify where the face is overall, this algorithm identifies likely locations of each of these components. The combinations of these sub-regions are given a probability of being a face, and the most likely combination(s) are identified as a face. Thus, multiple faces may be identified in an image.
Some difficulties include the face’s size and orientation, lighting condition, covering (ie. eyes blocked by sunglasses), and facial expression. This algorithm approaches these problems with the following reasonable constraints:
The features are displayed in the images with the following color codes -- eyes: magenta, mouth: red, nose: blue, eyebrows: cyan.
The objective of pre-processing is to extract groups of pixels, called
feature blocks, that are possibly features in a face. Each feature
block will later be considered as an eye, nose, mouth, or eyebrow that
will together give a score of the likelihood of a face. This processing
proceeds in four main steps as enumerated in table 1.
|
1. Morphological grayscale opening
|
| Table 1 Extracting feature blocks |
First, in order to eliminate point noise and bright spots caused by light reflections, a morphological graycscale open is performed. This is a common operation in computer vision and will not be explained here. Second, the boost filtering is a process that will increase the brightness of pixels that are likely not part of a facial feature. It is based on a combination of the gradient magnitude, intensity of itself and its neighbors to produce a 3x3 boost filter. The reader is referred to the original paper for details [1].
Third, the boosted grayscale image needs to be thresholded to produce a binary image for further processing. I implemented the Otsu thresholding algorithm, which is very robust in general for automated thresholding. Experiments show that this threshold is too high (produces too many feature pixels) for optimal feature extraction, so I simply subtracted a constant of 50 from the threshold. This parameter performed well for the images I tested; the original paper did not mention what thresholding strategy they used. The threshold produced was about 100, plus or minus 20 (256 gray levels) for most images.
The features are now grouped into blocks with a connected components algorithm. Due to small components that may have been mistakenly separated from larger components, they are grouped together with larger components within a variable radius depending on their size and orientation. If there are still components that are too small, they are eliminated (about 10 pixels). Also, regions with an aspect ratio that is too high or too low are eliminated since they are too skinny to be facial features.
Finally, the center of gravity and orientation for each feature block
are computed. Figure 2 shows the results of the pre-processing.
For each component, a color is chosen arbitrarily for display with a bounding
box around each block. The orientation is displayed with a short
red line starting from the center of gravity. My feature merging
still needs some refining. For example, the nose components should
not have been merged with the mouth and the small regions around the shirt
collar should have been combined with the shirt.
Figure 2 Feature Blocks |
Now that the feature blocks have been extracted, combinations of the blocks will be considered to form the features of a face. Each combination within geometrical restraints is given a score that reflects the probability that it represents a face. First, all possible pairwise combinations of blocks are considered as an eye pair. If the orientation between the eyes is between plus or minus 45 degrees, its score is computed based on the relative width of the blocks (eyes are the same with), the distance between the blocks (eyes are about one eye-width apart), and the orientation of the eyes (they should be closely oriented to the base line). A probability is produced with an exponential of the sum of these terms. If the eye score is high enough (about 70%), it is saved for further consideration.
For each eye pair identified, each of the blocks are considered as a
mouth. The mouth must lie within the geometrical constraints of the
eyes to be considered and given a score. These constraints require
the center of the mouth to lie between and below the centers of the eyes.
In order to provide greater robustness to rotation of the face with 45
degrees, the angle between the eyes is used to form the base line shown
as the single black line in figure 3. The two lines perpendicular
to the base line are computed; using an inside/outside test for the block
compared to each of the lines, it is determined whether the block lies
between the lines. If it lies within this region, it is given a score
base on its distance from the base line (the mouth is about the same distance
from the base line as the distance between the eyes) and orientation with
respect to the base line. Another contribution to the score I would
consider in future research is how close to the center of the region between
the lines that the block is. In several images, a block close to
these lines is identified as a mouth even though the real mouth is a little
further from the eyes but closer to the center of the region.
Figure 3 Orientation constraints |
In a similar way, the nose and eyebrows can be identified with facial geometry. Their restraints are even more restrictive than for the mouth to ensure that the regions lie in valid places for a face and the scores are also based on their distance from the base line as some fraction of the distance to between the eyes (0.6 and 0.3 for the nose and eyebrows, respectively).
A weighted probability is computed from the probability of each of the feature blocks of a face:
p = 0.5 * pEyes + 0.25 * pMouth + 0.15 * Nose + 0.05 * pLeftEyebrow + 0.05 * pRightEyebrow.
where pEyes is the probability of the eye pair based on the evaluation
function above. To obtain a probability sufficient to qualify for
a face need not contain all features which may have been eliminated in
the pre-processing step. In summary, the algorithm is presented in
table 2.
|
for ( j = i+1 to #facial features) consider featurei and featurej as eyes & find orientation q to x-axis if (q is in range) //face cannot be tilted more than 45 degrees evaluate the likelihood E that featurei and featurej are an eye pair if (E > threshold) //threshold » 0.7 add pair to list of eye-pair candidates for (each eye-pair candidate) for (each facial feature) find the most likely mouth find the most likely eyebrows find the most likely nose for each of the resulting facial feature combinations
|
| Table 2 Combining feature blocks to produce faces |
Although the geometry will identify the location of faces accurately in an image, alone it will produce many false positives. Simply said, the geometry does not take into consideration many characteristics that can differentiate a true face from another object or noise that happens to have correct facial geometry. Thus, I take the color of the face into account to eliminate many of these false positives. Using color is robust because it "provides a computationally efficient yet effective method which is robust under rotations in depth and partial occlusions" [2]. Also, "human skin forms a relatively tight cluster in colour space even when different races are considered" [3]. A color model for skin can be constucted using Gaussian mixtures with a priori probabilities [2]. Unfortunately I did not have the time to implement this approach. A neural network would also be an effective approach to approximating colors of the skin.
For each likely face combination (usually between 1-12 possibilities), the average color is computed within the face boundaries excluding the facial features themselves. In other words, the average color of the skin between the eyes and mouth (excluding those regions) is computed. If this color is a valid skin tone it is accepted.
My time constraints limited me to performing simple thresholds in RGB and HSB color space. Analysis of my sample images showed that in RGB space, red is always greater than green and blue, and green and blue are never greater than 200. Although the latter condition excludes faces under bright lighting conditions, it was a safe threshold for my data. Also, under the HSB model the brightness was always above 0.5. If the feature block combinations do not pass these color threshold tests they are rejected even though they have good enough geometry to be considered a face.
At first glance this algorithm appears slow because of the number of
feature block combinations that must be performed. However, performance
is not a big issue due to the simplicity of feature comparison. Table
3 shows the time to process a typical image on a PIII 450 running Windows
NT 4.0.
| Image 1
400x310 |
Image 2
400x310 |
Image 3
240x320 |
|
| Pre-Processing | 1.38s | 1.49s | 0.66s |
| Feature Matching | 0.22s | 0.77s | 0.06s |
The majority of the time is spent in the pre-processing stage; time is about evenly split three ways between the grayscale opening, boost filtering, and connecting components. Very little time is spent thresholding or combining small features. Feature matching time is largely dependent on the number of feature blocks that are created in the pre-processing step. More heuristics on accurately combining feature blocks in pre-processing would help cut the feature matching time even more.
Furthermore, I didn't go to great pains to optimize for speed and my
implementation is in Java 1.2.2. If this were implemented in C++
and the processing steps optimized, the time would drop significantly.
Figure 4 Smiling at success |
Here is a ZIP file containing all of the Java classes and source files. Note the Readme.txt to help you get started.