The Surveillance cameras are being used for a wide variety of applications from homeland security to securing private homes. Terrorist threats and public safety challenges continue to drive deployments of remote surveillance systems, biometric (facial recognition) access control systems, and numerous methods to secure and manage one’s identity.
Physical access control, perimeter monitoring and intelligence gathering solutions are key components of most overall security plans. Law enforcement and public safety organizations depend on a variety of archived image and video data to not only attempt to reduce crime and terrorist activities, but act as a primary forensic tool to recreate an event. Defense and intelligence organizations rely on accurately establishing and maintaining the identity of individuals of interest across myriad systems, national borders and situation ally diverse environments.
It’s been reported that the global installed base of closed circuit TV (CCTV) surveillance systems to total more than 45 million cameras. Putting the number in perspective, that’s one camera for every 155 people on the planet. Looked at from the other direction it would require the entire population of Japan to monitor each camera 24 hours a day, seven days a week. Without some form of artificial intelligence, this bottleneck neutralizes CCTV’s potential for real-time watch list monitoring. The good news is the next generation of software and hardware surveillance tools offers the promise of unattended operation.
With about 3,000 cameras deployed and networked in New York’s ‘Ring of Steel,’ the NYPD has the ability to collect real-time video footage and analyze thousands of images to find a particular item. This type of intelligent video analysis is used to detect objects with specific colors, shapes and sizes. The next step is to detect a person of interest in real time. This type of facial recognition is on the cusp of moving from science fiction to deployable fact. What is facial recognition?
Face recognition begins with an input image that is analyzed by the matching software in a process called enrollment. During the analysis, facial landmarks are identified and distilled into a numerical value called a template. The template (and any other identifying information like a name or address) is then added to the database (called a gallery). When an unknown subject is presented for comparison it’s called a probe image. The probe goes through the same process to create the template. The probe template is the compared to the gallery of templates in search of a match.
At its core, face recognition is a pattern matching problem – does the face I am looking for match a face in the database? Given an unknown facial input image, a system attempts to identify the person. To accomplish this, the process is generally broken into three main tasks: face detection, feature extraction, and face identification. Face detection is used to distinguish face-like objects from other objects in the image. Feature extraction is then done to reduce the face to its simplest terms for recognition. The face identification task identifies the input person’s face by searching a database of known individuals.
Because there can be so much subjectivity in the composition of an image that will be used for identification purposes, the International Civil Aviation Organization (ICAO) was tasked with defining how a passport picture should be composed. This standard is also the cornerstone of facial recognition systems because they can assume the presence of both eyes, the nose and mouth in a straight on view. When the software can assume the general position of the eyes it can then establish other facial landmarks which ultimately form the facial template. This template is the computer’s understanding of the uniqueness of the individual and the basis for comparison against the database. IMAGES 1—ICAO standard vs. Posed image illustration
These assumptions are also the source of facial recognition’s fragility because unless you have comparably posed images, matching can’t occur. In the real world people only pose for these pictures when they are applying for identification documents like passports and driver’s licenses. When people go about their daily business and pass in front of a surveillance camera, the resulting image never looks like the person’s driver’s license. Limitations of Facial Recognition
In an effort to see as much as possible, most surveillance cameras are mounted high overhead so they can follow a person’s movements through a scene. Because of this, individual frames taken from these cameras make it difficult, if not impossible, to identify the person unless they happen to look directly into the camera. Most, if not all, installations of surveillance cameras are placed high as possible to get a better overall view and deter vandalism. It can be easy to defeat these systems by simply hiding your face behind a hood or under a hat.
As the emphasis shifts from deploying surveillance cameras as deterrents and evidentiary collectors to proactive tools for mitigating criminal events the potential for even greater demands of already thin law enforcement resources is significant.
To marry the unconstrained nature of surveillance with the rigid requirements for template generation a whole new family of technology is being developed. By effectively fusing surveillance with biometric identification techniques, a new premium security market is emerging whose primary drivers are the current constraints in providing reliable real-time alarms when persons of interests are viewed by surveillance cameras. New software will also require new ways of thinking about the problem and its solutions.
Proactively looking for “the face in the crowd” is only one of the problems. Perhaps even more disturbing is how to recognize a person’s intent. Unless a person is already on a watch list no alarms will be triggered. Suicide bombers for example are single use weapons not in existing databases for obvious reasons. Advances in Surveillance and Facial Recognition
Virtually all facial recognition matching software will provide accurate results when a captured image is within +/- 15 degrees from eye-level. Unfortunately, most cameras that have been installed for real-time monitoring purposes are nowhere near eye-level and render the matching portions of most systems almost useless. The best way for law enforcement to counter this is to place cameras at or near eye level. If that is not possible, reduce the downward angle of the camera’s view as much as possible. The lower the angle, the better the likelihood that a matchable image can be captured.
Software like CyberExtruder’s Aureus 3D has expanded the face detection and feature extraction steps to include 3D information and then fuse it with the two dimensional information commonly extracted from a flat image. In doing this, it now has additional feature vectors to bring to bear in the face identification step. The net result is an extraction step which is less susceptible to errors introduced by pose, expression and lighting.
These specific problems are solved by dramatically broadening the spectrum of facial images which can be handled by rendering these problematic images into 3D versions of the subjects then producing corrected images suitable for recognition and identification by any facial recognition packages.
When Aureus 3D begins to reconstructs a face, it’s also able to determine the person’s original pose relative to the camera. Having this information allows Aureus 3D to produce results which lead to matches when the subjects may be turned away from the camera as much as 70 degrees and a downward (or upward) angle of 25 degrees.
Notre Dame’s Department of Computer Science and Engineering has been developing a tool it calls the Questionable Observer Detector. This software uses the premise that a terrorist will visit his target multiple times prior to an event and it builds its own database of repeatedly observed individuals. By comparing faces from multiple video streams the system can begin to build a list of ‘suspicious’ individuals. Law enforcement would have the ability to determine what constitutes a suspicious level of frequency and then take the appropriate action.
The ability to discern identity from nothing more than image data can be time consuming. However, with these advances in computer vision technology the critical first step of being able to quickly and accurately affix an individual’s identity is not only within practical reach but can be timely enough to provide actionable results. Jack Ives is the co-founder and chief operating officer of CyberExtruder, the developer of Aureus 3D. He may be reached at firstname.lastname@example.org.