Facial Feature Detection for Arabic Audio Visual Speech Recognition Systems

Nour Sami Ghadban, Jafar Alkheir, Mariam Saii


The visual speech modality plays an important role in the perception and production of speech. Although not purely confined to the mouth, it is generally agreed that the large proportion of speech information conveyed in the visual modality stems from the mouth region of interest (ROI).

To this end, it is imperative that an audio-visual speech processing (AVSP) system be able to accurately detect, track and normalize the mouth of a subject within a video sequence. This task is referred to as facial feature detection (FFD). The goal of FFD is to detect the presence and location of features, such as eyes, nose, nostrils, eyebrow, mouth, lips, ears, etc., with the assumption that there is only one face in an image. This differs slightly to the task of facial feature location which assumes the feature is present and only requires its location. Facial feature tracking is an extension to the task of location in that it incorporates temporal information in a video sequence to follow the location of a facial feature as time progresses. Throughout this article the tasks of facial feature detection, location and tracking are all thought to be encapsulated under the broad banner of FFD.



audio-visual speech processing, facial feature detection, Front-end Effect, eye detection and mouth location/tracking

Full Text:



