Introduction

This project is focused on building a webcam integrated facial recognition application using pre-trained DNN models and Python’s OpenCV library. To recognize faces, the project uses a special kind of neural architecture called “Siamese Network” in which the output vectors (embeddings) of input images are compared against each other to measure the similarity between the two.

The entire pipeline of the project is divided into 4 major parts as mentioned below. The scripts associated with each stage are run in sequence using command prompt to run the facial recognition application.

  • Image data generation (imagecapture.py)
  • Face detection
  • Extraction of image embeddings (extract_embeddings.py)
  • Face recognition (recognize_video.py)

Image Data Generation

For any facial recognition task, the very step is to build user image data so that it can be compared to live facial data in the future for recognition. The python script “imagecapture.py” is used here to capture user images which are then stored on scratch disk for further processing.

Frames from the webcam video are captured and stored on local drive using OpenCV’s VideoCapture method as shown in the code below.

Sample Code (Refer to “imagecapture.py” script for the entire code)

#Capture Images
print("Starting Webcam...")
capture = cv2.VideoCapture(0)

image_counter =  1

while True:
    _, frame = capture.read()
    cv2.imshow('imagasde', frame)
    k = cv2.waitKey(100) & 0xff
    if k == 27:
        # ESC pressed
        print("Escape hit. Closing Webcam...")
        break
    elif k == 32:
        # SPACE pressed
        print("writing file")
        image_name = "opencv_frame_{}.png".format(image_counter)
        cv2.imwrite(os.path.join(directory, image_name), frame)
        print("{} written!".format(image_name))
        image_counter += 1

capture.release()
cv2.destroyAllWindows()

Face Detection

Since an image can have multiple objects besides face, it is important for us to crop out just the face part before sending it to the model to extract embeddings. To accomplish this, OpenCV’s pre-trained Caffe deep learning model is used. The pre-trained model outputs face detections and associated probabilities along with the coordinates of the detection.

To reduce noise in the detections, a confidence limit is used to filter out weak face detections.

detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

detector.setInput(imageBlob)
detections = detector.forward()

if len(detections) > 0:
		i = np.argmax(detections[0, 0, :, 2])
		confidence = detections[0, 0, i, 2]

		#filter out weak detections
		if confidence > confidence_limit:
			# compute the (x, y)-coordinates of the bounding box for
			# the face
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")

			# extract the face from the coordinates
			face = image[startY:endY, startX:endX]
			(fH, fW) = face.shape[:2]

			# ensure the face width and height are sufficiently large
			if fW < 20 or fH < 20:
				continue

Extraction of Image Embeddings

The pre-processed image is then inputted to a commonly used pre-trained embedding model called OpenFace - a python and torch implementation of face recognition that is based on the paper “FaceNet: A Unified Embedding for Face Recognition and Clustering”.

The model outputs a 128-D facial embedding vector for every user image in the dataset which is then stored in a pickle dump.

embedder = cv2.dnn.readNetFromTorch(embedding_model_path)
faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255,
				(96, 96), (0, 0, 0), swapRB=True, crop=False)
			embedder.setInput(faceBlob)
			vec = embedder.forward()

			# add the name of the person + corresponding face
			# embedding to their respective lists
			knownNames.append(name)
			knownEmbeddings.append(vec.flatten())
			total += 1

# dump the facial embeddings + names to disk
print("[INFO] serializing {} encodings...".format(total))
data = {"embeddings": knownEmbeddings, "names": knownNames}
print(type(data))
f = open(out_embeddings, "wb")
f.write(pickle.dumps(data))
f.close()

Face Recognition

This is the section where the face recognition happens. The image frames from the live video feed are passed through the encoder to get the 128-D embedding vectors. These are then compared to each user embedding stored in the database using the L2 norm distance method. L2 norm distances are calculated for every combination and the user with the minimum L2 norm distance is chosen as the recognized individual.

Sample Code (Refer to “recognize_video.py” script for the entire code)

def who_is_it(vector,database_encode):
	encoding = vector
	min_dist = 100

	for i in range(len(database["embeddings"])):
		db_enc = database["embeddings"][i]
		name = database["names"][i]
		dist = np.linalg.norm(encoding - db_enc)

		if dist < min_dist:
			min_dist = dist
			identity = name
	if not min_dist < 0.55:
		identity = "Not in database"
	print(min_dist)
	return min_dist, identity

faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255,
				(96, 96), (0, 0, 0), swapRB=True, crop=False)
			embedder.setInput(faceBlob)
			vec = embedder.forward()

			similarity, name = who_is_it(vec, database)
			# perform classification to recognize the face
			# preds = recognizer.predict_proba(vec)[0]
			# j = np.argmax(preds)
			# proba = preds[j]
			# name = le.classes_[j]

			# draw the bounding box of the face along with the
			# associated probability
			text = "{}: {:.2f}".format(name, similarity)
			y = startY - 10 if startY - 10 > 10 else startY + 10
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				(0, 0, 255), 2)
			cv2.putText(frame, text, (startX, y),
				cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

Result

face Recognition

For the entire python notebook, please visit GitHub link.