How to implement a face recognition system in a React Native application

Introduction

I work in a team that supports and develops a mobile application written in React Native. One day, a rather interesting task appeared in the backlog — it was necessary to implement a face recognition system for automatically marking employees at their workplace.

At first, it was assumed that the front end would need to take a photo, process it, and send it to the back end, where the main action would take place. And perhaps that would have been the right decision, but things turned out differently.

The business wanted to find the most budget-friendly solution possible, without third-party services. At the same time, customers had concerns about transferring photos over the network — there were often bandwidth issues in the locations where the application was used. Therefore, it was highly desirable to implement the main part of the system directly on the front end. The back end remained, but more as a data storage facility.

Approach to implementation

We chose an approach based on embeddings. For a facial image, we can construct a unique feature vector (embedding). And if we learn how to obtain this vector on the device, then comparison becomes a completely solvable task.

Thus, the task boiled down to the following:

Obtain an image of the face.
Generate an embedding for it.
Send the embedding to the backend for storage or search in the vector database.

To generate an embedding, you need a special neural network called an embedder. This raises two key questions:

Which model should you choose?
How do you run it in a React Native application?

The answers were unexpectedly found on the project page react-native-fast-tflite. This library allows you to run models in TensorFlow Lite format directly on the device. What’s more, the project author has compiled a selection of useful resources, including a repository of models. There I found a suitable model called FaceNet.

It is worth noting that the model on Kaggle was not published in the format I needed. Converted versions (with the extension .tflite) can be found online, but some of them simply did not work for me. In the end, I decided to convert the model myself. I will not describe the conversion process here — there is enough information about it in open sources.

It is much more important to understand how to work with the model. The Netron service is perfect for this, as it shows the input and output parameters.

description of input and output parameters — Description of input and output parameters

Everything is clear with the outputs — in my case, the model returned an embedding of size 128. But the inputs initially raised questions.

thought process — When trying to understand what float32[-1, 160, 160, 3]

After figuring it out, I realised that the model expects a Float32Array array with a size of 160 × 160 × 3. This means that the image needs to be resized to 160×160 pixels and three values (RGB channels) need to be allocated for each pixel.

There are different approaches to converting an image into an RGB array. We used canvas because it was already in the project.

Implementation

Let’s move on to the code itself. To avoid overloading the article, I will omit details that are not related to recognition (for example, component state management).

Receiving photos from the camera

First, we use react-native-camera to take a photo and send it for further processing:

import { useTensorflowModel } from "react-native-fast-tflite";
import { Face, RNCamera } from "react-native-camera";
import Canvas from "react-native-canvas";

interface Props {
  onEmbeddingReceived: (embedding: Float32Array) => Promise<void>;
}

const FaceRecognitionCamera = ({ onEmbeddingReceived }: Props) => {
  const canvasRef = useRef<Canvas>(null);
  const cameraRef = useRef<RNCamera>(null);
  const plugin = useTensorflowModel(require("./facenet.tflite"));
  const model = plagin.state === "loaded" ? plugin.model : undefined;

  const makePhoto = async () => {
    if (!cameraRef.current) return;
    const { uri } = await cameraRef.current.takePictureAsync({
      fixOrientation: true,
      mirrorImage: false,
    });
    runEmbeddingCalculation(uri);
  };
};

In this step, we obtain the path to the saved photo (uri) and pass it to the runEmbeddingCalculation function.

Calculating embedding

Before obtaining the result, you need to prepare the input for the model:

import { useTensorflowModel } from "react-native-fast-tflite";
import { Face, RNCamera } from "react-native-camera";
import Canvas from "react-native-canvas";

interface Props {
  onEmbeddingReceived: (embedding: Float32Array) => Promise<void>;
}

const FaceRecognitionCamera = ({onEmbeddingReceived}: Props) => {
  const canvasRef = useRef<Canvas>(null);
  const cameraRef = useRef<RNCamera>(null);
  const plugin = useTensorflowModel(require('./facenet.tflite'));
  const model = plagin.state === 'loaded' ? plugin.model : undefined;

  const makePhoto = () => {...}

  const runEmbeddingCalculation = async (photoUrl: string) => {
    const canvas = canvasRef.current;
    if (!canvas) return;

    try {
      const { base64Image, face } = await getFaceWithImage(photoUrl);
      const float32Array = await base64ImgToFloat32Array(canvas, base64Image, face);

      const [result] = (await model?.run([float32Array])) as Float32Array[];
      if (!result) return;

      await onEmbeddingReceived(result);
    } catch (e) {
      //error handling
    } finally {
      //do something anyway
    }
  };
};

Here:

the getFaceWithImage function extracts the face from the photo,
base64ImgToFloat32Array converts the photo of the face into an array for the model,
model.run calculates and returns the embedding.

Let’s take a look at everything in order.

Face detection

To highlight the face area, I used the @react-native-ml-kit/face-detection library. The getFaceWithImage function takes the image address as input and returns the image in base64 format as well as a Face object, which contains information about the coordinates of the face in the photo:

import FaceDetection, { Face } from "@react-native-ml-kit/face-detection";
import RNFS from "react-native-fs";
import { RECOGNITION_ERRORS } from "./constants";

const getFaceWithImage = async (
  photoPath: string,
): Promise<{ base64Image: string; face: Face }> => {
  const [face] = await FaceDetection.detect(photoPath, {
    classificationMode: "none",
    performanceMode: "accurate",
    landmarkMode: "all",
  });

  const base64Image = await RNFS.readFile(photoPath, "base64");
  if (!face) throw new Error(RECOGNITION_ERRORS.MISSING_FACE_IN_FRAME);

  return { base64Image, face };
};

It is important to note that react-native-camera can search for faces, but in our project, after the patch, the coordinates returned were incorrect, so we had to use a separate library.

Preparing an image for a model

Now you need to crop the face area in the photo and resize it to 160×160, as well as convert it to an array of numbers:

import Canvas, { Image } from "react-native-canvas";
import { Face } from "@react-native-ml-kit/face-detection";
import { PHOTO_SIZE } from "./constants";
import { rgbaToNormalizedRGB } from "./rgbaToNormalizedRGB";

export const base64ImgToFloat32Array = async (
  canvas: Canvas,
  base64Img: string,
  face: Face,
) => {
  const rgba = await base64ImgToRGBA(canvas, base64Img, face);
  return rgbaToNormalizedRGB(rgba);
};

/*
  Draw facial areas on canvas and export to Uint8ClampedArray
*/
const base64ImgToRGBA = async (
  canvas: Canvas,
  base64Img: string,
  face: Face,
) => {
  const ctx = canvas.getContext("2d");

  canvas.width = PHOTO_SIZE;
  canvas.height = PHOTO_SIZE;
  const img = new Image(canvas);

  ctx.setTransform(1, 0, 0, 1, 0, 0);
  ctx.clearRect(0, 0, canvas.width, canvas.height);

  img.src = "data:image/png;base64," + base64Img;
  ctx.drawImage(
    img,
    face.frame.left,
    face.frame.top,
    face.frame.width,
    face.frame.height,
    0,
    0,
    canvas.width,
    canvas.height,
  );

  return (await ctx.getImageData(0, 0, canvas.width, canvas.height)).data;
};

The output is a Uint8ClampedArray, which is a one-dimensional array containing RGBA colour model data with integer values ranging from 0 to 255. The next step is to remove the alpha channel and normalise the values.

Conversion from RGBA to Float32Array

The Facenet model expects an Float32Array array in 160 × 160 × 3 format as input. To do this, we convert RGBA to RGB and normalise it:

import { MODEL_INPUT_SIZE } from "./constants"; // 160 * 160 * 3
import { isNumber } from "./utils";

export const rgbaToNormalizedRGB = (
  rgba: Uint8ClampedArray,
  resultLength = MODEL_INPUT_SIZE,
) => {
  const result = new Float32Array(resultLength);

  for (let i = 0, j = 0; j <= result.length - 1; i += 4, j += 3) {
    const [red, green, blue] = [rgba[i], rgba[i + 1], rgba[i + 2]];

    if (isNumber(red) && isNumber(green) && isNumber(blue)) {
      result[j] = red / 255;
      result[j + 1] = green / 255;
      result[j + 2] = blue / 255;
    }
  }

  return result;
};

This function:

ignores the alpha channel,
converts each colour channel from the range [0..255] to [0..1],
returns an array Float32Array(160×160×3 = 76800), which can be fed into FaceNet.

Why do we divide by 255? Most computer vision models are trained on input data normalised to the range [0,1] (or sometimes [-1,1]). Division allows us to bring pixel values (initially 0–255) to the same scale at which the model was trained. If we feed in ‘raw’ values, the input distribution will be different, and the accuracy of the model will drop sharply. I learned this through experience.

We sent the embedding obtained from the neural network to the backend, where it was stored in a vector database for subsequent comparison. The comparison mechanism and the search for similar (close) embeddings fell on the shoulders of the backend.

The first problems

Overall, the code above worked correctly. However, a nuance that I hadn’t thought of beforehand quickly became apparent.

Anyone who has dealt with face recognition tasks knows that before submitting an image to the model, the face needs to be aligned. Ideally, the eye line should be strictly horizontal and occupy approximately the same position in all photos. Without this, the embedding of the same person can vary greatly.

My implementation did not include alignment. This complicated recognition. If the head was tilted even slightly, the distance between the embeddings for the same person increased dramatically. No matches were found.

After studying the topic in more depth, I realised that one solution could be the OpenCV library, which already has ready-made methods for face alignment. Moreover, there is a port of this library for React Native: react-native-fast-opencv. But then a problem arose: our version of the framework was not supported. We couldn’t update React Native — it was too labour-intensive. We had to find an easier way.

Image rotating

The solution was to manually rotate the image in the opposite direction to the tilt of the head. To do this, I reworked the getFaceWithImage function and renamed it getAlignedFaceWithImage.

import FaceDetection, { Face } from "@react-native-ml-kit/face-detection";
import RNFS from "react-native-fs";
import { rotate } from "@meedwire/react-native-image-rotate";

import { RECOGNITION_ERRORS, ALLOWED_ANGLES } from "./constants";

export const getAlignedFaceWithImage = async (
  photoPath: string,
): Promise<{ base64Image: string; face: Face }> => {
  const { face, photoPath: alignmentPhotoPath } =
    await detectAndAlignFace(photoPath);
  const base64Image = await RNFS.readFile(alignmentPhotoPath, "base64");

  if (!face) throw new Error(RECOGNITION_ERRORS.MISSING_FACE_IN_FRAME);
  const { rotationX } = face;

  if (Math.abs(rotationX) > ALLOWED_ANGLES.X) {
    throw new Error(
      rotationX > 0
        ? RECOGNITION_ERRORS.FACE_TILTED_UP
        : RECOGNITION_ERRORS.FACE_TILTED_DOWN,
    );
  }

  return { base64Image, face };
};

const detectAndAlignFace = async (photoPath: string) => {
  const face = await FaceDetection.detect(photoPath, {
    classificationMode: "none",
    performanceMode: "accurate",
    landmarkMode: "all",
  });

  if (!face[0]) return { face: null, photoPath };
  const alignmentPhotoPath = await rotate({
    type: "file",
    content: photoPath,
    angle: -getHorizontalEyeAngle(face[0]),
  });

  const alignmentFace = await FaceDetection.detect(alignmentPhotoPath, {
    classificationMode: "none",
    performanceMode: "accurate",
  });

  return { face: alignmentFace[0], photoPath: alignmentPhotoPath };
};

const getHorizontalEyeAngle = (face: Face) => {
  const dx =
    face.landmarks!.rightEye.position.x - face.landmarks!.leftEye.position.x;
  const dy =
    face.landmarks!.rightEye.position.y - face.landmarks!.leftEye.position.y;
  return Math.atan2(dy, dx) * (180.0 / Math.PI);
};

The key idea is simple: the getHorizontalEyeAngle function calculates the angle of the eye line, and we compensate for it using the @meedwire/react-native-image-rotate library. After rotating, we detect the face again — now on the aligned photo — and use it for further processing.

Yes, this is not the most elegant approach, but rather a simplified workaround. However, it has significantly improved the quality of recognition.

Alignment demonstration

Vertical head tilts (for example, when a person raises or lowers their nose significantly) are worth mentioning separately. In this case, it is not possible to compensate for the distorted angle. Therefore, there is a check in the code: if the angle of deviation is too large, we throw an error and show it to the user in the interface.

Conclusion

As a result, we obtained a generally functional system for facial recognition. It is fair to say that the system is not perfect. We did not use advanced pipelines or acceleration optimisations. However, within the constraints, the solution proved to be reliable and stable.