Détection de visage avec TensorFlow.js

Table des matières

Ce guide présente l'intégration transparente de TensorFlow.js avec Docker pour effectuer la détection de visage. Dans ce guide, vous explorerez comment :

Exécuter une application TensorFlow.js conteneurisée à l'aide de Docker.
Implémenter la détection de visage dans une application web avec TensorFlow.js.
Construire un Dockerfile pour une application web TensorFlow.js.
Utiliser Docker Compose pour le développement et les mises à jour d'applications en temps réel.
Partager votre image Docker sur Docker Hub pour faciliter le déploiement et étendre la portée.

Remerciements

Docker tient à remercier Harsh Manvar pour sa contribution à ce guide.

Prérequis

Vous avez installé la dernière version de Docker Desktop.
Vous avez un client Git. Les exemples de ce guide utilisent un client Git en ligne de commande, mais vous pouvez utiliser n'importe quel client.

TensorFlow.js est une bibliothèque JavaScript open-source pour l'apprentissage automatique qui vous permet d'entraîner et de déployer des modèles de ML dans le navigateur ou sur Node.js. Il prend en charge la création de nouveaux modèles à partir de zéro ou l'utilisation de modèles pré-entraînés, facilitant une large gamme d'applications de ML directement dans les environnements web. TensorFlow.js offre un calcul efficace, rendant les tâches de ML sophistiquées accessibles aux développeurs web sans expertise approfondie en ML.

Pourquoi utiliser TensorFlow.js et Docker ensemble ?

Cohérence de l'environnement et déploiement simplifié : Docker empaquette les applications TensorFlow.js et leurs dépendances dans des conteneurs, garantissant des exécutions cohérentes dans tous les environnements et simplifiant le déploiement.
Développement efficace et mise à l'échelle facile : Docker améliore l'efficacité du développement avec des fonctionnalités telles que le rechargement à chaud et facilite la mise à l'échelle des applications TensorFlow.js à l'aide d'outils d'orchestration comme Kubernetes.
Isolation et sécurité renforcée : Docker isole les applications TensorFlow.js dans des environnements sécurisés, minimisant les conflits et les vulnérabilités de sécurité tout en exécutant des applications avec des autorisations limitées.

Obtenir et exécuter l'application d'exemple

Dans un terminal, clonez l'application d'exemple à l'aide de la commande suivante.

$ git clone https://github.com/harsh4870/TensorJS-Face-Detection

Après avoir cloné l'application, vous remarquerez que l'application a un Dockerfile. Ce Dockerfile vous permet de construire et d'exécuter l'application localement avec rien de plus que Docker.

Avant de pouvoir exécuter l'application en tant que conteneur, vous devez la construire en une image. Exécutez la commande suivante dans le répertoire TensorJS-Face-Detection pour construire une image nommée face-detection-tensorjs.

$ docker build -t face-detection-tensorjs .

La commande construit l'application en une image. Selon votre connexion réseau, le téléchargement des composants nécessaires peut prendre plusieurs minutes la première fois que vous exécutez la commande.

Pour exécuter l'image en tant que conteneur, exécutez la commande suivante dans un terminal.

$ docker run -p 80:80 face-detection-tensorjs

La commande exécute le conteneur et mappe le port 80 du conteneur au port 80 de votre système.

Une fois l'application en cours d'exécution, ouvrez un navigateur web et accédez à l'application à l'adresse http://localhost:80. Vous devrez peut-être autoriser l'accès à votre webcam pour l'application.

Dans l'application web, vous pouvez changer le backend pour utiliser l'un des suivants :

WASM
WebGL
CPU

Pour arrêter l'application, appuyez sur ctrl+c dans le terminal.

À propos de l'application

L'application d'exemple effectue une détection de visage en temps réel à l'aide de MediaPipe, un cadre complet pour la construction de pipelines d'apprentissage automatique multimodaux. Elle utilise spécifiquement le modèle BlazeFace, un modèle léger pour la détection de visages dans les images.

Dans le contexte de TensorFlow.js ou de cadres d'apprentissage automatique basés sur le web similaires, les backends WASM, WebGL et CPU peuvent être utilisés pour exécuter des opérations. Chacun de ces backends utilise des ressources et des technologies différentes disponibles dans les navigateurs modernes et a ses forces et ses limites. Les sections suivantes sont une brève description des différents backends.

WASM

WebAssembly (WASM) est un langage de bas niveau, de type assembleur, avec un format binaire compact qui s'exécute à une vitesse quasi-native dans les navigateurs web. Il permet au code écrit dans des langages comme C/C++ d'être compilé en un binaire qui peut être exécuté dans le navigateur.

C'est un bon choix lorsque des performances élevées sont requises, et que le backend WebGL n'est pas pris en charge ou que vous souhaitez des performances cohérentes sur tous les appareils sans dépendre du GPU.

WebGL

WebGL est une API de navigateur qui permet une utilisation accélérée par le GPU de la physique et du traitement d'images et des effets dans le cadre du canevas de la page web.

WebGL est bien adapté aux opérations qui sont parallélisables et peuvent bénéficier de manière significative de l'accélération GPU, telles que les multiplications de matrices et les convolutions que l'on trouve couramment dans les modèles d'apprentissage profond.

CPU

Le backend CPU utilise une exécution JavaScript pure, en utilisant l'unité centrale de traitement (CPU) de l'appareil. Ce backend est le plus universellement compatible et sert de solution de repli lorsque les backends WebGL et WASM ne sont pas disponibles ou appropriés.

Explorer le code de l'application

Explorez le but de chaque fichier et leur contenu dans les sections suivantes.

Le fichier index.html

Le fichier index.html sert de frontend pour l'application web qui utilise TensorFlow.js pour la détection de visage en temps réel à partir du flux vidéo de la webcam. Il intègre plusieurs technologies et bibliothèques pour faciliter l'apprentissage automatique directement dans le navigateur. Il utilise plusieurs bibliothèques TensorFlow.js, notamment :

tfjs-core et tfjs-converter pour les fonctionnalités de base de TensorFlow.js et la conversion de modèles.
tfjs-backend-webgl, tfjs-backend-cpu et le script tf-backend-wasm pour différentes options de backend de calcul que TensorFlow.js peut utiliser pour le traitement. Ces backends permettent à l'application d'effectuer des tâches d'apprentissage automatique efficacement en tirant parti des capacités matérielles de l'utilisateur.
La bibliothèque BlazeFace, un modèle TensorFlow pour la détection de visage.

Il utilise également les bibliothèques supplémentaires suivantes :

dat.GUI pour créer une interface graphique permettant d'interagir avec les paramètres de l'application en temps réel, comme le changement de backend TensorFlow.js.
Stats.min.js pour afficher des métriques de performance (comme les FPS) afin de surveiller l'efficacité de l'application pendant son fonctionnement.

<style>
  body {
    margin: 25px;
  }

  .true {
    color: green;
  }

  .false {
    color: red;
  }

  #main {
    position: relative;
    margin: 50px 0;
  }

  canvas {
    position: absolute;
    top: 0;
    left: 0;
  }

  #description {
    margin-top: 20px;
    width: 600px;
  }

  #description-title {
    font-weight: bold;
    font-size: 18px;
  }
</style>

<body>
  <div id="main">
    <video
      id="video"
      playsinline
      style="
      -webkit-transform: scaleX(-1);
      transform: scaleX(-1);
      width: auto;
      height: auto;
      "
    ></video>
    <canvas id="output"></canvas>
    <video
      id="video"
      playsinline
      style="
      -webkit-transform: scaleX(-1);
      transform: scaleX(-1);
      visibility: hidden;
      width: auto;
      height: auto;
      "
    ></video>
  </div>
</body>
<script src="https://unpkg.com/@tensorflow/[email protected]/dist/tf-core.js"></script>
<script src="https://unpkg.com/@tensorflow/[email protected]/dist/tf-converter.js"></script>

<script src="https://unpkg.com/@tensorflow/[email protected]/dist/tf-backend-webgl.js"></script>
<script src="https://unpkg.com/@tensorflow/[email protected]/dist/tf-backend-cpu.js"></script>
<script src="./tf-backend-wasm.js"></script>

<script src="https://unpkg.com/@tensorflow-models/[email protected]/dist/blazeface.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/dat-gui/0.7.6/dat.gui.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/stats.js/r16/Stats.min.js"></script>
<script src="./index.js"></script>

Le fichier index.js

Le fichier index.js contient la logique de détection des visages. Il démontre plusieurs concepts avancés en développement web et en intégration de l'apprentissage automatique. Voici une décomposition de certaines de ses composantes et fonctionnalités clés :

Stats.js : Le script commence par créer une instance Stats pour surveiller et afficher le taux de trame (FPS) de l'application en temps réel. Cela est utile pour l'analyse des performances, en particulier lors du test de l'impact des différents backends TensorFlow.js sur la vitesse de l'application.
TensorFlow.js : L'application permet aux utilisateurs de basculer entre différents backends de calcul (wasm, webgl et cpu) pour TensorFlow.js à l'aide d'une interface graphique fournie par dat.GUI. Le changement de backend peut affecter les performances et la compatibilité en fonction de l'appareil et du navigateur. La fonction addFlagLabels vérifie et affiche dynamiquement si SIMD (Single Instruction, Multiple Data) et le multithreading sont pris en charge, ce qui est pertinent pour l'optimisation des performances dans le backend wasm. The index.js file conducts the facial detection logic. It demonstrates several advanced concepts in web development and machine learning integration. Here's a breakdown of some of its key components and functionalities:
Stats.js: The script starts by creating a Stats instance to monitor and display the frame rate (FPS) of the application in real time. This is helpful for performance analysis, especially when testing the impact of different TensorFlow.js backends on the application's speed.
TensorFlow.js: The application allows users to switch between different computation backends (wasm, webgl, and cpu) for TensorFlow.js through a graphical interface provided by dat.GUI. Changing the backend can affect performance and compatibility depending on the device and browser. The addFlagLabels function dynamically checks and displays whether SIMD (Single Instruction, Multiple Data) and multithreading are supported, which are relevant for performance optimization in the wasm backend.
setupCamera function: Initializes the user's webcam using the MediaDevices Web API. It configures the video stream to not include audio and to use the front-facing camera (facingMode: 'user'). Once the video metadata is loaded, it resolves a promise with the video element, which is then used for face detection.
BlazeFace: The core of this application is the renderPrediction function, which performs real-time face detection using the BlazeFace model, a lightweight model for detecting faces in images. The function calls model.estimateFaces on each animation frame to detect faces from the video feed. For each detected face, it draws a red rectangle around the face and blue dots for facial landmarks on a canvas overlaying the video.

const stats = new Stats();
stats.showPanel(0);
document.body.prepend(stats.domElement);

let model, ctx, videoWidth, videoHeight, video, canvas;

const state = {
  backend: "wasm",
};

const gui = new dat.GUI();
gui
  .add(state, "backend", ["wasm", "webgl", "cpu"])
  .onChange(async (backend) => {
    await tf.setBackend(backend);
    addFlagLables();
  });

async function addFlagLables() {
  if (!document.querySelector("#simd_supported")) {
    const simdSupportLabel = document.createElement("div");
    simdSupportLabel.id = "simd_supported";
    simdSupportLabel.style = "font-weight: bold";
    const simdSupported = await tf.env().getAsync("WASM_HAS_SIMD_SUPPORT");
    simdSupportLabel.innerHTML = `SIMD supported: <span class=${simdSupported}>${simdSupported}<span>`;
    document.querySelector("#description").appendChild(simdSupportLabel);
  }

  if (!document.querySelector("#threads_supported")) {
    const threadSupportLabel = document.createElement("div");
    threadSupportLabel.id = "threads_supported";
    threadSupportLabel.style = "font-weight: bold";
    const threadsSupported = await tf
      .env()
      .getAsync("WASM_HAS_MULTITHREAD_SUPPORT");
    threadSupportLabel.innerHTML = `Threads supported: <span class=${threadsSupported}>${threadsSupported}</span>`;
    document.querySelector("#description").appendChild(threadSupportLabel);
  }
}

async function setupCamera() {
  video = document.getElementById("video");

  const stream = await navigator.mediaDevices.getUserMedia({
    audio: false,
    video: { facingMode: "user" },
  });
  video.srcObject = stream;

  return new Promise((resolve) => {
    video.onloadedmetadata = () => {
      resolve(video);
    };
  });
}

const renderPrediction = async () => {
  stats.begin();

  const returnTensors = false;
  const flipHorizontal = true;
  const annotateBoxes = true;
  const predictions = await model.estimateFaces(
    video,
    returnTensors,
    flipHorizontal,
    annotateBoxes,
  );

  if (predictions.length > 0) {
    ctx.clearRect(0, 0, canvas.width, canvas.height);

    for (let i = 0; i < predictions.length; i++) {
      if (returnTensors) {
        predictions[i].topLeft = predictions[i].topLeft.arraySync();
        predictions[i].bottomRight = predictions[i].bottomRight.arraySync();
        if (annotateBoxes) {
          predictions[i].landmarks = predictions[i].landmarks.arraySync();
        }
      }

      const start = predictions[i].topLeft;
      const end = predictions[i].bottomRight;
      const size = [end[0] - start[0], end[1] - start[1]];
      ctx.fillStyle = "rgba(255, 0, 0, 0.5)";
      ctx.fillRect(start[0], start[1], size[0], size[1]);

      if (annotateBoxes) {
        const landmarks = predictions[i].landmarks;

        ctx.fillStyle = "blue";
        for (let j = 0; j < landmarks.length; j++) {
          const x = landmarks[j][0];
          const y = landmarks[j][1];
          ctx.fillRect(x, y, 5, 5);
        }
      }
    }
  }

  stats.end();

  requestAnimationFrame(renderPrediction);
};

const setupPage = async () => {
  await tf.setBackend(state.backend);
  addFlagLables();
  await setupCamera();
  video.play();

  videoWidth = video.videoWidth;
  videoHeight = video.videoHeight;
  video.width = videoWidth;
  video.height = videoHeight;

  canvas = document.getElementById("output");
  canvas.width = videoWidth;
  canvas.height = videoHeight;
  ctx = canvas.getContext("2d");
  ctx.fillStyle = "rgba(255, 0, 0, 0.5)";

  model = await blazeface.load();

  renderPrediction();
};

setupPage();

The tf-backend-wasm.js file

The tf-backend-wasm.js file is part of the TensorFlow.js library. It contains initialization logic for the TensorFlow.js WASM backend, some utilities for interacting with the WASM binaries, and functions to set custom paths for the WASM binaries.

The tfjs-backend-wasm-simd.wasm file

The tfjs-backend-wasm-simd.wasm file is part of the TensorFlow.js library. It's a WASM binary that's used for the WebAssembly backend, specifically optimized to utilize SIMD (Single Instruction, Multiple Data) instructions.

Explore the Dockerfile

In a Docker-based project, the Dockerfile serves as the foundational asset for building your application's environment.

A Dockerfile is a text file that instructs Docker how to create an image of your application's environment. An image contains everything you want and need when running application, such as files, packages, and tools.

The following is the Dockerfile for this project.

FROM nginx:stable-alpine3.17-slim
WORKDIR /usr/share/nginx/html
COPY . .

This Dockerfile defines an image that serves static content using Nginx from an Alpine Linux base image.

Develop with Compose

Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application's services, networks, and volumes. In this case, the application isn't a multi-container application, but Docker Compose has other useful features for development, like Compose Watch.

The sample application doesn't have a Compose file yet. To create a Compose file, in the TensorJS-Face-Detection directory, create a text file named compose.yaml and then add the following contents.

services:
  server:
    build:
      context: .
    ports:
      - 80:80
    develop:
      watch:
        - action: sync
          path: .
          target: /usr/share/nginx/html

This Compose file defines a service that is built using the Dockerfile in the same directory. It maps port 80 on the host to port 80 in the container. It also has a develop subsection with the watch attribute that defines a list of rules that control automatic service updates based on local file changes. For more details about the Compose instructions, see the Compose file reference.

Save the changes to your compose.yaml file and then run the following command to run the application.

$ docker compose watch

Once the application is running, open a web browser and access the application at http://localhost:80. You may need to grant access to your webcam for the application.

Now you can make changes to the source code and see the changes automatically reflected in the container without having to rebuild and rerun the container.

Open the index.js file and update the landmark points to be green instead of blue on line 83.

-        ctx.fillStyle = "blue";
+        ctx.fillStyle = "green";

Save the changes to the index.js file and then refresh the browser page. The landmark points should now appear green.

To stop the application, press ctrl+c in the terminal.

Publishing your Docker image on Docker Hub streamlines deployment processes for others, enabling seamless integration into diverse projects. It also promotes the adoption of your containerized solutions, broadening their impact across the developer ecosystem. To share your image:

Sign up or sign in to Docker Hub.
Rebuild your image to include the changes to your application. This time, prefix the image name with your Docker ID. Docker uses the name to determine which repository to push it to. Open a terminal and run the following command in the TensorJS-Face-Detection directory. Replace YOUR-USER-NAME with your Docker ID.
$ docker build -t YOUR-USER-NAME/face-detection-tensorjs .
Run the following docker push command to push the image to Docker Hub. Replace YOUR-USER-NAME with your Docker ID.
$ docker push YOUR-USER-NAME/face-detection-tensorjs
Verify that you pushed the image to Docker Hub.
1. Go to Docker Hub.
2. Select My Hub > Repositories.
3. View the Last pushed time for your repository.

Other users can now download and run your image using the docker run command. They need to replace YOUR-USER-NAME with your Docker ID.

$ docker run -p 80:80 YOUR-USER-NAME/face-detection-tensorjs

Summary

This guide demonstrated leveraging TensorFlow.js and Docker for face detection in web applications. It highlighted the ease of running containerized TensorFlow.js applications, and developing with Docker Compose for real-time code changes. Additionally, it covered how sharing your Docker image on Docker Hub can streamline deployment for others, enhancing the application's reach within the developer community.

Related information: