Computer Vision Project: Detecting and Masking Faces with Go and OpenCV

Selvedin Dokara -

Computer vision is becoming a popular branch of software development, where applications are processing image and video material for various purposes. This short article will cover the basics of working with video with a sample being detection and blurring (masking) of faces.

 

 

Introduction

 

OpenCV is one of the most popular frameworks for Computer Vision, but it's written in C/C++, so using it directly would require knowledge of those languages. Some of you may have seen its Python bindings or used it that way, but in this article, we're going to use something a bit more performant – Go.


You could go full hardcore and write your own Go bindings or even C code alongside Go to use OpenCV, but we'll stick to existing bindings the GoCV project.

 

 

screenshotPrerequisites

 

For this short exercise, we're going to need:

  • Go
  • C/C++ compiler toolkit (e.g. MinGW64)
  • Sample video file with faces
  • Cascade XML classifier file
  • FFmpeg

 

 

Project Setup

 

Download the latest Go installer and install it. While you're at it, make sure you have a C/C++ compiler toolkit present –  gcc  and  g++  should be available. They will be used by the  go  build tool for compilation.

Start a new Go project by running the following in a directory where the project will be stored:

 go mod init example/facemask

 

Then create a  main.go  file:

 package main

 

 func main() {

 

 }

 

Then get the  gocv  module:

 go get gocv.io/x/gocv

 

Obtain a sample XML classifier file. In this sample, we will be using  haarcascade_frontalface_default.xml , which is a Stump-based 24x24 discrete AdaBoost frontal face detector. It can be downloaded as a gist from here.

 

Finally, download FFmpeg and add the directory where  ffmpeg  is to the  PATH  variable.

 

 

Coding

 

The program will consist of the following steps:

  • Reading args from the command line
  • Loading input video file
  • Sample video file with faces
  • Cascade XML classifier file
  • FFmpeg

 

Let's read the required input arguments first:

 package main

 

 import (

         "flag"
         "fmt"

 )

 

 func main() {
     var classifierFilePath string
     var inputFilePath string
     var outputFilePath string
     flag.StringVar(&classifierFilePath, "classifier", "", "Classifier file path")
     flag.StringVar(&inputFilePath, "input", "", "Input file path")
     flag.StringVar(&outputFilePath, "output", "", "Output file path")
     flag.Parse()
     if classifierFilePath == "" || inputFilePath == "" || outputFilePath == "" {
         fmt.Printf("Usage: face-mask.exe --classifier [path] --input [path] --output [path]\n")
         fmt.Printf("       classifier: XML classifier path\n")
         fmt.Printf("       input: input video file path\n")
         fmt.Printf("       output: path to output the result file at\n")
     }

 }

 

This part is pretty clear, so let's move on to the actual work:

 // load input file
 input, err := gocv.VideoCaptureFile(inputFilePath)

 if err != nil {
     fmt.Printf("error opening input video file: %v\n", err)
     return
 }
defer input.Close()

 // read dimensions and fps required for encoding
 width := int(input.Get(gocv.VideoCaptureFrameWidth))
 height := int(input.Get(gocv.VideoCaptureFrameHeight))
 fps := input.Get(gocv.VideoCaptureFPS)
 frameCount := int(input.Get(gocv.VideoCaptureFrameCount))

 

Here we're reading the input as a video capture, and determining its width, height, FPS, and frame count. The dimensions and FPS are needed for encoding, as we will be passing raw video frames to FFmpeg, and we need to tell it what the output video is supposed to be like. Frame count is used only for progress tracking.

 

 // load classifier to recognize faces
 classifier := gocv.NewCascadeClassifier()
 defer classifier.Close()
 if !classifier.Load(classifierFilePath) {
     fmt.Printf("error reading cascade file: %v\n", classifierFilePath)
     return
 }

 

Here we're loading up the Cascade classifier based on the provided XML file.

 

 // prepare output stream
 cmd := exec.Command(
     "ffmpeg",
     "-y",
     "-f", "rawvideo", // input format is raw
     "-pix_fmt", "bgr24", // opencv outputs bgr24 by default
     "-s", fmt.Sprintf("%dx%d", width, height),
     "-framerate", fmt.Sprintf("%f", fps),
     "-i", "pipe:", // input is stdin
     "-pix_fmt", "yuv420p", // needed to override the input pix_fmt
     "-c:v", "libx264", // x264 codec for output
     "-f", "mp4", // mp4 container
     outputFilePath,
 )
 output, err := cmd.StdinPipe()
 if err != nil {
     fmt.Printf("error creating pipe for output: %v\n", err)

 }
 err = cmd.Start()
 if err != nil {
     fmt.Printf("error starting ffmpeg: %v\n", err)
     return
 }

 

In this section, we're setting up a child  ffmpeg  process by specifying all the parameters required. We're then taking over its  stdin  as a pipe so we can write raw frames to it. At the end, we start it. The  ffmpeg  process will wait for us to write to and finally close the input stream.

 

 fmt.Printf("processing %s...\n", inputFilePath)

 

 // prepare output stream
 img := gocv.NewMat()
 defer img.Close()
 frame := 0
 for {
     if ok := input.Read(&img); !ok {
         break
     }
     frame++
     if img.Empty() {
         continue
     }

     // print progress, but not too often
     if frame == 1 || frame%10 == 0 || frame == frameCount {
         fmt.Printf("\rprogress: %.2f%%", float64(frame)/float64(frameCount)*100)
     }

     // detect faces
     rects := classifier.DetectMultiScale(img)

     // blur each face on the original image
     for _, r := range rects {
         imgFace := img.Region(r)

         // blur face
         gocv.GaussianBlur(imgFace, &imgFace, image.Pt(75, 75), 0, 0, gocv.BorderDefault)
         _ = imgFace.Close()
     }

     // write frame to ffmpeg stdin
     _, err = output.Write(img.ToBytes())
     if err != nil {
         break
     }
 }
 fmt.Printf("\n")

 

Now, the processing begins. We prepare an OpenCV matrix to store the frame data in, and start reading input frames. When we reach the end,  input.Read  will return  false  as its second return value. Make sure all GoCV resources are closed when you're with them, because they internally allocate unmanaged memory.

 

The processing first detects rectangles in the frame that are considered to be faces, then it blurs them by applying a simple Gaussian blur, and finally, it writes the frame bytes to the  ffmpeg  pipe.

 

 // close stdin to indicate EOF
 _ = output.Close()

 // wait for encoding
 err = cmd.Wait()
 if err != nil {
     fmt.Printf("error during processing: %v\n", err)
 }

 fmt.Printf("done\n")

 

This is the end of the processing, and of our program as well. By closing the output pipe, we're indicating  ffmpeg  that there's no more data. Then we wait for it to complete encoding and print out a suitable message.

 

 

Building and Running

 

To build this application, simply:

 go build

 

And then:

 face-masking.exe --classifier [path] --input [input] --output [output] 

 

 

How It Looks

 

As a sample, we will use a stock video with a front-facing person:

Which, after the processing, looks like this:
input

 

output

 

The detection is in no means perfect, as you can notice it caught a few false positive rectangles, but it does well enough fine given its relative simplicity.

 

 

Closing Thoughts

 

This is a simple example covering some processing, but OpenCV (and GoCV as a Go binding library for it) offers many more built-in functions. And the fact that you can access raw frames as bitmaps (or matrices) allows you to do much more with them.