Computer Vision Project: Detecting and Masking Faces with Go and OpenCV

May 13, 2025 10:00:00 AM

Computer vision is becoming a popular branch of software development, where applications are processing image and video material for various purposes. This short article will cover the basics of working with video with a sample being detection and blurring (masking) of faces.

Introduction

OpenCV is one of the most popular frameworks for Computer Vision, but it's written in C/C++, so using it directly would require knowledge of those languages. Some of you may have seen its Python bindings or used it that way, but in this article, we're going to use something a bit more performant – Go.

You could go full hardcore and write your own Go bindings or even C code alongside Go to use OpenCV, but we'll stick to existing bindings – the GoCV project.

Prerequisites

For this short exercise, we're going to need:

Go
C/C++ compiler toolkit (e.g. MinGW64)
Sample video file with faces
Cascade XML classifier file
FFmpeg

Project Setup

Download the latest Go installer and install it. While you're at it, make sure you have a C/C++ compiler toolkit present – gcc and g++ should be available. They will be used by the go build tool for compilation.

Start a new Go project by running the following in a directory where the project will be stored:

go mod init example/facemask

Then create a main.go file:

package main

func main() {

}

Then get the gocv module:

go get gocv.io/x/gocv

Obtain a sample XML classifier file. In this sample, we will be using haarcascade_frontalface_default.xml , which is a Stump-based 24x24 discrete AdaBoost frontal face detector. It can be downloaded as a gist from here.

Finally, download FFmpeg and add the directory where ffmpeg is to the PATH variable.

Coding

The program will consist of the following steps:

Reading args from the command line
Loading input video file
Sample video file with faces
Cascade XML classifier file
FFmpeg

Let's read the required input arguments first:

package main

import (

"flag" "fmt"

)

func main() { var classifierFilePath string var inputFilePath string var outputFilePath string flag.StringVar(&classifierFilePath, "classifier", "", "Classifier file path") flag.StringVar(&inputFilePath, "input", "", "Input file path") flag.StringVar(&outputFilePath, "output", "", "Output file path") flag.Parse() if classifierFilePath == "" || inputFilePath == "" || outputFilePath == "" { fmt.Printf("Usage: face-mask.exe --classifier [path] --input [path] --output [path]\n") fmt.Printf(" classifier: XML classifier path\n") fmt.Printf(" input: input video file path\n") fmt.Printf(" output: path to output the result file at\n") }

}

This part is pretty clear, so let's move on to the actual work:

// load input file input, err := gocv.VideoCaptureFile(inputFilePath)

if err != nil { fmt.Printf("error opening input video file: %v\n", err) return } defer input.Close() // read dimensions and fps required for encoding width := int(input.Get(gocv.VideoCaptureFrameWidth)) height := int(input.Get(gocv.VideoCaptureFrameHeight)) fps := input.Get(gocv.VideoCaptureFPS) frameCount := int(input.Get(gocv.VideoCaptureFrameCount))

Here we're reading the input as a video capture, and determining its width, height, FPS, and frame count. The dimensions and FPS are needed for encoding, as we will be passing raw video frames to FFmpeg, and we need to tell it what the output video is supposed to be like. Frame count is used only for progress tracking.

// load classifier to recognize faces classifier := gocv.NewCascadeClassifier() defer classifier.Close() if !classifier.Load(classifierFilePath) { fmt.Printf("error reading cascade file: %v\n", classifierFilePath) return }

Here we're loading up the Cascade classifier based on the provided XML file.

// prepare output stream cmd := exec.Command( "ffmpeg", "-y", "-f", "rawvideo", // input format is raw "-pix_fmt", "bgr24", // opencv outputs bgr24 by default "-s", fmt.Sprintf("%dx%d", width, height), "-framerate", fmt.Sprintf("%f", fps), "-i", "pipe:", // input is stdin "-pix_fmt", "yuv420p", // needed to override the input pix_fmt "-c:v", "libx264", // x264 codec for output "-f", "mp4", // mp4 container outputFilePath, ) output, err := cmd.StdinPipe() if err != nil { fmt.Printf("error creating pipe for output: %v\n", err)

} err = cmd.Start() if err != nil { fmt.Printf("error starting ffmpeg: %v\n", err) return }

In this section, we're setting up a child ffmpeg process by specifying all the parameters required. We're then taking over its stdin as a pipe so we can write raw frames to it. At the end, we start it. The ffmpeg process will wait for us to write to and finally close the input stream.

fmt.Printf("processing %s...\n", inputFilePath)

// prepare output stream img := gocv.NewMat() defer img.Close() frame := 0 for { if ok := input.Read(&img); !ok { break } frame++ if img.Empty() { continue } // print progress, but not too often if frame == 1 || frame%10 == 0 || frame == frameCount { fmt.Printf("\rprogress: %.2f%%", float64(frame)/float64(frameCount)*100) } // detect faces rects := classifier.DetectMultiScale(img) // blur each face on the original image for _, r := range rects { imgFace := img.Region(r) // blur face gocv.GaussianBlur(imgFace, &imgFace, image.Pt(75, 75), 0, 0, gocv.BorderDefault) _ = imgFace.Close() } // write frame to ffmpeg stdin _, err = output.Write(img.ToBytes()) if err != nil { break } } fmt.Printf("\n")

Now, the processing begins. We prepare an OpenCV matrix to store the frame data in, and start reading input frames. When we reach the end, input.Read will return false as its second return value. Make sure all GoCV resources are closed when you're with them, because they internally allocate unmanaged memory.

The processing first detects rectangles in the frame that are considered to be faces, then it blurs them by applying a simple Gaussian blur, and finally, it writes the frame bytes to the ffmpeg pipe.

// close stdin to indicate EOF _ = output.Close() // wait for encoding err = cmd.Wait() if err != nil { fmt.Printf("error during processing: %v\n", err) } fmt.Printf("done\n")

This is the end of the processing, and of our program as well. By closing the output pipe, we're indicating ffmpeg that there's no more data. Then we wait for it to complete encoding and print out a suitable message.

Building and Running

To build this application, simply:

go build

And then:

face-masking.exe --classifier [path] --input [input] --output [output]

How It Looks

As a sample, we will use a stock video with a front-facing person:	Which, after the processing, looks like this:

The detection is in no means perfect, as you can notice it caught a few false positive rectangles, but it does well enough fine given its relative simplicity.

Closing Thoughts

This is a simple example covering some processing, but OpenCV (and GoCV as a Go binding library for it) offers many more built-in functions. And the fact that you can access raw frames as bitmaps (or matrices) allows you to do much more with them.

Topics:

#technical

Selvedin Dokara