Computer vision is becoming a popular branch of software development, where applications are processing image and video material for various purposes. This short article will cover the basics of working with video with a sample being detection and blurring (masking) of faces.
OpenCV is one of the most popular frameworks for Computer Vision, but it's written in C/C++, so using it directly would require knowledge of those languages. Some of you may have seen its Python bindings or used it that way, but in this article, we're going to use something a bit more performant – Go.
You could go full hardcore and write your own Go bindings or even C code alongside Go to use OpenCV, but we'll stick to existing bindings – the GoCV project.
For this short exercise, we're going to need:
Download the latest Go installer and install it. While you're at it, make sure you have a C/C++ compiler toolkit present – gcc
and g++
should be available. They will be used by the go
build tool for compilation.
Start a new Go project by running the following in a directory where the project will be stored:
go mod init example/facemask |
Then create a main.go
file:
|
Then get the gocv
module:
go get gocv.io/x/gocv |
Obtain a sample XML classifier file. In this sample, we will be using haarcascade_frontalface_default.xml
, which is a Stump-based 24x24 discrete AdaBoost frontal face detector. It can be downloaded as a gist from here.
Finally, download FFmpeg and add the directory where ffmpeg
is to the PATH
variable.
The program will consist of the following steps:
Let's read the required input arguments first:
|
This part is pretty clear, so let's move on to the actual work:
|
Here we're reading the input as a video capture, and determining its width, height, FPS, and frame count. The dimensions and FPS are needed for encoding, as we will be passing raw video frames to FFmpeg, and we need to tell it what the output video is supposed to be like. Frame count is used only for progress tracking.
|
Here we're loading up the Cascade classifier based on the provided XML file.
|
In this section, we're setting up a child ffmpeg
process by specifying all the parameters required. We're then taking over its stdin
as a pipe so we can write raw frames to it. At the end, we start it. The ffmpeg
process will wait for us to write to and finally close the input stream.
|
Now, the processing begins. We prepare an OpenCV matrix to store the frame data in, and start reading input frames. When we reach the end, input.Read
will return false
as its second return value. Make sure all GoCV resources are closed when you're with them, because they internally allocate unmanaged memory.
The processing first detects rectangles in the frame that are considered to be faces, then it blurs them by applying a simple Gaussian blur, and finally, it writes the frame bytes to the ffmpeg
pipe.
|
This is the end of the processing, and of our program as well. By closing the output pipe, we're indicating ffmpeg
that there's no more data. Then we wait for it to complete encoding and print out a suitable message.
To build this application, simply:
go build |
And then:
face-masking.exe --classifier [path] --input [input] --output [output] |
As a sample, we will use a stock video with a front-facing person: |
Which, after the processing, looks like this: |
|
|
The detection is in no means perfect, as you can notice it caught a few false positive rectangles, but it does well enough fine given its relative simplicity.
This is a simple example covering some processing, but OpenCV (and GoCV as a Go binding library for it) offers many more built-in functions. And the fact that you can access raw frames as bitmaps (or matrices) allows you to do much more with them.