What is nms in object detection?

Question

cyucoofs Aqsa · Answer

Connecticut is cold. Very cold. Sometimes it’s hard to even get out of bed in the morning. And honestly, without the aide of copious amounts of pumpkin spice lattes and the beautiful sunrise over the crisp autumn leaves, I don’t think I would leave my cozy bed.

But I have work to do. And today that work includes writing a blog post about Felzenszwalb et al. method for non-maximum suppression.

If you remember, last week we discussed Histogram of Oriented Gradients for Objection Detection.

This method can be broken into a 6-step process, including:

After applying these steps you’ll have an object detector that is as smooth as, well, John Coltrane:

(Note: Images utilized in this post were taken from the MIT + CMU Frontal Face Images dataset)

These are the bare minimum steps required to build an object classifier using Histogram of Oriented Gradients. Extensions to this method exist including Felzenszwalb et al.’s deformable parts model and Malisiewicz et al.’s Exemplar SVM.

However, no matter which HOG + Linear SVM method you choose, you will (with almost 100% certainty) detect multiple bounding boxes surrounding the object in the image.

For example, take a look at the image of Audrey Hepburn at the top of this post. I forked my Python framework for object detection using HOG and a Linear SVM and trained it to detect faces. Clearly, it has found Ms. Hepburns face in the image — but the detection fired a total of six times!

While each detection may in fact be valid, I certainty don’t want my classifier to report to back to me saying that it found six faces when there is clearly only one face. Like I said, this is common “problem” when utilizing object detection methods.

In fact, I don’t even want to call it a “problem” at all! It’s a good problem to have. It indicates that your detector is working as expected. It would be far worse if your detector either (1) reported a false positive (i.e. detected a face where one wasn’t) or (2) failed to detect a face.

To fix this situation we’ll need to apply Non-Maximum Suppression (NMS), also called Non-Maxima Suppression.

When I first implemented my Python object detection framework I was unaware of a good Python implementation for Non-Maximum Suppression, so I reached out to my friend Dr. Tomasz Malisiewicz, whom I consider to be the “go to” person on the topic of object detection and HOG.

Tomasz, being the all-knowing authority on the topic referred me to two implementations in MATLAB which I have since implemented in Python. We’re going to review the first method by Felzenszwalb etl al. Then, next week, we’ll review the (faster) non-maximum suppression method implemented by Tomasz himself.

So without very delay, let’s get our hands dirty.

OpenCV and Python versions:This example will run on Python 2.7/Python 3.4+ and OpenCV 2.4.X/OpenCV 3.0+.

Open up a file, name it nms.py , and let’s get started implementing the Felzenszwalb et al. method for non-maximum suppression in Python:

We’ll start on Line 2 by importing a single package, NumPy, which we’ll utilize for numerical processing.

From there we define our non_max_suppression_slow function on Line 5. this function accepts to arguments, the first being our set of bounding boxes in the form of (startX, startY, endX, endY) and the second being our overlap threshold. I’ll discuss the overlap threshold a little later on in this post.

Lines 7 and 8 make a quick check on the bounding boxes. If there are no bounding boxes in the list, simply return an empty list back to the caller.

From there, we initialize our list of picked bounding boxes (i.e. the bounding boxes that we would like to keep, discarding the rest) on Line 11.

Let’s go ahead and unpack the (x, y) coordinates for each corner of the bounding box on Lines 14-17 — this is done using simple NumPy array slicing.

Then we compute the area of each of the bounding boxes on Line 21 using our sliced (x, y) coordinates.

Be sure to pay close attention to Line 22. We apply np.argsort to grab the indexes of the sorted coordinates of the bottom-right y-coordinate of the bounding boxes. It is absolutely critical that we sort according to the bottom-right corner as we’ll need to compute the overlap ratio of other bounding boxes later in this function.

Now, let’s get into the meat of the non-maxima suppression function:

We start looping over our indexes on Line 26, where we will keep looping until we run out of indexes to examine.

From there we’ll grab the length of the idx list o Line 31, grab the value of the last entry in the idx list on Line 32, append the index i to our list of bounding boxes to keep on Line 33, and finally initialize our suppress list (the list of boxes we want to ignore) with index of the last entry of the index list on Line 34.

That was a mouthful. And since we’re dealing with indexes into a index list it’s not exactly an easy thing to explain. But definitely pause here and examine these code as it’s important to understand.

Time to compute the overlap ratios and determine which bounding boxes we can ignore:

Here we start looping over the (remaining) indexes in the idx list on Line 37, grabbing the value of the current index on Line 39.

Using last entry in the idx list from Line 32 and the current entry in the idx list from Line 39, we find the largest (x, y) coordinates for the start bounding box and the smallest (x, y) coordinates for the end of the bounding box on Lines 44-47.

Doing this allows us to find the current smallest region inside the larger bounding boxes (and hence why it’s so important that we initially sort our idx list according to the bottom-right y-coordinate). From there, we compute the width and height of the region on Lines 50 and 51.

So now we are at the point where the overlap threshold comes into play. On Line 55 we compute the overlap , which is a ratio defined by the area of the current smallest region divided by the area of current bounding box, where “current” is defined by the index j on Line 39.

If the overlap ratio is greater than the threshold on Line 59, then we know that the two bounding boxes sufficiently overlap and we can thus suppress the current bounding box. Common values for overlapThresh are normally between 0.3 and 0.5.

Line 64 then deletes the suppressed bounding boxes from the idx list and we continue looping until the idx list is empty.

Finally, we return the set of picked bounding boxes (the ones that were not suppressed) on Line 67.

Let’s go ahead and create a driver so we can execute this code and see it in action. Open up a new file, name it nms_slow.py , and add the following code:

We start by importing our non_max_suppression_slow function on Line 2. I put this function in the pyimagesearch package for organizational purposes, but you can put the function wherever you see fit. From there, we import NumPy for numerical processing and cv2 for our OpenCV bindings on Lines 3-4.

Then, we define a list of images on Line 8. This list consists of 2-tuples, where the first entry in the tuple is a path to an image and the second entry is the list of bounding boxes. These bounding boxes were obtained from my HOG + Linear SVM classifier detecting potential “faces” at varying locations and scales. Our goal is to take the set of bounding boxes for each image and apply non-maximum suppression.

We start by looping over the image path and bounding boxes on Line 27 and load the image on Line 30.

To visualize the results of non-maximum suppression in action, we first draw the original (non-suppressed) bounding boxes on Lines 34 and 35.

We then apply non-maximum suppression on Line 38 and draw the picked bounding boxes on Lines 42-43.

The resulting images are finally displayed on Lines 46-48.

To see the Felzenszwalb et al. non-maximum suppression method in action, download the source code and accompanying images for this post from the bottom of this page, navigate to the source code directory, and issue the following command:

First, you’ll see the Audrey Hepburn image:

Notice how six bounding boxes were detected, but by applying non-maximum suppression, we are able to prune this number down to one.

The same is true for the second image:

Here we have found three bounding boxes corresponding to the same face, but non-maximum suppression is about to reduce this number to one bounding box.

So far we have only examined images that contain one face. But what about images that contain multiple faces? Let’s take a look:

Even for images that contain multiple objects, non-maximum suppression is able to ignore the smaller overlapping bounding boxes and return only the larger ones. Non-maximum suppression returns two bounding boxes here because the bounding boxes for each face at all. And even if they did overlap, do the overlap ratio does not exceed the supplied threshold of 0.3.

In this blog post I showed you how to apply the Felzenszwalb et al. method for non-maximum suppression.

When using the Histogram of Oriented Gradients descriptor and a Linear Support Vector Machine for object classification you almost always detect multiple bounding boxes surrounding the object you want to detect.

Instead of returning all of the found bounding boxes you should first apply non-maximum suppression to ignore bounding boxes that significantly overlap each other.

However, there are improvements to be made to the Felzenszwalb et al. method for non-maximum suppression.

Pounikar dfnhjahk Milan · Answer

Computer vision is one of the most glaring fields in data science. Like any other field of data science, the applications of this field has also become a part of our personal lives. For example, image classification, pose estimation, object detection, etc are some of its applications and we are all surrounded by it. Refer to this article-

I was recently studying algorithms for object detection and I came across a very interesting idea that almost all of these algorithms use – Non-Max Suppression (or NMS).

Non-max suppression is the final step of these object detection algorithms and is used to select the most appropriate bounding box for the object.

In this article, I will introduce the concept of non-max suppression, why it is used, and explain how it works in the object detection algorithms.

Object detection is one of the branches of computer vision and is widely in use in the industry. For example, Facebook uses it to detect faces in images uploaded, our phones use the object detection to enable the “face unlock” systems. Object detection involves the following two tasks –

The following image below will help you understand the same.

So I hope you have a basic understanding of the concept of object detection. In case you want to study object detection in detail, you can read the following blogs-

There are various algorithms for object detection tasks and these algorithms have evolved in the last decade. To improve the performance further, and capture objects of different shapes and sizes, the algorithms predict multiple bounding boxes, of different sizes and aspect ratios.

But of all the bounding boxes, how is the most appropriate and accurate bounding box selected? This is where NMS comes into the picture.

The objects in the image can be of different sizes and shapes, and to capture each of these perfectly, the object detection algorithms create multiple bounding boxes. (left image). Ideally, for each object in the image, we must have a single bounding box. Something like the image on the right.

Source: https://pjreddie.com/darknet/yolov1/

To select the best bounding box, from the multiple predicted bounding boxes, these object detection algorithms use non-max suppression. This technique is used to “suppress” the less likely bounding boxes and keep only the best one.

So we now understand why do we need NMS and what is it used for. Let us now understand how exactly is the concept implemented.

The purpose of non-max suppression is to select the best bounding box for an object and reject or “suppress” all other bounding boxes. The NMS takes two things into account

You can see the image below, along with the bounding boxes, the model returns an objectiveness score. This score denotes how certain the model is, that the desired object is present in this bounding box.

You can see all the bounding boxes have the object, but only the green bounding box one is the best bounding box for detecting the object. Now how can we get rid of the other bounding boxes?

The non-max suppression will first select the bounding box with the highest objectiveness score. And then remove all the other boxes with high overlap. So here, in the above image,

The same process goes for the remaining boxes. This process runs iteratively until there is no more reduction of boxes. In the end, we will be left with the following result.

That’s it. That’s how NMS works. To solidify our understanding, let’s write a pseudo code to implement non-max suppression.

By now you would have a good understanding of non-max suppression. Let us break down the process of non-max suppression into steps.

Suppose you built an object detection model to detect the following – Dog or Person. This object detection mode has given the following set of bounding boxes along with the objectiveness scores.

The following is the process of selecting the best bounding box using NMS-

Step 1: Select the box with highest objectiveness score

Step 2: Then, compare the overlap (intersection over union) of this box with other boxes

Step 3: Remove the bounding boxes with overlap (intersection over union) >50%

Step 4: Then, move to the next highest objectiveness score

Step 5: Finally, repeat steps 2-4

For our example, this loop will run twice. The below images show the output after different steps.

Now that you have a good understanding of non-max suppression and how it works, let us look at a simple implementation of the same. Let us say that we have the same image of person and dog (which we have been using in the previous section) with six bounding boxes and the objectiveness score for each of these bounding boxes.

Let us load the image and plot all the six bounding boxes.

For this image, we are going to use the non-max suppression function nms() from the torchvision library. This function requires three parameters-

Here, since the above coordinates are in x1, y1, width, height format, we will determine the x2, y2 in the following manner-

So this functions returns the list of bounding box/boxes to keep as an output, in the decreasing order of objectiveness score. Since I have set a very low threshold, the output has only two boxes. But if you set a higher threshold value, you will get more number of bounding boxes. In that case, you can then select the top n bounding boxes (where n should be the number of objects in your image).

For our example, this function has returned the bounding box 1 and 4. Let us plot these on the image to see the final results.

Great! So we have our best bounding boxes for each of the object in the image. Now this is a very useful technique and is implemented in most of the object detection algorithms. Let us have a look at some of them in the next section.

Almost all object detection algorithms use this technique to get the best bounding boxes from the predicted bounding box. The following is the screenshot of the SSD (Single Shot Detector) architecture taken from the research paper –

You can see that at the final step, SSD has 8732 predicted bounding boxes. Further, after these predictions, SSD uses the non-max suppression technique to select the best bounding box for each object in the image.

Similar to SSD, YOLO (You Only Look Once) also uses non-max suppression at the final step. Multiple bounding boxes are predicted to accommodate objects of different sizes and aspect ratios. Further, from these predictions, NMS to select the best bounding box.

mkgwy Shrivastav · Answer

What is Non Max Suppression, and why is it used?

Non max suppression is a technique used mainly in object detection that aims at selecting the best bounding box out of a set of overlapping boxes. In the following image, the aim of non max suppression would be to remove the yellow, and blue boxes, so that we are left with only the green box.

Procedure for calculating NMS:

To get an overview of what a bounding box is, and what IOU means, I have made two posts on the same.(Bounding Box, and IOU). The terms and parameters described in the two articles are carried forward in this post. I will first go ahead and describe the procedure of NMS for this particular example, and then explain a more generalized algorithm extending it for different classes.

Explaining the terms used:

Stage 1 (Initial removal of boxes):

Stage 2 (IOU Comparision of boxes):

Algorithm:

Code:

The code below is the basic function to perform Non Max Suppression. The IOU function used in the snippet below is the same function that was used in the previous post(Code can be found: here). The code below to calculate NMS can be optimized to improve performance.

Final Points:

Lauren-Marie Fadden · Answer

Non Maximum Suppression (NMS) is a technique used in numerous computer vision tasks. It is a class of algorithms to select one entity (e.g., bounding boxes) out of many overlapping entities. We can choose the selection criteria to arrive at the desired results.

Crystle Spione · Answer

Suppose we are building an orange detector which identifies all the oranges growing in a tree. Maybe it is will be used by a fruit picking robot. One way to “find” orange is drawing boxes around orange-like regions of the image1 and predicting the probability that the box contains an orange.

A common scenario in object detection models is that you end up with many overlapping detections as shown in the image below.

A good bounding box for a given object should contain as much of the object and as little of anything else as possible besides having a high probability predicted for the class of the object. The four sides of an ideal bounding box would be tangential to the uppermost, leftmost, lowermost and rightmost points of the object and have an associated probability prediction of 1 for the class of the object it contains.

Some of detections above are evidently better than others. For example if you observe the lowermost orange fruit (with the violet bounding boxes) you can see that some of the boxes only cover about half of the orange. We would like to find a way to keep just the best ones, ideally just one detection per object. We want something that looks more like this:

Typically each detection consists of

(Sometimes it will have other attributes like additional properties and predictions, such as masks in an instance segmentation model, but we will focus on just scores and boxes here).

Simple methods like choosing just the top $N$ detections where $N$ is smaller than the total number of detections don’t make use of the location information. For instance all the top scoring detections could be in one region in the image even though there are predictions in other areas.

Looking at the image we see how the boxes are roughly clustered around the areas where oranges are present. Using location information we can group the bounding boxes so the ones that are sufficiently close to each another (according to some metric) will belong to a cluster. We can then aggregate the predictions within a cluster to get a single prediction per cluster.

A common algorithm is non-maximum suppression or NMS. There is more than one way to do this but here we will discuss the most frequently used approach which is a greedy algorithm.

It can be described very simply as follows:

The boxes in top_boxes when the algorithm terminates represent the final predictions.

In practice you might also select other attributes such as score, segmentation, etc. associated with the bounding box and return them along with the boxes.

For this code2 we use a box_iou function that is explained in detail in this post. Here all you should know is that, given a pair of array boxes1 of shape $N_1 \times 4$ and boxes2 of shape $N_2 \times 4$, will return an $N_1 \times N_2$ array ious where ious[i, j] is the intersection-over-union of boxes1[i] and boxes2[j]:

Let us now apply this function to our example using an IoU threshold of 0.9. This threshold basically says two boxes can be considered to belong to the same cluster only if the overlap of the boxes is at least 90% the total area covered by the boxes implying a high degree of overlap.

What we end up with looks better. In particular the fairly isolated orange towards the botttom of the image has just a single box around it but there are still a lot of boxes towards the top of image where there are quite a few oranges close to each other.

Notice also in the figure above that clusters formed the boxes prior to applying NMS are quite spread out which suggests that we should try relaxing the threshold for including a box in a cluster. The threshold is a hyperparameter whose optimal value depends on the dataset and needs to be tuned accordingly. Here let us try decreasing it to 0.5.

This has reduced the number of boxes but overlapping boxes still remain at the top right and left of the image. We will try decreasing the threshold again to 0.1.

This looks a lot better. Almost all the oranges have a box that bounds them reasonably well. However an extra box that overlaps with two of the other bounding boxes remains on the top-left. We also note how the orange at the top right of the image does not have such a tight bounding box.

As NMS is a greedy algorithm it sometimes fails to generate the best clusters. In the original set of boxes there were better bounding boxes for the top right orange but they were absorbed into one of the other clusters. This can happen when one detection has a box with that bounds the object best but another detection has a better score since higher scoring detections are prioritised by NMS.

Batán Jinlei · Answer

Typical Object detection pipeline has one component for generating proposals for classification. Proposals are nothing but the candidate regions for the object of interest. Most of the approaches employ a sliding window over the feature map and assigns foreground/background scores depending on the features computed in that window. The neighbourhood windows have similar scores to some extent and are considered as candidate regions. This leads to hundreds of proposals. As the proposal generation method should have high recall, we keep loose constraints in this stage. However processing these many proposals all through the classification network is cumbersome. This leads to a technique which filters the proposals based on some criteria ( which we will see soon) called Non-maximum Suppression.

NMS:

Input: A list of Proposal boxes B, corresponding confidence scores S and overlap threshold N.

Output: A list of filtered proposals D.

Algorithm:

IOU calculation is actually used to measure the overlap between two proposals.

Below is the pseudo code of NMS. I have added comments to understand it better.

Now if you observe the algorithm above, the whole filtering process depends on single threshold value. So selection of threshold value is key for performance of the model. However setting this threshold is tricky. Let us see this scenario.

Assume that the overlap threshold N is 0.5. If there is a proposal with 0.51 IOU and has good confidence score, the box will be removed even though the confidence is higher than many other boxes with less IOU. Because of this, if there are two objects side by side, one of them would be eliminated. A proposal with 0.49 IOU is still kept even though its confidence is very low. Of course this is the known issue with any threshold based technique. Now how do we deal with this? Below is an example of such case. Only the proposal with 0.9 is kept and others will be removed. This reduces the precision of the model.

The simple yet efficient way to deal with this case is to use Soft-NMS. The idea is very simple — “instead of completely removing the proposals with high IOU and high confidence, reduce the confidences of the proposals proportional to IOU value”. Now let us apply this idea to the above example. instead of completely removing the proposals with 0.8 score, keep the proposals but reduce their score as shown in the figure below.

As I have mentioned earlier, the scores 0.4 of both the proposals are calculated based on the IOU values. The score calculation is as follows

So this is just one line change in the implementation of NMS algorithm and it increases the precision to a good extent. The figure below shows both the algorithms (NMS and Soft-NMS), which i took from Soft-NMS paper.

These techniques works well for filtering predictions of a single model, What if you have predictions from multiple models? Weighted boxes fusion is a novel method for combining predictions of object detection models. Check out my article to know more about that.

I gave the github links for implementation of NMS and Soft-NMS below in the references. That’s all from this post, thank you so much for being with me.

Ask Sawal

What is nms in object detection?

Related Questions

More Questions

Contact