Calculating Object Sizes in Drone Images
Currently, my focus lies in solving the intriguing problem of calculating the size of objects in drone-captured images – a recurring maths problem in our conservation work. In this case it was to be able to estimate the size of a Ganges river dolphin as captured in videos using small quadcopters. Setting this workflow is foundational as solving this problem could help us and other conservationists in a number of areas including, but not limited to, estimating the demographic distribution of animal species, calculating garbage hotspot sizes, sizing up the footprint of an image, tracking an individual animal’s body condition over time, and more.
As it turns out, this problem has already been solved, at least under certain assumptions. There are a couple of ways to calculate the size of an arbitrary object, some of which are elaborated upon in this blogpost.
A First Step
Images are made up of pixels and usually, pixels are small squares. We are going to assume that we have square pixels. Each element or distinct object one can see in an image is made up of pixels. Since we want to measure the real-life size of an object in an image, one way of doing that is to use the number of pixels occupied by that object. If we know how many pixels it occupies along with how many real-life centimetres/metres each pixel corresponds to, then we can calculate:
Area of an Object = Number of Pixels in the Object * Area Occupied by a Pixel in Square Centimetres
⇒ Area of an Object = Number of Pixels in the Object * GSD²
Where GSD (Ground Sampling Distance) is the distance between two consecutive pixel centres measured on the ground or the distance a side of a pixel represents. Here, it is the centimetres the side of a pixel denotes.
This also works under the assumption that all pixels are of the same size, and additionally denote the same real-life centimetres. Our formula won’t work if different pixels capture different amounts of distances on the ground, say, if one pixel captures 1 cm while another captures 10 cm of the ground.
Thus, our immediate problem becomes to find the GSD. Counting the pixels making up an object can be done rather trivially.
The following is the description of the problem statement.
There is an image, I, which has been taken from a drone, UAV, flying at a height, H. The dimensions of this image are I_W, I_L, and, I_D, corresponding respectively to the width, length, diagonal dimension of the image as measured in pixels. Additionally, the actual real-life area being captured has the following corresponding dimensions: A_W, A_L, and, A_D, measured in metres (m).
A nadir image is one which is taken with the drone camera pointing straight down at the ground. This image has been taken in a nadir position.
Additionally, the drone has a camera which has a focal length, F, and a sensor to capture the image with dimensions, S_W, S_L, and S_D, corresponding respectively to the width, length, diagonal dimension of the sensor as measured in millimetres (mm). All of these are the real sizes and not 35mm equivalent dimensions.
The next parameter is the field of view or FOV. This is expressed in degrees. Sometimes, people call it angle of view instead. This again is different for the width, length, and diagonal of the image as there are different amounts of areas being captured corresponding to each of these dimensions. So, we have three views with us: FOV_W, FOV_L, and FOV_D.
The final parameter is the one we are interested in finding out, GSD. As defined earlier, this is the real-life, actual distance each pixel side represents. Thus, the distance per pixel side. If we have the distance, in centimetres/metres, covered by the width or length of the image and then divide it by the number of pixels covered by that dimension in the image, then could divide the distance by the pixels to get the GSD. Thus, we have:
GSD (m)= A_W/ I_W = A_L/ I _L
Where (m) indicates that the GSD is measured in metres.
Now, let’s jump into actually solving this problem.
Our First Approach
This consists of an easy approach to estimate the areas covered by the image by using some basic trigonometry. Refer to our short yet detailed tutorial on solving this exact problem in one of our previous posts back in 2019, for details and derivations for the formula used:
A_D= 2 * H * tan(FOV_D/ 2)
A_W= 2 * H * tan(FOV_W/ 2)
A_L= 2 * H * tan(FOV_L/ 2)
Alternatively, we can also find A_W and A_L using the aspect ratio, r = I_W/ I_L and the fact that that A_W, A_L, and A_D form a right triangle, as follows:
A_L = A_D/ √(1 + r²)
A_W = r * A_D/ √(1 + r²)
Now, for calculating the GSD, we have:
GSD (m) = A_W/ I_W = A_L/ I_L
Tada! We are done with the first approach. If following this was tough, this video explains this approach very well as well.
Our Second Approach
Another common way to solve this problem is to use similarity to derive the more commonly used formula for calculating GSD.
If we take a look at how our camera sensor captures an image of the ground, we can see that there are two triangles that are formed, △AOB and △COD. Both of these triangles have a common angle i.e FOV = ∠AOB = ∠COD. The FOV being used here depends on the dimension of the sensor we are looking at. If AB is the diagonal of the sensor, S_D , then FOV_D = ∠AOB = ∠COD. In that case, A_D = CD. Similarly, if AB is S_W, then A_W = CD.
Since AB || CD, we see that ∠OAB = ∠ODC and ∠OBA = ∠OCD since they are alternate interior angles.
Since three corresponding angle pairs are equal in both the triangles, we have similarity by AAA criterion, △AOB ~ △COD. As a consequence of similarity we know that the ratio of the areas of similar triangles is equal to the square of the ratio of their respective sides.
AB²/ CD² = (1/2 * AB * F)/ (1/2 * CD * H)
⇒ AB/ CD = F/ H
⇒ CD = AB * H/ F
Because AB and CD can represent either the diagonal, width, or the length dimensions,
A_D = S_D * H/ F
A_W = S_W * H/ F
A_L = S_L * H/ F
Finally, since we know that
GSD (m) = A_W/ I_W = A_L/ I_L
We get,
A_W = GSD * I_W = S_W * H/ F
⇒ GSD (m) = S_W * H/ (I_W * F)
Similarly,
GSD (m)= S_H * H/ (I_H * F)
Tada! We have done it once again. We have solved the crisis of the missing GSD! And that’s a wrap!
In conclusion, choosing the appropriate formula from the above depends on which parameters you can access and trust. As an example, you might have found the focal length of your camera for a given setting through the EXIF data, but then maybe you don’t trust the data being reported. On the other hand, you might know the default field of view of your camera from its official documents but then, you find out that the field of view keeps changing from one mode of the drone to the other, for different aspect ratios, zoom levels, etc. (it is quite a mess).
Going through all these formulae and deriving them was a fun and educational experience. It gives us a clearer understanding of the caveats of using drone parameters in scientific research. We are now using these to estimate the size of river dolphins in the Ganges and better understand their age, body structure and health.
We hope you find this useful for your work- have fun and tread carefully. If you have any comments, or use some completely different way to solve this problem, we would love to hear from you- write to us at <contact@techforwildlife.com>
Cheers!