Technology for Wildlife Foundation

View Original

Using Computer Vision to Identify Mangrove-containing pixels in Satellite Imagery

This blogpost has been written by a team of 3rd-year BTech students from PES College, Bangalore: B Akhil, Mohammad Ashiq, Hammad Faizan and Prerana Ramachandra . They are collaborating with us on a research project around the use of computer vision and satellite imagery for mangrove conservation purposes.

Mangroves are plants  that grow in salt marshes, muddy coasts and tidal estuaries. They are biodiversity hotspots and serve as nurseries for fish stocks. They also help in maintaining the quality of water by filtering out the pollutants and sediments. Mangroves can flourish  in places where no other tree can grow, which makes them important ecosystems that help prevent coastal erosion and provide protection from flooding and cyclonic events. Furthermore, mangroves have the highest per-unit area rates of carbon sequestration (Alongi 2012) among any ecosystem, terrestrial or marine. Despite the ecosystem services they provide, mangrove forests are among the most threatened ecosystems on the planet. Globally, we have already lost 30-50% of all mangroves forests (WWF Intl. 2018) in the last 50 years and mangroves continue to be cut at rates 3-5 times higher than terrestrial forests every year.

One part of the solution in the puzzle to better conserve mangroves is to better document and monitor their existence, and the ecosystem services that they provide. So far, Technology for Wildlife has used traditional remote sensing methods on satellite and RPA imagery to understand the extent of mangroves. Our team is experimenting with the  use of computer vision to detect mangroves in satellite imagery. Through this project, we hope to develop this technique and compare its accuracy with that obtained using traditional spatial analysis methods. We  are also interested in this project because of the possibility of implementing a machine learning model that could become better  at detecting mangroves over time. Finally, the prospect of creating an automated monitoring system that systematically evaluates satellite data and detects changes in mangrove cover could be a significant tool for the conservation of mangrove ecosystems, both in Goa as well as globally.

In the rest of this post, we will outline the methods we considered for this project , as well as our reasoning for our final selections. The three major categories of  methods we considered for this project are:

(i)Machine Learning approaches, 

ii)Deep Learning approaches and 

iii) Image Processing Techniques. 

The Machine Learning approach  includes techniques such as decision trees, which is an approach of vegetation classification done by matching the spectral features or combinations of spectral features from images with those of possible end members of vegetation types. Other techniques include K-Means and IsoData algorithms, both of which are unsupervised, easy to apply and widely available in image processing, geospatial information and statistical software packages. 

The Deep Learning approach deals with architectures such as classification using Siamese residual networks (SiResNet) in which a 3-D Siamese residual network with a spatial pyramid pooling (3-D-SiResNet-SPP) is used which learns discriminative high-level features for hyperspectral mangrove species classification with limited training samples. Other potential techniques which could be used for better training of the model is the Chopped picture method, where images are dissected into numerous small squares so as to efficiently produce training images, and Convolutional Neural Networks, which are a class of deep neural  networks, most commonly applied to analyzing visual imagery. One could also use Mask - RCNN, which is a deep neural network designed to solve instance segmentation problems in machine learning or computer vision algorithms. An architecture which can be used for segmentation is a U-net neural network which is a standard CNN architecture for image segmentation tasks.

Under Image Processing, the techniques available include Gabor Filtering (which is widely used in image texture segmentation) feature extraction (where we use Hadooop to extract features from large datasets) and the Colour based approach (it deals with methods like k-means clustering and colour extraction using HSV model), among others. 

Choosing an appropriate method depends significantly on the data available. For training our model we have used USGS EarthExplorer to download Landsat 8 images. Each image consists of 8 channels, containing spectral information across several different wavelengths in the visible and near-infrared portions of the electromagnetic spectrum. The samples used to train the model were labeled at the pixel-level i.e. each pixel in the sample has an attribute value.  These attribute values are binary in nature, with a value of 1 representing the presence of mangroves, and a value of 0 indicating the absence of mangroves. Due to the limited spatial resolution of Landsat images, direct visual interpretation is difficult. The criteria initially used to label the mask data were a combination of altitude values from SRTM data and NDVI values from Landsat 8 data. If a specific pixel meets the required criteria to be tagged as ‘mangrove’, then it is labeled with a value of 1, or else given a value of 0.  For future iterations, we’ll be developing a masking process that includes aerial imagery and more sophisticated spatial analyses.

The method we chose for our project is segmentation using a U-net neural network. U-Net is considered to be a standard CNN architecture for image segmentation tasks. Segmentation is similar to image classification but in segmentation instead of just classifying the image based on the object present each pixel is classified to belong to a specific class i.e. segmentation requires discrimination at pixel level. U-net was originally invented and first used for biomedical image segmentation. Its architecture can be broadly thought of as an encoder network followed by a decoder network. 

The encoder is the first half of the architecture. It is usually a pre-trained classification network like VGG/ResNet where convolution blocks are applied first, followed by  maxpool downsampling to encode the input image into feature representations at multiple different levels. The decoder is the second half of the architecture. The goal here is to semantically project the discriminative features learnt by the encoder onto the pixel space to get a dense classification. The decoder consists of upsampling and concatenation followed by regular convolution operations. Upsampling is done to restore the condensed feature map to the original size of the input image, therefore expanding the feature dimensions. Upsampling is also referred to as transposed convolution, upconvolution, or deconvolution.

 

The U-net architecture offers some advantages over other segmentation techniques. In U-net architecture, the network is input-image size agnostic since it does not contain fully connected layers. This also leads to a smaller model weight size, hence also making it computationally efficient. The architecture is easily understandable, and can be scaled to have multiple classes. Architecture works well with a small training set, due to the robustness provided with data augmentation.

Deep U-net architecture is employed to perform segmentation. Image augmentation is used for input images to significantly increase training data. Image augmentation is also done while testing and mean results are exported.We plan on using Tensorflow Keras with python and its libraries to build our model, which we’ll be running on real-world data.

If you have any questions or comments on our work, please reach out to us through the contact form on the website.