How we use Excel for geospatial data analysis and visualisation
I’ve been giving a number of online talks over the past few months, talking about both the general nature of our work as well as about how we accomplish specific technical tasks. During one session of the latter type, I mentioned how a lot of our GIS work actually happens in MS Excel and one of the audience members wanted more information on what that actually entailed. I attempted an answer then, but a GIS task this morning really highlighted how much I need and love MS Excel. I’m going to use this blogpost to document exactly what I did, so if I’m ever asked this question again, I’ll have some documented proof to support my response!
In brief, I was provided with an Indian Government gazette notification detailing the locations of a set of offshore mining blocks off the coast of India. This was in PDF format and while it comprised both text and tables, the relevant data was in tables alone. There seemed to be only 60 or so rows of data, with about 9 columns. In the distant past, I would have just hand-coded the entire thing using the num-pad on my keyboard, but this time I used one of my new favourite OCR tools (Convertio) to rip the relevant pages of the PDF into a Word document. The tables were then copy/pasted into Excel, and that’s where the real work began.
As can be seen from the image above, the spatial information is in an odd format, not really ingestible by most GIS software which needs, at a minimum, coordinate pairs to represent points. In addition, I prefer working with decimal degrees rather than the degree-minute-second format as I’ve had some traumatic experiences with apostrophes and quotes in GIS software. So, now working only in Excel, I first did a quick check/fix to catch the few errors from the OCR process ( 8’s read as 3’s, 7’s read as 1’s), I created four new columns (Lat_E, Lat_W, Long_S, Long_N). Each deg-min column pair was converted into a decimal degree format in one of the new columns using the standard conversion formula [ Decimal Degree = Degree + (Minutes/60) + (Seconds/3600)]. I also created a unique ID (UID) for each row by combining the grid number with the initials of the area; for example, Block 12 in the Arabian Sea has a UID of 12_AS.
Now, for the more complex part; GIS software can ingest text files and visualise spatial data as points, lines or polygons. My desired output for this task was a demarcated text file which had all the information required to visualise each offshore mining block. One method would involve creating 3 new rows for each block record, where each row would eventually contain the coordinates for the NE, NW, SE and SW points. This could be done manually, using cell-handling commands in Excel but would be a lot of very boring manual labour, and I would then need to combine the points together in a GIS package to obtain my polygons. Alternatively, I could write a short piece of Python code which would convert the Excel sheet into a GeoJSON file with the appropriate geometry attributes. This is the most powerful of all available options, but felt like overkill for this task.
Instead, I created a new column and put together a quick cell-handling formula that converted the spatial information into a Well Known Text (WKT) polygon format that GIS software such as QGIS is capable of reading. The formula itself looked like this
=CONCATENATE("Polygon ( (",O2," ",M2,", ",P2," ",M2,", ",P2," ",N2,", ",O2," ",N2,") )")
where the cell reference numbers refer to the (Lat_E, Lat_W, Long_S, Long_N) columns in the correct pair-grouping to obtain the NE, NW, SW and SE corners of each offshore mining block.
The file was then saved as a Comma-Separated Value (CSV) file. I used QGIS to read the file, selecting the WKT format as the input option, and the polygons appeared! After a quick geometry repair process, I configured the labels, exported the file in KML format for visualisation in Google Earth Pro and also created a rough static map, and the task was complete.
This is just one among the many ways in which we’ve used Excel to do the initial heavy lifting for spatial analysis and visualisation tasks, reserving the more specialised GIS tools for when they’re really needed.