The Poor Man’s Proximity Tool: The Haversine Formula

The haversine formula is a method for finding the great-circle distance between a pair of latitude-longitude coordinates. In crime analysis I use the formula for finding incidents that have taken place within a particular distance, for example within 500 meters, of a location.

My approach uses Excel to calculate the haversine distance between a reference location and the latitude-longitude of every other location in my spreadsheet. I then sort the results to find the locations that are within the distance that I’m interested in. I’ve worked the following example in Excel to illustrate the method.

Imagine that you have a list of 1000 occurrences with the latitudes and longitudes of each of the occurrence locations. Now also imagine that you have a reference location and that you want to determine how many of the 1000 incidents occurred within 500 meters of the reference location. I set my spreadsheet up like this:

haversine_1

Note that in column B I have entered all of the latitudes and in column C I have entered all the longitudes. I have also entered my reference latitude and longitude in cells E2 and E3 so that they can be referenced by the haversine formula. Also note that both latitudes and longitudes are decimal degrees as opposed to degree minutes seconds.

In column D I enter the Excel-ified version of the haversine formula which I have reproduced below. It gets a bit complex because Excel likes to deal with radians as opposed to degrees when using the COS and SIN functions so it is necessary to use Excel’s RADIANS function to convert the latitudes and longitudes. The final multiplication by 6371 represents the Earth’s radius in kilometres.

haversine_2

A note for pedants: I realize that the Earth is not a perfect sphere and that means that the calculated distances will be somewhat off if you are dealing with large distances. But, for the distances we are concerned with–distances of 10s of kilometres–the impact is negligible and can be ignored.

By filling column D with the expression I calculate the separation distance between the reference point and each incident. Now, if the distance column is sorted from smallest to largest I can easily see the incidents that occurred within 500 meters (or 0.5 kilometres) of the reference location.

haversine_3

I find this method useful for determining the proximity of all sorts of things. For example, if the reference coordinates are for an intersection the method can be used to find all motor vehicle collisions that occurred within 100 meters. Or say there was a break and enter, a proximity search can be done against field contacts that occurred within 1 kilometre of the occurrence address. Really anything that has a recorded latitude and longitude can be used. Best of all, it doesn’t require a GIS, just Excel and the formula.

 

Creating Heat Maps from Excel Pivot Tables with Conditional Formatting

I frequently use Excel’s Pivot Table functionality to examine the relationships between variables. While it is a common tool in the analyst’s toolbox I don’t like the default visualization options. This blog post will discuss an approach for visualizing pivot table data using Excel’s built-in conditional formatting functionality to create heat maps.

A Heat Map is a chart that uses colour to visualize a two-dimensional matrix of values. Since pivot tables make the creation of two-dimensional output so easy the heat map is the perfect tool for visualizing a pivot table. The process is straightforward and I demonstrate it with an example.

I start with some fictitious data that has day-of-week and hour-of-day properties. Using the standard approach I create a pivot table that has day-of-week as the columns and the hour-of-day as the rows.

sample_pivot_table1

Next, I highlight the pivot table data and copy it to a new worksheet (this is for future formatting). I then highlight the entire data range and select the Conditional Formatting button in the ‘Home’ ribbon. Under Conditional Formatting I select Color Scales and one of the colour ramps that are available.

sample_conditional_formatting

This immediately colour codes the entire range and creates an attractive heat map that will dynamically adjust its formatting if the cell values change. With a small amount of additional formatting (some bolding, a border, centre-aligned text) a very nice chart can be produced that can be added into a report.

sample_heat_map1Contrast the heat map with the following chart that plots the same data using multiple line charts. I find the heat map much more compact, readable and intuitive.

sample_line_chart

Conditional Formatting offers a quick way to produce heat maps for the kind of two-dimensional datasets analysts produce everyday. I have found that heat maps are an effective way to visualize a large volume of data in a manner that is easily understood by non-analysts. Best of all, if you have Excel 2007 or higher the tools are already installed on your computer.

Bonus Tip
You may come across the situation where you want to keep the underlying heat map cell colours but delete the numbers in the cells. Unfortunately, deleting the numbers also deletes the colour since this is how conditional formatting works. Frustratingly, copying and pasting in Excel also doesn’t work, not even with ‘Paste Formatting’, as it’s not possible to maintain the format without the numbers. Despite these headaches, I did find a trick for keeping the colours without the numbers.

First, highlight the whole heat map and copy it. Now open Microsoft Word and copy the heat map into a Word document.

Second, select the heat map in Word and copy it. Now open up a new Excel document and paste into the first cell. The heat map should be pasted in its entirety back into Excel and if you delete the numbers the colours should still be there.

Update: Reader Frank Fery writes that it is possible to remove the numbers from a heat map without resorting to the copy-to-Word trick. He provided a link to superuser.com that states that if you change the custom format of the cell to ;;; it will hide the numbers.  I still prefer the Word way though because it allows me to actually lay out different numbers on top of the heat map. Why would you want this? As an example, I once created a heat map where the surface represented calls-for-service volume but the numbers on the chart were numbers of officers working. Thanks for the feedback Frank.