Using Excel to Jitter Coincident Point Features

When working with crimes as point features some addresses such as apartment complexes or shopping centres have multiple occurrences geocoded to the same location. When the points are mapped they are stacked on top of each other and look like a single point. One solution is to use graduated symbols that scale the point but I find that approach an inelegant solution because I don’t think one large dot has the same visual oomph as a collection of tightly clustered points. Also, when you collapse all coincident points into a graduated symbol you can’t colour code them by age since there is only one feature.

So what to do? Jitter the points. By jitter I mean add a small, random fluctuation in the x and y coordinates of every point so that previously coincident points become overlapping but visually discernable points. Many GIS programs have the ability to do this with some tool or another but I prefer to jitter points at the data file level so that I can preserve the jitter, edit it by hand if need be, and maintain a record of the pre- and post-jittered locations. I have found that Excel makes jittering points quick and easy using the built-in function RandBetween().

A worked example will help illustrate the benefits of the technique. Below is a list of point features that I am going to map in ArcGIS. You’ll notice that this contrived example has 6 points that all have the exact same UTM X and Y coordinates.

If I put these on a map I will get the following not very helpful map.

To better visualize the points I will jitter them using the aforementioned Excel function RandBetween(). RandBetween() is superior to regular old Rand() because it produces a random number between any user-specified lower and upper boundary which makes it ideal for moving points around. The following image shows how simple it is to use.

The X_UTM_JITTER column is calculated by adding the current X_UTM value to the output of the RandBetween() function where the lower and upper boundaries are -100 and 100. Since the range for RandBetween() is -100 to 100 there is an equal chance of moving the point left or right (this is the X-axis recall). Doing the same for Y_UTM (random up or down movement) means that the coordinates for each point feature will be randomly translated in a square around the true values. Since the numbers output from RandBetween() are random each point will be uniquely jittered ensuring that the points no longer overlap. The result is this Excel table:

Which, when mapped, looks like this:

Recognizing that my UTM units are meters my choice of -100 and 100 comes down to personal choice — you don’t want to jitter so much that the points stop clustering but you want to jitter enough to create the effect. The correct amount of jitter comes down to the scale of the map and I have found that at the map scales I traditionally work with a 100 meter jitter works well. If it looks right at the desired scale, go with it.

I’ve had some people object to the jittering process saying it introduces imprecision in the map. That’s true, but once again it has to do with scale. Look at the following three map snippets.

Too much jitter.

Not bad.

Maybe not enough?

Obviously, in the first map, the jitter of 100 meters has significantly shifted the point features and if someone was using the map to locate the points in real life then they would not be well served by this technique. But, as you scale out the points begin to resemble a simple cluster of point features which emphasizes that there is multiple points in close proximity. For crime mapping, which is often intended to give a flavour of an area as opposed to specify exact locations, the scaled out clusters work quite well.

Two final things. First, once the jitters have been calculated in Excel it makes good sense to copy and paste as values the jitter columns. The reason is that RandBetween() will be executed every time the spreadsheet is loaded or one of the columns changes and this means that the values will be constantly changing. By pasting as values the jitters are locked in place and can be referenced at a later date without having them change. Second, it is possible to jitter using decimal degrees, you just have to pick smarter RandBetween() parameters so that you don’t end up sending your points to other side of the planet.

This entry was posted in Excel Tricks and tagged . Bookmark the permalink.