Prioritizing Using Exponential Decay

I had an officer ask the other day if I could identify the top 10 subjects in our records management system that were involved in mental health occurrences. On the surface this is a straightforward request: identify the people with the most activity but it gets trickier when you factor in the passage of time. For example, is someone who had 3 incidents over 6 months ago more of a concern than someone who had 2 incidents in the last 8 weeks? Clearly recency should factor into how a person is prioritized but history should also weigh in. To address questions like this we can use an exponential decay equation in Excel to preferentially weight recent incidents while still giving some weight to the past. In this post I will outline how to apply the method.

The first step is to obtain some data. Depending on the nature of the incident I will set an arbitary historical cut-off; in this example we’ll set it a year. This then means that we need to extract every mental health occurrence over the last year along with the date of the occurrence and a unique person identifier.


For each occurrence we then calculate the age of the incident in days by subtracting the date of the occurrence from today. This is done using Excel’s NOW() function to get the current date and the INT() function to trim off the time since we are working strictly with days. the following screen shot shows the formula in the function bar.


Next we want to create a column for the exponential decay formula. Most people are familiar with the idea of exponential decay, it is a mathematical formula where a quantity decreases at a rate proportional to its current value. In practice that means that the value drops off quickly and first and then more slowly as the value shrinks. The chart below shows a graph of exponential decay for a number of different constants. As can be seen in the chart we can tune how fast the values drop off by adjusting the constant.


In Excel, the formula for exponential decay is =EXP(-constant*time) where we get to specify the constant and time, in our case, is the age of the incident. Since we’re dealing with days and we only have a year of data we can decide how much we want past incidents to influence a person’s priority. For example, if we say that a mental health incident that happened today is worth 1 point, how much is an incident that happened 6 months ago worth? If we say that it’s work 0.5 points then we can calculate our constant as: -1*LN(0.5)/180. Where LN() is Excel’s natural logarithm function, 0.5 is point value we want to calculate and 180 is the (approximate) number of days in 6 months. This calculation results in a value of 0.003851 which we can use as our constant.


Back in our data worksheet we can finally create our new column that calculates the decay score for each incident based on its age. We use the exponential decay formula =EXP(-1*0.003851*age) to calculate the decay score for each incident as in the following screen shot. Note that the decay value will always be less than 0 but never reach 0.


Now that we have a decay score for each occurrence we need to create a pivot table of all the data. Highlight everything and select ‘PivotTable’ from the ‘Insert’ tab. This will open a modal window and you can just hit the ‘OK’ button to create your pivot worksheet.


On the pivot table worksheet drag the Identifier field to the Row Labels box and the Decay Score field to the Values box. Click on ‘Count of Decay Score’ in the Values box and select the ‘Value Field Settings’ option at the bottom of the pop-up window. Change the selected option from Count to Sum and click ‘OK’.


Almost done. Back in your pivot table right-click the Sum of Decay Score column and sort ‘Largest to Smallest’. This will sort the persons by their aggregated decay scores and the people at the top are the highest priority individuals.


This method works because the exponential function embodies the principles we were originally trying to capture: it gives more weight (a higher score) to recent incidents but it also gives some score to historic incidents as well. For every occurrence a person is involved with they have the potential to add up to 1 point to their total score (if the incident happened just today) but as the incidents age they influence the total score less. In our example above the person with the 3 incidents over six months ago has, at most, a score of 1.5 (since 0.5 is the 6 month mark) while our 2 incidents in the last 8 weeks person is likely to have a score a little less than 2. So this system weights these people the same by balancing the influence of recent and history. The best part is that we can tune how much weight we give to the two factors by adjusting the exponential decay constant. If we decide that an incident 6 months is only worth 0.1 point rather than 0.5, which greatly privileges recency over history, we can adjust the constant and rescore everyone. On the other hand, if we wanted to make a 6 month incident worth 0.9 points, we could recalculate the scores and privilege history over recency.

I like this method of prioritizing because it’s simple to implement in Excel (no plug-ins!), it’s logically justifiable and it allows the impact of recency and history to vary according to the expertise of the analyst. I hope you find it useful and good luck prioritizing!

Performing a Statistical T-Test in Excel

The value of the t-Test is that it can tell you if a difference you are seeing in two measurements is legitimate or if the difference is likely to be just from randomness. In this blog post I am going to use the t-Test to determine if the difference in the number of daily motor vehicle collisions during the winter is meaningfully different then the number during the summer. While this is kind of a no-brainer example it will help illustrate the value of the t-Test and give you some ideas on how you might apply it to give some statistical rigour to your analytical findings.

Before we get into the test let me provide some background to help understand what it is we’re trying to do. In statistics we are always comparing two hypotheses. The first is named the null hypothesis and the second is named the alternate hypothesis. We almost always want to reject the null hypothesis and accept the alternate one because in that situation it means something interesting has occurred. For our motor vehicle collision example we want to determine if there is a meaningful difference in the number of daily motor vehicle collisions during the first quarter of the year (i.e. the winter) versus the third quarter of the year (i.e. the summer). Now, you may be thinking “obviously there’s a difference, the weather” but let’s, for the sake of education, assume that we need to demonstrate statistically that there is a difference. In this study the null hypothesis would be that “there is no difference between the quarters, the  number of collisions are the same” while the alternate hypothesis would be “yes, there is a statistically significant difference between the two quarters”.

The term “statistically significant” needs some explanation. When we want something to be statistically significant we want to be able to say, with a particular level of confidence, that the results we are seeing are not just due to chance. First we pick our “confidence level” or how sure we want to be and then, through the magic of statistics, we are provided a number that our results have to beat in that we can be that sure. In this example we’re going with the 95% confidence level which means that if we find a difference between the average number of MVCs during the winter and summer we want to be 95% sure it’s not due to chance. Or, in other words, we are willing to wrongly reject the null hypothesis only 5% of the time. We can up the confidence interval to 99% if we want but, as we’ll see, we’ll need to adhere to even stricter conditions. Whatever we choose just remember that the “significant” part in statistically significant doesn’t mean “important” (as significant typically means in everyday conversation) it just means “not due to randomness”.

How do we go about demonstrating that the two quarters have real, non-random differences in the number of daily motor vehicle collisions? The first thing we do randomly select from the first and third quarters 30 different days worth of MVC counts. The screen grab below shows my randomly selected MVC data.


Notice that we there are two columns of data, one marked Winter and one marked Summer and each has 30 entries. Also notice that at the bottom of each column is a number labeled “Mean”. In those cells I have used Excel’s AVERAGE function to find the mean of all the numbers (add them all up, divide by 30). And behold, the means are different. That proves the quarters are different, right? Not quite. We still have to deal with the peskiness that is statistical significance. It’s possible, after all, that it’s just through chance those means are different. As discussed above we want to be 95% sure that they aren’t.

To prolong your agony I’m actually going to show you two ways to perform the t-Test: the (relatively) quick way using Microsoft’s Data Analysis ToolPak Add-in and the longer, manual way that uses built in Excel functions.

First, using the Add-In. 

The Analysis ToolPak is a free Add-In provided by Microsoft as part of the default installation of Excel. Add-Ins are a kind of Excel file, typically with a .xlam extension, that package together a bunch of related functionality. To get started with the Analysis ToolPak Add-In we need to check if it is installed. You may or may not have it already installed but to check if you do you click the “Data” tab in Excel and look at the far right. If you see “Data Analysis” as an option, it’s already active (see screen shot).


If you don’t see it that means we need to turn it on. First, click the green “File’”tab (or the Excel jewel if you’re in 2007) to call up the Save/Open/Close menu. Look around the option named “Options’”and click it. This will open up a window with a menu on the left. Click on the menu option named “Add-Ins” on the left hand side. The screen will change and at the bottom, beside the word “Manage”, will be a select box and a button labeled“Go”. Click the“Go”button to open yet another window and you will see a series of check- boxes. Fingers crossed that one of them says “Analysis ToolPak” and is unchecked. Click the checkbox and click ok. Now go back to Data tab, look at the far right and you’ll see “Data Analysis’”. You’re set.

Now that the ToolPak is installed click the “Data Analysis” button and a modal window will open up. Click “t-Test: Two-Sample Assuming Unequal Variances” in the list and click the “OK” button and this will open up a new window (check the screen shot below) with some options. For the “Variable 1 Range” click the little arrow button and select all of the “Winter” data, which is in B2:31. For “Variable 2 Range” select all the “Summer” data from C2:C31. For the “Hypothesized Mean Difference” enter 0 (since we’re trying to prove they are the same, the difference of the mean should be 0) and everything else can stay the same. However, notice that there is field named “Alpha” that has the value 0.05. It’s not a coincidence that 0.05 = 1 – 0.95. Alpha is another way of asking how sure we want to be.


Click the “OK” button and Excel will open up a new worksheet populated with a bunch of labels and values just like the one in the screen shot below.


So what are we looking at? Well, there are the means we saw. Next comes the variances, which are measures of how spread out the values are, and they are quite different (which is why we needed to use the test that assumed unequal variances) and a little further down we see our “t Stat” and it has a value of about 4.35. Skip down two lines to the value labeled “t Critical one-tail” and note that it is about 1.68. That’s good news for us because the rule is that if your “t Stat” is larger than your “t Critical” value then the null hypothesis can be rejected and our results are therefore statistically significant. This means that we can now state, with the power of math backing us up, that we are 95% sure that the daily average number of MVCs in the winter are greater than the daily average number of MVCs in the summer.

Second, the manual way. 

So we used a built-in tool to do a bunch of heavy lifting for us (as tools should) and we read an answer off a table. It works, but how did it work? How can we calculate these numbers if the ToolPak isn’t around but we still, for some reason, have access to Excel?

First, have a look at this screen shot because it will help to explain each of the steps I used to perform a manual unequal variance t-Test.


On the left are the two original columns of data, one for winter and one for summer. Notice that there are 30 entries for each season but that I used “freeze panes” so that I can show the bottom of the data (starting in row 32). For both seasons I calculated three additional variables, the mean (using Excel’s AVERAGE function), the standard deviation (using Excels SDEV function) and the count of the number of data points (using Excel’s COUNT function). These will all be needed in the coming calculations.

On the right hand side of the screen shot I produced my own little table and I’m going to walk through each of these variables and explain the thinking behind them.

The first needed variable is the Sum of the Squared Deviations for which I used Excel’s DEVSQ function as a short cut. Think of the variable like this: you have a mean value for  a season and you have 30 different data points that are either going to be a little bit more or a little bit less than the average (as data points usually are). For each data point, subtract its value from the mean and square it. Now sum those squared differences all up and that’s the sum of the squared deviations (or differences). Notice that I added together the sum of the squared deviations for both the winter and summer seasons together into a single value.

Why did we need the sum of the squared deviations? Because we want to calculate the pooled sample variance. Why pooled? Because we put both the winter and the summer values together. But what’s variance? The average of the squared differences from the mean. This is easy to calculate, just divide the sum of the squared deviations by the number of points we have in our two seasons, minus 2 (1 for each season) to account for Bessel’s correction which is a correction for the bias in the estimation of the population variance (just trust me, subtract 2, Excel does).

The next variable is the Standard Error of Difference in Means and to calculate it we take the square root of the pooled sample variance times the sum of one over the counts for the winter and summer data points (i.e. 1/30 + 1/30). But what is the standard error of difference in means? It’s just the average expected difference between the means for two samples that have this many data points. We’re trying to determine, after all, if two means are significantly different from one another and, given the size of the two sample groups, we should expect that there would be some difference. Now, imagine that we run this little study over and over again with other, randomly selected groups of MVCs, each pair would have their own difference between the means. That sounds like a lot of work, so instead of running this study over and over again we can instead use the formula for the standard error of difference to answer the question: “what’s the expected average of that difference?”.

But what do we need that for? To calculate our t-statistic. And to perform that calculation we just divide the actual difference in the means (winter mean minus summer mean) by the standard error of difference in means to end up with approximately 4.35, just like the ToolPak did.

But we’re not done, we still need to determine our critical t-score at the 95% confidence interval. To do that we cheat a bit and use Excel’s built in TINV function. The first input is the probability that we want to test, in this case it’s 0.05 (i.e. 1-0.95) or the alpha from the ToolPak. The eagle-eyed will note that in the calculation in the screen shot 0.05 is actually multiplied by 2 and this is because we’re interested in replicating the “one-tailed” result from the ToolPak and since TINV is “two-tailed” we need to double up the probability to account for that.

TINV also takes a second parameter, degrees of freedom, and usually degrees of freedom is pretty straightforward (it’s just the sum of the two counts, minus 2, like for the pooled variance) except when we’re dealing with an unequal variance scenario. In order to handle this scenario we need to break out the more elaborate calculation defined by Moser and Stevens 1992 and spelled out in the ugly Excel formula:

=(1/B34 + F8/C34)^2/(1/(B34^2*(B34-1)) + F8^2/(C34^2*(C34-1)))

Where the value in cell F8 is calculated as =C32^2/B32^2 or the ratio of the squared standard deviations for each season. God, it’s a mess, I know. But just dissect the formula and you’ll see that it’s just a complicated formula using the counts for the two seasons in multiple places. One thing you may notice is that the resulting degrees of freedom is not an integer and that we have the approximate value 48.58. The ToolPak results above have a degrees of freedom of 49. This is because Microsoft rounds. You can choose to round or not, the results are pretty much the same.

With degrees of freedom we can finally run the TINV function and when we do we get a critical t-stat of 1.68, just like the ToolPak (although the ToolPak number is slightly different further into the decimals because I didn’t round my degrees of freedom). And just like with the ToolPak, our t-stat is way bigger than that critical t-value so we can reject the null hypothesis and assume that we have statistically significant results at the 95% confidence level. Yes!

And with that we are finally done our odyssey of manually calculating the unequal variance t-test. We’ve calculated the same results as the ToolPak and we got right into the nitty-gritty of it. Well done.


If you made it this far (and didn’t just skip the manual section) then bravo, that was a real grind. Either way, I hope you learned something about the t-statistic and how it can be used to add some rigour to pronouncements about means. I do, somewhat unfortunately, find that a lot of the time the t-stat plays spoiler in that decision makers see a difference in averages and conclude that something is working when the reality is that, statistically speaking, there is no significant difference. Nevertheless, it’s important to be able to back up your analysis with some math and t-stats are a great way to provide that oomph. Good luck.


Understanding the Difference Between Mean and Median

As analysts we are often interested in averages. But an average is a tricky thing because the word average actually covers several related concepts. In this blog post I am going to discuss two different concepts of average – the Mean and the Median – and why it’s crucial to understand their differences.

The Mean is what we typically mean when we use the term average. The Mean is calculated by adding everything up and dividing by the number of items. The resulting value is supposed to represent the centre of all the values. People intuitively understand the Mean because a lifetime of being exposed to the concept of the bell curve (or the normal distribution as it is called in math circles) has taught them that the big peak in the middle of the curve is the average.

The Median is less well known but just as easy to conceptualize. Consider all of the items in your collection and sort them from smallest to largest. The Median is then the value of the item that sits in the middle of the sorted list. In other words, it’s the value of the item that splits the collection in half with an equal number of items above and below it.

When dealing with data that follows the bell curve, that is data that produces a symmetric and unimodal (single peak) distribution, the Mean and the Median have the same value and we don’t have any problems. The issue is that a lot of data we’re likely to encounter in our day-to-day work— such as call response times, time spent on patrol, man hours spent on an occurrence—are not accurately represented by a symmetric bell curve but instead follow a skewed distribution that is weighted more to one side than the other.

But what does this mean for averages? It means that for data that is positively skewed (the bulk of the data is on the left side of the chart) that the Mean is going to be larger than the Median and vice versa for negatively skewed data. The problem arises when the difference between the Mean and Median gets large enough to meaningfully change the stat that you are reporting.

For example, the number of man-hours spent on an occurrence is the kind of data that is going to be positively skewed. This is because of the nature of the data: 0 is the lowest value that can exist but the upper limit is not bounded. If you create a histogram of the data you’re likely to find a whole lot of occurrences with a small to medium amount of time being spent (the peak on the left) but also a few occurrences with a lot of time being spent (the long tail on the right). I’ve created the following simplified data set below to illustrate the scenario.

You can see in the screen shot that the occurrence that took 24 hours is clearly an extreme example but it has a significant impact on the Mean. If a Commander came to you and asked for the average number of man-hours spent on occurrences do you feel that the Mean of 8.8 would be representative, considering that only one occurrence actually took longer than that? Clearly the Median value of 5 hours is more representative of man-hours and is a more accurate average.

This result comes about because the Mean is sensitive to the inclusion of extreme values because of the way it is calculated while the Median, which is relatively indifferent to the inclusion of extreme values because it only concerns itself with the middle value for a data set, is more likely to provide an appropriate number that is stable in the face of outliers.

The screen shot also conveniently illustrates the function calls used to calculate both the Mean and Median in Excel. Frustratingly, Microsoft has opted to use the function name AVERAGE for the Mean further erroneously cementing them as synonyms. Thankfully the Median is a straightforward call to the MEDIAN function.

You’ll notice a third function in the list: SKEW. I discuss above about the positive and negative skewness of data and the SKEW function is useful for coming to grips with the direction and magnitude of a data set’s skewness. In the occurrence man-hours example I state that the data is positively skewed and that’s reflected by the positive (greater than 0) value returned by the SKEW function and that indicates that the Mean likely overestimates the Median. Contrarily, if SKEW returns a value that is less than 0 you’ll know that the data is negatively skewed and that the Mean likely underestimates the Median. Finally, if the value is close to 0 your data set likely follows the bell curve or normal distribution and the Mean and Median will be nearly the same.

The key take away from this post is that it is important to not just blindly take the average (by which I mean the Mean) of a stat without first considering how the data is distributed. So much law enforcement data is skewed one way or the other that it makes sense to take a minute and run the MEDIAN and SKEW functions in Excel to make sure you’re reporting the most appropriate number to your superiors.

Cumulative Distribution Function Charts in Excel

Most analysts know what a histogram is: it’s a chart where each piece of data is put into one of several bins and then the bin counts are plotted in order to gain insight into how the data is distributed. Histograms are useful as they have an intuitive appeal because it’s easy to see how the data is distributed just by looking at the chart. Of course, they aren’t perfect. The whole binning process is pretty subjective and depending on how the bins are selected the results can be different, which is deceptive. If only there was an alternative that could address these shortcomings…

Well, there is: histograms have an underutilized sibling known as the Cumulative Distribution Function (CDF) chart which compliments and, I think, improves on the histogram by doing away with some of the histogram’s problems and offering some useful innovations of its own. Namely, the ability to read off the chart what percentage of the data falls between two points, something that is not possible with a histogram. In this post I will discuss why I think you should make the CDF part of your analytical toolbox and also demonstrate how easy it is to create a CDF in Excel.

Convince me

First, two example charts that describe my dataset: a completely made up list of 30,000 calls-for-service and how long they last from initial call to the closing off of the incident. The first chart is a standard histogram and the second is the corresponding CDF. For both charts the horizontal axis is the same: it’s hours. For the histogram the vertical axis is a count of how many calls are in each bin while the vertical axis of the CDF is a measure of percentage (why percentage will become clear in a bit).


So the histogram is the same old boring story. We have a normal distribution that is skewed to the left (fatter on the left than the right). Looks like most calls take about five hours and only a few go above 12 hours. Yawn.


Ok, the CDF looks great. But what’s it good for? What can it tell us that the histogram can’t? Remember that I mentioned you can use a CDF to determine what percentage of the data falls between two points? Let’s explore that now.

Suppose I want to know how many calls take less than 5 hours (that is, between 0 and 5 hours). I can read that right off the graph by going across the horizontal axis until I reach 5 and then going up until I hit the trend line. I then read across to the vertical axis and see that the value is about 60%. What this means is that 60% of all calls take 5 hours or less. That’s a much firmer number than ‘the most calls are around 5 hours’. Similarly, what if the question was how many calls take between 5 and 10 hours. Well, first read off the 10 hour value, which is about 99% and we know the 5 hour value is 60% so just take the difference 99-60 to arrive at 33% of all calls take between 5 and 10 hours. Let’s see you do that with a histogram (hint: you can’t do it).

I’m convinced. How do I make one in Excel?

Creating a CDF is very straightforward and the following five steps describe the entire process. I’ve included a quick Excel screen grab to illustrate what the first few rows of the data will look like when you’re done (it’ll make more sense when you’re done the steps).


Step One: Start with your data in Column A in Excel sorted from smallest value to largest. My data had a lot of zero values, so that is why the column starts off with a series of zeros.

Step Two: In Column C (yes I intentionally skipped B) put the number 1 in C1, 2 in C2, 3 in C3, etc.  Here’s a useful hint to speed this up: use =C1+1 for C2 and fill down.

Step Three: In B1 enter how many data points you have. I had 30,000 so I dropped that number in there.

Step Four: For B2 use the formula =C2/$B$1 and fill down. The idea is that you want to divide the value in Column C by the total count. When you get to the bottom of your data the last value should be 1.

Step Five: Now that the data has been prepped highlight all of columns A and B and insert a new Scatter chart. What you’ll get is the monotonically increasing (i.e. always going up) trend line. You can edit the styling so that the graph is a line rather than points and make whatever other changes make it look it pretty. I recommend fixing the vertical axis at 1 (since the data can never go above 100%) and the minimum horizontal axis at 0, if that’s appropriate. Voila, now you have a CDF chart.

But how do CDFs address the histogram’s shortcomings? Did you notice any bins during the calculation? Nope. CDFs use the value of every data point and therefore do away with binning and the loss of information binning introduces. And because there are no bins the shape of the CDF can’t change like it can in a histogram so that source of subjectivity is neatly side-stepped. And you want to know what else is great? It’s easy to compare two data sets with CDFs by simply plotting them on the same chart. Comparing two histograms on the same plot is not particularly enlightening besides noting how the peaks compare.

In conclusion, CDFs are like grilled cheese: great and easy. And like grilled cheese everyone should try one at lunch (or whenever you next do a frequency analysis). The CDF is a great compliment to traditional histogram analysis and will hopefully become a valuable tool in your graphic analysis toolbox.

An olive branch to histograms

Making a histogram is actually more work than creating a CDF. But if you really want to create one you can either use Microsoft’s Analysis ToolPak (what happened to the ‘c’ MS?) or perform some more elaborate Excel JuJu. Good luck.

Being Careful with Averages

In crime analysis we are used to calculating averages and when I say average I mean the “mean”—e.g. add up ten numbers and divide by ten—that kind of average. But in crime analysis we are also frequently interested in examining rates or ratios. For example: how many hours of patrol time are spent responding to calls. When we combine the two there is the potential to be tripped up since, if we are talking ratios, we need to be very careful about how we calculate an average. Depending on the approach we take we’ll get different numbers.

An example will help illustrate the problem.

Let’s say that your CAD system keeps track of how many hours an officer spends on patrol and, of that time, how much of it is spent responding to calls. The following table illustrates a small sample of this kind of data.


From the data in the table I calculated a ‘busy-ness’ ratio for each officer; this is Call Time divided by Patrol Time. From those values I in turn calculated the average Busy-ness, which is 0.51, or 51% busy. But wait, let’s add up all the Patrol Time (78.6 hours) and all the Call Time (40.6 hours) and find the ratio. (40.6/78.6)=0.52 or 52% busy. These numbers are different because of the mathematical principle that says that the ratio of averages (0.52) does not necessarily equal the average of ratios (0.51). But which one is ‘correct’? Well, they both are, and the one you go with all depends on what question you are trying to answer.

For the ratio of averages (0.52) we are asking a question about the entire police service: “how much patrol time did we spend on calls?” For this question 52% is the correct answer because it takes into account the total time spent on patrol and the total time spent on calls. If an officer spends 3 hours on patrol or 12 hours, all of that is taken into account in the final number.

The average of ratios (0.51) is trickier as it answers a question about officers going on patrol: “how much of their patrol can an officer expect to spend on calls?” The difference here arises because the average of ratios treats all values contributing to the mean as having equal weight and this is a fine assumption when the question is focused on how the averages behave. In other words, the length of the shift doesn’t matter, only how much of the shift was spent on calls.

But why am I telling you this since the difference seems to be pretty small? It’s because there are situations where the difference can be significant. For example, look at the modified table below where by adjusting the patrol times I was able to increase the difference between the averages to 4%. Making a 4% mistake on something like officer busy-ness can be the difference between hiring more officers and not. That’s not a mistake I would want hanging around my neck.


The take away from this post is to be mindful when taking averages because so many of the numbers crime analysts look at are ratios. Whether we’re looking at officer busy-ness or per capita car thefts or tickets issued per traffic blitz it is important to understand the question that the average is supposed to answer.

Bonus commentary for pedants
But wait, you might be saying, how is the 0.52 number a ratio of averages? No average was even calculated; we just summed the numbers and divided the Patrol Time by the Call Time. That’s a ratio of sums. Yes, that’s true, but if you had wanted to be a real stickler you could have found the average Patrol Time (78.6/8 = 9.825) and the Call Time (40.6/8 = 5.075) and divided those numbers and arrived at the same answer (5.075/9.825) = 0.52. It works this way because the divisor in both the averages is the same (it’s 8 representing the 8 shifts included in the calculation) and therefore it cancels out when the ratio is taken.