Halton Regional Police Service wins 2016 ComputerWorld Data+ Editors’ Choice Award

In the September issue of ComputerWorld the Halton Regional Police Service was awarded a 2016 Data+ Editors’ Choice Award for their innovative use of data and analytics to improve the effectiveness and efficiency of the police service.

For all the talk of big data and technology, I think this is the most important insight for organizations looking to adopt data analytics:

“Unless people can believe in the product, the solving of the technical challenges doesn’t mean much.”

Read the article: http://www.computerworld.com/article/3117301/data-analytics/halton-regional-police-service.html

Comparing Event Counts with the Poisson Means Test

An exceedingly common task for crime analysts is to count the number of events that occur in one time period (e.g. day, week, month, YTD) and compare them to a prior time period. The change between these two numbers if often the source of much strategy and project-making in law enforcement. What does it mean that we had more motor vehicle collisions this week compared to last? What should we do about the increased number of mental health apprehensions YTD? What are we doing right to make robberies go down this month? These types of questions, and the actions that are taken from the discussions they generate, are the daily bread of law enforcement. Unfortunately, one of the things that is seldom done is to check if the change in the event counts is statistically significant. That is, is there really a difference between these numbers or can we just chalk it up to randomness?

This is where the Poisson Means Test comes in to play. It is a statistical test that can be used to determine if the difference between two counts is statistically significant or not. In prior posts I have written about statistical significance being about deciding between two hypotheses. The first (or null) hypothesis says that there is no difference between the two sets of events. The second hypothesis (the exciting one) says that, yes, these numbers are different enough that we would only expect to see this level of difference by chance every, say, 1 out of 20 times (when dealing with a 95% confidence interval). In this case, what we want to be able to do is pick a confidence interval (I’m sticking with 95%) and look at two different event counts (let’s say robberies YTD this year and last) and say that ‘yes, something is meaningfully different about these two numbers’, according to the statistical test. And we want to be able to do this in Excel.

But before we get into Excel a little background on Poisson. A Poisson process is a process that generates independent events over a period of time. Because the events are independent the time between consecutive events is considered random. It’s a perfect process for modelling things like criminal occurrences because notionally they are unrelated and we can’t predict with specificity when the phone is next going to ring. I’m not going to get into any math in this post as it’s not super important for understanding how to use the test so just know that Poisson is about counting the number of times something happens in a time period.

In order to compare two sets of events we need to define a couple of terms. The first set of events has a total count C1 that occurred over time intervals T1. The count C1 is the total number of events but there is some flexibility in defining the time intervals and that flexibility has to do with the other event count you want to compare. In our example I have the number of robberies in Jan-Feb-Mar of 2015 and 2014. I could say that T1=1 quarter, or I could say T1=3 months or I could say that T1=90 days, it really doesn’t matter because T2, the time for the events counted in C2, is exactly the same, since we’re doing a YTD comparison between two years. However, not all the use cases are like that, it is very possible to compare Jan-Feb-Mar 2015 to April 2015 in which case you’d want T1 to be 3 (months) and T2 to be 1 (month). Basically, you pick the largest common time period you have between the two data sets and used that as the basis for T.

Enough chit-chat, an example will make it things clearer.

poisson_1

I am sure this looks common to everyone: two year’s worth of crime data, the difference in the events and the percent increase. Can you just hear the operations inspector demanding  to know why robberies are up 21% and the detectives scrambling for answers? The problem is that we don’t know if that change is because of anything real or is just due to the randomness of the data. But we can use the Poisson Means Test to help us figure it out and in this example, using the nomenclature above, we have C1=44, T1=1 (because it’s one quarter), C2=56 and T2=1. What we need to do next is send those numbers through a function to tell us whether the difference is statistically significant or not. I sure hope Excel has a function for that … but it does not. However, thanks to some talented programmers I found a solution.

I first learned about the Poisson Means Test from the blog of Evan Miller, a programmer and statistician who has developed a great piece of statistical software for Mac named Wizard. Evan has a blog post on the subject of event counting where he explicitly tells readers to never make decisions on plus/minus differences without first checking the Poisson Means Test. To help facilitate this he has created a great online tool for checking two event counts to see if there is a meaningful difference between the counts.

The online tool works great but I need to be able to run these kinds of tests in bulk in Excel (where all crime data resides) and I wanted to see if I could make Excel do the test. Following Evan’s notes in his blog post and peeking under the hood at his tool’s javascript I discovered that most of the heavy lifting for the tool is performed by a javascript library known as jstat. Jstat provides numerous special functions including the, ahem, ‘Inverse of the Lower Regularized Incomplete Gamma Function’.  I took the time to figure out what that means so you don’t have to and what I learned is that Excel doesn’t offer a function to do it. But, Excel does offer VBA, and the jstat source code is available on GitHub, so I ported all the javascript functions I needed into VBA user defined functions and made them available through the following GIST.

Now, I don’t expect you to understand this code because I don’t understand a lot of it. I know what it is doing but there is a lot of code in here that I suspect comes from some flavor of a Numerical Recipes book. The point is, we don’t need to know how it works, the geniuses at jstat already figured it out, I just need to make sure that it works in VBA and I didn’t mess it up during the translation. I’m pretty confident in my coding after a lot of testing and you can test it if you want by trying Evan’s tool and my Excel macro together and seeing if you get the same results (you will). You can also, if you are adventurous, try out the R library zipfR, which has a function named Rgamma.inv, which does the same thing.

So how do we use the VBA to do something useful? Check the screen shot below where I have added a fifth column to my table named P-value. This is your usual statistical p-value that, for a 95% confidence interval, needs to have a value less than 0.05 in order to indicate statistical significance. As you can see in the image, I have called the custom function PoissonPValue from the VBA code and passed in the four values I identified above: C1, T1, C2, T2. The output of the function call is the value is 0.271, which means that, statistically speaking, there is no difference between 44 and 56, so that 21% increase is nothing to get worked up about (certainly not something to change strategy over).

poisson_2

I know that this is a hard truth to swallow because it runs counter to the thinking of most law enforcement personnel. They want to believe that if a number goes up then something is to blame and that something can be handled to make the number go back down. However, from personal experience I can attest, much of the week-to-week and month-to-month differences are not statistically significant. The reality is, randomness happens, and unless we want to be chasing our tails running after statistical ghosts, we would be well advised to concentrate our efforts on differences that the stats indicate are real.

To wrap up, the Poisson Means Test is an incredibly valuable tool that you should put in your toolbox. It’s not just for your weekly/monthly/annual crime report either. You can use it to test for meaningful differences between any two sets of events. So, for example, say someone ran a project over the last two months and wants to know if you can see an impact in the numbers. If you test the events from a period before against the 2-month project period you can see if there is a stat-sig difference between the counts. Now, that’s not to say that the project is responsible but at least you can demonstrate statistically there is a difference. Anecdotally, I have found new applications for the test every week and I like it because it adds a further layer of rigour to our analysis and I think that’s important. I hope you do too.

Police Analytics in HQ Magazine

I wrote a Police Analytics article for the most recent issue of the OACP’s H.Q. Magazine. I tried to communicate the value of data to police services and how my Service has approached the challenge.

This article was originally published in the WINTER 2014-15 issue of H.Q. Magazine, the official publication of the Ontario Association of Chiefs of Police.

At the beginning of 2014, Halton Regional Police Service (HRPS) created a new unit with a mandate to exploit police data assets to support evidence-based decision making. Combining scientists, programmers and police experts the new unit serves as a clearinghouse for quantitative analysis and report development for customers throughout the Service. From frontline platoons and investigative bureaus to administrative units and the executive branch this unit, named Police Analytics, has changed the way analytic insights are generated and delivered to in-house customers.

Historically, police services have spent considerable resources gathering data; first in the form of paper documents and subsequently in the form of records management systems. The primary function of this data has been to document activity and support investigations but as these data stores have grown in size and complexity the possibility of conducting large-scale analysis to produce efficiencies and insights has been recognized. The challenge is that the data is not in a format conducive to performing analysis but if this challenge could be overcome then tremendous value would be realized.

Business Intelligence and Police Analytics

Recognizing this opportunity, HRPS chose to invest in a Business Intelligence (BI) software system to better exploit these data assets. Business Intelligence is a combination of hardware and software designed to transform existing data resources into a form that is better suited to reporting and analysis. The work is highly technical but the end result is a single database that contains the entirety of a Service’s data. The HRPS IT department, working closely with our BI vendor, spent over 18 months creating a BI database tuned to the Service’s needs and the final result was unparalleled access to our information. But data alone is not enough, you also need skilled analysts who can intelligently and imaginatively utilize that data to produce results, and those skilled analysts work in our new Police Analytics unit.

Police Analytics was envisioned as a different kind of law enforcement analysis; it isn’t crime analysis or intelligence analysis but is instead a data-centric role that provides quantitative analytical products to all levels of the Service. We realized early on that we wanted individuals with a background in math, engineering or the sciences so that they would be capable of performing complex statistical work. Further, the new analysts needed to be highly technical so that they would be comfortable working with databases and writing software to perform their analysis. This combination of skill sets echoes many of the talents of programmers and developers in the world of tech companies and so that was the model we adopted. To get the best people we decided to hire expertise from outside law enforcement and in so doing we created a tech start up inside the Police.

Challenges for a New Unit

Like any start up, there have been growing pains. Our initial conversations focused on where in the organization to position this new unit and while, from a technical perspective, the unit could logically fit in IT, from an analytical perspective it was decided that the unit should fall under the Chief’s Staff. The access provided by working directly for the executive allows the analysts to have a more direct line to senior command—better to communicate analytical findings and field requests—and working alongside other executive units such as planning, audits and policy meant that the analysts could develop a holistic understanding of how data flows throughout the organization. The placement of a highly technical unit outside of the traditional IT infrastructure was a novel undertaking and providing the needed access for police analysts to do their work meant that policies and practices had to be adapted. Consensus was reached through negotiation and collaboration between departments and we were able to ensure data integrity and successfully address security concerns.

The next challenge was one of branding. We had constructed a high-functioning unit that produced useful analysis but we needed the Service to know about it. To address that issue we started an internal campaign advertising ourselves throughout the Service as the source for statistics and analytical products. We positioned ourselves as a resource for Service members of any rank to get data and advice to support their projects and we emphasized the value of having a solid quantitative foundation to produce successful project outcomes.

Evidence-based Decision Making

Our outreach efforts focused on promoting a culture of data-driven, evidence-based decision making and we encouraged Service members to think about data collection and how subtly adjusting business practices could lead to better data which in turn would lead to better analysis. As an example, our computer systems allow officers to push a button every time they change activity but some officers had gotten in the habit of not pushing the button and this lead to data gaps. To address this issue we communicated to officers how consistently updating their status led to much more accurate activity reporting that better captured all of the work they performed throughout the day. When officers discovered the benefits of pushing the buttons, namely credit where credit is due, they modified their behaviour and adopted a data-driven mentality.

We’ve worked hard to change behaviours and clean up our data sets and we’ve started to see rewards for those efforts. One of our successes has been a project that studied call volume and peak period staffing. Calls for service fluctuate throughout the day but our officers work 12-hour shifts and that leads to peaks and valleys in officer busy-ness. By accurately capturing officer status changes we obtain an extremely detailed view of what patrol officers are doing throughout their shifts and with this level of detail it is possible to broadly categorize officer time as either ‘busy’ or ‘available’. For our analysis, we aggregated hundreds of thousands of time-stamped officer activities to construct daily and hourly profiles of when our officers have the heaviest work load and using that data senior commanders are able to construct additional shifts to move officers to busy times. The end result is a reduction in officer busy-ness during peak periods and a levelling of the overall work load across day and night shifts and because the system is data driven we are able to measure the impact of the shift changes and quantitatively demonstrate our success.

Increased Data Awareness

Beyond analytical work the police analytics unit also studies organizational business practices to identify “pain points”. If a business process is cumbersome or time consuming we brainstorm as a unit how to rectify the problem. This practice has led to the development of a number of reporting tools that repackage existing data assets into more useable formats. A few examples include summary reports of field contacts for front line officers, absence management reports for human resources, and occurrence mapping tools for crime analysis. The point of these reports is not that they contain analysis but that they take existing data that is stored in isolation and synthesize it into an easily read report and where before an individual tasked with reviewing these entities may have spent hours clicking through the RMS they can now see everything in one report.

Perhaps our biggest success in this vein is the introduction of our officer activity report that allows Service members to review their activity. Our RMS captures officer activities such as arrests, charges, and tickets and our new reporting tools allow supervisors to review monthly summaries of this information for their units in a web-based, on-demand format. This reporting tool offers many improvements over the old, self-reported monthlies including accuracy, standardization and time savings. This tool has eliminated the need for members to waste time collecting and collating data that has already been captured in our databases and has resulted in a greater awareness, both for members and for command, of officer activity.

Lessons Learned and the Future

With our success establishing the police analytics unit HRPS has learned a number of lessons that may be instructive to other Services looking to create a data-focused unit:

  • You need buy-in from the highest levels of the organization. When attempting to create a data-driven culture of analysis that message needs to be embraced and communicated through management.
  • Success is built around people and not technology. Simply buying a piece of software does not solve your problems; you need a small core of dedicated experts who believe in the mission to find success.
  • Success takes time and progress will be slow. It is not possible to successfully influence the culture of a large organization quickly and that’s doubly true when the changes are related to a complex matter such as data.
  • Change is an iterative process. Only when you start looking will you see how your processes need improvement and once those improvements are in place you’ll need to gather more data and you’ll likely see more necessary changes.
  • The unit needs access to senior management. The police analytics unit needs the latitude to communicate proposed changes to senior officers and receive approvals quickly.

HRSP is proud of what our police analytics unit has accomplished in a short time. We have made meaningful contributions to improving the efficiency and effectiveness of the Service and have successfully promoted the value of data-driven decision making in the organization. We also spent 2014 promoting our vision for police analytics to other Services in Canada as well as at conferences in North America and Europe where our ideas have been enthusiastically received. In 2015 we plan to ramp up our software development and update our web-based tools so that officers in the field can access information on their tablets and phones. The volume of data captured by the police is going to keep growing and specialized systems and analysts are needed to extract value from those assets. We believe Police Analytics is the future of law enforcement analysis and we encourage everyone to consider how it might benefit their organization.

 

Month-over-Month Crime Stats Aren’t Useful (and two alternatives)

Do you prepare a monthly crime report? Good. Do you break down the number of incidents by crime type, for example assaults, break and enters, robberies, etc? Good. Once you have this month’s numbers, do you prepare a plus/minus over last month’s numbers, maybe with a splash of red or green to indicate direction? Pointless. I’ve seen too many crime reports, comp stat presentations, power point slides, what have you, that compare a current month’s crime (or week’s crime) to the prior month (or week). This is not a helpful exercise because crime has, as any large scale analysis of your incidents will indicate, an annual cycle.

The number of crimes reported to the police has three distinctive cycles. The first is daily, you get a different number of crimes being reported depending on the hour of the day. The second is weekly, depending on the day of the week, you’re going to get more calls. The third and final cycle is annual, you get more calls in the summer and fewer in the winter (in the northern hemisphere anyway). The following diagrams illustrate these cycles.

hod

 

dow

moy

Now, I realize that day of week cycle doesn’t look like anything legitimate but keep in mind that the volume of incidents is high enough that the difference is statistically significant. I encourage you to perform an analysis on your own data to demonstrate to yourself that what I’m saying is true.

Because crime has an annual cycle that means that if your crimes bottom out in January and peak in July then every month from January to July, on average, your current month’s stats are going to be higher than the previous month’s. Similarly, after you reach the peak your current month stats are going to be less than the previous months. The problem, of course, is that when these stats are reviewed each month commanders are going to be chewing out underlings and developing strategies to counteract the perceived rise in incidents during the first half of the year and celebrating a job well done in the second half of the year. Of course, this is an exaggeration, cops know as well as anyone that they get busier in the summer than they do in the winter but to the untrained observer (the media, the public, etc) this kind of stat reporting is not helpful.

Fortunately, there are a couple of alternatives:

The first is to compare this month to the same month from the previous year. This is a solid indicator since, barring drastic changes in the long-term trend (which you should quantify!), the numbers should be viable for comparison. The number is viable because it compares two points that are on the same position in the annual crime cycle.

The second alternative is to only compare a year-to-date (YTD) stat and forget the whole month-over-month nonsense. Like option one above, a YTD stat will provide you with a legitimate yardstick to compare one time period to another. The yardstick is legitimate because it takes into account the annual cycle by summing up everything since the beginning of the newest iteration.

So there it is: don’t perform month-over-month stat comparisons. There are two equally easily calculated stats that will fill the spot in the report and provide actual insight. The key is to remember the natural crime cycles.

Year-over-year Crime Stat Reporting with ISO Week Date

Does your organization report a weekly crime stat? Is part of that report a measure of how the week compares to the same week the year before? If so, it’s important that you understand how ISO week dates work so that your report offers an accurate comparison between this year and prior years. In this post I am going to first discuss what an ISO week date is and then I am going to explain how it helps create better crime stat reports.

First, what is an ISO week date? Hopefully you’re familiar with the International Organization for Standardization (ISO). They publish a lot of standards and ISO 8601 is the standard that deals with a calendar system that gives each week of the year a number. For example, this article was written on October 21, 2012, which, in ISO date week notation, is written 2012 W42 4. Breaking the date down: the first number is the year, the second number is the week of the year (42 in this case) and the third number is the day of the week (the 4th day is a Thursday because the system states that Monday is the first day of the week). Most of the time a year has 52 weeks, sometimes it has 53 weeks to handle leap years. According to Wikipedia the system is most often used in government and business for keeping fiscal years.

This is pretty straightforward to understand but the tricky parts comes when determining the first week of the year as the first week introduces slight discrepancies between the ISO system and the traditional Gregorian calendar that people are used too. To wit: ISO 8601 defines the first week as the week with the year’s first Thursday in it. Using 2012 as an example, the first Thursday of 2012 was January 5th and, recalling that the ISO week begins on Monday, this means that the first day of the ISO year, that is 2012 W01 1, was January 2nd, 2012. But what about January 1st, 2012? It was actually the last part of week 52 of 2011 or 2011 W52 7. Yes, I know, it’s weird, but I think its utility outweighs its weirdness.

If you’ve made it this far in the post you’re probably thinking: “This seems overly complicated, why should this concern someone reporting crime stats?” A valid question. Consider a weekly crime report that covers October 8th through October 14th from 0000 hours to midnight. That’s a whole week and conveniently it is also week 41 of 2012. Now let’s say that for comparison our theoretical crime report also tabulates the stats for October 8th through October 14th for 2011. Is there a problem with this? Does it matter that the 2011 numbers run from October 8th (a Saturday in 2011) to October 14th (a Friday in 2011) instead of Monday to Sunday like it does in 2012? Is it enough that we capture one of each weekday in a weekly crime report? Is it relevant that the Saturday from 2011 is part of ISO week 40 while the Saturday from 2012 is from week 41?

I think this last point is key. Calls for service have a definite seasonal trend which means that for any particular day of the week, Saturdays for example, the calls for service will vary according to the week of the year. This means that, historically speaking, the number of calls for service on the Saturday in week 41 are likely going to be consistently different than the number of calls for service on the Saturday in week 40 and the same goes for every other day of the week. Basically, days are not created equal and that means that if you want to compare apples to apples for crime reports you should really compare equivalent time periods. The easiest way to do this is, you guessed it, by using ISO week dates and comparing week 41 from 2012 to the week 41 from 2011. In our example, compare October 8th through October 15th 2012 to October 10th to October 16th 2011.

But is this reasonable? Why is the week the unit of analysis here? In non-leap years October 8th through 14th is always the 281st through 287th day of the year, isn’t there consistency in that that gets thrown out if we adopt the weekly approach? Perhaps but I don’t think day-of-year consistency is more useful than the weekly one. Most analysts recognize that calls for service have a distinct weekly pattern—there are more calls for service on Friday and Saturday then other days of the week—that is imposed upon the larger seasonal trend. By adopting the ISO week date system we can align reporting with this natural frequency found in the crime data and by locking into the frequency we can exploit it to make comparisons easier.

So what’s the bottom line? Because calls for service show a seasonal trend you should endeavor to compare the same weeks when performing a year over year analysis. And because calls for service show a weekly trend it makes sense to lock your reporting to a standard week that makes comparisons easy and straightforward.  The best way to accomplish both of these goals is to adopt the established ISO week date system for weekly crime reports.