VLOOKUP is one of those functions that causes a lot of grief for analysts. I think it’s because the function is kind of complex to implement and most people don’t use it often enough to commit to memory exactly how it works. In this blog post I am going to explain exactly how to use VLOOKUP with the aid of two analyst-orientated examples.
An Introductory Example
At its core VLOOKUP (the V stands for vertical by the way) is used for looking up one value based on another … vertically. To demonstrate what I mean have a look at the screen shot below in which a list of Universal Crime Report (UCR) codes (column A) is listed side-by-side with the description of the crime (column B).
Together column A and column B constitute a “look up table” because you can use a known value in column A to find the corresponding value in column B (and vice versa). But why do we need look up tables? We need them because it’s likely that when you export a list of crime stats from your database that you’ll see the codes rather than the descriptions. And while the UCR code is shorter and easier to manage there will eventually come a time when you want to replace the code “1110” with the friendlier “Murder 1st Degree” (maybe when you are preparing a final report to send around) and that’s when you use VLOOKUP to look up the description based on the code
The key thing to remember about VLOOKUP is that when you’re using it you always have two different sets of data: the first is the data you are working on and the second is the lookup table. The screen shot below shows a sheet with both sets of data; the working data is on the left and the lookup table is on the right in grey. Notice that the data being worked on has an occurrence number and a UCR code but no Crime.
But how to use it? Have a look at the above screenshot again, specifically cell C2, and note that the function in that cell is =VLOOKUP(B2, $D$1:$E$16,2,FALSE). Let’s break that function down into pieces.
- For the first input to VLOOOKUP, we’re instructing Excel to use the value in cell B2 (the known UCR code) to find the description to put into cell C2.
- For the second input we’re telling VLOOKUP to look for the value of B2 in the cell range $F$1:$G:$16 which is the entire range occupied by the lookup table. (As an aside, you’ll notice that the range values all have dollar signs in front of each letter and number. This is because when we fill this function down we don’t Excel to act ‘intelligently’ and start shifting the references. Using the dollar sign locks the reference so that no matter how we fill it, it will always point to the same set of cells.)
- For the third input we use the value 2. Notice that our look up table has only two columns (D and E) in this example, one that is the code and one that is the crime. Since we already know the code (from column B) what we want is the description and since the description is in the second of the two columns in the lookup table we specify that we want the result to come from column 2.
- The final input to the function is ‘FALSE’. By entering FALSE we are telling Excel that we want an exact match. That means that if Excel can’t find UCR code 1110 it’s going to return #N/A. There are situations where you might not want exact matches but they are less common and sticking with FALSE will usually get you what you want 90% of the time.
With the function input it’s just a matter of filling the function down to populate all of the cells in column C with the crime description based on the UCR code. If you click on any of the other cells in column C, as I did in the example below, you’ll see that the lookup table, locked as it is by the dollar sign $ notation, is always referencing the same range of cells. I bring this up again because the VLOOKUP mistake I see most often is that people don’t lock the range and get weird results when they fill the formula down. Make sure to use those dollar signs!
A More Advanced Example
While the example above is the most common usage of VLOOKUP I frequently use it for another purpose: to compare two lists of things (e.g. occurrence numbers, suspect names) to determine if the lists are the same and highlight those elements that are different. The following screen shot demonstrates how I lay out my EXCEL file to perform this work.
The two columns of occurrence numbers in column A and C are almost the same and in fact they only differ by 3 values. To identify those three values we can use VLOOKUP to look up one set of occurrence numbers against the other and see what happens. The above screen shot shows the function in cell B5 to be =VLOOKUP(A5, $C2:$C$20, 1, FALSE). Just like before we are checking each value against a table of values but this time it just so happens that the lookup table only has one column (instead of the two above) and that’s why we have a “1” as the third input.
If you fill the function down the occurrence numbers get filled in and where VLOOKUP can’t find a match for the cell in column A in column C Excel outputs #N/A. That last sentence may seem overly precise but it is meant to illustrate an important point: the VLOOKUP detailed above is only checking the values in A against the look up table and therefore will only catch those values in A that aren’t in C. To get the full picture you also need to perform the reverse lookup, that is, catch the values that are in C but not in A. To do that you can put the following function call =VLOOKUP(C2, $A2:$A$20, 1, FALSE) in cell D2 and fill down. Now, just like before, we’ll see matching occurrence numbers if the values are in both columns and #N/As for those values that are in C but not in A. Taken together these two VLOOKUPs will help identify how the two lists differ.
Getting Even More Advanced
One more thing, I really don’t like Excel’s blaring #N/A error value. It’s ugly. Also, I don’t know about you, but I don’t really care about when the occurrence numbers match, I’m more interested when they don’t match and I’d prefer not to see anything for the positive matches. Let’s see what we can do about fixing this up.
To accomplish this, we’ll need to use the VLOOKUP function in conjunction with another Excel function, ISERROR, that we haven’t talked about. I’m not going to pretend this isn’t getting complicated, because it is, but I’m doing it anyway. In cell B2 I replace the old VLOOKUP function call with this new one:
=IF(ISERROR(VLOOKUP(A2, $C$2:$C$20,1,FALSE)), “Diff”, “”)
I know that looks intimidating but let’s break it down. The part about VLOOKUP we already know, it’s the same as above, but what we’re doing now is feeding the output of VLOOKUP to the function ISERROR. ISERROR’s only purpose is to return a TRUE or FALSE based on whether an error occurs within the VLOOKUP and, as we know, 3 errors occur, one for the each of those #N/As. For each of those three #N/As the ISERROR function will return TRUE and for all the rest it will return FALSE. This behaviour is very handy because the IF function checks for TRUE or FALSE when trying to figure out what to do. In this case, if it gets a TRUE from ISERROR (that is, it’s true that there was an error) then it prints the word “DIFF” but if it gets a FALSE, it doesn’t print anything because there was no error.
I know that’s a brainful but have a look at the screen shot below and you can see that each of the cells that was previously an #N/A is now a “DIFF”, because those are the ones that caused a VLOOKUP error that in turn caused ISERROR to be TRUE which caused the IF function to print the value for TRUE. All the rest, the ones without VLOOKUP errors, display nothing, just as I wanted.
Phew, that’s a lot of Excel-fu for a post that started up just about VLOOKUP. I hope the post (or at least the first two parts) give you the confidence to use VLOOKUP more often in your daily work. And I hope that the third part, as intimidating as it looks (but it’s really not, just follow the logic of the TRUEs and FALSEs) will give you ideas about how you can use the function to accomplish more complex tasks. Good luck!