4 Advanced Excel
4.1 Advanced Functions
4.1.1 COUNTIFS, SUMIFS, AVERAGEIFS
When we used COUNTIF, SUMIF, and AVERAGEIF in a prior chapter, we were counting/summing/averaging numbers that fit a single condition (Averaging scores of only females, for example). We can add another layer (or more!) to this using the COUNTIFS, SUMIFS, and AVERAGEIFS functions. These accept multiple conditions as arguments.
Using the following example table, we will look at how these functions work.
| A | B | C | D | E | F | 
|---|---|---|---|---|---|
| 1 | ID | Gender | Facility | Social Score | Activity Level | 
| 2 | 1 | F | Ind Liv | 12 | 5 | 
| 3 | 2 | M | Ind Liv | 14 | 4 | 
| 4 | 3 | F | Ind Liv | 17 | 8 | 
| 5 | 4 | F | Grp Home | 11 | 3 | 
| 6 | 5 | M | Grp Home | 19 | 7 | 
| 7 | 6 | M | Grp Home | 13 | 4 | 
| 8 | 7 | F | Grp Home | 15 | 6 | 
We can see we have binary gender reported, along with living facility. We also have two scores: social and activity level. Using AVERAGEIF, we could get an average social score for males, or for those living in a group home. Using AVERAGEIFS, we can get an average social score for males who live in a group home.
The function takes the following arguments: =AVERAGEIFS(range_to_average, criteria1_range, criteria1_value, criteria2_range, criteria2_value). If we wanted more criteria, we would just continue the pattern.
Using our example, we could find the average social score for males living in a group home with =AVERAGEIFS(E2:E8, C2:C8, “M”, D2:D8, “Grp Home”), giving us 16. We could also count how many males were in a group home with =COUNTIFS(C2:C8, “M”, D2:D8, “Grp Home”).
Video: COUNTIFS, SUMIFS, AVERAGEIFS
4.1.2 VLOOKUP
While perhaps less for data wrangling and more for data investigation, this is still a useful function. VLOOKUP is being replaced by XLOOKUP, but given the XLOOKUP function is only available in versions of Excel released after 2021 and in Microsoft 365, we will continue to use it for compatibility.
VLOOKUP takes the following arguments: (known_value, table_range, column_index_number, exact(false)_or_approximate(true)_match). The known_value is what you want it to search - you know this. the table_range is the entire range of cells you want it to search. Since this range must include both the known value and the return value, it is often easiest to select the whole table. The column_index_number is which column you want to search from the selected range, where 1 is the left most column of that range. Lastly, the last argument is a TRUE/FALSE option, based on if you want an approximate (TRUE) or exact (FALSE) match.
Using the small table below - current as of the 2024 Olympics - we can see how this function works. Of note: we wouldn’t use XLOOKUP for such a small table. This is simply to illustrate how the function works.
| A | B | C | D | E | 
|---|---|---|---|---|
| 1 | Athlete | Sport | At least one gold? | Country | 
| 2 | Katie Ledecky | Swimming | Yes | USA | 
| 3 | Simone Biles | Gymnastics | Yes | USA | 
| 4 | Ilona Maher | Rugby | No | USA | 
| 5 | Ariarne Titmus | Swimming | Yes | Aus | 
| 6 | Rebeca Andrade | Gymnastics | Yes | Brazil | 
| 7 | Coco Yoshizawa | Street Skateboarding | Yes | Japan | 
Let’s say we wanted to see if Rebeca Andrade has won a gold medal. We can use VLOOKUP, with the following arguments: =VLOOKUP(“Rebeca Andrade”, B3:E8, 3, FALSE). From this, we see that she has won a gold medal.
If you have a compatible version of Excel, you can use XLOOKUP: =XLOOKUP(“Rebeca Andrade”, A2:A7, D2:D7). A nice advantage of XLOOKUP is that you can return multiple columns. So, if we wanted to know if she had won a gold medal and what country she was from, we would use =XLOOKUP(“Rebeca Andrade”, B2:B7, D2:E7).
Video: VLOOKUP and XLOOKUP
4.1.3 CONCATENATE
The CONCAT() [CONCATENATE() in older versions of Excel]function will allow you to combine two or more cell values into one. For example, if you had ‘Last Name’ and ‘First Name’ and wanted to combine them into ‘Name’ you could use this function.
As an example, consider the table below:
| A | B | C | D | 
|---|---|---|---|
| 1 | First Name | Last Name | Name | 
| 2 | Katie | Ledecky | |
| 3 | Simone | Biles | |
| 4 | Ilona | Maher | |
| 5 | Ariane | Titmus | |
| 6 | Rebeca | Andrade | |
| 7 | Coco | Yoshizawa | 
To create the ‘Name’ column, we would use =CONCAT(B2, ” “, C2). Notice how we had to indicate we wanted a space between the first and last name with” “. If we did not include that, it would have resulted in ‘KatieLedecky’ rather than ‘Katie Ledecky’.
If we had wanted the name in ‘Lastname, Firstname’ format, we would have simply used =CONCAT(C2, “,”, B2). This time our separator is a comma and a space (“,”) rather than just a space.
Video: Concatenate
4.1.4 IF(AND)
Building on prior IF functions, we can combine IF() with AND(), OR(), and NOT() functions. Like the basic IF function, the conditions are first evaluated, followed by the value to provide if true, then the value to provide if false. However, we are combining the IF() function with the ability to test multiple conditions.
For example, If we have a data set with age (column A) and mobility score (Column B), we might generate a statement to suggest activities with =IF(AND(A2 > 65, B2 = “High”), Bowling, Bingo). This is saying that if an individual is both over 65 (A2 > 65) AND their mobility score is ‘high’ (B2 = “High”), then bowling is a suggested activity. Otherwise bingo is a suggested activity. Notice with AND, both conditions have to be met for the first IF value to be returned. If one, or both, are not met, then the second IF value will be returned.
If we just needed one of the conditions to be met, we could change AND() to OR(): =IF(OR(A2 > 65, B2 = “High”), Bowling, Bingo). Here, if an individual is either over 65 or has a high mobility score, Bowling will be returned. If neither condition is met, Bingo will be returned.
Lastly, if we only want bowling returned for individuals who do not have ‘low’ mobility, and bingo returned for individuals with low mobility, we could use =IF(NOT(B2 = “Low”), Bowling, Bingo).
Video: IF(AND)
4.2 Sum a Series of Numbers in a Cell
The SUM function works well when you have a single number per cell. However, if you have a series of numbers in each cell (e.g.: 14, 85, 65, 23, 50), the SUM function will not work to sum all the numbers in the cell. However, combining SUM with VALUE and TEXTSPLIT will allow you to sum all the numbers within the cell. NOTE: TEXTSPLIT is only available in newer versions of Excel (i.e. Microsoft Office 365).
The formula will take the form =SUM(VALUE(TEXTSPLIT(cell, delimiter))) where “cell” is the cell you are summing (e.g. A1) and “delimiter” is what separates the values in the cell (e.g. “,” or “;”).
To break this down:
=SUM(VALUE(TEXTSPLIT(cell, delimiter))) - This is specifying which cell to use, and what type of delimiter (separator) is between the numbers
=SUM(VALUE(TEXTSPLIT(cell, delimiter))) - The TEXTSPLIT function is splitting the text in the specified cell based on the specified delimiter. If the cell contains (14, 85, 65), it will produce an array containing the individual numbers
=SUM(VALUE(TEXTSPLIT(cell, delimiter))) - The VALUE function will convert text strings to numeric values. This is useful if the TEXTSPLIT function turned your series of numbers into what Excel is interpreting as text.
=SUM(VALUE(TEXTSPLIT(cell, delimiter))) - Lastly, the SUM function is adding up all the values resulting from the prior steps.
Video: Sum a series of numbers
4.3 Separate Text
Sometimes, your data may contain a column for “Name” that has both the first and last name in the same cell. However, you need your data to have a single column for last name, and a separate column for first name. Or you might have “city, state” in a cell but would really like to sort by state.
As a small example, say you have the following table of names that you’d like to separate:
| Name | First | Last | 
|---|---|---|
| Katie Ledecky | ||
| Simone Biles | ||
| Ilona Maher | 
You can see that there is a column for ‘Name’, and I have created two empty columns for first name (First) and last name (Last). To get the first name, we would use the formula =LEFT(A2, FIND(” “, A2)-1). NOTE: There IS a space between the sets of quotation marks. This is important.
Breaking the formula down, starting from the inside:
- =LEFT(A2, FIND(” “, A2)-1) - This is using the FIND function to look for the first space (” “) it finds within the text contained in A2. It will then count the number of characters it took to get there. Using our example of”Katie Ledecky”, the space is between ‘Katie’ and ‘Ledecky’ and the space is the 6th character. FIND() takes the arguments =FIND(thing_to_find, text_to_search). 
- =LEFT(A2, FIND(” “, A2)-1) - This is subtracting one from the value calculated in step 1: 6 - 1 = 5 
- =LEFT(A2, FIND(” “, A2)-1) - This is taking the leftmost characters from the text in A2 up to the number calculated in step 2. LEFT() takes the arguments =LEFT(text, number_of_characters) 
We can then use AutoFill to complete the rest of the cells, making our example table look as below:
| Name | First | Last | 
|---|---|---|
| Katie Ledecky | Katie | |
| Simone Biles | Simone | |
| Ilona Maher | Ilona | 
To get the last name, we would use this formula: =RIGHT(A2, LEN(A2) - FIND(” “, A2)). This is similar to the formula used to get the first name, with a few changes.
- =RIGHT(A2, LEN(A2) - FIND(” “, A2)) - This is again finding the first space (” “) in A2 and calculating what character it is. 
- =RIGHT(A2,LEN(A2) - FIND(” “, A2)) - This is calculating the total number of characters in the text of A2.”Katie Ledecky” would return 13 characters. 
- =RIGHT(A2, LEN(A2) - FIND(” “, A2)) - This is using the values calculated in steps 1 and 2 above to subtract the position of the first space from the total length. Using our example, this would be 13 - 6. 
- =RIGHT(A2, LEN(A2) - FIND(” “, A2)) - Finally, this takes the number calculated in step 3 and gets all the characters to the right of that character location. As with LEFT(), the RIGHT() function takes the arguments =RIGHT(text, number_of_characters). 
We can again use AutoFill to complete the rest of the cells, completing our table:
| Name | First | Last | 
|---|---|---|
| Ledecky, Katie | Katie | Ledecky | 
| Biles, Simone | Simone | Biles | 
| Maher, Ilona | Ilona | Maher | 
We can apply similar formulas if we have names in the format of “Last name, First name”, as seen in our (modified) example table below:
| Name | First | Last | 
|---|---|---|
| Ledecky, Katie | ||
| Biles, Simone | ||
| Maher, Ilona | 
This time, notice how the names are separated by a comma, not a space. We will then be looking for “,” rather than ” ” in our FIND() functions.
We will get the first name using =RIGHT(A2, LEN(A2) - FIND(“,”, A2)) and the last name using =LEFT(A2, FIND(“,”, A2) - 1). Fast-forwarding a bit, we can complete our table again.
| Name | First | Last | 
|---|---|---|
| Ledecky, Katie | Katie | Ledecky | 
| Biles, Simone | Simone | Biles | 
| Maher, Ilona | Ilona | Maher | 
Video: Separating text
4.4 PivotTables
PivotTables allow us to group and examine our data for trends. Many times, data is in ‘wide’ format, where each variable is a separate column and each row is a single individual. This can make it challenging to see if trends on an outcome are different for different demographic variables, for example. In order to use a PivotTable, you need to have one or more grouping variables that you will ‘pivot’ on. These variables will be categorical in nature (e.g. Male/Female, Mild/Moderate/Severe, etc.). An important data-cleaning aspect: your categories must be consistent. If you have Moderate and Mod as indicators of ‘moderate’, they will be two different categories rather than one.
As an example, let’s say we have the data below:
| A | B | C | D | E | F | 
|---|---|---|---|---|---|
| 1 | ID | Gender | State | Interview? | Social_Score | 
| 2 | 101 | M | VA | Yes | 24 | 
| 3 | 102 | F | VA | Yes | 30 | 
| 4 | 103 | F | VA | No | 51 | 
| 5 | 104 | F | VA | No | 17 | 
| 6 | 105 | M | VA | No | 52 | 
| 7 | 106 | M | VA | Yes | 44 | 
| 8 | 107 | F | VA | Yes | 17 | 
| 9 | 108 | M | VA | Yes | 31 | 
| 10 | 109 | M | VA | No | 30 | 
| 11 | 110 | F | VA | Yes | 25 | 
| 12 | 111 | F | NC | Yes | 43 | 
| 13 | 112 | F | NC | No | 20 | 
| 14 | 113 | M | NC | No | 7 | 
| 15 | 114 | F | NC | No | 37 | 
| 16 | 115 | M | NC | Yes | 50 | 
| 17 | 116 | M | NC | No | 7 | 
| 18 | 117 | M | NC | Yes | 16 | 
| 19 | 118 | M | NC | No | 47 | 
| 20 | 119 | F | NC | Yes | 54 | 
We want to see the number of people, first grouped by state, then by who has and has not had an interview. The video will show these steps in more detail, but your basic steps are:
- Start the pivot table: Insert tab -> PivotTable
- Verify data range and PivotTable output. Click OK.
 
- Select your grouping variables under PivotTable Fields. For our example, we will select State and Interview?. Notice that the order in which you select your grouping variables will influence how your data is grouped. We wanted it to be first grouped by state, then by interview, so we first selected state, then interview. If we selected in the opposite order, we would have data first grouped by interview, then by state.
- Select your ‘calculation’ variable - what do you want to perform a calculation on? Remember, that count is a calculation, so in our case, our ‘calculation variable’ is ID. We’re not summing or averaging ID numbers, but we do want a count of individuals. We get this by clicking ‘Values’ under our PivotTable Fields.
- Adjust calculation variable (if necessary) - Excel assumed we wanted a sum of our IDs. Click the drop-down arrow next to Sum of ID -> Value Field Settings -> Change ‘Sum’ to ‘Count’ -> OK
- Format PivotTable (if desired) - PivotTable Tools section -> Design tab -> Select style of choice. Under this tab you can also select where (and if) subtotals, grand totals, extra rows, etc. are shown.
This results in the following Pivot Table:
| Row Labels | Count of ID | 
|---|---|
| NC | 9 | 
| No | 5 | 
| Yes | 4 | 
| VA | 10 | 
| No | 4 | 
| Yes | 6 | 
| Grand Total | 19 | 
Video: PivotTables
4.4.1 Potential Issues with PivotTables
Create the Pivot Table as described above. Now, double-click on the “5” next to “No” in under NC. What happens? Important to consider when making PivotTables is that the underlying (potentially sensitive) data is still accessible to you, and others, who wish to see it.
But what if you wanted to share your PivotTable without sharing the underlying data? One way to get around this is to copy your PivotTable -> open a blank workbook -> Right click -> paste special -> Select ‘Values’ -> OK -> Right click (again) -> paste special (again) -> Select ‘Formats’ -> OK. This will first only paste the numbers (not the underlying data) from your PivotTable, then paste the special formatting you spent time on.
Video: Sharing PivotTables
4.4.2 Changing the PivotTable
If you decide you want to sort your data a different way, you do not need to make a whole new PivotTable; you can adjust the one you have already made.
Under the PivotTable Tools section -> Analyze -> Show -> Field List will bring up your field selection. If we no longer wanted a count but an average social score for our groupings, we can change the Values variable. We would unselect ID, and select instead Social_Score. Remember to adjust what calculation is being performed!
Video: Changing data selection
4.4.3 PivotCharts
It is possible to create a PivotChart that reflects the data and grouping of your PivotTable, as well as being able to dynamically change with your PivotTable. In other words, if you change what variables you are examining, the chart will also change.
To do this, go to the Insert tab -> Charts section -> PivotChart -> PivotChart. This will bring up a menu for you to select what type of chart you would like to make, much like the ones we explored earlier. Remember: good data visualization rules still apply here!
Video: PivotCharts