2 Excel Basics
Excel is one of the software programs included with Microsoft Office. As of the most recent publication of this (as well as in the videos), I will be using the most recent version of Excel contained in Office 365 on a PC. As differences between the PC version and Mac version arise, I will do my best to point them out. Please note that older versions of Excel will likely look different, though much of the functionality should remain the same.
Excel is useful to store smaller data sets (it does have a max capacity), to perform simple calculations, and for the creation of some graphs and tables for publication. A particularly useful function with Excel is the responsiveness of graphs to data manipulation - you can see how the graph changes when you change the input in ‘real time’. If you are working with large or particularly complex data sets or want to perform a series of analyses, other programs would be better suited. Additionally, as will be discussed in later chapters, reproducibility and documentation of your work can be a challenge.
2.1 Basic Formatting
As with most documents, cell contents can be bold or italics. They can also be right, left, or center justified, as well as centered at the top, bottom, or middle of the cell. The cell color (background) and text color are also able to be changed to suit your particular needs. Most of these formatting changes are for aesthetic reasons, and do not affect the character of the data. These changes can be applied to a single cell, a whole row or column, or the entire spreadsheet. Formatting options are available on the toolbar, or can be accomplished using some of the keyboard short-cuts listed in the tables below.
To select an entire row, you would click on the row number (or numbers). Similarly, to select an entire column, you would click on the letter of the column(s). Notice how when you hover your mouse over the letters (for columns) or numbers (for rows), it changes to a directional arrow.
Video: The Excel Window
We are able to adjust the height and width of the columns and rows individually, as a selected group, or for the whole sheet. To adjust the dimensions of a single column (or row), hover your mouse over the line between the column you wish to adjust and the one adjacent to it. For example, to make column “B” wider, I would hover my mouse over the line between “B” and “C”. When I do this, my cursor changes to a bold, double-headed arrow, which allows me to click and drag the border of the column to my desired width. The same process works to adjust row height.
You can also make the column/row size jump to the size of the largest entry by double-clicking on the margin between the columns or rows. This can be handy with text entries, to allow the full text to be seen. However, if you have obnoxiously large numbers, it can also be less than helpful.
Video: Adjusting Row and Column Widths
The cell contents can also be formatted with regard to what type of information is contained in each cell. For example, you can select “General” information, which can be a mix of numbers and words, “Number”, which is only numbers, “Percentage” or “Fraction”, which are as the names imply, etc. Excel also tends to interpret number ranges as dates (e.g. an age range of 1-5 may be interpreted as 5-Jan). To avoid this, you should format cells containing number ranges as “Text” rather than “General”.
Occasionally, you may want to merge cells together across consecutive rows or columns. This may be useful when you are trying to create an overall heading that multiple columns fall under: “Year in school” under which falls “Freshman”, “Sophomore”, “Junior”, and “Senior” for example.
Video: Formatting and Merging Cells
2.2 Inserting Rows and Columns
To insert a column, right-clicking on the column header (e.g. “D”) will bring up a menu with some options, one of which is “Insert”. Choosing that will insert a column to the LEFT of the column you initially clicked on. The same process works with rows: right click the row you would like to insert a row ABOVE. You can also click the column or row header, go to the “Cells” group on the “Home” tab, click “Insert” and select “Insert Sheet Columns” or “Insert Sheet Rows”. Only one option will be available at any given time, depending on if you have selected a row or a column header. Lastly, from a single cell you can insert new columns, rows, or a single cell. Right-clicking on a cell and clicking “Insert” will bring up a menu with four options - shift the cells down or to the right, as well as to add an entire row or column. The first two options are inserting a single cell and shifting the rest of the row or column while the last two options are adding a new row or column entirely.
Deleting columns and rows works in much the same way. You can right-click on the column or row you wish to delete and select “Delete” from the menu that comes up. Here, the row or column you have selected will be the one you delete! You can also select a row or column, go to the “Cells” group on the “Home” tab, click “Delete” and select either “Delete Sheet Rows” or “Delete Sheet Columns”. And lastly, you can delete from a single cell: right-click on the cell and click “Delete”. From there, you can select to delete the entire row or column, or just a single cell.
2.3 Keyboard Shortcuts
Occasionally, it is easier to use a keyboard shortcut to accomplish a task (and keep hands on the keyboard) rather than using the mouse to accomplish the task. However, it is never necessary to memorize lists of shortcuts - these are here as a reference. If you find them useful, great! There are some I use every time I use Excel, and others I have listed below that I have never used.
Some document-wide shortcuts (very similar to what you would use in other documents or programs):
Keys | Action |
---|---|
Ctrl + o | Open a Workbook |
Ctrl + s | Save the Workbook |
Ctrl + x | Cut selection |
Ctrl + c | Copy selection |
Ctrl + v | Paste selection |
Ctrl + z | Undo last action |
Ctrl + y | Repeat last action (if possible) |
Ctrl + End | Go to end of file |
Ctrl + Home | Go to beginning of file |
Alt + Page Down | Page Right |
Alt + Page Up | Page Left |
Cell Navigation Shortcuts (More specific to Excel):
Keys | Action |
---|---|
Shift + Tab | Move to the previous cell in a worksheet |
Up arrow | Move one cell up in a worksheet |
Down Arrow | Move one cell down in a worksheet |
Left Arrow | Move one cell left in a worksheet |
Right Arrow | Move one cell right in a worksheet |
Ctrl + Arrow key | Move to the edge of the current data region |
Tab | Move one cell to the right in a worksheet |
Ctrl + ; | Enter the current date |
Ctrl + : | Enter the current time |
Ctrl + ` | Show formulas in cells |
Delete | Remove cell contents |
2.4 Entering Data
When entering in data to Excel, there are two native tools that can make your task much easier: AutoFill and AutoComplete. AutoFill will continue filling in a sequence of numbers, words, or formulas following the established pattern. AutoComplete, on the other hand, will recognize the first few letters of prior entries and suggest to complete the cell with that information. Think of AutoComplete as predictive text on a texting app.
AutoFill can be accomplished by entering in your data (or function), then clicking the cell you wish to extend. You will see a outline with a small square on the lower right corner. Hovering over this square will change your cursor from a thick white ‘plus’ to a thinner, black one. Dragging the box down will extend the data. Of note, if you wish to extend a pattern, you can do that as well. For example, entering in “Red”, “Orange”, “Yellow”, “Green”, “Blue”, and “Purple” in a row, selecting all six cells, then dragging the little box will repeat the pattern of colors for your selected range. You will also see a little pop-up letting you know where you are in the pattern you are extending (i.e., if you wanted to end on “Purple”, wait until you see purple in the pop-up).
Video: Entering Data into Excel
2.5 Sorting Data
We can sort data in text cells from A - Z, or from Z - A using the ‘Filter and Sort’ option. NOTE: you must select all your data prior to sorting, or only the selected column will be sorted, rather than all your data. This will REALLY mess up your data! Excel will typically throw a warning message if you haven’t selected all your data, but you shouldn’t rely on that.
Sort can be found under the ‘Home’ tab -> ‘Editing’ panel -> ‘Filter and Sort’. It can also be found under the ‘Data’ tab -> ‘Sort and Filter’ panel -> ‘Sort’.
You can also sort by multiple data columns. For example, if you wanted to sort first by self-reported gender, then by evaluation score, you would first sort the gender column, followed by the score column. To do this, first, highlight ALL your data. Then, under the ‘Data’ tab -> ‘Sort and Filter’ -> ‘Sort’. Select the first column you want to sort on, and how you want to sort it - Excel will automatically detect if you have a character or numeric column. Then, click the ‘Add level’ button at the top of the window. This will create another sorting argument. Notice how the two sorting rows read ‘Sort by’ [first column] ‘Then by’ [second column]. This means that Excel will first sort by your first choice, then by the second. You can add more than two columns if it makes sense for your data situation.
Video: Sorting Data
2.6 Basic Functions
Excel has a number of different functions that you can use to explore and describe your data, as well as to perform different calculations. Under the ‘Formula’ tab, there is a series of function libraries sorted into categories such as Financial or Math & Trig. The formulas that will likely be most useful to data exploration will be under ‘Math & Trig.’ as well as ‘More Functions’ -> ‘Statistical’.
Functions in Excel, as in all programs, require arguments (i.e., information) be supplied to them. Some functions only take a single value, or a range of values. Other functions need more information supplied. If you’re unsure what arguments need to be supplied to a function, you can click the \(fx\) icon next to the formula bar. This will indicate what arguments the function is expecting.
When you use a function, you would first type “=” followed by the function name (e.g. AVERAGE), parenthesis, the arguments, and a close parenthesis. Using AVERAGE as an example, and saying I wanted to average the numbers in cells A1 through A5, I would type the following: =AVERAGE(A1:A5). Once I hit enter, the cell contents would then reflect the average of those five numbers. Since it is a function, this will not be a static number – if I change the value in A2, for example, the average would also change.
Video: Functions
The tables below have a selection of useful functions; some of these we will discuss further below and others are there for quick reference. They are divided into statistical functions, mathematical functions, and distributional functions. Remember that these are just a selection – functions that I find I use most often. There are more in the aforementioned menus.
Statistical functions:Function | Description |
---|---|
AVERAGE | Finds the average (mean) of the selected values |
MEDIAN | Finds the median of the selected values |
MODE | Finds the mode of the selected values |
MIN | Finds the minimum of the selected values |
MAX | Finds the maximum of the selected values |
STDEV.S | Finds the standard deviation of the selected values, using N-1 in the denominator; sample standard deviation |
VAR.S | Finds the variance of the selected values, using N-1 in the denominator |
CORREL | Calculates the correlation of the selected values |
Function | Description |
---|---|
COUNT | Counts the number of cells that have valid numeric data |
COUNTA | Counts the number of cells that have valid data of any type |
SUM | Adds the selected cells together |
ABS | Finds the absolute value of the cell |
EXP | Calculates the exponential of a given number, using the constant ‘e’ as the base |
SQRT | Calculates the square root of a number |
RAND | Generates a random number between 0 and 1 |
ROUND | Rounds a value to the specified number of decimal places |
Function | Description |
---|---|
KURT | Gives the kurtosis value of a sample |
SKEW | Gives the skew value of a sample |
NORM.DIST | Gives the area to the left of the value in a normal distribution with the given mean and standard deviation |
CHISQ.DIST.RT | Gives the area to the right of a chi-square statistic in a chi-square distribution |
CONFIDENCE.NORM | Gives the margin of error used to construct a confidence interval based on the normal distribution |
2.6.1 Addition and Sum
Frequently, we want to create a total column, which reflects a sum of other columns. We can accomplish this using a function (SUM), or the ‘old-fashioned’ way of manually adding them together. As an example, say we want to sum columns C and D. We can use one of three approaches:
- =C2 + D2
- =SUM(C2:D2)
- =SUM(C2, D2)
Importantly, all the above options work with AutoFill. In other words, you can type your calculation of choice in the first cell, then drag to AutoFill the rest.
2.6.2 Mathematical Equations
It is also possible to type a mathematical equation into a cell in Excel, rather than using a function (we saw that above with SUM). Along with + and -, operators you can use include * (multiplication), / (division), and ^ (exponents). Keep in mind that Excel will follow order of operations (PEMDAS). In other words, Excel performs operations in the following order:
- Parenthesis
- Exponents
- Multiplication & Division (left to right)
- Addition & Subtraction (left to right)
This can be seen in the difference between =5 - 3/2 and =(5 - 3)/2; the first one gives 3.5 while the second gives 1. Additionally, you still need to include the “=” sign in front of your mathematical equation; = 5-3/2 gives you 3.5 while 5-3/2 gives the date 5/3/2002.
2.6.3 Logical Operators
Along with the operators discussed above, you will also encounter situations where you want to use “greater than” or “greater than or equal to” (or less than) in your functions. While ‘greater than’ and ‘less than’ may be intuitive (> and <, respectively), the ‘or equal to’ portion may be less so. To say ‘greater than or equal to’ you would use ‘>=’. Lastly, if you wanted to say ‘not equal to’ you would use ‘<>’. You might want to use this when you run a function on all races except white (race column <> “white”).
2.6.4 AVERAGEIF, SUMIF, COUNTIF
Sometimes we will want to put conditions on our functions. Three examples of this are AVERAGEIF, SUMIF, and COUNTIF. Notice how they are the functions described above (AVERAGE, SUM, and COUNT), but with the addition of ‘IF’ on the end. This is saying to only perform the function on rows that meet a specific criteria (the IF condition). This can be useful if we are creating an average or total column, but plan to later sort our data. Rows B2:B20 may not stay the same after sorting by gender, for example.
These functions take the form (range of cells for IF condition, IF condition, cells to perform function if the IF condition is satisfied). As an example, say we have the following table:A | B | C | D | ||
---|---|---|---|---|---|
1 | ID | Gender | Visual | Social | Total |
2 | 1 | M | 21 | 15 | 40 |
3 | 2 | F | 22 | 19 | 37 |
4 | 3 | M | 23 | 16 | 39 |
5 | 4 | F | 24 | 14 | 38 |
We want to create a total for males, and another for females. We can use SUMIF to accomplish this, and the value will stay the same regardless of any later changes to our data.
For females, the function would look like: =SUMIF(B2:B4, “F”, C2:D5). The function for males is nearly identical, with the switch of the IF condition: =SUMIF(B2:B4, “M”, D2:D5).
We first indicate what the SUMIF function should look at to find the IF condition, then indicate the IF condition (“M” or “F”), followed by the range to perform the function on.
The COUNTIF function can be particularly useful if you want to quickly find out how many records satisfy a particular condition. This takes the arguments (range to count, IF condition to count). Using the example above, if we wanted to know how many cases were female, we would use =COUNTIF(B2:B5, “F”). Perhaps silly on such a little table, but very useful on larger ones.
Video: SUMIF, COUNTIF functions
2.7 Absolute vs. Relative Referencing
Occasionally, you may want to use functions or mathematical equations that consistently reference a cell. As an example, if you are calculating z scores from your data, you will want to consistently reference the row mean and standard deviation, but allow the value of the raw score to shift. Or, if you have a constant entered into a cell for use in an equation, you would not want that to change. Using absolute referencing will keep the cell the same regardless of if you use AutoFill or even copy/paste contents.
Absolute referencing can be applied to a row, column, or both, and is indicated by a ‘$’ before the row or column designation. $A1: Absolute reference of the column - the column will not change, but the row will A$1: Absolute reference of the row - the row will not change, but the column will $A$1: Absolute reference of both the column and the row - neither the column nor the row will change; this is an absolute reference to a cell.
2.8 Transposing Data
You will occasionally need to change the layout of your data. One format is ‘long’ data - this is where each row is a separate entry (e.g. people). However, you may need your data in ‘wide’ format - where each column is a separate entry. This change is very easy to accomplish via copy/paste. Copy the data you need to transpose, move to the sheet where you wish to have your data in a new format, and rather than simply pasting it, right click -> paste options -> (4th option from the left) transpose. This will paste your data in the opposite format. You can go from wide to long, or long to wide.
Video: Transposing data
2.9 More Advanced Functions
It is possible to use more advanced logic in Excel to make examining and navigating your data easier. The IF and LOOKUP functions, discussed below, employ logic to return values based on the arguments input by the user. Other logic functions include AND, NOT, and OR, among others.
2.9.1 IF
The format of the IF function is: =IF(logical test, value if true, value if false). Say, for example, we wanted to sort income levels into High and Low, and use that for further analysis. We could use an IF function to create a new column with “High” if income was greater than 50000 and “Low” if income was lower than 50000. To do this, we would use the function like this: =IF(D1 > 50000, “High”, “Low”). Reading that in ‘regular’ language we would say “If the value in D1 (our income column) is greater than 50000, return High. Otherwise, return Low.”
Or, if you needed to see if a patient had completed a certain number of tests, and you had columns for each test. It would be tedious to manually count for each patient. Instead, you may create a function like =IF(COUNT(A2:F2) > 3, “Complete”, “Incomplete”). Here, we are using a function within the IF function - COUNT - to do the counting of tests completed by the patients. Reading the function, we are saying to first count across columns A through F. Then, if that is greater than three, return “Complete”. Otherwise, return “Incomplete”. This will let us quickly know if patients have completed 4 or more tests.
Video: IF function
2.9.2 Nested IF
It is also possible to nest IF statements - have one IF statement as an argument of another IF statement. Going back to our example above, perhaps rather than “High” and “Low” salary, we would prefer “High”, “Mid”, and “Low”. We could nest two IF statements like so:
=IF(D2 > 50000, IF(D2 > 75000, “High”, “Mid”), “Low”).
Breaking this down, we can examine the inner IF statement first: IF(D2 > 75000, “High”, “Mid”). This says that if the value in D2 is greater than 75000, return “High”, otherwise return “Mid”. However, if we look at the outer IF statement, we see that the inner IF statement is only evaluated if the value in D2 is greater than 50000. The outer IF statement says “If the value in D2 is greater than 50000, go to the second IF statement. Otherwise, return ‘Low’.” In other words, we have set anything less than 50000 as Low, between 50000 and 75000 as Mid, and anything greater than 75000 as High.
2.9.3 LOOKUP
Occasionally, it is easier to create a reference table and use the LOOKUP function rather than a series of nested IF statements. As an example, if the results of a blood test have a number of different potential implications depending on the value returned, you could create a series of nested IF statements. Or, you could create a reference table and use LOOKUP instead.
For example, say we had the following data:
A | B | |
---|---|---|
1 | ID | LDL |
2 | 1 | 124 |
3 | 2 | 198 |
4 | 3 | 98 |
5 | 4 | 125 |
6 | 5 | 174 |
We can see in this minimal example that each patient has an ID number, and an LDL value. We know that the ranges for LDL are <100mg/dl is optimal, 100-129mg/dl is near optimal, 130-159mg/dl is borderline high, 160-189mg/dl is high, and anything over 190mg/dl is very high. We could create a series of nested IF statements to determine where the LDL values fall, but that would get quite cumbersome.
Instead, if we create a reference table, we can use LOOKUP. Our table would now look like this:A | B | C | D | E | |
---|---|---|---|---|---|
1 | ID | Cholesterol | mg/dl | LOOKUP value | Health Result |
2 | 1 | 124 | < 100 | 0 | Optimal |
3 | 2 | 198 | 100-129 | 100 | Near Optimal |
4 | 3 | 98 | 130-159 | 130 | Borderline High |
5 | 4 | 125 | 160-189 | 160 | High |
6 | 5 | 174 | > 190 | 190 | Very High |
Notice in the reference table I have included both the range of values, but then a separate column for LOOKUP values. This is because the function cannot evaluate a range. Instead, it works by finding the largest value in the reference table that is less than or equal to the value you are looking up. In other words, for a LDL of 125, LOOKUP evaluates the reference table by looking for the largest number less than 125: 100 in this case. This is associated with Near Optimal. If we had instead used 129 as the reference table value for Near Optimal and 100 for Optimal, 125 would be incorrectly classified as Optimal.
Video: LOOKUP function