3 Graphing in Excel
Excel offers a built-in way to visualize your data. Remember when creating any of these visualizations to keep in mind the best practices discussed during the Data Visualization module. The default setting of visualizations in Excel are frequently too busy (grid lines, etc.) and you will likely want to do some editing to the graph before you would include it in a publication or presentation.
Conveniently, there is a wide variety of graph types to choose from in Excel. We will be discussing three types below: Bar charts, line graphs, and scatter plots. Notice we are not discussing pie charts - look back to the Data Visualization module if you are curious as to why!
Note: Much of the information for this portion will be on the associated videos. Some basic information will be included below as well as brief steps to take. However, given the nature of this information, please do reference the videos for a more thorough walk-through.
3.1 Bar Charts
Bar charts are used with count-type data - number of people who responded Urban vs. Suburban vs. Rural, for example. They can also be used to express proportions (e.g. proportion of seniors who engage in different types of social activities). You may need to create separate columns in your data for your bar chart. The COUNTIF function can come in handy here!
Basic steps to creating a bar chart:
- Decide if you want to sort your data. If it would make your message more clear to have the largest category first, and the rest in decreasing order, sort your data from highest to lowest. Other times it may not make sense to sort your data.
- Select the cells you wish to use to make the bar chart
- Insert tab -> Charts section -> Bar chart icon (“Insert Column or Bar Chart) -> 2D Column (for vertical bars) or 2D Bar (for horizontal bars)
- Tidy up the chart: click on the gridlines (a dot will appear at each end of all the gridlines) and hit ‘delete’
- Edit title (if necessary)
Video: Bar Chart
3.1.1 Clustered Bar Chart
Sometimes you may have multiple pieces of data for each category. This is common in longitudinal studies (e.g. perceptions of a topic over five years). A bar chart would still be appropriate, but you would use a clustered bar chart, where a series of bars are present in a particular category.
For example, say you were looking at nurse satisfaction in 2019, 2020, 2021, and 2022 and one category was ‘number of patients’. That category would have four bars, each a different color, for each of the years. It is very important with these types of bar charts to select colors carefully to be as accessible as possible.
Basic steps to creating a clustered bar chart:
- Sort your data - often for these types of graph, a horizontal orientation is visually more clear. Additionally, sorting your data to create some sort of hierarchy can aid in interpretation (If possible).
- Select the cells you wish to use to make the clustered bar chart
- Insert tab -> Charts section -> Bar chart icon (“Insert Column or Bar Chart) -> 2D Column (for vertical bars) or 2D Bar (for horizontal bars)
- Change bar colors if necessary: Left click on a bar to select it -> Right click -> Change color under “Fill” and/or “Outline” to desired color. You can also select the graph, go under the Chart Tools tab -> Design -> Change colors -> Select one of the pre-designed themes you prefer
- Tidy up the chart: click on the gridlines (a dot will appear at each end of all the gridlines) and hit ‘delete’
- Format legend (if necessary): Left click to select legend -> Right click -> Format legend -> Select location and other options
- Edit title (if necessary)
- Add data labels: Chart Tools tab -> Design -> Add Chart Element -> Data Labels -> Outside end. These are sometimes helpful, sometimes distracting. If you only want certain labels, you can either individually delete the labels you do not want (click twice on the label you wish to delete) or you can create a second table with the values you wish to substitute in, leaving values you wish to exclude blank, and reference these by series. You would do this by single clicking one of the data labels (notice how the label for that category is selected across all other categories) -> Chart Tools tab -> Design -> Add Chart Element -> Data Labels -> More Data Label options -> Uncheck “Value” -> Check “Value From Cells” -> Select the data from the second table you wish to substitute
video: Clustered Bar Chart
3.1.2 Stacked Bar Chart
A stacked bar chart is taking the idea of a clustered bar chart, but is used when percentages sum to 100% (e.g. in opinion scales). There is a single bar for a category (e.g. Nurses reported satisfaction), but it has different colors within it representing different answer choices (e.g. Very satisfied, satisfied, unsatisfied, very unsatisfied).
Creating a stacked bar chart is very similar to creating a bar chart: Select your data -> Insert tab -> Charts section -> Insert Column or Bar Chart -> More Column Charts -> Choose the Icon at the top: Stacked Bar or 100% Stacked Bar.
You would select 100% Stacked bar when your data adds to 100% - this is also the more interpretable version. The ‘plain’ version can get misleading, so use with caution.
Another consideration with these comes in color choice. You want viewers to be able to determine where one section ends and the next starts.
Video: Stacked Bar Chart
3.1.3 Diverging Stacked Bar Chart
A diverging stacked bar chart is used with horizontal bars, and rather than having them all start even on the Y axis, they are instead aligned along a vertical line. Creating a diverging stacked bar chart requires you to create some buffer columns depending on where you want the alignment to occur. The left buffer column and the answer choices you want to fall to the left of the divergence line should sum to 100%; same with the right buffer column and the answer choices falling to the right of the divergence line.
Basic steps to creating a diverging stacked bar chart:
- Decide where you want your bars centered
- Create your buffer columns and fill appropriately
- Create the chart: Insert tab -> Charts section -> Insert Column or Bar Chart -> More Column Charts -> Choose 100% Stacked Bar
- Change colors, label categories with text boxes, add vertical line
Video: Diverging Stacked Bar Chart
3.2 Line Graph
Line graphs are used when you are displaying how the value of one variable changes across different levels of another, or how a statistical value changes across levels of another variable (e.g. attitudes towards HIV over time).
Basic steps to creating a line graph:
- Select data
- Insert tab -> Charts section -> Insert Line or Area Chart -> Line
- Get rid of chart border, gridlines, y axis line, etc.
- Create descriptive graph title
- Decide on legend placement (if applicable)
Video: Line Graphs
3.3 Scatterplots
Scatterplots are used to show the relationship between two continuous variables. From them, you can obtain trendlines and
Basic steps to creating a scatterplot:
- Select data
- Insert tab -> Charts section -> Insert Scatter (X,Y) or Bubble Chart -> Option on the top left
- Change axis range if it makes sense to do so
- Eliminate grid lines
- Add descriptive title
- Add trendline,
value - Make any color, shape, line edits desired
Video: Scatterplots
3.4 Scatter Plots by Group
It is also possible to make scatterplots where each group has their own trendline,
Basic steps to creating a scatterplot by group:
- Sort your data, if necessary
- Insert tab -> Charts section -> Insert Scatter (X,Y) or Bubble Chart -> Option on the top left
- Chart Design -> Data section -> Select data -> Legend Entries (Series) -> Add
- Select appropriate data for your first group, repeat for subsequent groups
- Add trendline,
, label for each group - Make any style changes - make sure trendlines are of different styles!
- Change dot shapes for each group
- Add descriptive title
Video: Scatterplots by group
3.5 Confidence Intervals
When reporting means, it is good practice to include 95% confidence intervals (CIs). Adding these informs viewers about the amount of sampling error associated with our sample estimate of the population parameter - how accurate we are in our estimates. Larger confidence intervals indicate more error (and perhaps less confidence) while narrower confidence intervals indicate less error (and more confidence).
To plot confidence intervals, you will need to first calculate their width in a new column. There is a function (=CONFIDENCE.T()) that you can use to easily accomplish this. The arguments you will need to tell it are (alpha level, standard deviation, sample size). For a 95% CI, the alpha level would be 0.05; if you are using a 90% confidence level, your alpha would be 0.1. If you create columns for standard deviation and sample size, using this function becomes very easy.
Once you have your confidence intervals calculated, you can use them as a “Custom Error Amount”. Given that the confidence interval around a mean is symmetrical, you will use the same column for both the positive and negative offsets.
NOTE: When using confidence intervals, you should NOT say “There is a 95% chance the confidence interval contains the true population value.” That is WRONG. A confidence interval either does, or does not, contain the true population value. Correct interpretation would be that if an infinite number of confidence intervals were created from random samples of the same population, 95% of the confidence intervals would contain the population value. Cumbersome, but accurate. A fun animation illustrating this can be found at http://rpsychologist.com/d3/CI/.
Video: Confidence Intervals