Stacked Barplots, or graphs that depict conditional distributions of data, are great for being able to see a level-wise breakdown of the data. Unfortunately, R has no easily built in functions for generating a stacked barplot. In fact, the majority of R users suggest creating stacked barplots using ggplot2 since it is easier and also looks a bit better. However, for those not yet aboard the ggplot2 train or for those that prefer base R, fear not! For we are about to embark on an adventure that will bring stacked barplots to base R as well as the ability to group bar plots together!

## Examining the Data

For this example, we will use data concerning web domains in different journals. The dataset looks like the following:

internetrefs |

Domain | Journal | Count |
---|---|---|

gov | NEJM | 41 |

gov | JAMA | 103 |

gov | Science | 111 |

org | NEJM | 37 |

org | JAMA | 46 |

org | Science | 162 |

com | NEJM | 6 |

com | JAMA | 17 |

com | Science | 14 |

edu | NEJM | 4 |

edu | JAMA | 8 |

edu | Science | 47 |

other | NEJM | 9 |

other | JAMA | 15 |

other | Science | 52 |

Notice, the data has consistent order such that all **Domain** entries are grouped together and the **Journal** variable has a cyclic ordering where: *NEJM*, then *JAMA*, and finally *Science* appears. We will exploit this feature of the data. I will also provide a method that isolates the data when such feature does not exist.

### Creating the Matrix

In order to create a stacked bar plot, we must first know a variable's levels(). Levels is synonymous with factors that are associated with categorical (string) variables. The levels() function returns the unique values of the strings that a variable takes on. So:

# Attach object so that we can reference by Domain, Journal, and Count # Instead of internetrefs$Count attach(internetrefs) |

levels(Journal) |

```
## [1] "JAMA" "NEJM" "Science"
```

levels(Domain) |

```
## [1] "com" "edu" "gov" "org" "other"
```

Then, using the levels() information, the data must be transformed into a matrix structure. The matrix structure takes on the following form: the rows are representative of the levels() of the **Domain** variable and the columns represent the levels() of the **Journal** variable. We induce this by:

### USING THE PRE-EXISTING ORDER FEATURE OF THE DATA ### # Load the count values data = Count # Place data in a matrix that will have 3 columns since the number levels() # for Journal is 3. Also, based on the ordering feature of the data, load # the matrix such that we fill the matrix by row. data = matrix(data, ncol = 3, byrow = T) # Label the columns and rows colnames(data) = levels(Journal) rownames(data) = levels(Domain) |

The matrix structure of data is then:

id | JAMA | NEJM | Science |
---|---|---|---|

com | 41 | 103 | 111 |

edu | 37 | 46 | 162 |

gov | 6 | 17 | 14 |

org | 4 | 8 | 47 |

other | 9 | 15 | 52 |

But, let's say that you lack that nice feature of the data that we discussed earlier. One way to obtain it is by going through and ordering the columns of the initial dataframe:

internetrefs_ordered = internetrefs[with(internetrefs, order(Domain, Journal)), ] |

id | Domain | Journal | Count |
---|---|---|---|

8 | com | JAMA | 17 |

7 | com | NEJM | 6 |

9 | com | Science | 14 |

11 | edu | JAMA | 8 |

10 | edu | NEJM | 4 |

12 | edu | Science | 47 |

2 | gov | JAMA | 103 |

1 | gov | NEJM | 41 |

3 | gov | Science | 111 |

5 | org | JAMA | 46 |

4 | org | NEJM | 37 |

6 | org | Science | 162 |

14 | other | JAMA | 15 |

13 | other | NEJM | 9 |

15 | other | Science | 52 |

From here, the problem then simplifies to the code used for the ordered data:

### AFTER CREATING THE ORDER FEATURE OF THE DATA ### # Access and load the count values data_ordered = internetrefs_ordered$Count # Place data in a matrix that will have 3 columns since the number levels() # for Journal is 3. Also, based on the ordering feature of the data, load # the matrix such that we fill the matrix by row. data_ordered = matrix(data_ordered, ncol = 3, byrow = T) # Label the columns and rows colnames(data_ordered) = levels(internetrefs_ordered$Journal) rownames(data_ordered) = levels(internetrefs_ordered$Domain) |

id | JAMA | NEJM | Science |
---|---|---|---|

com | 41 | 103 | 111 |

edu | 37 | 46 | 162 |

gov | 6 | 17 | 14 |

org | 4 | 8 | 47 |

other | 9 | 15 | 52 |

## Building the Stacked Barplot

There are two ways to build a stacked barplot: percentage-based and counts-based

However, special care needs to be taken when including a legend. By default, barplot()'s legend generating capabilities are pretty lacking. As a result, one needs to modify the margin space and how clipping is handled. This is achieved by setting par():

# mar is defined to receive: c(bottom, left, top, right) . The default # margin is: c(5, 4, 4, 2) + 0.1 . As a result, we have exploded the # right-hand side of the figure to hold legend. # xpd=TRUE forces all plotting to be clipped to the figure region par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) |

To build percentage-based barplots, we must use prop.table() to generate each percentage for the columns:

# Here, margin represents whether it will be run on rows (1) or columns (2) # We've selected to use prop.table() on columns since that was how we built # our data. prop = prop.table(data, margin = 2) |

If we are looking to build a **percentage**-based *vertical* stacked barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(prop, col = heat.colors(length(rownames(prop))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data)) |

If we are looking to build a **counts**-based *vertical* stacked barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(data, col = heat.colors(length(rownames(data))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data)) |

## Grouped Barplots

If we are looking to build a **percentage**-based *horizontal* grouped barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(prop, col = heat.colors(length(rownames(prop))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data)) |

If we are looking to build a **counts**-based *horizontal* group barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(data, col = heat.colors(length(rownames(data))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data)) |

## In Summary

We've talked about a lot of different components required to pull off a stacked barplot and group bar plot in R. Below is the script that you will want to modify to suit your own data:

internetrefs = read.delim("~/Desktop/internetrefs.txt") # Force Order data_ordered = internetrefs[with(internetrefs, order(Domain, Journal)), ] # load the count values data = data_ordered$Count data = matrix(data, ncol = 3, byrow = T) colnames(data) = levels(data_ordered$Journal) rownames(data) = levels(data_ordered$Domain) prop = prop.table(data, margin = 2) par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) # Percent-based vertically stacked barplot barplot(prop, col = heat.colors(length(rownames(prop))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data)) # Percent-based grouped bar barplot barplot(prop, col = heat.colors(length(rownames(prop))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data)) # Counts-based vertically stacked barplot barplot(data, col = heat.colors(length(rownames(data))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data)) # Counts-based grouped barplot barplot(data, col = heat.colors(length(rownames(data))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data)) |

### Thanks!

Special thanks go out to Weihong Huang!