0

I have a sample dataframe that is 3600 rows long by 6 columns wide. I want to create plot in R that will show six boxplots, one for each of the 6 columns of data. I am using ggplot. I can create them in excel easy enough (shown below) but want to be able to do it in R as my future dataframes are going to be much larger and R seems to handle large datasets a lot easier.

excel plot

Using the code below I can plot the first column fine, but can't figure out how to add the data from the other 5 columns.

ggplot(data=df)+
 geom_boxplot(aes(x="Label", y=col1))
callin
  • 49
  • 8
  • Please provide a reproducible example of your dataset (see this link: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – dc37 Mar 16 '20 at 02:52

1 Answers1

1

Using geom_boxplot from ggplot2

To get a boxplot for each of your 6 columns with ggplot2, you need to reshape first your dataframe into a longer format in order to match the grammar of ggplot2 (one column for x values, one column for y values and one or more column as categorical values). Then, you can use ggplot2 and geom_boxplot function:

Here, an example using the included iris dataset:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Using, pivot_longer function from tidyr package you can reshape the first 4 columns of this dataset into a longer format:

library(tidyr)
library(dplyr)
iris2 <- iris %>% pivot_longer(cols = Sepal.Length:Petal.Width, names_to = 
"Var", values_to = "val")

# A tibble: 600 x 3
   Species Var            val
   <fct>   <chr>        <dbl>
 1 setosa  Sepal.Length   5.1
 2 setosa  Sepal.Width    3.5
 3 setosa  Petal.Length   1.4
 4 setosa  Petal.Width    0.2
 5 setosa  Sepal.Length   4.9
 6 setosa  Sepal.Width    3  
 7 setosa  Petal.Length   1.4
 8 setosa  Petal.Width    0.2
 9 setosa  Sepal.Length   4.7
10 setosa  Sepal.Width    3.2
# … with 590 more rows

And then, you can use this new dataset in ggplot2 for getting boxplot for each of values of Var:

library(ggplot2)
ggplot(iris2, aes(x = Var, y = val, fill  = Var))+
  geom_boxplot()

enter image description here


Alternative using base r

Without the need to reshape your dataframe, you can get the boxplot right away by using boxplot function in base r:

boxplot(iris[,c(1:4)], col = c("red","green","blue","orange"))

enter image description here

Does it answer your question ?

dc37
  • 15,840
  • 4
  • 15
  • 32
  • Thank you so much. I had no idea that pivot data function existed but was just able to manipulate my data frame easy enough using your example. Thank you for explaining it so clearly, it really helped me walk through the process – callin Mar 16 '20 at 03:10
  • You're welcome ;). With time and practice, you will see that it is a really common process when you want to use `ggplot2` – dc37 Mar 16 '20 at 03:13
  • Is there a way to use pivot_longer to just select specific columns instead of the whole range? – callin Mar 16 '20 at 03:29
  • Yes, that why I did when using `cols = Sepal.Length:Petal.Width` in order to specify to pivot only colums in this range and not the "Species" column. Take a look at the official documentation: https://tidyr.tidyverse.org/reference/pivot_longer.html – dc37 Mar 16 '20 at 10:52