0

My dataset looks like the following, where there is an ID column that is either 1 or 2, then there are several other columns which label several measurements.

ID      1        2     3    ...
1       0.3002   0.05  0.4
2       0.12     0.5   0.32
1       0.05     0.12  0.2
1       0.74     0.12  0.32

I am trying to make a boxplot using ggplot, where x is the non-ID column names, y is the measurements in the table, and the fill is the ID. Here is my current code attempt, but this gives me an error that "Aesthetics must be either length 1 or the same as data":

ggplot(df, aes(x=colnames(df), y=df[,-1], fill=ID)) + geom_boxplot()

Any help would be appreciated.

user2657817
  • 652
  • 2
  • 9
  • 18
  • [Reshape your data from wide to long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format), then plot. `library(tidyverse); gather(df, key, value, -1) %>% ggplot(aes(key, value, fill = ID)) + geom_boxplot()` – markus Jun 02 '18 at 21:06

3 Answers3

2

If I understood you correctly, you can do it with the tidyverse approach. I'm using the iris dataset for this example. You can (ish) fill/color the boxplot by ID using geom_jitter()

require(tidyverse)

iris %>% 
  #Select only numeric variables 
  select_if(is.numeric) %>% 
  #to create index
  rownames_to_column("ID") %>% 
  mutate(ID = as.numeric(ID)) %>%
  #Making it tidy
  gather(vars, value, -ID) %>% 
  ggplot(aes(vars, value, color = ID, fill = ID)) + 
  geom_boxplot() +
  geom_jitter()

enter image description here

DJV
  • 4,743
  • 3
  • 19
  • 34
1

Your data:

df <- read.table(text = 
  "ID 1      2    3
  1  0.3002 0.05 0.4
  2  0.12   0.5  0.32
  1  0.05   0.12 0.2
  1  0.74   0.12 0.32", header = T)

If my understanding is correct, what you want to do is plot a box plot for each value in the columns.

First, you need to turn your data to long format using tidyr's gather:

library(tidyr)
long_df <- df %>% gather(Key, Value, -ID)
#   ID Key  Value
#1   1   1 0.3002
#2   2   1 0.1200
#3   1   1 0.0500
#4   1   1 0.7400
#5   1   2 0.0500
#6   2   2 0.5000
#7   1   2 0.1200
# ...

Then you can plot simply:

ggplot(long_df, aes(x = Key, y = Value)) + 
  geom_boxplot()

which leads to the following graph:

enter image description here

camille
  • 16,432
  • 18
  • 38
  • 60
byouness
  • 1,746
  • 2
  • 24
  • 41
1

The thing missing from other answers is that you wanted to set the fill based on ID. Your sample doesn't include enough data to really show the different colors—there's only one set of observations for ID = 2—so I just created some random data with a similar structure to illustrate.

library(tidyverse)

set.seed(123)
df <- tibble(
  ID = rep(c(1, 2), 20),
  `1` = rnorm(40),
  `2` = rnorm(40, sd = 0.5),
  `3` = rnorm(40, sd = 1.2)
)

First you need this in long-shaped format, so you have a column (I called it "key", but you can give it a more descriptive name in your gather) which you can map onto your x aesthetic.

df_long <- df %>%
  gather(key = key, value = value, -ID)

This data in a long shape will have a format such as this:

head(df_long)
#> # A tibble: 6 x 3
#>      ID key     value
#>   <dbl> <chr>   <dbl>
#> 1     1 1     -0.560 
#> 2     2 1     -0.230 
#> 3     1 1      1.56  
#> 4     2 1      0.0705
#> 5     1 1      0.129 
#> 6     2 1      1.72

Then to make boxplots filled for each ID, make ID a factor. You can do that in the dataset, or you can do it like I did here, just inside your aes.

ggplot(df_long, aes(x = key, y = value, fill = as.factor(ID))) +
  geom_boxplot()

Created on 2018-06-03 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60
  • Thanks, what would you do if your key consists of multiple columns? I am unable to get the syntax right using gather. – user2657817 Aug 24 '18 at 01:44
  • You can do multiple `gather` calls. If you have a different situation from what's in your question, you should either post a new question or find a question that's already been posted dealing with multiple keys – camille Aug 24 '18 at 14:44