-1

I want to generate boxplot for a continuous variable (N) against a categorical variable (BL) grouped by another factor (A with 2 levels)

So, when I use the following code :

data$BL <-factor (data$BL, labels =c("0", "1" , "2"))
data$A <- factor (data$A, labels = c("0", "1"))

plot1 <- ggplot(data=data, aes(x = BL , y = N, fill = A)) +
    geom_boxplot()
plot1

I end up with no box for the combination 0 by 0... I know from the dataset that there are individuals with the combination 0-0.. so why is there no box displayed?

Any advice and suggestions would be much appreciated Thanks

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
Sneha
  • 1
  • 2
  • I've edited your code; please check if this is correct. I removed an additional `+` after `geom_boxplot()`, and column `BL` was originally defined as `data$BL_` but referenced as `BL` in `ggplot`. – Maurits Evers Mar 21 '18 at 13:04
  • @MauritsEvers, I'm afraid it still doesn't work. Is it something to do with interactions? Its quite strange as when I change the fill = another variable (not mentioned), the plots are fine and I get all the combinations. Its clearly to do with variable A, but I don't know what it is! – Sneha Mar 21 '18 at 13:22
  • I didn't offer a solution; I only corrected your syntax errors. – Maurits Evers Mar 21 '18 at 13:22
  • Thanks for editing – Sneha Mar 21 '18 at 13:24

3 Answers3

0

Note that since you don't provide any sample data, trying to debug what is going on with your data is a bit of a guessing game. It's best to always provide a reproducible & minimal example including sample data.

In the following example I generate some sample data.

set.seed(2017);
N <- 60;
data <- data.frame(
    N = sample(1:60, N),
    BL = sample(rep(c("0", "1", "2"), each = N / 3)),
    A = sample(rep(c("0", "1"), each = N / 2)));

Show the distribution of N for different BL groups:

library(ggplot2);
ggplot(data, aes(x = BL, y = N)) + geom_boxplot();

enter image description here

Show the distribution of N for different BL and A groups:

ggplot(data, aes(x = BL, y = N, fill = A)) + geom_boxplot();

enter image description here

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thanks. That's not the problem though.. May be I should be a bit more specific. That I only get the median line but no quartiles or whiskers for the combination of 0-0... – Sneha Mar 21 '18 at 13:41
  • Please note my answer below/above then – waskuf Mar 21 '18 at 13:47
  • @Sneha Agreed with @waskuf; if you only have one count in a group, you can't draw any whiskers or upper/lower quartiles. Again, you really need to provide sample data. You can check counts per group with e.g. `table(data[, c("BL", "A")])`. That table will hold the answer. – Maurits Evers Mar 21 '18 at 13:51
  • That makes sense and I do have that group. But its BL2 with A0... Clearly labelling is going wrong somewhere then? Or how can I just drop that group? – Sneha Mar 21 '18 at 13:59
  • @Sneha What do you mean by *"drop that group"*? If there's only one (or even zero) count(s), that's what it is, isn't it. I'm not sure what you mean by *"labelling is going wrong somewhere"*. There's nothing wrong with showing your data in a plot similar to waskuf's first plot. – Maurits Evers Mar 21 '18 at 14:03
  • I think I know what @Sneha means, provided a new answer. – waskuf Mar 21 '18 at 14:06
  • Excellent. Over to you then @waskuf;-) – Maurits Evers Mar 21 '18 at 14:07
  • @MauritsEvers- I agree and I will happily show the same plot. But the group that has an N=1 in the dataset is different to what is shown in the graph. For. eg: For combination BL0-A0 n =30; BL0-A1 n=26, BL2-A0 n = 1 and BL2-A1 n=35. However, from my plot it show that BL0-A0 n =1 (which isn't the case) – Sneha Mar 21 '18 at 14:14
0

It seems as if all individuals with the combination BL=0 and A=0 have the same value for N. Could you please check if that isn't the case?

If that is the case, the plot with your code and simulated data would look like this: enter image description here

Because otherwise, the plot with exactly your code and data where N varies for individuals with the combination 0-0 looks just fine:

enter image description here

Here's the code:

library(ggplot2)
data <- data.frame(BL = c(0,0,1,1,2,2),
                   A = c(0,1,0,1,0,1),
                   N = c(1:30))
data$BL <-factor (data$BL, labels =c("0", "1" , "2"))
data$A <- factor (data$A, labels = c("0", "1"))

ggplot(data=data, aes(x = BL , y = N, fill = A)) +
  geom_boxplot()
waskuf
  • 415
  • 2
  • 4
  • So, my plot looks exactly like in plot1. where you say N is the same for BL0 and A0. I have gone back to check my data. For this combination, my Quartile 1 is 1.36 and Quartile 3 is 1.63 with an IQR of 0.27. This is pretty much the same for combination BL0 and A1 which has a Quartile 1 of 1.35 and Quartile 3 is 1.63 with an IQR of 0.277. So when this combination is plotted, howcome the first combination doesn't work? – Sneha Mar 21 '18 at 13:56
0

Can you please check, if the following code yields the desired result:

data$BL <- as.factor(data$BL)
data$A <- as.factor(data$A)
ggplot(data=data, aes(x = BL , y = N, fill = A)) +
  geom_boxplot()

The way you transformed BL and A to factors messes your data up if the order in which you provide the levels is not the same as the order in which the levels appear in your data.

waskuf
  • 415
  • 2
  • 4
  • please make sure you run this code instead of your original code, not after it, i.e. before you transform BL and A to factors ;) – waskuf Mar 21 '18 at 14:08
  • Hurray! it works .. Thanks a lot. Could I ask what was wrong with the original code? – Sneha Mar 21 '18 at 14:25
  • Great! In the original code, when you transform your variables to factors, e.g. BL, you tell R that the levels of the newly created factor variable are c("0", "1" , "2") in exactly this order. Now, if the first observation in your data has BL=2, R will match all BL-values of 2 to the factor-level "0" (because of the order in which you provided the levels). Thus, what appears to be the group BL0-A0 will in fact be the group BL2-A0. Hope this helps. – waskuf Mar 21 '18 at 14:33
  • Super explanation. Thanks! – Sneha Mar 21 '18 at 14:41
  • @Sneha I'm glad it worked out, but that's not really how `factor`s work. You can try it out yourself: Case 1 `f <- factor(c("0", "1", "2"), levels = c("0", "1", "2")); as.numeric(f)`. The order of the factor levels is `1 2 3`. Case 2: Now we reorder the character vector to simulate encountering a `"2"` first `f <- factor(c("2", "1", "0"), levels = c("0", "1", "2")); as.numeric(f)`. You can see that the order of factor levels is `3 2 1`, consistent with the order of your charcacter vector. **It's not `1 2 3` again!**. Imagine the consequence of such a behaviour! [...] – Maurits Evers Mar 21 '18 at 20:47
  • [...] So I'm not sure why it worked, but `factor` does **not** re-assign *matching* levels based on the order the entries appear in the character vector. – Maurits Evers Mar 21 '18 at 20:51
  • @MauritsEvers, the difference is, that Sneha uses the argument "labels" in the call to factor(), not "levels" as you did in your example. Therefore the matching levels (to be precise, the labels of the matching levels) are in fact assigned based on the order of appearance ;) – waskuf Mar 22 '18 at 09:41
  • @waskuf You are absolutely correct! I misread `labels` for `levels`. Thanks for the clarification. – Maurits Evers Mar 22 '18 at 10:58