0

I managed to make a data like this

df<- structure(list(label = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("boys", 
"girls"), class = "factor"), variable = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 5L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 
4L), .Label = c(" G1", " G20", " G5", " G52", " G9"), class = "factor"), 
    value = structure(c(3L, 8L, 18L, 1L, 15L, 17L, 19L, 7L, 2L, 
    2L, 11L, 10L, 6L, 4L, 9L, 12L, 14L, 5L, 13L, 16L), .Label = c("112864.443", 
    "11319531", "12874.443", "142983324", "1612410048", "16349475.63", 
    "184901841", "2223793.8", "30553282.01", "312004.547", "3135868.44", 
    "317403612.9", "3686081.063", "43701608", "623793.8", "64959501.42", 
    "67666215", "767666215", "775987137.8"), class = "factor")), .Names = c("label", 
"variable", "value"), class = "data.frame", row.names = c(NA, 
-20L))

Now I am trying to make a box plot for each set

when I do this

ggplot(data = df, aes(x=variable, y=value)) + geom_boxplot(aes(fill=label))

it only gives me the following which means it plots all data separated

enter image description here

what I want is to have them together as a box together. These are all G1 which means they will be box together (girls in one color and boys in another color) x axis become 1 . in this set, girls have 3 replicate (samp1,2 and 3) and boys have 3 replicate (samp4,5,6)

Then the second box will be In this case girls have 3 replicate (samp1,2,3) and boys have 2 replicate (samp5,6)

something like this would also be great if few points cannot be plotted by box plot https://www.r-graph-gallery.com/47-groups-distribution-with-ggplot2/

I want to make a significant comparison between girls in different x axis and boys in different x-axis , something like this Put stars on ggplot barplots and boxplots - to indicate the level of significance (p-value)

  • 1
    You need to covert the column "value" from a factor to a number. `df$value<-as.numeric(as.character(df$value))` – Dave2e Nov 29 '17 at 18:07
  • You deleted to your other question before I could respond, but I want to make sure you got it. To do the split, you just need `df_list = split(df, df$data)`. If you insist on multiple data frame objects not in a list, you can then do `list2env(df_list)`, but you'll almost certainly create more work for yourself by taking them out of the list. – Gregor Thomas Nov 29 '17 at 23:11
  • @Gregor yes I got it. I really appreciate your help. I saw people hate my question and I felt I am so dump that I could not do it, so I did it. :-) I really appreciate your help thanks –  Nov 29 '17 at 23:19
  • @Gregor if you can please help me with this question. This has made me crazy lol :-) –  Nov 29 '17 at 23:22
  • 3 points aren't enough for a boxplot. Just use `geom_point()`. – Gregor Thomas Nov 30 '17 at 00:19
  • @Gregor is it possible to give me a solution? I have been trying with no success –  Nov 30 '17 at 01:28

1 Answers1

0

Okay, your real problem is that you have your value stored as a factor as if it was categorical data. We can fix this and then plot:

df$value = as.numeric(as.character(df$value))

ggplot(df, aes(x = variable, y = value, fill = label)) +
    geom_boxplot()

But you really don't have enough data for boxplots. I would just use points, maybe dodged a little bit like this:

ggplot(df, aes(x = variable, y = value, color = label)) +
    geom_point(position = position_dodge(width = 0.2))

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • how can I see the significant differences? sorry for confusion, I mean if I do the plot based on dot, I am still having just a dot plot but I cannot make any differences between boys from G1 to G5, and G9 and G20 and G52. By the way the x-axis is not consecutive either –  Nov 30 '17 at 04:07
  • What differences are significant? You have to do a model or a test or something. Don't rely on a graphics package to do statistical significance calculations. – Gregor Thomas Nov 30 '17 at 15:42
  • for example between the girls G1 and and the boys G1 . also is it possible to make the order in a way that the variables come one after the other ? for example G1, G5, G9, G20 and G52 ? –  Nov 30 '17 at 16:17
  • Please see the [FAQ on ordering ggplot axis](https://stackoverflow.com/q/12774210/903061). – Gregor Thomas Nov 30 '17 at 16:33
  • great I solved that. I am still strugeling to make a significant changes plot. I will make my question better. I am working on it –  Nov 30 '17 at 16:41
  • Please, this question has been mostly answered. I will not be working more on it. Instead of continuing to move the goal posts, make a new question. What you need to address is *"what are significant changes?"*. Once you have that answer, you can use `ggplot2` to visualize them, but you should not rely on a graphics package to do statistical tests for you. – Gregor Thomas Nov 30 '17 at 16:52
  • you are right. i accepted your answer. i will make a more clear question –  Nov 30 '17 at 17:00
  • I made another question, do you think that I am clear now? https://stackoverflow.com/questions/47579156/how-can-i-make-statistical-differences-within-each-interval-and-between-two-grou –  Nov 30 '17 at 17:22