1

My graph displays correctly without using scale. I want to have it looks better so I convert factor to numeric then using scale_x_continuous. However, the graph looks incorrect when I convert from factor to numeric (How to convert a factor to an integer\numeric without a loss of information?). I can't use scale without converting to numeric. Please run a sample code below with and without these lines ( main$U <- as.numeric(as.character(main$U)), and + scale_x_continuous(name="Temperature", limits=c(0, 160)) ). Thank you.

library("ggplot2")
library("plyr")

df<-data.frame(U = c(25, 25, 25, 25, 25, 85, 85, 85, 125, 125), 
               V =c(1.03, 1.06, 1.1,1.08,1.87,1.56,1.75,1.82, 1.85, 1.90), 
               type=c(2,2,2,2,2,2,2,2,2,2)) 

df1<-data.frame(U = c(25, 25,25,85, 85, 85, 85, 125, 125,125), 
                V =c(1.13, 1.24,1.3,1.17, 1.66,1.76,1.89, 1.90, 1.95,1.97), 
                type=c(5,5,5,5,5,5,5,5,5,5)) 

df2<-data.frame(U = c(25, 25, 25, 85, 85,85,125, 125,125), 
                V =c(1.03, 1.06, 1.56,1.75,1.68,1.71,1.82, 1.85,1.88), 
                type=c(7,7,7,7,7,7,7,7,7))

main <- rbind(df,df1,df2)
main$type <- as.factor(main$type)
main <- transform(main, type = revalue(type,c("2"="type2", "5"="type5", "7" = "type7")))
main$U <- as.factor(main$U)
main$U <- as.numeric(as.character(main$U))

ggplot(main, aes(U, V,color=type)) + 
  geom_boxplot(width=0.5/length(unique(main$type)), size=.3, position="identity") + 
  scale_x_continuous(name="Temperature", limits=c(0, 160))  
MrFlick
  • 195,160
  • 17
  • 277
  • 295
Peter Rowan
  • 127
  • 1
  • 11
  • Can you describe what exactly you want it to look like? Right now you have temperature on the x-axis for a boxplot, which isn't usually how boxplots are used (they relate one discrete variable to one continuous variable). I see `scale_x_continuous` changing the limits as expected. – Calum You Feb 26 '18 at 19:50
  • @CalumYou, if you run without lines (main$U <- as.numeric(as.character(main$U)) and scale_x_continuous(name="Temperature", limits=c(0, 160))) the boxplot displays fine. However, it doesn't look too good on x-axis, currently the same interval from 25 to 85 as 85 to 125. Thanks. – Peter Rowan Feb 26 '18 at 20:00

1 Answers1

2

You have to specify the group in your call to geom_boxplot, and to keep the legend you can use color=factor(U) (i.e, converting U back). To not lose information on the groups that have the same x-values, I think it is best to create a new grouping column first. You take all unique pairs of U and type and create a new variable based on which row falls into which of these pairs.

main$U <- as.character(main$U)
main$type <- as.character(main$type)

grp_keys <- unique(as.matrix(main[, c("U", "type")]))
grp_inds <- 1:nrow(grp_keys)

main$grps <- apply(main, 1, function(x) {
  grp_inds[colSums(as.character(x[c("U", "type")]) == t(grp_keys)) == length(c("U", "type"))]
  })

Then, plotting (width adjusted because it looks very small with higher range),

main$U <- as.numeric(as.character(main$U))
ggplot(main, aes(U, V,color=type)) + 
  geom_boxplot(aes(group = grps, color = type), width=20/length(unique(main$type)), size=.3, position="identity") +
  scale_x_continuous(name="Temperature", limits=c(0, 160))

enter image description here

erocoar
  • 5,723
  • 3
  • 23
  • 45
  • 1
    thank you so much. You save me again. That is exactly the plot I am expecting. Thanks. – Peter Rowan Feb 26 '18 at 20:02
  • Glad it helped! :) – erocoar Feb 26 '18 at 20:04
  • 1
    the x-axis looks good now, but it looks like it missing some data set. Please look at the original graph without two lines I mentioned above. Thank you. – Peter Rowan Feb 26 '18 at 20:17
  • Could you help to explain this line "function(x) grp_inds[colSums(as.character(x[c(1, 3)]) == t(grp_keys)) == 2])". Clearly understand this line will help me to connect boxplot median without having this error "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?". Thank you. – Peter Rowan Feb 28 '18 at 18:02
  • That line calculates for every single row which factor the U and type are equal to (remember prior to that we made a new factor of all the unique combinations of U and type). So now we just want to add for every row which unique combination it belongs to. Do you get an error running that? – erocoar Feb 28 '18 at 20:35
  • No, it worked great. My application has a lot of columns, so I just don't understand when you hardcoded "as.character(x[c(1, 3)]) == t(grp_keys)) == 2])" . I didn't understand what 2 means in here. I tried to rewrite to use column name, but it didn't work. Thank you. – Peter Rowan Feb 28 '18 at 20:45
  • The 2 represents the amount of columns! To be a unique pair, columns 1 and 3 have to be equal to a unique pair -- so `sum(TRUE, TRUE)` would then be 2 :) – erocoar Mar 01 '18 at 14:11
  • Thank you for your response. I understand your logic as you explained in the previous comment. Should it be a rowSums instead of colSums ? "colSums(as.character(x[c(1, 3)]) == t(grp_keys)) == 2". If it doesn't take too much time for you, could you rewrite it using column's names ? Thank you. – Peter Rowan Mar 01 '18 at 14:44
  • See my edit! If anything is still unclear, please ask :)I use colSums because otherwise the `==` statement won't work because of the way R recycles vectors that aren't long enough (length 2 for 2 columns .. comparing it to an entire dataframe). But when I transpose the dataframe it does work – erocoar Mar 01 '18 at 17:40
  • Thank you so much. I have tried to use with my column indexs, but it didn't work. I will try again with column's names as you shown above. Thank you. – Peter Rowan Mar 01 '18 at 17:52