54

I have the following data frame:

A       B       C       D       Xax
0.451   0.333   0.034   0.173   0.22        
0.491   0.270   0.033   0.207   0.34    
0.389   0.249   0.084   0.271   0.54    
0.425   0.819   0.077   0.281   0.34
0.457   0.429   0.053   0.386   0.53    
0.436   0.524   0.049   0.249   0.12    
0.423   0.270   0.093   0.279   0.61    
0.463   0.315   0.019   0.204   0.23

I need to plot all these columns in the same plot(on the x-axis I want the variable Xax and the y-axis the variables A,B,C and D) and also to draw the regression line for each variable alone.

I tried this:

pl<-ggplot(data=df) + geom_point(aes(x=Xax,y=A,size=10)) + 
  geom_point(aes(x=Xax,y=B,size=10)) + 
  geom_point(aes(x=Xax,y=C,size=10)) + 
  geom_point(aes(x=Xax,y=D,size=10)) + 
  geom_smooth(method = "lm", se=FALSE, color="black")

But it's only plotting the first one(Xax and A)

Henrik
  • 65,555
  • 14
  • 143
  • 159
ifreak
  • 1,726
  • 4
  • 27
  • 45

4 Answers4

73

The easiest is to convert your data to a "tall" format.

s <- 
"A       B        C       G       Xax
0.451   0.333   0.034   0.173   0.22        
0.491   0.270   0.033   0.207   0.34    
0.389   0.249   0.084   0.271   0.54    
0.425   0.819   0.077   0.281   0.34
0.457   0.429   0.053   0.386   0.53    
0.436   0.524   0.049   0.249   0.12    
0.423   0.270   0.093   0.279   0.61    
0.463   0.315   0.019   0.204   0.23
"
d <- read.delim(textConnection(s), sep="")

library(ggplot2)
library(reshape2)
d <- melt(d, id.vars="Xax")

# Everything on the same plot
ggplot(d, aes(Xax,value, col=variable)) + 
  geom_point() + 
  stat_smooth() 

# Separate plots
ggplot(d, aes(Xax,value)) + 
  geom_point() + 
  stat_smooth() +
  facet_wrap(~variable)
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • i did not get the solution?? this is a small part if the dataframe, it's much bigger. can you please explain the answer and apply it on the original data frame?? – ifreak Mar 02 '12 at 15:26
  • and btw this is not working :/ – ifreak Mar 02 '12 at 15:48
  • 11
    @ifreak How could anyone apply this code to the original full data frame, which exists only on your computer, and you haven't provided? And saying that "it's not working" is about the least helpful comment imaginable, since it doesn't provide any information about how or why it isn't working. – joran Mar 02 '12 at 16:18
  • my data frame is around 500 row. but i tried to copy the same code that Vincent provided and tried it outside my script and it did not worked also. that's what i meant with it did not worked.. – ifreak Mar 02 '12 at 17:04
  • 2
    To be able to help, we need to know what you mean by "it did not work": was there any error message? – Vincent Zoonekynd Mar 02 '12 at 22:34
  • yes, the error msg was on the close(s) thing which i did not understand. but now after removing it i have a plot but i also have a Warning message: closing unused connection 3 (s). but in all cases my data frame is bigger, how i can adapt the script to work on bigger data frames?? and btw, in this case how i should adapt a linear regression to each variable since all the variables are metled?? – ifreak Mar 05 '12 at 08:53
  • You can ignore the warning about the connection: it comes from my reading the data from a string instead of from a file. The size of the data should not matter: if your real data is similar to the example, no code change is needed. Since the data has been grouped, either thanks to the `col` argument (first example) or the `facet_wrap` call (second example), there is one separate regression for each variable. Since you want a linear regression, you should add `method="lm"` to `geom_smooth`, as in your question. – Vincent Zoonekynd Mar 05 '12 at 09:21
  • but how i should transform all the data frame to this format found in the answer?? – ifreak Mar 05 '12 at 09:41
  • If the data is in the same format as in the question, there is nothing to do (the code only assumes that there is an `Xax` column). If it is in a different format, I have no idea what it looks like... – Vincent Zoonekynd Mar 05 '12 at 10:06
  • ok, thank you. it worked, but i have another question. i printed the linear line but how should i get the regression equation and R2?? – ifreak Mar 05 '12 at 10:55
  • The easiest is probably to compute the regressions separately: `library(plyr); ddply( d, "variable", function(u) { r <- lm(value ~ Xax, data=u); c(coef(r), r.squared=summary(r)$r.squared) } )`. – Vincent Zoonekynd Mar 05 '12 at 12:01
  • you mean for each variable a regression function and then include it inside the geom_text ?? – ifreak Mar 05 '12 at 12:07
  • The code above only computes the coefficients and the R^2 of the regression: if you want to add all these numbers to the plot, you would use `geom_text` (or perhaps `annotate`), but it is more complicated -- you have to decide and compute where to put the text. – Vincent Zoonekynd Mar 05 '12 at 12:56
  • yes, i want to add the numbers to the plot.how can i do this using geom_text?? – ifreak Mar 05 '12 at 13:27
  • Precisely positioning the text can be tricky: you should probably post the problem as a new question, explaining where you want the text to appear. – Vincent Zoonekynd Mar 05 '12 at 13:33
  • This solution doesn't draw the variables on the same plot. – ABCD Jul 23 '16 at 13:39
14

A very simple solution:

df <- read.csv("df.csv",sep=",",head=T)
x <- cbind(df$Xax,df$Xax,df$Xax,df$Xax)
y <- cbind(df$A,df$B,df$C,df$D)
matplot(x,y,type="p")

please note it just plots the data and it does not plot any regression line.

Alessandro Jacopson
  • 18,047
  • 15
  • 98
  • 153
5

Using tidyverse

df %>% tidyr::gather("id", "value", 1:4) %>% 
  ggplot(., aes(Xax, value))+
  geom_point()+
  geom_smooth(method = "lm", se=FALSE, color="black")+
  facet_wrap(~id)

DATA

df<- read.table(text =c("
A       B       C       G       Xax
0.451   0.333   0.034   0.173   0.22        
0.491   0.270   0.033   0.207   0.34    
0.389   0.249   0.084   0.271   0.54    
0.425   0.819   0.077   0.281   0.34
0.457   0.429   0.053   0.386   0.53    
0.436   0.524   0.049   0.249   0.12    
0.423   0.270   0.093   0.279   0.61    
0.463   0.315   0.019   0.204   0.23"), header = T)
shiny
  • 3,380
  • 9
  • 42
  • 79
0

To select columns to plot, I added 2 lines to Vincent Zoonekynd's answer:

#convert to tall/long format(from wide format)
col_plot = c("A","B")
dlong <- melt(d[,c("Xax", col_plot)], id.vars="Xax")  

#"value" and "variable" are default output column names of melt()
ggplot(dlong, aes(Xax,value, col=variable)) +
  geom_point() + 
  geom_smooth()

Google "tidy data" to know more about tall(or long)/wide format.

user3226167
  • 3,131
  • 2
  • 30
  • 34