0

My data looks like as shown below:

Fasting_glucose sample  Prevotella  Turicibacter    Mitsuokella Description
138 PCS119F 0.005782    0   0   Known_Diabetic
114 PCS119M 0.062654    0.000176    0.020358    New_Diagnosed
100 PCS11F  0.33044 0.000469    0.000352    New_Diagnosed
88  PCS120M 0.097811    0.000135    0   Normoglycemic
228 PCS125F 0.17703 0.000264    0.06429 Known_Diabetic
98  PCS127M 0.466902    0   0.011735    Normoglycemic
148 PCS130F 0.186682    0   0.000131    New_Diagnosed
233 PCS132F 0.003126    0   0   Known_Diabetic

I want to use lm function to plot the simple linear regression between Fasting_glucose with all other columns using Description column as a grouping variable.

Currently, I am trying to use following script:

Prevotella<-ggplot(fasting.glucose, aes(Fasting_glucose, Prevotella)) +
geom_point() +
geom_smooth(method="lm")+ geom_point(aes(size = Fasting_glucose))+geom_point(aes(fill=Description, size=Fasting_glucose),  shape=21)+theme(panel.background = element_rect(fill='white', colour='black')) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

But it is producing only one plot at a time.

So just want to ask how can apply lm function and plot it for all the columns at once.

Jamie Taylor
  • 4,709
  • 5
  • 44
  • 66
  • You need to specify the output you are looking for. There are many ways to look at the fit between multiple variables. – Ben Rollert May 01 '14 at 08:36
  • The output I am expecting is the script should return a single plot containing multiple scatterplots with regression line for all the columns, like Prevotella, Tuneribacter, Mitsuokella etc against Fasting_glucose level. – user3526009 May 01 '14 at 08:46
  • Try to add `+ facet_wrap( ~ Description)` – David Arenburg May 01 '14 at 08:47
  • facet_wrap(~Description) is segregating the samples based on groups in Description column. But I am still not able to produce plots for all the columns vs Fasting_glucose column. – user3526009 May 01 '14 at 09:41

1 Answers1

1

You need to make your data tidy to use it with ggplot2. This means loading the reshape2 package and using the melt function.

library(ggplot2)
library(reshape2)

x <- read.table(text = "Fasting_glucose sample  Prevotella  Turicibacter    Mitsuokella Description
138 PCS119F 0.005782    0   0   Known_Diabetic
114 PCS119M 0.062654    0.000176    0.020358    New_Diagnosed
100 PCS11F  0.33044 0.000469    0.000352    New_Diagnosed
88  PCS120M 0.097811    0.000135    0   Normoglycemic
228 PCS125F 0.17703 0.000264    0.06429 Known_Diabetic
98  PCS127M 0.466902    0   0.011735    Normoglycemic
148 PCS130F 0.186682    0   0.000131    New_Diagnosed
233 PCS132F 0.003126    0   0   Known_Diabetic", header = TRUE)

y <- melt(x, id.vars = c("Fasting_glucose", "sample", "Description"))

ggplot(y, aes(Fasting_glucose, value, colour = Description)) + geom_point() +
geom_smooth(method = "lm") + facet_wrap(~ variable)
nacnudus
  • 6,328
  • 5
  • 33
  • 47
  • I'm guessing the OP wants to do `facet_wrap(Description~variable)` but otherwise this approach is correct – Ben Rollert May 01 '14 at 11:11
  • Thank You nacnudus, It wokred the way I want it. I was wondering if I can add the R values to each of the plots? – user3526009 May 02 '14 at 05:58
  • Glad to have helped and welcome to StackOverflow. You can let others know that this question is answered by clicking the tick next to my answer. The R values question is a popular one, already answered [here](http://stackoverflow.com/a/7549819/937932) – nacnudus May 02 '14 at 07:17