-3

New user to R (like 2 days of use new) and coming from MATLAB, syntax nuances are driving me a little crazy. If anyone can point me in a direction on this topic I would really appreciate it. I have this dataset (fl1.back), that has 32 variables (columns) and 513 measurements (rows), and I want to create a table with basic stat tables of 9 of the 32 columns of data. There's a separate datset(fl2.back) that I would also like to pull 1 column of data from for the final table.

Here's the code I used to do the above tasks for 1 of the columns of data (sodium measurements) from fl1.back and fl2.back:

fl1.back <- read.delim("web.flat",comment.char="#",colClasses="character")
fl1.back <- fl1.back[-1,]
fl2.back <- read.delim("web.flat2",comment.char="#",colClasses="character")
fl2.back <- fl2.back[-1,]
head(fl1.back)
head(fl2.back)

#for rep criteria for sodium
back.sod.rep <- fl2.back[fl2.back$P00930!="",]
back.sod.rep$P00930 <- as.numeric(back.sod.rep$P00930)
back.sod.rep$P00930

#for samples...sodium 
back.sod <- fl1.back[fl1.back$P00930!="",] 
back.sod$P00930 <- as.numeric(back.sod$P00930)
back.sod$P00930
head(back.sod)
back.sod.summ <- data.frame("Sodium")
back.sod.summ
colnames(back.sod.summ) <- "Compound"

back.sod.summ$WQ_crit <- "20 mg/L"
back.sod.summ$n <- nrow(back.sod)
back.sod.summ$n_det <- nrow(back.sod[back.sod$R00930!="<",]) 

back.sod.summ$min <- min(back.sod[back.sod$R00930!="<","P00930"])
back.sod.summ$max <- max(back.sod[back.sod$R00930!="<","P00930"])
back.sod.summ$mean <- mean(back.sod[back.sod$R00930!="<","P00930"])
back.sod.summ$median <- median(back.sod[back.sod$R00930!="<","P00930"])
back.sod.summ$percent_samp_det <- 100*(back.sod.summ$n_det/back.sod.summ$n)
back.sod.summ$percent_samp_above_crit <- 100*(length(back.sod[back.sod$P00930>20,"P00930"])/back.sod.summ$n)
back.sod.summ$percent_rep_above_crit <- (sum(back.sod.rep$P00930>=20)/(nrow(back.sod.rep)))

back.sod$P00930
length(back.sod[back.sod$P00930>back.sod.summ$WQ_crit,"P00930"])

back.sod.summ
final <- data.frame(back.sod.summ)

Instead of rewriting/copying and pasting the above code to create the data frame final, I would like to loop over the two datasets since I'm looking to repeat the same task, just on different columns of data. I really don't know where to start, and there doesn't seem to be much literature on for loops in R.

Any insight is appreciated!

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 2
    Please learn how to produce a [**minimal reproducible example**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) – Henrik May 15 '14 at 13:55

1 Answers1

0

Here is an example of what I think you want with the iris dataset:

library(plyr)
dlply(iris, .(Species), summary)

This can be extended if you need additional stats. Anyway, you probably should use (as I show above) the "split-apply-combine" approach as implemented in various functions and packages.

Roland
  • 127,288
  • 10
  • 191
  • 288