0

I won't pretend that this code is even remotely optimal, but here is the problem I have. I have a list of files with multiple columns read in with sapply(), such that if I call file.list[[1]] I get a summary of that data.frame, and summary(file.list) is a list of files.

I am fitting curves to the data using the mgcv package as follows:

gam_data <- function(curves)
{
  out <- gam(curves[, 15] ~ s(curves[, 23]))
  pd <- plot(out)
  return(pd)
}
out <- lapply(file.list, gam_data)

split_curves <- function(splitting)
{
  pd_2 <- c(splitting[[1]]$fit)
  pd_3 <- c(splitting[[1]]$x)
  pd_4 <- c(splitting[[1]]$se)
  curveg <- cbind(pd_2, pd_3, pd_4)
  colnames(curveg) <- c("fitted", "sphro", "se")
  return(curveg)
}

out2 <- lapply(out, split_curves)

Where the first block is performing gam and the second is extracting the fit of the curve. However, after all of that the original information in file.list such as replicate, genotype, etc. is lost, and the data.frames are not the same length anymore. This is probably a trivial question, but how does one retain that information through processing? I'm applying this to hundreds of data frames so I cannot just manually recreate the columns.

Phil
  • 4,344
  • 2
  • 23
  • 33
user2472414
  • 131
  • 11
  • 2
    Can you post some example data with `dput()`? See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for help – Phil May 12 '17 at 19:10
  • 1
    Maybe just `dput` a sample of the data then? – Sraffa May 12 '17 at 20:50
  • The point of a *minimal, reproducible example*, as described in the link I posted, is that it's easier for us to: get some example data into R to work with; and be sure our answer works with your data. If you can post this we can help you out, but it's difficult to see what's going on without this. – Phil May 14 '17 at 18:21
  • structure(list(ch1_mrnas_corr = c(1L, 3L, 5L, 0L, 0L), genotype = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "r153", class = "factor"), sex = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "f", class = "factor"), rep = c(1L, 1L, 1L, 1L, 1L), sprho = c(75.18126263, 32.15594264, 97.2410413, 51.78756296, 119.0448949), spphi = c(46.86455994, 23.07028426, 62.73749975, 33.91775658, 73.19567605)), .Names = c("ch1_mrnas_corr", "genotype", "sex", "rep", "sprho", "spphi"), row.names = c(NA, 5L), class = "data.frame") – user2472414 May 16 '17 at 22:22
  • In this case the example I posted would then be out <- gam(curves[, 1] ~ s(curves[, 5])). The point would be to process all of this data using the pipeline above, and then still retain the 'sex', 'rep', 'genotype' columns, for all members of a file.list. – user2472414 May 16 '17 at 22:23

0 Answers0