0

I am pretty new to R and really finding it powerful. I am trying to create a dataframe with multiple levels. I have 3 groups (of subjects) called "group1", "group2" and "group3" who have each 19 tests (names do not matter). In each of these 19 tests there are 3 components called b1, b2 and b3 out of 7 components that I am interested in.

What I tried so far: For each patient, the 19 tests form 19 columns with 7 rows (components) hence, the components I am interested in are 5,6,7 for each of the 19 tests:

sapply(x, '[', XXX, j)

My final code is:

for (j in 1:19) {
   x = lapply(c('group1', 'group2', 'group3'), function(i) {
      filenames = dir(file.path('c:/19/7', i),
                 pattern = 'sRL.txt', full.names = T)
      x = lapply(filenames, read.table, sep = '')
      bb1 = sapply(x, '[', 5, j)
      bb2 = sapply(x, '[', 6, j)
      gg = sapply(x, '[', 7, j)
      pcode = as.numeric(gsub('_.*', '', basename(filenames)))
      data.frame(group = i, tests= j, pcode, b1= bb1, b2= bb2, g = gg)
   })
}

Could you please advise how to create a nice dataframe that combine the components 5,6,7 of the 19 tests for each subject of each group?

gagolews
  • 12,836
  • 2
  • 50
  • 75
  • Welcome to SO. Could you provide some sample data to run your code and easily understand the transformation from input to output? Have a look at [how to make a great reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – talat Jun 15 '14 at 13:30

1 Answers1

0

It would be better to provide sample data, but in your case I can see how it would be difficult as the organization of the files is the problem you are trying to deal with.

If I understand you correctly, each patient has their own file with all the tests. Each file has 19 columns (one column per test) and 7 rows (one row per test component). The files (patients) are organized into groups where each group is in a different directory.

As output you want rows 5:7 of each file, transposed so there is one column per component and 1 row for each test, for each patient, in each group.

I think this has a reasonable shot at doing what you want, although it's impossible to test without your complete dataset...

## not tested...
get.group <- function(g) {
  get.file <- function(x,g) {
    df <- read.table(x,sep="")
    data.frame(group=g, test=1:ncol(x), 
               pcode=gsub('_.*', '', basename(x)), t(x[5:7,]))
  }
  filenames <- dir(file.path('c:/19/7', g), pattern = 'sRL.txt', full.names = T)
  result    <- do.call(rbind,lapply(filenames,get.file,g))
}
final <- do.call(rbind,lapply(paste0("group",1:3),get.group))
colnames(final)[4:6] <- c("bb1","bb2","g")

So working from the outside in: for each group the function get.group(...) is called, with the results bound together row-wise. get.group(...) in turn grabs the file names for that group and calls get.file(...) for each file name, binding the results together row-wise. Finally, get.file(...) reads the file and creates a data frame with the group name, the patient code, the test number, and the results for the three components of all 19 tests.

jlhoward
  • 58,004
  • 7
  • 97
  • 140