-1

I have a file that is laid out in the following way:

# Query ID 1
# note
# note
tab delimited data across 12 columns
# Query ID 2
# note
# note
tab delimited data across 12 columns

I'd like to import this data into R so that each query is its own dataframe. Ideally as a list of dataframes with the query ID as the name of each item in the list. I've been searching for awhile, but I haven't seen a good way to do this. Is this possible? Thanks

Jake
  • 145
  • 2
  • 8
  • 1
    Please provide the reproducible example : http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Metrics Feb 22 '15 at 01:55

1 Answers1

1

We have used comma instead of tab to make it easier to see and have put the body of the file in a string but aside from making the obvious changes try this. First we use readLines to read in the file and then determine where the headers are and create a grp vector which has the same number of elements as lines in the file and whose values are the header for that line. Finally split the lines, and apply Read to each group.

but aside from that try this:

# test data

Lines <- "# Query ID 1
# note
# note
1,2,3,4,5,6,7,8,9,10,11,12
1,2,3,4,5,6,7,8,9,10,11,12
# Query ID 2
# note
# note
1,2,3,4,5,6,7,8,9,10,11,12
1,2,3,4,5,6,7,8,9,10,11,12"

L <- readLines(textConnection(Lines)) # L <- readLines("myfile")
isHdr <- grepl("Query", L)
grp <- L[isHdr][cumsum(isHdr)]
# Read <- function(x) read.table(text = x, sep = "\t", fill = TRUE, comment = "#")
Read <- function(x) read.table(text = x, sep = ",", fill = TRUE, comment = "#")
Map(Read, split(L, grp))

giving:

$`# Query ID 1`
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1  1  2  3  4  5  6  7  8  9  10  11  12
2  1  2  3  4  5  6  7  8  9  10  11  12

$`# Query ID 2`
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1  1  2  3  4  5  6  7  8  9  10  11  12
2  1  2  3  4  5  6  7  8  9  10  11  12

No packages needed.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341