1

I have multiple files with tab separated data that looks like this:

A 25
B 50
C 10
D 30

What I would like is to invert and combine them. So it looks like this:

filename A B C D
file1 25 50 10 30
file2 20 15 0 10
file3 60 20 30 0

As you can see there are some files that have missing data (file2 lacks a value for C so there is no row C in that file). I would like to have any missing columns reported as 0.

I tried using data = lapply(filelist, read.table, sep = "\t") but this just gives me:

data
[[1]]
         V1      V2
1         C   27660
2         B       4
3         E   40128
4         D    4584
5         G   43078

[[2]]
         V1      V2
1         C   31530
2         E   47978
3         D    5268
4         G   54636

Which is not what I want. I want the letters to be the columns and the rows to be the file names.

helicase
  • 354
  • 1
  • 5
  • 13
  • If the files don't all have the same number of columns, how is your example file sufficient information for us to use to craft a solution? Wouldn't it make sense to provide an example that actually represents your files? – joran Feb 22 '12 at 15:58
  • Read your files and then use `merge` – Andrie Feb 22 '12 at 16:44
  • can merge be used with more than 2 files? – helicase Feb 22 '12 at 17:58

2 Answers2

1

You can add the file name in a new column of the data.frames, concatenate them, and reshape the result.

# Not run:
# data <- lapply(filelist, read.table, sep = "\t") 
# names(d) <- filelist

# Use sample data instead
d <- list(
  file1 = data.frame( V1 = sample(LETTERS, 10), V2 = rpois(10,10) ),
  file2 = data.frame( V1 = sample(LETTERS, 10), V2 = rpois(10,10) ),
  file3 = data.frame( V1 = sample(LETTERS, 10), V2 = rpois(10,10) )
)

# Add the file name as a column
for(i in names(d)) {
  d[[i]] <- data.frame( file=i, d[[i]] )
}

# Concatenate everything
d <- do.call(rbind, d)

# Convert to wide format
library(reshape2)
dcast(d, file ~ V1, fill=0 )
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
0

You can probably use the plyr package's rbind.fill() function. Basically you would read in your files, transpose them with t(), and then use rbind.fill to join them all up into one big data frame.

jonchang
  • 1,434
  • 9
  • 13