1

Three text files are in the same directory ("data001.txt", "data002.txt", "data003.txt"). I write a loop to read each data file and generate three data tables;

for(i in files) {
x <- read.delim(i, header = F, sep = "\t", na = "*")
setnames(x, 2, i)
assign(i,x)
}

So let's say each individual table looks something like this:

       var1    var2    var3
row1   2       1       3

I've used rbind to combine all of the tables...

combined <- do.call(rbind, mget(ls(pattern="^data")))

and get something like this:

       var1    var2    var3
row1   2       1       3
        var1    var2    var3
row1   3       2       4   
       var1    var2    var3
row1   1       3       5

leaving me with superfluous column names. At the moment I can get around this by just deleting that specific row containing the column names, but it's a bit clunky.

colnames(combined) = combined[1, ]      # make the first row the column names
combined <- combined[-1, ]              # delete the now-unnecessary first row
toDelete <- seq(1, nrow(combined), 2)   # define which rows to be deleted i.e. every second odd row
combined <- combined[ toDelete ,]       # delete them suckaz

This does give me what I want...

       var1    var2    var3
row1   2       1       3
row1   3       2       4   
row1   1       3       5

But I feel like a better way would simply be to extract the values of "row1" as a vector or as a list or whatever, and combine them all together into one data table. I feel like there is a quick and easy way to do this but I haven't been able to find anything yet. I've had a look here and here and here.

One possibility is to take the second row (that I want), and convert it into a matrix (then transpose it to make it a row instead of column!?) and rbind:

data001.txt <- as.matrix(data001.txt[2,])
data001.txt <- t(data001.txt)
combined <- rbind(data001.txt, data002.txt)

This gives me more or less what I want except without the column name headers (e.g. va1, var2, var3).

   v1      v2      v3
   2       1       3
   3       2       4

Any ideas? Would this second method work well if there is some way to add the column names? I feel like it's less clunky than the first method. Thanks for any input :)

edit - solved in answer below.

Community
  • 1
  • 1
heds1
  • 3,203
  • 2
  • 17
  • 32
  • You could use of the `skip` argument in `read.delim` to skip rows for 2 of the files and only take headers of the first one. – JohnCoene May 17 '17 at 04:55
  • You could read directly into a list and then take advantage of the functions which work on lists, like `lapply`, `sapply`... – Roman Luštrik May 17 '17 at 08:01
  • How about setting header = TRUE in `read.delim`? I tried to reproduce your problem with built-in airquality data, but no such issue happened. Theoretically, `rbind` will not replicate the same column names, unless R regards them as parts of data instead of names. – ytu May 17 '17 at 10:10
  • Hi, thanks for the comments everyone. @SocialFunction: unfortunately the data file is oriented such that I have to import it as is, then remove most of the columns (there are about 22 unrequired columns), then transform it (columns to rows and vice versa). So skipping x amount of rows of the data file would delete some of the data I need. – heds1 May 17 '17 at 21:21
  • @RomanLuštrik: same issue here, sorry that I wasn't clear in the post - there's a lot of data that I have to remove after I import it so I don't think importing it as a list will work, although I'm new to R so there's probably a way. I can convert the processed dataframe to a list: `listdata001 <- split(data001.txt, seq_len(nrow(data001.txt)))` but then if I try to rbind for example the second row of that list to a different list, e.g. `combinedlist <- rbind(data002.txt, listdata001[2])` then it doesn't really work either. I think the issue is that the list values are characters – heds1 May 17 '17 at 22:19
  • To clarify, I can get the list values as numbers with your `lapply` suggestion: `values <- lapply(listdata001, as.numeric)` but then trying to rbind that to another data table doesn't proceed as I expected (i.e. each value goes to its corresponding column); instead, the entire list is repeated in each column. – heds1 May 17 '17 at 22:40
  • @Cory - The eventual column names are row names in the initial file, so I can set `row.names = 1` and `header = TRUE` while reading the file, but what I end up with is it missing out the first row name for some reason. If I do row.names = 1 and header = F, that works but I get an extra row of zeroes for some reason; then `rbind` works but the data table ends up with a row of zeroes every two rows so I'm back to square 1 with having to delete every second row after the fact. – heds1 May 17 '17 at 23:09
  • I see. I can imagine why row.names = 1 and header = TRUE miss the row names (probably because the row names in your file have a header as well), but I don't see why row.names = 1 and header = FALSE give you extra rows of zeros. Do you have some pseudo-data that people can reproduce your problem with? – ytu May 18 '17 at 02:40
  • Whip up an answer and post it as such. – Roman Luštrik May 18 '17 at 08:42

1 Answers1

0

Figured it out. Converting to data matrix and using set.names from data.table package required. Say you have a range of text data files like the one that follows, and you want to extract just the seventh column (the one with the numbers, not letters), and combine them together in their own data table including the row names:

chemical1 a b c d e 1 g h i j k l m 
chemical2 a b c d e 2 g h i j k l m 
chemical3 a b c d e 3 g h i j k l m 
chemical4 a b c d e 4 g h i j k l m 
chemical5 a b c d e 5 g h i j k l m 

setting row.names = 1 and header = F.

setwd("directory") 
files <- list.files(pattern = "data")  # take all files with 'data' in their name
for(i in files) {
    x <- read.delim(i, row.names = 1, header = F, sep = "\t", na = "*")
    setnames(x, 6, i)    # if the data you want is in column six. Sets data file name as the column name.
    x <- as.matrix(x[6]) # just take the sixth column with the numeric data (delete everything else)
    x <- t(x)            # transform (if you want..)
    assign(i,x)
    }

combined <- do.call(rbind, mget(ls(pattern="^data")))    # combine the data matrices into one table
write.table(combined, file="filename.csv", sep=",", row.names=T, col.names = NA)
heds1
  • 3,203
  • 2
  • 17
  • 32