How to select the same columns from many files

Question

I have many text files in which I want to load them all and then make a new matrix with a certain columns from all files

for example some matrices are as follows:

1a <- replicate(10, rnorm(20)) 
1b <- replicate(10, rnorm(19)) 
2a <- replicate(10, rnorm(18))
2b <- replicate(10, rnorm(15))

how I reconize them, I put them all in a folder and I set my directory there then I can get the list of them like

filelist = list.files(pattern = ".*.txt")

Then I want to put the first column of the 1a and V6 and V7 in a new matrix then I want to put the V6 and V7 from the 1b in a new matrix then I want to put the V6 and V7 from the 2a in a new matrix then I want to put the V6 and V7 from the 2b in a new matrix

The files are not in the same length (their rows are different from each other) . I would like to do two things

1- save the same file with selected columns with an added R to the name for example if the original file is 1a, then select V6 and V7 and same a new file with only 2 columns and name 1aR

2- make a new matrix and put all the selected columns in that (those that are not equal , we can make NA or 0 there

@Heroka did you read my question fully? is it the same as that question???????? I am sorry but the title is similar — koskesh kiramtodahanet, Feb 21 '16 at 14:41
In my opinion, at least the first part of your question is answered by the duplicate. For the second part, you might need to do manual subsetting/column selecting as you need a different column from each. — Heroka, Feb 21 '16 at 14:43
@Heroka you reply when I need to pick up the same columns from all. if you reply that I will accept. if you dont know, please don't just down vote a question. There are many people out there who are expert in R. you cannot just down vote a question, if you cannot reply !!! — koskesh kiramtodahanet, Feb 21 '16 at 14:45
@Heroka I changed my question , the first part I removed and now I did it the way you said you can do. — koskesh kiramtodahanet, Feb 21 '16 at 14:54
Given the new question (load 100 files, subset the 6th and 7th column for each), look into how for-loops or how apply functions work. Here is a slightly more complicated version of your question: http://stackoverflow.com/questions/27437144/how-to-automate-subsetting-multiple-files-using-r — slamballais, Feb 21 '16 at 15:42
You may try `filelist <- list.files(pattern='\\d+[^0-9]+\\.txt', full.names=TRUE); lst <- lapply(filelist, read.csv, header=TRUE, stringsAsFactors=FALSE); cbind(lst[[1]][1], do.call(cbind, lapply(lst, `[`, 6:7)))` — akrun, Feb 21 '16 at 16:24
@akrun thanks do you load the txt file with read.csv? or fread or read.table? however, I would like to know what if the length of the rows where not equal ? Please check the question again. because I want the question to be different that people dont down vote it :-( — koskesh kiramtodahanet, Feb 21 '16 at 16:29
@koskeshkiramtodahanet You can use either one of the options to load the data. With `fread`, it would be fast though. If the lengths of the rows are not equal, the `cbind` option will not work. Then, we may need to populate with `NA` rows so that the nrow will be equal for all. — akrun, Feb 21 '16 at 16:36
@koskeshkiramtodahanet I posted a solution below. Hope it helps you. — akrun, Feb 21 '16 at 16:51
Just a doubt, do you want to `cbind` the selected columns only in the `1a` and `1b` together, likewise, only `2a` and `2b` together? In that case, there will be 2 output dataset. — akrun, Feb 21 '16 at 16:55
@akrun no all together but in the right way. 1a then 1b then 2a then 2b and also I want to save each file separated saved too — koskesh kiramtodahanet, Feb 21 '16 at 17:00
@koskeshkiramtodahanet Can you check the solution below. If I understand, you want a single file that contains the 6th and 7th column from all the dataset and then save it as a separate file. — akrun, Feb 21 '16 at 17:11

akrun · Accepted Answer · 2016-02-21T17:30:42.790

1

Here is an option to read the files, select the concerned columns from the dataset, and create a new dataset.

We get the files that follow a particular file name pattern in the working directory using list.files.

filelist <- list.files(pattern='\\d+[^0-9]+\\.txt', full.names=TRUE)

Then, read all the files into a list using either read.csv/read.table or fread from data.table

lst <- lapply(filelist, read.csv, header=TRUE, stringsAsFactors=FALSE)

Extract the 6th and 7th columns from the 'lst'

lst1 <- lapply(lst, "[", c("V6", "V7"))

If the data.frame elements in the list have unequal number of rows, one option is cbind.fill from library(rowr)

library(rowr)
res <- cbind.fill(lst[[1]][1], do.call(cbind.fill, 
           c(lst1, list(fill=NA))), fill=NA)
res 
#   V1 V6 V7 V6.1 V7.1
#1  21  1 11    1   11
#2  22  2 12    2   12
#3  23  3 13    3   13
#4  24  4 14   NA   NA
#5  25  5 15   NA   NA
#6  26  6 16   NA   NA
#7  27  7 17   NA   NA
#8  28  8 18   NA   NA
#9  29  9 19   NA   NA
#10 30 10 20   NA   NA

Then, we write the file as .txt

write.table(res, 'CombinedV6_V7.txt', row.names=FALSE, quote=FALSE)

Update

Using the data from the link

lst <- lapply(filelist, read.csv, sep='\t',
              header=TRUE, stringsAsFactors=FALSE)
lst1 <- lapply(lst, "[", c("Time", "X220"))
res <- do.call(cbind.fill, c(lst1, list(fill=NA)))
head(res)
#   Time   X220  Time   X220  Time  X220   Time  X220
#1 0.700    111 1.400   2370 0.850   520  1.600 21216
#2 2.083 131747 1.650 179289 1.633 54607  1.900  3816
#3 2.517  23428 2.100  21690 2.117 13677  2.117  3573
#4 2.667  12528 2.267  10383 2.267 13448  2.300 11349
#5 3.883   1055 3.017    816 3.567  1346  9.717   292
#6 4.500    881 3.383    637 5.350   772 21.600  3774

data

 lst <- list(data.frame(V1=21:30, V6=1:10, V7= 11:20), 
             data.frame(V6=1:3, V7=11:13, V1= 21:23))

NOTE: The above data is just for reproducing the problem.

edited Feb 21 '16 at 17:30

answered Feb 21 '16 at 16:47

akrun

874,273
37
540
662

@koskeshkiramtodahanet Can you check the `str(lst)`? I have to use `sep="\t"` as the delimiter. It is working for me after that change. Please check the updated solution – akrun Feb 21 '16 at 17:31
I have only few questions to learn. what is the differences if I use ".*.txt or \\d+[^0-9]+\\.txt ? what if I set recursive=TRUE there (based on previous comments you put. what the last function does ? lst <- list – koskesh kiramtodahanet Feb 21 '16 at 17:38
@koskeshkiramtodahanet I used the `\\d+[^0-9]+` to make it a bit more specific as it seems like your file names start with one or more numbers followed by one or more characters followed by `.txt`. The last one `lst <- list(..` was just to create a reproducible data in case anybody wants to test it. – akrun Feb 21 '16 at 17:42
thank you so much for the explanation and your amazing help. I accepted it. However, I would like to know something (first part of my question) I read the files in a listfiles and I select two columns from each file. is it possible to save them as a new file automatically ? in the same folder ? or I must do it manually – koskesh kiramtodahanet Feb 21 '16 at 17:46
@koskeshkiramtodahanet We can save them as a new file, For example, TRy `nm1 <- paste0('new', basename(filelist)); invisible(lapply(seq_along(lst1), function(i) write.table(lst1[[i]], nm1[i], row.names=FALSE, quote=FALSE)))` – akrun Feb 21 '16 at 17:51
this will create 4 files with only those selected columns in them? – koskesh kiramtodahanet Feb 21 '16 at 17:52
@koskeshkiramtodahanet Yes, it should create 4 files with that selected columns as the `lst1` is a list of data.frames with that columns. If the working directory is different than the the one where you loaded the files, then you may need to specify the `path` with `paste` to save it in that directory. – akrun Feb 21 '16 at 17:53

How to select the same columns from many files

1 Answers1

Update

data