0

I have a data set in txt format having 31968 files and each file contains 365 values in one column. I want to combine each 48 files in sequence and that will bring 17520 values as result.

 Inputs as like
    a = (X1, x2…………………………………x48)
    b =(x49, x50 ………………………………x96)

   Expected outputs like as 
    a = (1, 2, 3, 4,………………………………..17520)
    b= (1, 2, 3, …………………………………..17520)

How I can load the bunch of 31968 file and execute this work in R.

irfan
  • 45
  • 10
  • Did you read the files in R? What are `X1, X2,...etc`? If X1, X2, etc are columns of a dataset. `unlist(dat[paste0("X", 1:48)], use.names=FALSE)` – akrun Jun 14 '16 at 03:43
  • Are the files named in a nice way to make it predictable which ones should be combined? You should probably begin by reading them in [as a list of data frames](http://stackoverflow.com/a/24376207/903061). – Gregor Thomas Jun 14 '16 at 03:47

2 Answers2

0
setwd('files location')
file_names = list.files()
for(i in 1:length(file_names)){
    p = data.frame()
    for(i in i:i+47){
        setwd('input files location')
        d = read.table(file_names[i],col.names=c("id"),strip.white=TRUE)
        p = rbind(p,d)
    }
    setwd('output location')
    write.table(p,paste(file_names[i],".txt",sep=""),row.names=F)
}

See that all your input files lie in a single directory and no other files are present in that directory. The output is created with the name of 48th file read in the every combining list of 48 files.

Raviteja Reddy
  • 109
  • 2
  • 14
  • @ Reddy , i got this error in the result of your proposed solution (Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'NA': No such file or directory) – irfan Jun 14 '16 at 14:25
  • can u post the code which u tried here Along with that tell me which line gave u that error – Raviteja Reddy Jun 14 '16 at 14:52
  • setwd('') file_names = list.files() for(i in 1:length(file_names)){ p = data.frame() for(i in i:i+47){ setwd('input location) d = read.table(file_names[i],col.names=c("id"),strip.white=TRUE) p = rbind(p,d) } setwd('output location') write.table(p,paste(file_names[i],".txt",sep=""),row.names=F) } I could not understand the "file, rt" which shows in error and probably it occurs in second last command. – irfan Jun 14 '16 at 15:15
  • change the values with the location values in setwd() Refer the following link to know how setwd() work. You need to change the locations to required locations in your code. The exact copy paste will not work. [link](http://www.r-bloggers.com/r%E2%80%99s-working-directory/) – Raviteja Reddy Jun 14 '16 at 18:46
  • yes, i did same as given in above link but the error is still there. – irfan Jun 15 '16 at 03:56
0

Since 31968/48 gives 666, create a list with 666 vectors, each contain 48 file names.

file_names <- list.files(path=".", pattern="\\.txt") # change the path to the directory where the files are kept
list_of_files <- lapply(1:666, function(x) file_names[((x-1)*48 + 1):((x-1)*48 + 48)])

Read the files into R as list_of_data and use do.call & rbind to convert into a single data.frame.

for(i in 1:666){
    list_of_data <- lapply(list_of_files[[i]], read.table, sep="\t") # put in appropriate read.table parameters for the text files
    assign(paste0("a", i), do.call(rbind, list_of_data))
    }

Alternative:

for(i in 1:666){
    list_of_data <- lapply(list_of_files[[i]], read.table, sep="\t")
    assign(sprintf("a.%03d", i), do.call(rbind, list_of_data))
    }

This should return 666 objects e.g.

"a.001" "a.002" "a.003" "a.004" "a.005" "a.006" "a.007" "a.008" "a.009" "a.010" "a.011"
"a.012" "a.013" "a.014" "a.015" "a.016" "a.017" "a.018" "a.019" "a.020" "a.021" "a.022"

To merge all 666 data.frame:

frames <- grep("a[.]", ls(), value=T)
library(plyr)
output <- ldply(frames, get)
Adam Quek
  • 6,973
  • 1
  • 17
  • 23
  • @ Adam thank you, this command works rightly but one thing i would ask to you, suggest me an easy and fast command to merge all 666 dafa.frame and export into csv, txt or xlsx format. – irfan Jun 15 '16 at 03:59
  • Assuming that you have 666 data.frames, `a.1, a.2, a.3, ..., a.666`, (i) create a vector with the names of the data.frames with `frames<-grep("a[.]", ls(), value=TRUE )`; (ii) put them in one list with 'list_of_frames <- lapply(frames, function(x)get(x))'; (iiI) use do.call to rbind the list_of_frames with `dat <- do.call(rbind, list_of_frames)` – Adam Quek Jun 15 '16 at 04:05
  • Export dat to csv with write.csv; dat to txt with write.table. Not going to help with exporting into xlsx format. – Adam Quek Jun 15 '16 at 04:06
  • @ Adam , it did export the data properly but there were two issues, (1) Due to names like (a1,a2,a3,a4.....a666) the files are not showing in sequence, rather more like this (a1, a10, a100,a101......). (2) In the result of last command , list_of_frame_with_dat i got only 189 frame instead of 666. Thank you in advance for prompt response. – irfan Jun 15 '16 at 07:23
  • LOL. I neglected the weird R behaviour of treating 10 as the next number after 1. Gonna dig into my notes for this... – Adam Quek Jun 15 '16 at 07:56
  • I really can't recall how to force R to read 1,2,3 instead of 1,10,100, 101, 2. Anyway, added an alternative on top to force the objects into 3 digit. The grep should be getting the dataframe in sequence that way. As for your 2nd question, I can't really answer without knowing how your output look like.... given a ldply option in the answer above. Hope that works better. – Adam Quek Jun 15 '16 at 08:12