1

I'm working on something which I think is pretty easy to solve but I can't put all the necessary steps together.

Situation

I have a directory with .txt files (~17000, ~20gb sum) They all are structure more or less the same (they represent a bill of material) and have a separator in common.

What I want to do now is load all of these files into one single list, where on the second level are again lists, with the file name as the attribute. In this second level list the content of the text file should be a dataframe.

So to make it more explicit, I have files like this:

A1.txt
B1.txt
C1.txt

all with the more or less same structure but with different content (BOM). the design of the list should be something like this:

list of list
  |list(named A1.txt)
     $ Content of A1.txt as dataFrame
     $ Content of other files concerning A1.txt (dataframe)
     $ Content of A1.txt (other, e.g. timeseries)
  |list(named B1.txt)
     $ Content of B1.txt as dataFrame
     $ Content of other files concerning B1.txt (dataframe)
     $ Content of other files concerning B1.txt (other, e.g. timeseries)

I am quite new to R and so all I got until now is a dataframe of the textfile contents inside a list, but without the nomenclature and the nesting I would like to achieve, here the code so far:

list_of_lists <- list(1,2)


content <- read.delim(file, sep="|", header = FALSE, stringsAsFactors = FALSE) %>%
select(V1,V2,V3,V8,V14)
list_of_lists[[1]] <- as.data.frame(content)

Thank you already for help in advance

SandPiper
  • 2,816
  • 5
  • 30
  • 52
rued
  • 45
  • 6
  • Use `list.files` to get all file names; then `lapply` to read all those files into a list and finally assign the file names to the list using `names` or `setNames`. Check this related Q&A: https://stackoverflow.com/a/9565095/3521006 – talat Aug 02 '17 at 14:31
  • Thank you for the link, i tried it, it looks now like this ofc: filenames <- list.files(dir, pattern =".txt", full.names = TRUE) ldf <- lapply(filenames, read.delim, sep = "|", stringsAsFactors = FALSE) now when i start the script it runs very very long, when i abort it i get the warning: Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string i didnt got this when i tried it on a single file, am i too unpatient or did i forget something on the lapply? thank you for the help – rued Aug 02 '17 at 15:15
  • What does `$ Content of other files concerning A1.txt (dataframe)` mean? – Carl Boneri Aug 02 '17 at 16:22
  • Hey, it means that there is other information or data for the "product" A1 which is not the bill of materials but other information likestatus data or timelogs for example, br – rued Aug 03 '17 at 07:09

0 Answers0