0

I am processing all files in a directory and I want to get metadata for filenames, save it in a dataframe, then eventually load the dataframe into a RSQLite table after the directory is processed.

Ref: https://stackoverflow.com/a/51913491/9410024 and maybe https://stackoverflow.com/a/45522323/9410024

I don't understand the warning messages and why the filenames haven't been loaded into the dataframe:

setwd('C://tst//')
df <- data.frame("filename"= character(0), stringsAsFactors=FALSE)
for (fn in Sys.glob("tst*.dat")) {
    print(fn)
    df[nrow(df) + 1,] = list(fn)
}

Output:

[1] "tst1.dat"
[1] "tst2.dat"
[1] "tst3.dat"
Warning messages:
1: In `[<-.data.frame`(`*tmp*`, nrow(df) + 1, , value = list("tst1.dat")) :
  replacement element 1 has 1 row to replace 0 rows
2: In `[<-.data.frame`(`*tmp*`, nrow(df) + 1, , value = list("tst2.dat")) :
  replacement element 1 has 1 row to replace 0 rows
3: In `[<-.data.frame`(`*tmp*`, nrow(df) + 1, , value = list("tst3.dat")) :
  replacement element 1 has 1 row to replace 0 rows
> dfrun
[1] filename
<0 rows> (or 0-length row.names)
>
nealei
  • 33
  • 6
  • You need to first create a data frame that has the number of rows you will eventually have. You can't grow a data frame in the way you are attempting to do so, thus the warning that you are trying to replace something empty (length zero) with something longer than zero. Also the code you're using will do some really weird stuff - you probably want `<- fn` not `<- list("fn")`. – Thomas Aug 19 '18 at 02:49
  • This seems inconsistent with the first reference and I have no way of knowing how many files there may be. Edited to remove double quotes around fn (left over debug code). – nealei Aug 19 '18 at 03:06
  • You do. Call `Sys.glob("tst*.dat")` first to determine number of files, use that to build the empty data frame, and then loop over it. – Thomas Aug 19 '18 at 03:13
  • Original code works fine now - no idea what happened. – nealei Aug 19 '18 at 05:23

1 Answers1

0

There's no need to grow a data frame or use a loop here.

Say you have these files:

ls ~/tst/*.dat
# tst1.dat tst2.dat tst3.dat

You can write a simple R code:

library(purrr)
library(dplyr) 

my_files <- Sys.glob(file.path("~", "tst", "*.dat"))
df <- data.frame(filename=my_files, stringsAsFactors = FALSE)

decode_files <- function(x) {
    # some function that processes a file
    lines <- readLines(x)
    substr(lines, 1, 5)
}

df %>% 
    mutate(output = map_chr(filename, decode_files))

Which gives you:

                    filename output
1 /Users/pedram/tst/tst1.dat  hfrsh
2 /Users/pedram/tst/tst2.dat  ifhju
3 /Users/pedram/tst/tst3.dat  fdnfd
pedram
  • 2,931
  • 3
  • 27
  • 43
  • The loop is essential to the main task of decoding the third line of the files and selecting two blocks out of the data files. Capturing metadata for the files processed is an ancillary process. I do want an efficient way of aggregating processed file metadata pending loading it into an rsqlite table. – nealei Aug 19 '18 at 04:34
  • You can avoid a loop there as well. I've updated my answer to show you how to use functional programming and purrr to work on vectors rather than through loops. – pedram Aug 19 '18 at 04:52
  • You've convinced me. Metadata for road, option, traffic, year, period, seed etc will be collected in https://stackoverflow.com/questions/51903990/add-unique-values-to-r-sqlite-database before `end = 0`. I need to get it into codereview. – nealei Aug 19 '18 at 05:19
  • 1
    I don't know why you loaded the two libraries or what the last two lines to generate the output mean but this approach improved my code greatly. – nealei Aug 19 '18 at 13:36