CSV to disk frame with multiple CSVs

Question

I'm getting this error when trying to import CSVs using this code:

some.df = csv_to_disk.frame(list.files("some/path"))

Error in split_every_nlines(name_in = normalizePath(file, mustWork = TRUE), : Expecting a single string value: [type=character; extent=3].

I got a temporary solution with a for loop that iterated through each of the files and then I rbinded all the disk frames together.

I pulled the code from the ingesting data doc

It looks like the function can only take a single value and not a vector of files names. Thus a loop is a valid option. So what is the question? — Dave2e, Sep 18 '20 at 18:56

xiaodai · Accepted Answer · 2020-09-20T06:09:07.463

2

This seems to be an error triggered by the bigreadr package. I wonder if you have a way to reproduce the chunks.

Or maybe try a different chunk reader,

csv_to_disk.frame(..., chunk_reader ="data.table")

Also, if all fails (since CSV reading is hard), reading them in a loop then append would work as well.

Perhaps you need to specify to only read CSVs? like

list.files("some/path", pattern=".csv", full.names=TRUE)

Otherwise, it normally works,

library(disk.frame)

tmp = tempdir()

sapply(1:10, function(x) {
  data.table::fwrite(nycflights13::flights, file.path(tmp, sprintf("tmp%s.csv", x)))
})


library(disk.frame)
setup_disk.frame()
some.df = csv_to_disk.frame(list.files(tmp, pattern = "*.csv", full.names = TRUE))

edited Sep 20 '20 at 06:09

answered Sep 20 '20 at 04:02

xiaodai

14,889
18
76
140

I tried that adding the filter to "*.csv* and it gave me one message that repeated this phrase for each file: `Stage 1 of 2: splitting the file [path]` then it gave me this error: `Error in split_every_nlines(name_in = normalizePath(file, mustWork = TRUE), : Expecting a single string value: [type=character; extent=25].` – Cauder Sep 20 '20 at 05:19
are you table to read the files properly using data.table? sounds like bigreadr is not liking your files perhaps due to formatting? do you need to set the separator? The other thing which you can try (which is slower is to use csv_to_disk.frame(..., chunk_read="data.table") for the loop approach u have is fine. – xiaodai Sep 20 '20 at 06:08

CSV to disk frame with multiple CSVs

1 Answers1