Questions tagged [disk.frame]
30 questions
5
votes
2 answers
Problem with non-standard evaluation in disk.frame objects using data.table syntax
Problem
I'm currently trying to write a function that filters some rows of a disk.frame object using regular expressions. I, unfortunately, run into some issues with the evaluation of my search string in the filter function. My idea was to pass a…

Joshua Entrop
- 73
- 6
3
votes
1 answer
How do count unique entities with disk.frame in R?
I'd like to convert a data frame to a disk frame and then count the first column. It's not counting the number of unique values of the column when I try it. It appears to be counting the number of…

Cauder
- 2,157
- 4
- 30
- 69
3
votes
0 answers
Is there a better way to use disk.frame within a function?
I have created some functions that need to handle either a disk.frame or a data.table as input. I am getting errors from the future package used within disk.frame due to an object not being found upon execution. I think this is due to the fact that…

pmbrophy
- 31
- 2
2
votes
1 answer
Do I have to use collect with disk frames?
This question is a follow-up from this thread
I'd like to perform three actions on a disk frame
Count the distinct values of the field id grouped by two columns (key_a and key_b)
Count the distinct values of the field id grouped by the first of two…

Cauder
- 2,157
- 4
- 30
- 69
2
votes
1 answer
Error in serialize(data, node$con) : error writing to connection with disk frame
I'm trying to perform a group by on a disk frame and it's getting this error
Error in serialize(data, node$con) : error writing to connection with
disk frame
I'm wondering if I might be able to get around this by changing the sizes of the chunks.…

goollan
- 765
- 8
- 19
2
votes
0 answers
How to convert a very large 40gb ffdf to a disk.frame?
Had it been smaller it would not have been difficult to use the as.data.table.ffdf function. But as it is, the file is much larger than my ram.
Is there any way I can convert it or do I need to write it to disk and then reload it?

Tobias Karlsson
- 31
- 4
1
vote
0 answers
The disk.frame in R doesn't work with data
I have a data set with 80+ million rows. Because of memory shortage I can't manipulate with this data correctly and getting error messages like "can not allocate vector of 180 MB" or so.
I found out the library disk.frame which helps to manipulate…

grislepak
- 31
- 3
1
vote
2 answers
How can I input a single additional parameter to disk.frame's inmapfn at readin?
According to the article https://diskframe.com/articles/ingesting-data.html a good use case for inmapfn as part of csv_to_disk_frame(...) is for date conversion. In my data I know the name of the date column at runtime and would like to feed in the…

Joel Kandiah
- 1,465
- 5
- 15
1
vote
1 answer
In format.default(nam.ob, width = max(ncn), justify = "left") : NAs introduced by coercion to integer range
I have a disk frame that I've saved into a file. It's made up of ten chunks.
I coded every one of the columns as a character because I intend on combining these individual disk frames into one large disk frame and setting the column types at that…

Cauder
- 2,157
- 4
- 30
- 69
1
vote
1 answer
CSV to disk frame with multiple CSVs
I'm getting this error when trying to import CSVs using this code:
some.df = csv_to_disk.frame(list.files("some/path"))
Error in split_every_nlines(name_in = normalizePath(file, mustWork =
TRUE), : Expecting a single string value:…

Cauder
- 2,157
- 4
- 30
- 69
1
vote
1 answer
Is n_distinct an exact calculation with disk frames?
I'm running n_distinct on a large file (>30GB) and it doesn't appear to produce an exact result.
I have another reference point for the data, and the output is off in the disk frame aggregate.
It mentions in the docs that n_distinct is an exact…

Cauder
- 2,157
- 4
- 30
- 69
1
vote
1 answer
How should we select the chunk size in disk frame?
I'm working with disk frame and it's great so far.
One piece that confuses me is the chunk size. I sense that a small chunk might create too many tasks and disk frame might eat up time managing those tasks. On the other hand, a big chunk might be…

Cauder
- 2,157
- 4
- 30
- 69
1
vote
1 answer
How do I read a disk frame that's already been saved?
I saved a disk frame to its output directory and then restarted my R session.
I'd like to read the existing disk frame instead of recreating it elsewhere.
How might I be able to accomplish this? My folder is called outdir.df
This is how I saved the…

Cauder
- 2,157
- 4
- 30
- 69
1
vote
1 answer
What's the best way to write a disk frame to CSV?
I'm looking through the docs and I don't see a function for writing to CSV.
It appears there's a function for writing the disk frame, but it's unclear what format it gets stored in
write_disk.frame
Write a data.frame/disk.frame to a disk.frame…

Cauder
- 2,157
- 4
- 30
- 69
0
votes
0 answers
MCA with a dataset larger than RAM capacity
I'm currently trying to process an MCA on a dataset of 100'572 x 52. However, I receive the message :
Error: cannot allocate vector of size 37.7 Gb
Before buying some new RAM (I've 2x8GB of RAM and I'm working on Windows 11), I wanted to try to…

Max_Wlt
- 3
- 1