2

I'm trying to perform a group by on a disk frame and it's getting this error

Error in serialize(data, node$con) : error writing to connection with disk frame

I'm wondering if I might be able to get around this by changing the sizes of the chunks. This seems to indicate that I have chunks that are too big to be processed (my file has sixteen chunks). I'm considering recreating the disk frames with 30 chunks that are each much smaller and then trying again with my aggregation. Specifically, the aggregation is doing n_distinct.

Does that sound about right?

goollan
  • 765
  • 8
  • 19
  • Does this answer your question? [My group by doesn't appear to be working in disk frames](https://stackoverflow.com/questions/63851782/my-group-by-doesnt-appear-to-be-working-in-disk-frames) – xiaodai Sep 17 '20 at 02:43

1 Answers1

0

Are you using the data.table syntax? Just use the dplyr syntax instead. See https://stackoverflow.com/a/63929173/239923

xiaodai
  • 14,889
  • 18
  • 76
  • 140