I know the question is a bit older. I came across a similar problem recently and want to share the code still.
I wanted to split a data.table
in equally sized chunks. The number of chunks I computed beforehand by dividing the total number of rows of the data.table
by the number of smaller data.table
s I intended to receive. I wrote a function that splits the data.table
(input x
) into the number of chunks with equal number of rows (no_rows_per_frame
) and put a path were to store the frames (path_to_store
).
I needed it to handcollect variables for the chunks. But you could rewrite it to simply return all data.table
s separately. Or better and according to @David Arenburg's answer: Store it in a list and don't pollute your global environment. NB: The code might not be efficient as it uses a loop, but was pretty fast for my sample containing almost 500k observations (as data.table
is).
# function: split into equally-sized samples for handcollection
split_data_table <- function(x, no_rows_per_frame, path_to_store){
split_vec <- seq(1, nrow(x), no_rows_per_frame)
for (split_cut in split_vec) {
sample <- x[split_cut:(split_cut+(no_rows_per_frame-1))]
fwrite(sample, paste(path_to_store, "sample_until_", (split_cut+(no_rows_per_frame-1)), ".csv", sep = ""))
}
}
# apply sample cut
split_data_table(x = vendor_tab, no_rows_per_frame = 5000,
path_to_store = "C/...")
Hope it help so.!