@JosephWood nailed it. If event_list
has 700,000 rows, then you're trying to create a data.frame
with 700,000 rows and 700,000 columns. strsplit(event_list$event_list, ",")
would be a list of length 700,000, so length(strsplit(event_list$event_list, ","))
gives a single number: 700000
. max
of one number is just that number. You should use lengths
instead.
So your call to str_split_fixed
ends up acting like this:
str_split_fixed(event_list$event_list, ",", n = 700000)
That gives a list of 700,000 elements (length of event_list$event_list
), each element being a character vector with 700,000 values (n
).
On my machine, I roughly estimated the necessary memory:
format(700000 * object.size(character(700000)), "GB")
# [1] "3650.8 Gb"
That's not counting any extra memory required to store those vectors in a data.frame
.
The solution:
split_values <- strsplit(event_list$event_list, ",")
value_counts <- lengths(split_values)
extra_blanks <- lapply(max(value_counts) - value_counts, character)
values_with_blanks <- mapply(split_values, extra_blanks, FUN = c, SIMPLIFY = FALSE)
DF <- as.data.frame(values_with_blanks)