There are 2 huge datasets which I need to merge (left_join
). The datasets have 15mio rows each.
I tried different ways to merge them, but there were always error messages:
For example, dataset X has variables NAME and value_Q1, dataset Y has variables NAME and value_Q2.
Try1 <- left_join(X, Y)
- Error: can not allocate vector of size 19251.5 Gb
Try2 <- merge(X, Y, by = "NAME", all.x =TRUE)
- Error: negative length vectors are not allowed
Try3_1 <- merge(X[1:10000000,], Y, by = "NAME", all.x =TRUE)
- worked
Try3_2 <- merge(X[10000001:nrow(X),], Y, by = "NAME", all.x =TRUE)
- Error: negative length vectors are not allowed
The result of Try3_2 surprised me the most, since Try3_1 worked with 10mio rows, but Try3_2 didn't with only 5mio rows. How does merge
in R actually work?
In the end, I also tried the code above with data tables, the error message says that results are in more than 2^31 rows.