I have two data frames that I am trying to merge.
df1
has dimensions 20015 rows and 7 variables.
df2
has dimensions 8534664 rows and 29 variables.
When I do full_join(df1, df2, by = "KEY")
I get the Error: cannot allocate vector of size 891.2 Mb
so I set memory.limit(1000000)
and I still get the same error. I run the full_join()
whilst looking at my CPU usage graph in the windows task manager and it increases exponentially. I have also used gc()
through out my code.
My question is, is there a function out there which can join the first 1,000,000
rows. Take a break, then join the next 1,000,000
rows etc. until all rows have been joined.
Is there a function to run the full_join()
in batches?