1

I am trying to do a left join on 2 data table with 2 million rows each and R is crashing due to memory issue. I am using the join function from plyr.

Is there any solution available on the same

Sarayu
  • 17
  • 9
  • Have you tried the `left_join` from `dplyr` – akrun May 26 '17 at 17:38
  • 1
    Or doing a `data.table` join? – Mike H. May 26 '17 at 17:40
  • 1
    Or if data.table fails, you could look at the RSQLite package. – Ian Wesley May 26 '17 at 17:44
  • 1
    Please see: [*How to join (merge) data frames (inner, outer, left, right)?*](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – Jaap May 26 '17 at 17:50
  • 1
    I tried the left_join from dplyr and the same memory issue came.When i tried the data.table join, I got the below error Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in more than 2^31 rows (internal vecseq reached physical limit). Very likely misspecified join. Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. – Sarayu May 26 '17 at 17:54
  • The data is in the workspace. I'm not sure how RSQLite will help. – Sarayu May 26 '17 at 17:55
  • Write the tables to the database, then do the join, then pull out what you need. – Ian Wesley May 26 '17 at 17:56
  • Based on the error it sounds like there are quite a few duplicate values in the keys you are trying to join on. Are you really expecting 2 billion + rows? – Ian Wesley May 26 '17 at 18:06
  • There are some duplicates in the IDs but they cannot be removed. They are required for the computation. Yes I am expecting 2 billion + rows since the input has 2 billion + rows. The fix should be within the code and not database dependent. – Sarayu May 26 '17 at 18:11
  • when i tried the left_join from dplyr, i got the following error. Error in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) : std::bad_alloc Any suggestions? – Sarayu May 26 '17 at 20:10

0 Answers0