-4

I have two dataframes : df1 with 62,107 lines and 68 columns and df2 with 98,323 lines and 68 columns, same structure.

A want ALL the lines from df1 and ONLY the non_matching lines of df2 (on a column names "PK") ; into a final dataframe, df.

I've been looking around for outer join WITH ADDING a non matching request, but did not find any combined syntax.

I finally tried things like that :

df <- (merge(df1, df2, by.x = c("PK"), by.y = -c("PK")))

did not work !

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Elido
  • 3
  • 2

1 Answers1

0

You should use sqldf for this, below you will return all rows that do not exist in df2.

df <- sqldf("Select * from df1 LEFT JOIN df2 on df1.ID = df2.ID
      WHERE df2.ID IS NULL")

"A want ALL the lines from df1 and ONLY the non_matching lines of df2 (on a column names "PK") ; into a final dataframe, df." Do essentially accomplish this you would just perform a left join without the null clause.

user2600629
  • 559
  • 6
  • 16
  • This exclude the entries from df1 matching in df2... Not what I understand from OP sentence, but it's so badly written I can be wrong... – Tensibai Feb 09 '16 at 15:56
  • yeah, hard to say what this person is exactly after. Really the only way to keep all rows from the original dataframe is to use a left a join but then it will return matches and non matches. – user2600629 Feb 09 '16 at 16:05
  • Thank you, that is what I finally did. – Elido Feb 10 '16 at 16:27
  • The problem is over ! Thanks everybody. – Elido Feb 10 '16 at 16:27