0

I am working on creating an application that will load up a CSV into one dataframe and a SQL lookup table in another.

The first frame has columns: col1, col2, ..., coln.

The second frame is a lookup table containing just col1 and col2.

The lookup table looks like

col1 col2
AB 123
AB 234
BC 456

and so on. The first table needs to have a valid combination of keys. I want to get the record set that are invalid.

I need to do something like "if the PAIR of keys doesn't exist, then grab the record. else drop"

I think I could do a nested loop through each one, but am wondering if there is a more efficient way to do this, such as with pd.merge or something.

Here's what I've got so far:

... create two dataframes
...

frame = d1.merge(df2, how='outer', indicator=True, left_on=['col1', 'col2']).loc[lambda x: x['_merge'] == 'left_only']

I get this error:

TypeError: object of type 'NoneType' has no len()

Is there a better way to do this?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
seve
  • 159
  • 12

1 Answers1

0

Found it thanks to this post

The correct syntax to capture the fallout records is:

frame = df1.merge(df2, how='left', indicator=True, left_on=cols, right_on=cols).query('_merge == "left_only"').drop('_merge', 1)
seve
  • 159
  • 12