I am working on creating an application that will load up a CSV into one dataframe and a SQL lookup table in another.
The first frame has columns: col1, col2, ..., coln.
The second frame is a lookup table containing just col1 and col2.
The lookup table looks like
col1 | col2 |
---|---|
AB | 123 |
AB | 234 |
BC | 456 |
and so on. The first table needs to have a valid combination of keys. I want to get the record set that are invalid.
I need to do something like "if the PAIR of keys doesn't exist, then grab the record. else drop"
I think I could do a nested loop through each one, but am wondering if there is a more efficient way to do this, such as with pd.merge
or something.
Here's what I've got so far:
... create two dataframes
...
frame = d1.merge(df2, how='outer', indicator=True, left_on=['col1', 'col2']).loc[lambda x: x['_merge'] == 'left_only']
I get this error:
TypeError: object of type 'NoneType' has no len()
Is there a better way to do this?