Compare pandas dataframe columns to sql table dataframe columns

Question

I am working on creating an application that will load up a CSV into one dataframe and a SQL lookup table in another.

The first frame has columns: col1, col2, ..., coln.

The second frame is a lookup table containing just col1 and col2.

The lookup table looks like

col1	col2
AB	123
AB	234
BC	456

and so on. The first table needs to have a valid combination of keys. I want to get the record set that are invalid.

I need to do something like "if the PAIR of keys doesn't exist, then grab the record. else drop"

I think I could do a nested loop through each one, but am wondering if there is a more efficient way to do this, such as with pd.merge or something.

Here's what I've got so far:

... create two dataframes
...

frame = d1.merge(df2, how='outer', indicator=True, left_on=['col1', 'col2']).loc[lambda x: x['_merge'] == 'left_only']

I get this error:

TypeError: object of type 'NoneType' has no len()

Is there a better way to do this?

score 0 · Answer 1 · answered Jan 13 '21 at 21:24

0

Found it thanks to this post

The correct syntax to capture the fallout records is:

frame = df1.merge(df2, how='left', indicator=True, left_on=cols, right_on=cols).query('_merge == "left_only"').drop('_merge', 1)

answered Jan 13 '21 at 21:24

seve

159
12

Compare pandas dataframe columns to sql table dataframe columns

1 Answers1

Linked