Python - Remove tuple from df column if present in another df column

Question

I have a function which removes a string from a df column, if the string is present in the column of another df:

df1['col'] = df1['col'][~df1['col'].isin(df2['col'])]

The problem is that I now have to use this function on a column of tuples, which the function does not work with. Is there a way to easily transform the above function to accommodate for tuples? Data:

df1:                                         df2:
index   col1                                 index      col
0       ('carol.clair', 'mark.taylor')       0          ('james.ray', 'tom.kopeland')
1       ('james.ray', 'tom.kopeland')        1          ('john.grisham', 'huratio.kane')
2       ('andrew.french', 'jack.martin') 
3       ('john.grisham', 'huratio.kane')                                               
4       ('ellis.taylor', 'sam.johnson')      

Desired output:
df1
index      col1
0          ('carol.clair', 'mark.taylor')
1          ('andrew.french', 'jack.martin') 
2          ('ellis.taylor', 'sam.johnson')

The function does work if the column is first converted to string, however this raises an error later on in my code (I've tried using the .astype(tuple) command to solve this after removing the tuples, however the same error arose):

ValueError: too many values to unpack (expected 2)

Do both col1 and col2 contain tuples in your problem, or just one of them? What objects are in the tuple, and how many? To get a quick answer to your question, best provide a reproducible example: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — w-m, Jul 20 '18 at 12:23
`isin()` does work with tuples. Where exactly is your problem, what's the error? — w-m, Jul 20 '18 at 14:00
I use'd the .isin() method however it did not remove any of the tuples that should have been removed. This was fixed by converting the column to string, however doing this created the error as listed above. — Laurie, Jul 20 '18 at 14:43

score 1 · Accepted Answer · answered Jul 20 '18 at 14:23

1

This will give you desired output:

df1.loc[~df1['col1'].isin(df2['col'])].reset_index(drop=True)
#                           col1
#0    (carol.clair, mark.taylor)
#1  (andrew.french, jack.martin)
#2   (ellis.taylor, sam.johnson)

answered Jul 20 '18 at 14:23

zipa

27,316
6
40
58

Perfect, thanks for the help mate. I came up with my own solution of appointing a discriminator column which could be used for wrangling, however this is much more elegant. Thanks again. – Laurie Jul 20 '18 at 14:44

Python - Remove tuple from df column if present in another df column

1 Answers1