0

I have a function which removes a string from a df column, if the string is present in the column of another df:

df1['col'] = df1['col'][~df1['col'].isin(df2['col'])]

The problem is that I now have to use this function on a column of tuples, which the function does not work with. Is there a way to easily transform the above function to accommodate for tuples? Data:

df1:                                         df2:
index   col1                                 index      col
0       ('carol.clair', 'mark.taylor')       0          ('james.ray', 'tom.kopeland')
1       ('james.ray', 'tom.kopeland')        1          ('john.grisham', 'huratio.kane')
2       ('andrew.french', 'jack.martin') 
3       ('john.grisham', 'huratio.kane')                                               
4       ('ellis.taylor', 'sam.johnson')      

Desired output:
df1
index      col1
0          ('carol.clair', 'mark.taylor')
1          ('andrew.french', 'jack.martin') 
2          ('ellis.taylor', 'sam.johnson') 

The function does work if the column is first converted to string, however this raises an error later on in my code (I've tried using the .astype(tuple) command to solve this after removing the tuples, however the same error arose):

ValueError: too many values to unpack (expected 2)

Laurie
  • 1,189
  • 1
  • 12
  • 28
  • 2
    Do both col1 and col2 contain tuples in your problem, or just one of them? What objects are in the tuple, and how many? To get a quick answer to your question, best provide a reproducible example: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – w-m Jul 20 '18 at 12:23
  • Please find edited post containing full example. – Laurie Jul 20 '18 at 13:04
  • `isin()` does work with tuples. Where exactly is your problem, what's the error? – w-m Jul 20 '18 at 14:00
  • I use'd the .isin() method however it did not remove any of the tuples that should have been removed. This was fixed by converting the column to string, however doing this created the error as listed above. – Laurie Jul 20 '18 at 14:43

1 Answers1

1

This will give you desired output:

df1.loc[~df1['col1'].isin(df2['col'])].reset_index(drop=True)
#                           col1
#0    (carol.clair, mark.taylor)
#1  (andrew.french, jack.martin)
#2   (ellis.taylor, sam.johnson)
zipa
  • 27,316
  • 6
  • 40
  • 58
  • Perfect, thanks for the help mate. I came up with my own solution of appointing a discriminator column which could be used for wrangling, however this is much more elegant. Thanks again. – Laurie Jul 20 '18 at 14:44