Pandas Series String Comparison

Question

I have a huge dataframe in which I am filtering it on two conditions.

A reproducible toy example is as follows:

import pandas as pd
df_ = pd.DataFrame([["A",91,1], ["B",91,2], ["C",92,1]], 
                   columns=['Name','Iteration','IP Weight'])
df2 = pd.DataFrame([["D",91,1], ["E",91,1], ["F",91,1]], 
                   columns=['Name','Iteration','IP Weight'])

Objective If df_ rows have the same "iteration" and "ip_weight" combination as the 1st row of df, filter that and append df, here 1st row will get removed from df_ and df2 will get appended to it.

I filtered it as follows,

df_[~((df_['Iteration']==df2['Iteration'][0]) & (df_['IP Weight']==df2['IP Weight'][0]))]

It runs fine in the notebook but when I put it in the script it fails with the message

" FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison"

Any help is highly appreciated.

In the docker container, it's 1.0.3 while in local(notebook where it is working as expected) is '0.24.2' — Pragyan, Jul 20 '20 at 08:14
Does this help [FutureWarning elementwise comparison failed](https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur)? — FBruzzesi, Jul 20 '20 at 08:15

Valdi_Bo · Accepted Answer · 2020-07-20T08:24:24.137

1

Create the following mask:

msk = df_['Iteration'].eq(df2.loc[0, 'Iteration'])\
    & df_['IP Weight'].eq(df2.loc[0, 'IP Weight'])

I assume that the initial row in df2 has index == 0. True value of this mask indicate rows to move from df_ to df2.

Then append rows to be moved to df2:

df2 = df2.append(df_[msk], ignore_index=True)

And finally drop them from df_:

df_ = df_[~msk]

Edit

Other, more concise way to create the mask is:

msk = df_.iloc[:, 1:].eq(df2.iloc[0, 1:]).all(axis=1)

This time it will work regardless of the index in the first row of df2.

edited Jul 20 '20 at 08:24

answered Jul 20 '20 at 08:16

Valdi_Bo

30,023
4
23
41

How can I reach the same result without using "~" – Pragyan Jul 20 '20 at 11:22
Look at *df_ = df_[~msk]* in my answer. This is just what you ask for - an example of *boolean indexing* with **negated** mask. – Valdi_Bo Jul 20 '20 at 15:22

score 1 · Answer 2 · answered Jul 20 '20 at 08:17

So when I run something like your example in my notebook as you say it runs fine - but I would note that in researching this I found this link:

FutureWarning: elementwise comparison failed; returning scalar instead

The top answer is informative. My best guess would be that maybe in your actual data some of the ints are recorded as strings?

For example, see my code below:

import pandas as pd
df_ = pd.DataFrame([["A",91,1], ["B",91,2], ["C",92,1]], 
                   columns=['Name','Iteration','IP Weight'])
df2 = pd.DataFrame([["D","91",1], ["E",91,1], ["F",91,1]], 
                   columns=['Name','Iteration','IP Weight'])

k=df_[((df_['Iteration']==df2['Iteration'][0]) & (df_['IP Weight']==df2['IP Weight'][0]))]

g=pd.concat([df2,k])

print(g)

By making the Iteration of the first row of df2 a string, I can recreate your error. Make it an integer and it works.

Pandas Series String Comparison

2 Answers2

Edit