0

I have a huge dataframe in which I am filtering it on two conditions.

A reproducible toy example is as follows:

import pandas as pd
df_ = pd.DataFrame([["A",91,1], ["B",91,2], ["C",92,1]], 
                   columns=['Name','Iteration','IP Weight'])
df2 = pd.DataFrame([["D",91,1], ["E",91,1], ["F",91,1]], 
                   columns=['Name','Iteration','IP Weight'])

Objective If df_ rows have the same "iteration" and "ip_weight" combination as the 1st row of df, filter that and append df, here 1st row will get removed from df_ and df2 will get appended to it.

I filtered it as follows,

df_[~((df_['Iteration']==df2['Iteration'][0]) & (df_['IP Weight']==df2['IP Weight'][0]))]

It runs fine in the notebook but when I put it in the script it fails with the message

" FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison"

Any help is highly appreciated.

FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
Pragyan
  • 567
  • 1
  • 5
  • 18
  • which version of pandas are you using? – quest Jul 20 '20 at 08:00
  • In the docker container, it's 1.0.3 while in local(notebook where it is working as expected) is '0.24.2' – Pragyan Jul 20 '20 at 08:14
  • 1
    Does this help [FutureWarning elementwise comparison failed](https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur)? – FBruzzesi Jul 20 '20 at 08:15

2 Answers2

1

Create the following mask:

msk = df_['Iteration'].eq(df2.loc[0, 'Iteration'])\
    & df_['IP Weight'].eq(df2.loc[0, 'IP Weight'])

I assume that the initial row in df2 has index == 0. True value of this mask indicate rows to move from df_ to df2.

Then append rows to be moved to df2:

df2 = df2.append(df_[msk], ignore_index=True)

And finally drop them from df_:

df_ = df_[~msk]

Edit

Other, more concise way to create the mask is:

msk = df_.iloc[:, 1:].eq(df2.iloc[0, 1:]).all(axis=1)

This time it will work regardless of the index in the first row of df2.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
1

So when I run something like your example in my notebook as you say it runs fine - but I would note that in researching this I found this link:

FutureWarning: elementwise comparison failed; returning scalar instead

The top answer is informative. My best guess would be that maybe in your actual data some of the ints are recorded as strings?

For example, see my code below:

import pandas as pd
df_ = pd.DataFrame([["A",91,1], ["B",91,2], ["C",92,1]], 
                   columns=['Name','Iteration','IP Weight'])
df2 = pd.DataFrame([["D","91",1], ["E",91,1], ["F",91,1]], 
                   columns=['Name','Iteration','IP Weight'])

k=df_[((df_['Iteration']==df2['Iteration'][0]) & (df_['IP Weight']==df2['IP Weight'][0]))]

g=pd.concat([df2,k])

print(g)

By making the Iteration of the first row of df2 a string, I can recreate your error. Make it an integer and it works.