1

I am facing issues with pandas filtering of rows. I am trying to filter out team whose sum of weight is not equal to one.

dfteam
Team    Weight
A       0.2
A       0.5
A       0.2
A       0.1
B       0.5
B       0.25
B       0.25

dfteamtemp = dfteam.groupby(['Team'], as_index=False)['Weight'].sum()
dfweight = dfteamtemp[(dfteamtemp['Weight'].astype(float)!=1.0)]

dfweight
  Team  Weight
0  A     1.0

I am not sure about the reason for this output. I should get an empty dataframe but it is giving me Team A even thought the sum is 1.

cs95
  • 379,657
  • 97
  • 704
  • 746
Arif Akhtar
  • 79
  • 1
  • 2
  • 6

1 Answers1

2

You are a victim of floating point inaccuracies. The first value does not exactly add up to 1.0 -

df.groupby('Team').Weight.sum().iat[0]
0.99999999999999989

You can resolve this by using np.isclose instead -

np.isclose(df.groupby('Team').Weight.sum(), 1.0)
array([ True,  True], dtype=bool)

And filter on this array. Or, as @ayhan suggested, use groupby + filter -

df.groupby('Team').filter(lambda x: not np.isclose(x['Weight'].sum(), 1))

Empty DataFrame
Columns: [Team, Weight]
Index: []
cs95
  • 379,657
  • 97
  • 704
  • 746