1
rm_id  a         b       c      d   r_id
12  TRUE    TRUE    TRUE    0.2     1
13  TRUE    TRUE    TRUE    0.32    1
14  TRUE    TRUE    TRUE    0.02    1
15  TRUE    TRUE    FALSE   1.2     1
16  TRUE    TRUE    TRUE    0.05    1
17  FALSE   TRUE    FALSE   0.06    2
18  FALSE   TRUE    TRUE    0.8     1
19  TRUE    TRUE    FALSE   0.32    2
20  FALSE   TRUE    TRUE    0.54    1
13  TRUE    TRUE    FALSE   0.12    2
14  FALSE   TRUE    TRUE    0.012   2
16  FALSE   FALSE   FALSE   0.5     2
12  TRUE    FALSE   FALSE   0.9     2
11  FALSE   TRUE    TRUE    0.37    1

Hi Everyone: I have the above table, I want to get the values as displayed below when I filter the values by r_id i.e. sum of each column. Can you help me?

rm_id a       b       c      d    r_id
12  TRUE    TRUE    TRUE    0.2     1
13  TRUE    TRUE    TRUE    0.32    1
14  TRUE    TRUE    TRUE    0.02    1
15  TRUE    TRUE    FALSE   1.2     1
16  TRUE    TRUE    TRUE    0.05    1
18  FALSE   TRUE    TRUE    0.8     1
20  FALSE   TRUE    TRUE    0.54    1
11  FALSE   TRUE    TRUE    0.37    1

    FALSE   TRUE    FALSE   3.5 
  • In dupe solution is for `filter the values by r_id`, can you explain more `sum` Because `sum` of boolean is not `False, True, False` and also `5.4012` is not sum of filtered rows – jezrael Jan 28 '20 at 09:56
  • Thanks for your reply. I have corrected the sum. for columns containing the numerical values ('d' in the above example), it should sum the row values associated with that respective r_id and for boolean values, it should give me the above result. – Siddu Kattimani Jan 28 '20 at 10:02
  • "There is appended new row in filtered data?" : No – Siddu Kattimani Jan 28 '20 at 11:00
  • Hi Jezrael, Thanks for your quick reply.. The above solution did not solve my problem. – Siddu Kattimani Jan 28 '20 at 11:09
  • 1
    I want to get the result of each column associated with respective r_id. i need to get the sum for columns with numerical values (‘d’) and for the columns with boolean values I want ‘False’ if any rows of the respective column associated with that particular r_id had the value ‘False’ in it, Ex: (‘a’ , ’c’), if all the values in a column are true then only I need to get the value as true. Ex(‘b’) – Siddu Kattimani Jan 28 '20 at 11:34
  • Can you check `df1 = df.groupby('r_id', as_index=False).agg({'a':'all', 'b':'all','c':'all', 'd':'sum'})` ? – jezrael Jan 28 '20 at 11:41
  • Sorry, do you check comment link above? – jezrael Jan 28 '20 at 13:09
  • how is the condition being checked? is it logical 'AND' or 'OR'.. because i want to get "FALSE" result if any one of the column row contains 'False' value.. Now i am getting TRUE for all the resulting columns @jezrael – Siddu Kattimani Jan 28 '20 at 13:31
  • Are columns boolean? what is `print (df[['a','b','c']].dtypes)` ? – jezrael Jan 28 '20 at 13:32
  • right now they are of type 'Object' – Siddu Kattimani Jan 28 '20 at 13:35
  • Solution not working? – jezrael Jan 28 '20 at 13:41

1 Answers1

0

Use GroupBy.agg with GroupBy.all and sum function in dictionary:

If data contains TRUE and FALSE strings use:

print (df[['a','b','c']].dtypes)
a    object
b    object
c    object
dtype: object

#check real data
print (df[['a','b','c']].stack().unique())
['TRUE' 'FALSE']

#replace to boolean
df[['a','b','c']] = df[['a','b','c']].replace({'TRUE':True, 'FALSE':False})

print (df[['a','b','c']].dtypes)
a    bool
b    bool
c    bool
dtype: object

df1 = df.groupby('r_id', as_index=False).agg({'a':'all', 'b':'all','c':'all', 'd':'sum'})
print (df1)
   r_id      a      b      c      d
0     1  False   True  False  3.500
1     2  False  False  False  1.912
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252