1

I have a Pandas dataframe that looks like this:

    ID                                               Dyn
0 AA01   0.084, 0.049, 0.016, -0.003, 0, 0.025, 0.954, 1
1 BG54   0.216, 0.201, 0.174, 0.175, 0.179, 0.191, 0.200

And I'm looking for a way to iter trough the Dyn column, generating another one that sums only the numbers that are bigger than a cutoff, i.e.: 0.150, assigning all the values that pass it a value of one. This is what the expected result should look like:

    ID                                               Dyn Sum
0 AA01   0.084, 0.049, 0.016, -0.003, 0, 0.025, 0.954, 1   2
1 BG54   0.216, 0.201, 0.174, 0.175, 0.179, 0.191, 0.200   7

I thought I could use apply, while ittering trough all of the rows:

for index, rows in df.iterrows():
   df['Sum'] = df['Dyn'].apply(lambda x: x = 1 if int(x) > 0.150 ) 

But I'm lost on how to apply the condition (only sum it if it's greater than 0.150) to all the values inside 'Dyn' and how to assign the value of 1 to them. All advice is accepted. Thanks!

JuanMacD
  • 171
  • 7
  • https://stackoverflow.com/questions/20995196/python-pandas-counting-and-summing-specific-conditions – ddejohn Feb 11 '21 at 00:04

2 Answers2

1
#Create temp column to hold Dyn convereted into list
df=df.assign(sum=df['Dyn'].str.split(','))

#Explode DataFrame
df=df.explode('sum')
#Convert to float
df['sum']=df['sum'].astype(float)
#Filter out values greater that 0.015, groupby and sum
df[df['sum'].gt(0.150)].groupby(['ID','Dyn'])['sum'].sum().reset_index()
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Thanks for the answer. I understand how it works. I don't know why .sum() is not working, it still returns all the columns after the explode, and the column only has the individual values of each 'Dyn' row. – JuanMacD Feb 11 '21 at 00:25
  • Works for me. Check the column's naming. Maybe have white spaces. Can you `df.columns` and copy the columns as they appear into the code? – wwnde Feb 11 '21 at 00:29
  • Yeah, it should work. I don't know why it doesn't. I'm getting the columns right (df.columns returns: Index(['ID', 'Dyn', 'Dyn_Sum'], dtype='object')), but I'm getting the df exploded (22899 rows) instead of it gruped by and with sum (67 rows, expected). – JuanMacD Feb 11 '21 at 12:32
0

What about the following? You can filter out the records that don't meet that condition, and then sum:

>>> x = pd.DataFrame({'ID': {0: 'A', 1: 'B'}, 'Dyn': {0: '1,2,3', 1: '2,3,4'}})
>>> x
  ID    Dyn
0  A  1,2,3
1  B  2,3,4
>>> y = x.set_index('ID').Dyn.str.split(',').explode().astype(int)
>>> y = y[y>=1.5]
>>> x.merge(y.groupby('ID').sum().reset_index().rename({'Dyn':'Sum'}, axis = 1))
  ID    Dyn  Sum
0  A  1,2,3    5
1  B  2,3,4    9
Adam Zeldin
  • 898
  • 4
  • 6
  • I believe this can work, but I had the error: 'ValueError: invalid literal for int() with base 10: '0.084'', maybe I need to convert the values first? – JuanMacD Feb 11 '21 at 00:22
  • Ah, just do .astype(float) instead. My mockup only accounted for integers. See if that works. – Adam Zeldin Feb 11 '21 at 03:08