2

Iam trying to sum a column inside a dataFrame that is a set of int.

something like : ['xxxx',{1,2,3}] and i need ['xxxx',6]

Thanks for your help.


 for index,row in df_clusters.iterrows():
        if isinstance(row['sum_coefs'],set):
            row.loc['sum_coefs']=sum(row['sum_coefs'])

I got at the output an unchanged Dataframe with a set in my sum_coefs column and not the sum.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
rednight
  • 59
  • 8
  • 3
    Please provide a sample of your `DataFrame`. Also, in general storing lists in a DataFrame (and in this case a list with a set) is not a good idea. – ALollz May 28 '19 at 13:50
  • Have a look at this question: https://stackoverflow.com/questions/41286569/get-total-of-pandas-column – sekky May 28 '19 at 13:50

1 Answers1

2

Could try using Series.apply:

# Setup    
df_clusters = pd.DataFrame(['xxxx',set([1,2,3])], columns=['sum_coefs'])

def sum_sets(val):
    if isinstance(val, set):
        return sum(val)
    return val

df_clusters['sum_coefs'] = df_clusters['sum_coefs'].apply(sum_sets)

[out]

0    xxxx
1       6
dtype: object

Or alternatively, using an inline lambda function to achieve the same result:

df_clusters['sum_coefs'] = df_clusters['sum_coefs'].apply(lambda x: sum(x) if isinstance(x, set) else x)
Chris Adams
  • 18,389
  • 4
  • 22
  • 39