Summing up numbers, which are stored as strings, of all cells of a specific column in pandas

Question

The dataframe I am using has a column called "NUM_EMPL" which stores the number of employees of a specific company.

As you can see in the picture, those cells contain strings.

Now I have written a piece of code which can sum up one specific cell of that column:

list = buildings.loc[61, 'NUM_EMPL'].split(', ')
int_list = [float(i) for i in list]
print(sum(int_list))

Now I want to do that with every cell and store the sum of every single cell in a new dataframe.

How do I iterate through the cells?

Use the apply function, you can read more about this here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html — Suleman, Nov 08 '21 at 15:15

score 0 · Answer 1 · answered Nov 08 '21 at 15:18

0

Usa apply and a lambda function:

df = pd.DataFrame({"NUM_EMPL": ["32.0, 2.0", "3.0"]})
df.NUM_EMPL.apply(lambda x: sum(map(float, x.split(","))))

# Out:
0    34.0
1     3.0
Name: NUM_EMPL, dtype: float64

answered Nov 08 '21 at 15:18

mcsoini

6,280
2
15
38

not_speshal · Answer 2 · 2021-11-08T15:32:24.493

0

You could use str.split and groupby for a vectorized approach, and to avoid using apply:

srs = df["NUM_EMPL"].str.split(", ").explode().astype(float)
df["NUM_EMPL"] = srs.groupby(srs.index).sum()

>>> df
   NUM_EMPL
0       3.0
1     794.0
2      35.0
3      42.0
4       3.0
5       3.0
6     794.0
7       8.0

edited Nov 08 '21 at 15:32

answered Nov 08 '21 at 15:23

not_speshal

22,093
2
15
30

score 0 · Answer 3 · answered Nov 08 '21 at 15:25

Setup a minimal reproducible example:

df = pd.DataFrame({'NUM_EMPL': ['3.0', '794.0', '32.0, 3.0',
                                '32.0, 3.0, 3.0, 2.0, 2.0']})
print(df)

# Output:
                   NUM_EMPL
0                       3.0
1                     794.0
2                 32.0, 3.0
3  32.0, 3.0, 3.0, 2.0, 2.0

Use:

df['NUM_EMPL'] = df['NUM_EMPL'].str.split(', ') \
                               .apply(lambda x: sum([float(i) for i in x])) \
                               .astype(int)
print(df)

# Output:
   NUM_EMPL
0         3
1       794
2        35
3        42

Summing up numbers, which are stored as strings, of all cells of a specific column in pandas

3 Answers3