0

The dataframe I am using has a column called "NUM_EMPL" which stores the number of employees of a specific company.

As you can see in the picture, those cells contain strings.

Now I have written a piece of code which can sum up one specific cell of that column:

list = buildings.loc[61, 'NUM_EMPL'].split(', ')
int_list = [float(i) for i in list]
print(sum(int_list))

Now I want to do that with every cell and store the sum of every single cell in a new dataframe.

How do I iterate through the cells?

not_speshal
  • 22,093
  • 2
  • 15
  • 30

3 Answers3

0

Usa apply and a lambda function:

df = pd.DataFrame({"NUM_EMPL": ["32.0, 2.0", "3.0"]})
df.NUM_EMPL.apply(lambda x: sum(map(float, x.split(","))))

# Out:
0    34.0
1     3.0
Name: NUM_EMPL, dtype: float64
mcsoini
  • 6,280
  • 2
  • 15
  • 38
0

You could use str.split and groupby for a vectorized approach, and to avoid using apply:

srs = df["NUM_EMPL"].str.split(", ").explode().astype(float)
df["NUM_EMPL"] = srs.groupby(srs.index).sum()

>>> df
   NUM_EMPL
0       3.0
1     794.0
2      35.0
3      42.0
4       3.0
5       3.0
6     794.0
7       8.0
not_speshal
  • 22,093
  • 2
  • 15
  • 30
0

Setup a minimal reproducible example:

df = pd.DataFrame({'NUM_EMPL': ['3.0', '794.0', '32.0, 3.0',
                                '32.0, 3.0, 3.0, 2.0, 2.0']})
print(df)

# Output:
                   NUM_EMPL
0                       3.0
1                     794.0
2                 32.0, 3.0
3  32.0, 3.0, 3.0, 2.0, 2.0

Use:

df['NUM_EMPL'] = df['NUM_EMPL'].str.split(', ') \
                               .apply(lambda x: sum([float(i) for i in x])) \
                               .astype(int)
print(df)

# Output:
   NUM_EMPL
0         3
1       794
2        35
3        42
Corralien
  • 109,409
  • 8
  • 28
  • 52