1

I can't seem to find the right phrasing for Googling this question because I get closely similar but not correct answers.

I'm busy with the Titanic dataset and want to sum the number of surviving members in a family. So the dataset looks like this:

+-------------+----------+-----------+-------------+ | PassengerId | Survived | Surname | NumSurvived | +-------------+----------+-----------+-------------+ | 1 | 0 | Braund | | | 2 | 1 | Cumings | | | 3 | 1 | Heikkinen | | | 4 | 1 | Futrelle | | | 5 | 0 | Braund | | | 6 | 0 | Moran | | | 7 | 0 | Futrelle | | | 8 | 0 | Braund | | | 9 | 1 | Cumings | | +-------------+----------+-----------+-------------+

I need to sum the Survived value for each surname in the NumSurvived column like so:

+-------------+----------+-----------+-------------+ | PassengerId | Survived | Surname | NumSurvived | +-------------+----------+-----------+-------------+ | 1 | 0 | Braund | 0 | | 2 | 1 | Cumings | 2 | | 3 | 1 | Heikkinen | 1 | | 4 | 1 | Futrelle | 1 | | 5 | 0 | Braund | 0 | | 6 | 0 | Moran | 0 | | 7 | 0 | Futrelle | 1 | | 8 | 0 | Braund | 0 | | 9 | 1 | Cumings | 2 | +-------------+----------+-----------+-------------+

Izak Joubert
  • 906
  • 11
  • 29

1 Answers1

2

try:

df['NumSurvived']=df.groupby('Surname')['Survived'].transform(lambda x: x.eq(1).sum())

print(df)

   PassengerId  Survived    Surname  NumSurvived
0            1         0     Braund            0
1            2         1    Cumings            2
2            3         1  Heikkinen            1
3            4         1   Futrelle            1
4            5         0     Braund            0
5            6         0      Moran            0
6            7         0   Futrelle            1
7            8         0     Braund            0
8            9         1    Cumings            2
anky
  • 74,114
  • 11
  • 41
  • 70