I can't seem to find the right phrasing for Googling this question because I get closely similar but not correct answers.
I'm busy with the Titanic dataset and want to sum the number of surviving members in a family. So the dataset looks like this:
+-------------+----------+-----------+-------------+
| PassengerId | Survived | Surname | NumSurvived |
+-------------+----------+-----------+-------------+
| 1 | 0 | Braund | |
| 2 | 1 | Cumings | |
| 3 | 1 | Heikkinen | |
| 4 | 1 | Futrelle | |
| 5 | 0 | Braund | |
| 6 | 0 | Moran | |
| 7 | 0 | Futrelle | |
| 8 | 0 | Braund | |
| 9 | 1 | Cumings | |
+-------------+----------+-----------+-------------+
I need to sum the Survived value for each surname in the NumSurvived column like so:
+-------------+----------+-----------+-------------+
| PassengerId | Survived | Surname | NumSurvived |
+-------------+----------+-----------+-------------+
| 1 | 0 | Braund | 0 |
| 2 | 1 | Cumings | 2 |
| 3 | 1 | Heikkinen | 1 |
| 4 | 1 | Futrelle | 1 |
| 5 | 0 | Braund | 0 |
| 6 | 0 | Moran | 0 |
| 7 | 0 | Futrelle | 1 |
| 8 | 0 | Braund | 0 |
| 9 | 1 | Cumings | 2 |
+-------------+----------+-----------+-------------+