Pandas Dataframe and duplicate names

Question

I've a Pandas dataframe, and some numerical data about some people. What I need to do is to find people that appare more than one time in the dataframe, and to substitute all the row about one people with one row where the numeric values are the sum of the numeric values of the rows before.

Example:

Names  Column1 Column1  
Jonh     1        2  
Bob      2        3  
Pier     1        1  
John     3        3  
Bob      1        0

Have to become:

Names  Column1 Column1  
Jonh     4        5  
Bob      3        3  
Pier     1        1

How can I do?

Use `df.groupby('Names', as_index=False, sort=False).sum()` – jezrael Nov 03 '18 at 13:08 — jezrael, Nov 03 '18 at 13:08

score 1 · Answer 1 · answered Nov 03 '18 at 13:08

1

Try this:

In [975]: df.groupby('Names')[['Column1','Column2']].sum()
Out[975]: 
       Column1  Column2
Names                  
Bob          3        3
John         4        5
Pier         1        1

answered Nov 03 '18 at 13:08

Mayank Porwal

33,470
8
37
58

score 0 · Answer 2 · answered Nov 03 '18 at 13:09

0

groupby and sum should do the job

df.groupby('Names').sum().sort_values('Column1', ascending=False)

       Column1  Column1.1
Names                    
Jonh         4          5
Bob          3          3
Pier         1          1

answered Nov 03 '18 at 13:09

nimrodz

1,504
1
13
18

Pandas Dataframe and duplicate names

2 Answers2