Python Pandas: select rows based on comparison across rows

Question

In the dataframe below, the first column is the index with occasional non-unique values.

|   | col1 |
|---|------|
| A |  120 |
| A |   90 |
| A |   80 |
| B |   80 |
| B |   50 |
| C |  120 |
| D |  150 |
| D |  150 |

I want to select rows so that I can get the following dataframe.

|   | col1 |
|---|------|
| A |  120 |
| B |   80 |
| C |  120 |
| D |  150 |

Basically, I just want to keep the first row associated with a unique value of the index.

score 3 · Accepted Answer · answered Jun 22 '15 at 18:22

3

Try this.

import pandas as pd
import numpy as np

index = 'A A A B B C D D'.split()
col1 = [120, 90, 80, 80, 50, 120, 150, 150]
ser = pd.Series(col1, index=index)
# use groupby and keep the first element
ser.groupby(level=0).first()

Out[200]: 
A    120
B     80
C    120
D    150
dtype: int64

answered Jun 22 '15 at 18:22

Jianxun Li

24,004
10
58
76

Somehow I saw the other response first. You both gave the same answer. Since you were here first, I'm choosing yours. – ba_ul Jun 22 '15 at 18:42

score 1 · Answer 2 · answered Jun 22 '15 at 18:34

Here's an example:

We start with a dataframe which looks like this

      | col |
-------------
   A  |  1  |
   B  |  2  |
   B  |  3  |
   C  |  4  |
   C  |  5  |
   C  |  6  |

Use the groupby method

import pandas as pd
df = pd.DataFrame(index=['a', 'b', 'b', 'c', 'c', 'c'],
                  data=[1, 2, 3, 4, 5, 6], columns=['col'])

group = df.groupby(level=0)
df = group.first()

And end up with:

      | col |
-------------
   A  |  1  |
   B  |  2  |
   C  |  4  |

You could use group.last() if you want to keep the last values for every index.

Fantastic. Didn't know `first` and `last` exist. Thanks! – ba_ul Jun 22 '15 at 18:40 — ba_ul, Jun 22 '15 at 18:40

Python Pandas: select rows based on comparison across rows

2 Answers2

Linked