Pandas dataframe group by order

Question

I have the input dataframe:

df1 = pandas.DataFrame( { 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory","Mallory", "Bob" ,"Bob", "Mallory", "Alice"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland", "Portland", "Seattle", "Seattle"] } )

And I want to groupby Name, but not unique, so the output should be:

["Alice","Bob","Mallory","Bob","Mallory", "Alice"]

I couldn't find any efficient way to do it - is there a way without iterating all rows?

`df1.groupby(df1.Name.ne(df1.Name.shift()).cumsum()).Name.first()` — user3483203, Oct 15 '18 at 15:26

score 1 · Accepted Answer · answered Oct 15 '18 at 15:25

You can do the following:

df1.groupby((df1['Name'] != df1['Name'].shift()).cumsum()).first()

Yields:

         Name      City
Name                   
1       Alice   Seattle
2         Bob   Seattle
3     Mallory  Portland
4         Bob  Portland
5     Mallory   Seattle
6       Alice   Seattle

If you just want the 'Name' column:

df1.groupby((df1['Name'] != df1['Name'].shift()).cumsum())['Name'].first().values

Yields:

['Alice' 'Bob' 'Mallory' 'Bob' 'Mallory' 'Alice']

Pandas dataframe group by order

1 Answers1