Grouping by multiple parameters using Pandas dataframe

Question

I have a data frame that I would like to group by two parameters (1) consecutive numbering in the same first column and (2) matching values in the second column

data frame:

In [20]: import pandas as pd

In [21]: df1 = pd.DataFrame ({ "res": [30, 31, 35, 36], "ss": ["H", "H", "H", "E"], "AA": ["A", "B", "C", "D"]})

In [22]: df1
Out[22]:
   res ss AA
0  30  H  A
1  31  H  B
2  35  H  C
3  36  E  D

Desired output:

group 1: (30, H, A), (31, H, B)

group 2: (35, H, C)

group 3: (36, E, D)

Group 1 includes the first 2 rows because 30 and 31 are consecutive and the second columns match. Group 2 is created because 31 and 35 in col 1 are not consecutive. Group 3 is created because H and E do not match.

I am trying to use groupby and enumerate together, but I can't seem to combine them.

Identify groups of continuous numbers in a list

grouping rows in list in pandas groupby

I appreciate any tips on how to combine the selections

I don't understand your desired output at all. What is "AB, C, D", and how do you get it from `df1`? Please edit your question to be more specific. — DSM, May 19 '16 at 18:06

score 0 · Answer 1 · answered May 19 '16 at 18:15

I took some liberties with what you meant. Let me know if I understood correctly.

Setup: copy & paste to set up problem

import pandas as pd

df1 = pd.DataFrame ({"res": [30, 31, 35, 36],
                     "ss": ["H", "H", "H", "E"],
                     "AA": ["A", "C", "D", "B"]})  # I made 'F' a 'B'

df1

df1 looks like:

  AA  res ss
0  A   30  H
1  C   31  H
2  D   35  H
3  B   36  E

I believe you just want to sort not groupby.

Solution

print df1.sort_values(['AA', 'res'])

Looks like:

  AA  res ss
0  A   30  H
3  B   36  E
1  C   31  H
2  D   35  H

However, parts of your question don't make sense. Hopefully this is helpful.

Sorry, I mixed up the columns. Column 1 should have numbers. Column two should have either "H" or "E" and the last column should have letters. I want to create groups where the first columns are consecutive and the second column matches.. i.e 30, H, A and 31, H, B for one group. 35, H, D is a new group because 31 and 35 are not consecutive. 36, E, C would be a new group because H and E do not match. — Vonler01, May 19 '16 at 18:44

Grouping by multiple parameters using Pandas dataframe

1 Answers1

Setup: copy & paste to set up problem

Solution