Python
Using itertools.groupby()
method
The ìtertools.groupby()
function returns consecutive keys and groups from the iterable.
You can read more about how to use itertools.groupby()
on the documentation or on this stackoverflow post.
import pandas as pd
import itertools
df = pd.DataFrame({
'col_a': ['A','B','B','C','D','D','E','A','A'],
'col_b': [1,2,3,4,5,6,3,2,1]
})
groups = []
for key, group in itertools.groupby(df.index, lambda x: df['col_a'][x]):
col_b_values = df.loc[group, 'col_b'].tolist() # Extract the 'col_b' values as a list
groups.append({
'col_a': key,
'min': min(col_b_values),
'max': max(col_b_values)
})
pd.DataFrame(groups)
Output:
col_a min max
0 A 1 1
1 B 2 3
2 C 4 4
3 D 5 6
4 E 3 3
5 A 1 2
Using Pandas shift().cumsum()
and groupby
methods
You can create a new column which identifies consecutive rows in col_a
using shift().cumsum()
import pandas as pd
df = pd.DataFrame({
'col_a': ['A','B','B','C','D','D','E','A','A'],
'col_b': [1,2,3,4,5,6,3,2,1]
})
df['group'] = (df.col_a != df.col_a.shift()).cumsum()
The output will look like:
col_a col_b group
0 A 1 1
1 B 2 2
2 B 3 2
3 C 4 3
4 D 5 4
5 D 6 4
6 E 3 5
7 A 2 6
8 A 1 6
And then group by the df['group'] column:
df.groupby(['group','col_a'], as_index=True).agg(min=('col_b','min'), max=('col_b', 'max')).reset_index(level='group',drop=True).reset_index()
Output:
col_a min max
0 A 1 1
1 B 2 3
2 C 4 4
3 D 5 6
4 E 3 3
5 A 1 2
Excel
Using LAMBDA
and OFFSET
functions
With Microsoft's introduction of dynamic arrays in the recent years, we can implement approximately same logic in Excel without the use of VBA.
First we will create an array which for each row of col_a identifies whether it is a duplicate consecutive. Then we create a range that generates a unique consecutive group ID and then add min/max arrays to get minimum and maximum col_b values for each consecutive group ID. Finally we group it by group ID and drop duplicates.
The function is:
=LET(
col_a,$A$2:$A$10,
col_b,$B$2:$B$10,
is_dup,MAP(col_a,OFFSET(col_a,-1,0),LAMBDA(a_1,a_2,a_1=a_2)),
groupids,SCAN(0,is_dup,LAMBDA(a,r,IF(r,0+a,a+1))),
_min_val,BYROW(groupids,LAMBDA(r,TAKE(SORT(FILTER(col_b,groupids=r),,1),1))),
_max_val,BYROW(groupids,LAMBDA(r,TAKE(SORT(FILTER(col_b,groupids=r),,-1),1))),
VSTACK({"col_a","min","max"},CHOOSECOLS(UNIQUE(HSTACK(col_a,groupids,_min_val,_max_val)),1,3,4))
)
Excel implementation
Other methods using Excel
You might be able to achieve similar result using the Scripting.Dictionary
object in VBA or combination of INDEX
and AGGREGATE
functions (see this stackoverflow post)