-14
            1           0.0   6.400000000
            2   6.400000000          27.0
            2          27.0  27.100000000
            2  27.100000000  27.400000000
            2  27.400000000  27.700000000
            2  27.700000000  30.600000000
            1  30.600000000          31.0
            1          31.0  36.600000000
            1  36.600000000  36.900000000
            1  36.900000000  37.100000000
            1  37.100000000  37.100000000
            1  37.100000000  37.300000000
            1  37.300000000  37.800000000
            1  37.800000000  38.900000000
            1  38.900000000  39.200000000
            1  39.200000000  39.300000000
            1  39.300000000  39.500000000
            1  39.500000000  39.700000000
            2  39.700000000  42.600000000
            2  42.600000000  42.600000000
            2  42.600000000  42.800000000
            1  42.800000000  44.900000000
            1  44.900000000  45.600000000
            2  45.600000000          51.0
            1          51.0  51.800000000
            1           0.0   6.400000000
            2   6.400000000  30.600000000
            1  30.600000000  39.500000000
... so on

I want to transform my data in the above format. I tried this with groupBy min and max for column but it seems like it doesn't give me the desirable results.

Rishabh Ojha
  • 57
  • 1
  • 8

2 Answers2

2

From your expected result, it appears that each "group" is based on contiguous id values. For this, you can use the compare-cumsum-groupby pattern, and then use agg to get the min and max values.

# Sample data.
df = pd.DataFrame(
    {'id': [1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1], 
     'A': [0.0, 6.4, 27.0, 27.1, 27.4, 27.7, 30.6, 31.0, 36.6, 36.9, 37.1, 37.1, 37.3, 37.8, 38.9, 39.2, 39.3, 39.5, 39.7, 42.6, 42.6, 42.8, 44.9, 45.6, 51.0],
     'B': [6.4, 27.0, 27.1, 27.4, 27.7, 30.6, 31.0, 36.6, 36.9, 37.1, 37.1, 37.3, 37.8, 38.9, 39.2, 39.3, 39.5, 39.7, 42.6, 42.6, 42.8, 44.9, 45.6, 51.0, 51.8]}
)

# Solution.
>>> (df
     .groupby(df['id'].ne(df['id'].shift()).cumsum())
     .agg({'id': 'first', 'A': 'min', 'B': 'max'})
     .set_index('id'))
       A     B
id            
1    0.0   6.4
2    6.4  30.6
1   30.6  39.7
2   39.7  42.8
1   42.8  45.6
2   45.6  51.0
1   51.0  51.8
Alexander
  • 105,104
  • 32
  • 201
  • 196
1

It seems data are not numeric, so first change them to numeric and then use GroupBy.agg with named aggregation working in pandas 0.25+:

df[['b','c']] = df[['b','c']].astype(float)
df1 = (df.groupby('a').agg(min_b=('b','min'),
                           max_b=('b','max'),
                           min_c=('c','min'),
                           max_c=('c','max'))
                      .reset_index())
print (df1)
   a  min_b  max_b  min_c  max_c
0  1    0.0   51.0    6.4   51.8
1  2    6.4   45.6   27.0   51.0

Setup:

   a             b             c
0  1           0.0   6.400000000
1  2   6.400000000          27.0
2  2          27.0  27.100000000
3  2  27.100000000  27.400000000
4  2  27.400000000  27.700000000
...
...
...

If need each groups separately:

df[['b','c']] = df[['b','c']].astype(float)
df1 = (df.groupby(df['a'].ne(df['a'].shift()).cumsum())
         .agg(a=('a','first'),
              min_b=('b','min'),
              max_b=('b','max'),
              min_c=('c','min'),
              max_c=('c','max'))
         .reset_index(drop=True))
print (df1)
   a  min_b  max_b  min_c  max_c
0  1    0.0    0.0    6.4    6.4
1  2    6.4   27.7   27.0   30.6
2  1   30.6   39.5   31.0   39.7
3  2   39.7   42.6   42.6   42.8
4  1   42.8   44.9   44.9   45.6
5  2   45.6   45.6   51.0   51.0
6  1    0.0   51.0    6.4   51.8
7  2    6.4    6.4   30.6   30.6
8  1   30.6   30.6   39.5   39.5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252