How to transform given dataset in python?

Question

            1           0.0   6.400000000
            2   6.400000000          27.0
            2          27.0  27.100000000
            2  27.100000000  27.400000000
            2  27.400000000  27.700000000
            2  27.700000000  30.600000000
            1  30.600000000          31.0
            1          31.0  36.600000000
            1  36.600000000  36.900000000
            1  36.900000000  37.100000000
            1  37.100000000  37.100000000
            1  37.100000000  37.300000000
            1  37.300000000  37.800000000
            1  37.800000000  38.900000000
            1  38.900000000  39.200000000
            1  39.200000000  39.300000000
            1  39.300000000  39.500000000
            1  39.500000000  39.700000000
            2  39.700000000  42.600000000
            2  42.600000000  42.600000000
            2  42.600000000  42.800000000
            1  42.800000000  44.900000000
            1  44.900000000  45.600000000
            2  45.600000000          51.0
            1          51.0  51.800000000

            1           0.0   6.400000000
            2   6.400000000  30.600000000
            1  30.600000000  39.500000000
... so on

I want to transform my data in the above format. I tried this with groupBy min and max for column but it seems like it doesn't give me the desirable results.

Can you please share the initial dataframe and what you have tried so far ? — Anshuman Tiwari, Feb 19 '20 at 06:29

Alexander · Accepted Answer · 2020-02-19T06:48:20.140

From your expected result, it appears that each "group" is based on contiguous id values. For this, you can use the compare-cumsum-groupby pattern, and then use agg to get the min and max values.

# Sample data.
df = pd.DataFrame(
    {'id': [1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1], 
     'A': [0.0, 6.4, 27.0, 27.1, 27.4, 27.7, 30.6, 31.0, 36.6, 36.9, 37.1, 37.1, 37.3, 37.8, 38.9, 39.2, 39.3, 39.5, 39.7, 42.6, 42.6, 42.8, 44.9, 45.6, 51.0],
     'B': [6.4, 27.0, 27.1, 27.4, 27.7, 30.6, 31.0, 36.6, 36.9, 37.1, 37.1, 37.3, 37.8, 38.9, 39.2, 39.3, 39.5, 39.7, 42.6, 42.6, 42.8, 44.9, 45.6, 51.0, 51.8]}
)

# Solution.
>>> (df
     .groupby(df['id'].ne(df['id'].shift()).cumsum())
     .agg({'id': 'first', 'A': 'min', 'B': 'max'})
     .set_index('id'))
       A     B
id            
1    0.0   6.4
2    6.4  30.6
1   30.6  39.7
2   39.7  42.8
1   42.8  45.6
2   45.6  51.0
1   51.0  51.8

jezrael · Answer 2 · 2020-02-19T06:41:05.823

It seems data are not numeric, so first change them to numeric and then use GroupBy.agg with named aggregation working in pandas 0.25+:

df[['b','c']] = df[['b','c']].astype(float)
df1 = (df.groupby('a').agg(min_b=('b','min'),
                           max_b=('b','max'),
                           min_c=('c','min'),
                           max_c=('c','max'))
                      .reset_index())
print (df1)
   a  min_b  max_b  min_c  max_c
0  1    0.0   51.0    6.4   51.8
1  2    6.4   45.6   27.0   51.0

Setup:

   a             b             c
0  1           0.0   6.400000000
1  2   6.400000000          27.0
2  2          27.0  27.100000000
3  2  27.100000000  27.400000000
4  2  27.400000000  27.700000000
...
...
...

If need each groups separately:

df[['b','c']] = df[['b','c']].astype(float)
df1 = (df.groupby(df['a'].ne(df['a'].shift()).cumsum())
         .agg(a=('a','first'),
              min_b=('b','min'),
              max_b=('b','max'),
              min_c=('c','min'),
              max_c=('c','max'))
         .reset_index(drop=True))
print (df1)
   a  min_b  max_b  min_c  max_c
0  1    0.0    0.0    6.4    6.4
1  2    6.4   27.7   27.0   30.6
2  1   30.6   39.5   31.0   39.7
3  2   39.7   42.6   42.6   42.8
4  1   42.8   44.9   44.9   45.6
5  2   45.6   45.6   51.0   51.0
6  1    0.0   51.0    6.4   51.8
7  2    6.4    6.4   30.6   30.6
8  1   30.6   30.6   39.5   39.5

How to transform given dataset in python?

2 Answers2