-1

I have a pandas dataframe like this

    snapDate     instance   waitEvent                   AvgWaitInMs
0   2015-Jul-03  XX         gc cr block 3-way               1
1   2015-Jun-29  YY         gc current block 3-way          2
2   2015-Jul-03  YY         gc current block 3-way          1
3   2015-Jun-29  XX         gc current block 3-way          2
4   2015-Jul-01  XX         gc current block 3-way          2
5   2015-Jul-01  YY         gc current block 3-way          2
6   2015-Jul-03  XX         gc current block 3-way          2
7   2015-Jul-03  YY         log file sync                   9
8   2015-Jun-29  XX         log file sync                   8
9   2015-Jul-03  XX         log file sync                   8
10  2015-Jul-01  XX         log file sync                   8
11  2015-Jul-01  YY         log file sync                   9
12  2015-Jun-29  YY         log file sync                   8

I need to transform this to

snapDate        instance    gc cr block 3-way    gc current block 3-way  log file sync  
2015-Jul-03       XX              1                      Na                  8
2015-Jun-29       YY              Na                     2                   8 
2015-Jul-03       YY              Na                     1                   9
...

I tried pivot but it returns an error dfWaits.pivot(index = 'snapDate', columns = 'waitEvent', values = 'AvgWaitInMs') Index contains duplicate entries, cannot reshape

The result should be another dataFrame

jhon.smith
  • 1,963
  • 6
  • 30
  • 56

2 Answers2

2

You can also use pivot_table:

df.pivot_table(index=['snapDate','instance'], columns='waitEvent', values='AvgWaitInMs')

Out[64]:
waitEvent             gc cr block 3-way  gc current block 3-way  log file sync
snapDate    instance
2015-Jul-01 XX                      NaN                       2              8
            YY                      NaN                       2              9
2015-Jul-03 XX                        1                       2              8
            YY                      NaN                       1              9
2015-Jun-29 XX                      NaN                       2              8
            YY                      NaN                       2              8

Data:

I used the following txt file as input (with read_csv from pandas to get the data.frame):

snapDate;instance;waitEvent;AvgWaitInMs
0;2015-Jul-03;XX;gc cr block 3-way;1
1;2015-Jun-29;YY;gc current block 3-way;2
2;2015-Jul-03;YY;gc current block 3-way;1
3;2015-Jun-29;XX;gc current block 3-way;2
4;2015-Jul-01;XX;gc current block 3-way;2
5;2015-Jul-01;YY;gc current block 3-way;2
6;2015-Jul-03;XX;gc current block 3-way;2
7;2015-Jul-03;YY;log file sync;9
8;2015-Jun-29;XX;log file sync;8
9;2015-Jul-03;XX;log file sync;8
10;2015-Jul-01;XX;log file sync;8
11;2015-Jul-01;YY;log file sync;9
12;2015-Jun-29;YY;log file sync;8
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
1

Here is one way to reshape the dataframe to something similar to what you want. Let me know if you have any additional specific requirements on the resulting dataframe.

import pandas as pd

# your data
# ====================================
print(df)

       snapDate instance               waitEvent  AvgWaitInMs
0                                                            
0   2015-Jul-03       XX       gc cr block 3-way            1
1   2015-Jun-29       YY  gc current block 3-way            2
2   2015-Jul-03       YY  gc current block 3-way            1
3   2015-Jun-29       XX  gc current block 3-way            2
4   2015-Jul-01       XX  gc current block 3-way            2
5   2015-Jul-01       YY  gc current block 3-way            2
6   2015-Jul-03       XX  gc current block 3-way            2
7   2015-Jul-03       YY           log file sync            9
8   2015-Jun-29       XX           log file sync            8
9   2015-Jul-03       XX           log file sync            8
10  2015-Jul-01       XX           log file sync            8
11  2015-Jul-01       YY           log file sync            9
12  2015-Jun-29       YY           log file sync            8

# processing
# ====================================
df_temp = df.set_index(['snapDate', 'instance', 'waitEvent']).unstack().fillna(0)

df_temp.columns = df_temp.columns.get_level_values(1).values

df_temp = df_temp.reset_index('instance')

print(df_temp)

            instance  gc cr block 3-way  gc current block 3-way  log file sync
snapDate                                                                      
2015-Jul-01       XX                  0                       2              8
2015-Jul-01       YY                  0                       2              9
2015-Jul-03       XX                  1                       2              8
2015-Jul-03       YY                  0                       1              9
2015-Jun-29       XX                  0                       2              8
2015-Jun-29       YY                  0                       2              8
Jianxun Li
  • 24,004
  • 10
  • 58
  • 76
  • Almost there is there a way to repeat the snapDate instead of the blanks. – jhon.smith Jul 15 '15 at 08:30
  • @jhon.smith sure, just add `.reset_index()`. I've updated the code. Please have a look. – Jianxun Li Jul 15 '15 at 08:32
  • Hmm My dataframe had 12 rows but when i tried the unstack operation the resulting dataframe has only 6 rows not exactly what i want.My resulting dataframe should also have 12 rows – jhon.smith Jul 15 '15 at 08:41
  • @jhon.smith I don't think you can keep the same number of rows as your original df because some rows have to mov to columns. For example, say `Jul-03` data, row `0, 6, 9` are all records about the same `snapDate` with instance `XX`. So doing a pivot would reshape these 3 rows to only one row because those data have been moved to columns. – Jianxun Li Jul 15 '15 at 08:44