0

I have a pandas dataframe with around ~70millions rows and 3 columns (3x70,000,000). Let's simplify it to the following:

df = pd.DataFrame([[1, '2010-01-31', 20], 
                   [1, '2010-02-28', 20], 
                   [1, '2016-07-31', 23],
                   [2, '2016-07-31', 15],
                   [2, '2016-08-31', 15],
                   [3, '2013-03-31', 19], 
                   [3, '2013-04-30', 23],
                   [3, '2013-05-31', 23],
                   [3, '2013-06-30', 23]], columns=['User', 'Date', 'Salary'])

╔══════╦════════════╦════════╗
║ User ║    Date    ║ Salary ║
╠══════╬════════════╬════════╣
║    1 ║ 2010-01-31 ║     20 ║
║    1 ║ 2010-02-28 ║     20 ║
║    1 ║ 2010-03-31 ║     23 ║
║    2 ║ 2016-07-31 ║     15 ║
║    2 ║ 2016-08-31 ║     15 ║
║    3 ║ 2013-03-31 ║     19 ║
║    3 ║ 2013-04-30 ║     23 ║
║    3 ║ 2013-05-31 ║     23 ║
║    3 ║ 2013-06-30 ║     23 ║
╚══════╩════════════╩════════╝

I'd like to transpose it as follows:

╔══════╦══════════╦══════════╦══════════╦══════════╗
║ User ║ Salary_1 ║ Salary_2 ║ Salary_3 ║ Salary_4 ║
╠══════╬══════════╬══════════╬══════════╬══════════╣
║    1 ║       20 ║       20 ║ 23       ║ NaN      ║
║    2 ║       15 ║       15 ║ NaN      ║ NaN      ║
║    3 ║       19 ║       23 ║ 23       ║ 23       ║
╚══════╩══════════╩══════════╩══════════╩══════════╝

I tried this:

(df.pivot_table(index='User', columns='Date', aggfunc=len, fill_value=0))

But couldn't acchieve what I want :(

Snedecor
  • 689
  • 1
  • 6
  • 14
  • 2
    `df.set_index(['User',df.groupby('User').cumcount().add(1)])['Salary'].unstack()` refer Q10 of dupe link – anky Jun 11 '20 at 16:42
  • 1
    adding column headers do @anky's solution: `df.set_index(['User',df.groupby('User').cumcount().add(1).map('Salary_{}'.format)])['Salary'].unstack()` – Stef Jun 11 '20 at 17:04

0 Answers0