I have a pandas dataframe with around ~70millions rows and 3 columns (3x70,000,000). Let's simplify it to the following:
df = pd.DataFrame([[1, '2010-01-31', 20],
[1, '2010-02-28', 20],
[1, '2016-07-31', 23],
[2, '2016-07-31', 15],
[2, '2016-08-31', 15],
[3, '2013-03-31', 19],
[3, '2013-04-30', 23],
[3, '2013-05-31', 23],
[3, '2013-06-30', 23]], columns=['User', 'Date', 'Salary'])
╔══════╦════════════╦════════╗
║ User ║ Date ║ Salary ║
╠══════╬════════════╬════════╣
║ 1 ║ 2010-01-31 ║ 20 ║
║ 1 ║ 2010-02-28 ║ 20 ║
║ 1 ║ 2010-03-31 ║ 23 ║
║ 2 ║ 2016-07-31 ║ 15 ║
║ 2 ║ 2016-08-31 ║ 15 ║
║ 3 ║ 2013-03-31 ║ 19 ║
║ 3 ║ 2013-04-30 ║ 23 ║
║ 3 ║ 2013-05-31 ║ 23 ║
║ 3 ║ 2013-06-30 ║ 23 ║
╚══════╩════════════╩════════╝
I'd like to transpose it as follows:
╔══════╦══════════╦══════════╦══════════╦══════════╗
║ User ║ Salary_1 ║ Salary_2 ║ Salary_3 ║ Salary_4 ║
╠══════╬══════════╬══════════╬══════════╬══════════╣
║ 1 ║ 20 ║ 20 ║ 23 ║ NaN ║
║ 2 ║ 15 ║ 15 ║ NaN ║ NaN ║
║ 3 ║ 19 ║ 23 ║ 23 ║ 23 ║
╚══════╩══════════╩══════════╩══════════╩══════════╝
I tried this:
(df.pivot_table(index='User', columns='Date', aggfunc=len, fill_value=0))
But couldn't acchieve what I want :(