3

I have a pandas dataframe structured like this:

+---------------+---------------------------+--------+
|     Email     |          Paid at          |  Name  |
+---------------+---------------------------+--------+
| john@mail.com | 2019-11-20 12:23:06 +0100 | #10710 |
| ed@mail.com   | 2019-11-20 11:36:24 +0100 | #10709 |
| john@mail.com | 2019-11-19 14:58:41 +0100 | #10700 |
| ed@mail.com   | 2019-11-19 14:41:30 +0100 | #10699 |
| dev@mail.com  | 2019-11-19 14:20:26 +0100 | #10697 |
+---------------+---------------------------+--------+

And my final goal is to aggregate all the transactions made by one user, in a format like this one:

+---------------+---------------------------+--------+---------------------------+--------+--+
|     Email     |          Paid at          |  Name  |          Paid at          |  Name  |  |
+---------------+---------------------------+--------+---------------------------+--------+--+
| john@mail.com | 2019-11-20 12:23:06 +0100 | #10710 | 2019-11-19 14:58:41 +0100 | #10700 |  |
| ed@mail.com   | 2019-11-20 11:36:24 +0100 | #10709 | 2019-11-19 14:41:30 +0100 | #10699 |  |
| dev@mail.com  | 2019-11-19 14:20:26 +0100 | #10697 |                           |        |  |
+---------------+---------------------------+--------+---------------------------+--------+--+

My starting dataframe has been constructed like this:

df = pd.DataFrame({'Email':['john@mail.com', 'ed@mail.com', 
                       'john@mail.com', 'ed@mail.com', 'dev@mail.com'],
             'Paid at':['2019-11-20 12:23:06 +0100', 
                        '2019-11-20 11:36:24 +0100', 
                        '2019-11-19 14:58:41 +0100', 
                        '2019-11-19 14:41:30 +0100',
                       '2019-11-19 14:20:26 +0100'],
             'Name':['#10710', '#10709', '#10700', '#10699', '#10697']})

I have tried using the pivot functions df.pivot(index='Email', columns='Name', values='Paid at') and I can get a dataframe for which every timestamp is a column and the index in the email, but I am stuck in understanding how I can create the columns I want.

sanna
  • 1,398
  • 5
  • 16
  • 24
  • Mark link Q10 . – BENY Nov 24 '19 at 15:02
  • @WeNYoBen still a duplicate? You marked it as a duplicate of a question regarding pivot tables in pandas. The solution does not contain a pivot table at all. Why did you do this? – sanna Nov 25 '19 at 08:45
  • This still a dup , try to look at the whole link I shared, your question can be solved by q10 + q11 – BENY Nov 25 '19 at 14:40

1 Answers1

1

Use:

#convert column to datetimes
df['Paid at'] = pd.to_datetime(df['Paid at'])
#descending sorting by datetimes
df = df.sort_values('Paid at', ascending=False)
#create MultiIndex by counter with cumcount, reshape by unstack, change order of columns
df = (df.set_index(['Email', df.groupby('Email', sort=False).cumcount()])
        .unstack()
        .sort_index(axis=1, level=[1,0], ascending=[True, False]))
#flatten MultIndex in columns
df.columns = df.columns.map('{0[0]}_{0[1]}'.format)
#convert index to column
df = df.reset_index()
print (df)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, this is indeed a nice approach, however, it can happen that a transaction happens more than 2 times, so the final dataframe should also 'capture' this. How can it be modified? – sanna Nov 24 '19 at 15:06
  • @sanna - Not sure if understand, is possible create sample data for see it? – jezrael Nov 24 '19 at 15:07
  • `df = pd.DataFrame({'Email':['john@mail.com', 'ed@mail.com', 'john@mail.com', 'ed@mail.com', 'dev@mail.com', 'ed@mail.com'], 'Paid at':['2019-11-20 12:23:06 +0100', '2019-11-20 11:36:24 +0100', '2019-11-19 14:58:41 +0100', '2019-11-19 14:41:30 +0100', '2019-11-19 14:20:26 +0100', '2019-11-14 20:13:02 +0100'], 'Name':['#10710', '#10709', '#10700', '#10699', '#10697', '#10623']})` – sanna Nov 24 '19 at 15:09
  • @sanna - What is expected output with new data? – jezrael Nov 24 '19 at 15:11
  • In this case, Ed did 3 transactions, while John only two. How to create n columns (probably corresponding to the maximum number of transactions occurred per customer) so that if one user did 2 transactions only two columns will have dates, and if another user did four only four columns are filled. – sanna Nov 24 '19 at 15:11
  • I am creating a table for you! – sanna Nov 24 '19 at 15:13
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202976/discussion-between-sanna-and-jezrael). – sanna Nov 24 '19 at 15:21
  • @sanna - Super, I hope working nice my solution. Don't forget to accept the answer, if it suits you! :) – jezrael Nov 25 '19 at 06:39