How do I add duplicates from a CSV to the end of a specific row over several instances using Python?

Question

I have two CSV's that I would like to merge, one looks like this (with many more customers)

The tables and csv's below have changed from the original versions for the example in the "follow up question." Stack Overflow keeps not accepting the edits to the Profiles and Events Tables, but the csv's below are correct.

Profiles Table (`df_profiles`)

ID	Name	Email
12	Mike	Mike@AOL.com
78	Jane	Jane@AOL.com

And the other looks like this (with many more events)

Events Table (`df_events`)

ID	Name	Event Type	Event Date
12	Mike	upgrade	10/02/20
78	Jane	upgrade	9/18/20
12	Mike	cancel	10/30/20
12	Mike	upgrade	11/25/20
78	Jane	cancel	11/14/20

In the end, I would like the final CSV export to look like this (with more customers and events)

ID	Name	Email	Event Type 1	Event Date 1	Event Type 2	Event Date 2	Event Type 3	Event Date 3	Event Type 4	Event Date 4
12	Mike	Mike@AOL.com	upgrade	10/2/2020	cancel	10/30/2020	upgrade	11/25/2020	cancel	12/03/2020
78	Jane	Jane@AOL.com	upgrade	9/18/2020	upgrade	10/30/2020	cancel	11/14/2020

This is what I have so far

    import pandas as pd

    df_profiles = pd.read_csv('profiles.csv')
    df_events = pd.read_csv('events.csv')

    df_merge = pd.merge(df_profiles[['ID', 'NAME', 'email']], df_events[['ID', 'Event Type', 'Event Date']], on='ID', how='outer')
    df_duplicates = df_merge[df_merge.duplicated('ID')]
    df_stitch = pd.merge(df_merge, df_duplicates[['ID', 'Event Type', 'Event Date']], on='ID')

This outputs the table like this

ID	Name	Email	Event Type 1	Event Date 1	Event Type 2	Event Date 2
12	Mike	Mike@AOL.com	upgrade	10/02/20	cancel	10/30/20
12	Mike	Mike@AOL.com	upgrade	10/02/20	upgrade	11/25/20
78	Jane	Jane@AOL.com	upgrade	9/18/20	cancel	11/14/20

So instead of adding it to the third event, it just creates another column. This is just an example and in reality many users have 10+ events, but some have only 1 or 2. I am sure the solution is just a certain loop, but I can't seem to get one that works. Any help would be greatly appreciated, thank you in advance.

File Contents are shown below:

events.csv

ID,NAME,Event Type,Event Date,email
12,Mike,upgrade,10/2/2020,Mike@AOL.com
78,Jane,upgrade,9/18/2020,Jane@AOL.com
12,Mike,cancel,10/30/2020,Mike@AOL.com
12,Mike,upgrade,11/25/2020,Mike@AOL.com
78,Jane,cancel,11/14/2020,Jane@AOL.com
12,Mike,cancel,12/03/2020,Mike@AOL.com
78,Jane,upgrade,10/30/2020,Jane@AOL.com

profiles.csv

ID,NAME,email
12,Mike,Mike@AOL.com
78,Jane,Jane@AOL.com

Follow Up Question

With the help of anky, I arrived at a solution that yielded what I was looking for, however, there is one formatting hurdle I would like to clear that will make my life easier when doing the rest of the this project.

The output table is grouping the event types all together, and sorting them by date within those groupings. For example, outputs look like this:

ID	Name	Email	Event Type 1	Event Date 1	Event Type 2	Event Date 2	Event Type 3	Event Date 3	Event Type 4	Event Date 4
12	Mike	Mike@AOL.com	cancel	10/30/20	cancel	12/03/20	upgrade	10/2/20	upgrade	11/25/20
78	Jane	Jane@AOL.com	cancel	11/14/20	upgrade	9/18/20	upgrade	10/30/20

Versus the intended output of (as shown above):

ID	Name	Email	Event Type 1	Event Date 1	Event Type 2	Event Date 2	Event Type 3	Event Date 3	Event Type 4	Event Date 4
12	Mike	Mike@AOL.com	upgrade	10/2/2020	cancel	10/30/2020	upgrade	11/25/2020	cancel	12/03/2020
78	Jane	Jane@AOL.com	upgrade	9/18/2020	upgrade	10/30/2020	cancel	11/14/2020

In order to make sure I was not messing up my date sorting, I went into the CSV itself and ordered the events correctly by date and I still got the output grouped by events rather than date. I am reposting my code below in case there I made an error.

import pandas as pd

df_profiles = pd.read_csv('profiles.csv')
df_events = pd.read_csv('events.csv')
df_events = df_events[['ID','Event Type','Event Date','NAME','email']]
df_profiles = df_profiles[['ID','NAME','email']]

events = df_events.groupby('ID')['Event Type'].apply(lambda x: x.ne(x.shift).cumsum())

out = (df_profiles.merge(df_events.drop(['NAME','email'],1).assign(Events=events),on='ID',how='outer')
        .pivot_table(index=['ID','NAME','email'],columns='Events',aggfunc='first')
        .sort_index(axis=1,level=1))

out.columns = out.columns.map('{0[0]} {0[1]}'.format)
out = out.reset_index()

out.to_csv('testingfinal.csv',index=False,encoding='utf-8-sig')

Please don't post images of code, data, or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`) ... [Discourage screenshots of code and/or errors](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors)...[Why not upload images of code on SO when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) ... [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — wwii, Dec 29 '20 at 17:33
@wwii I do not see any screen shots. There are no `JPEG` or `PNG` files embedded in the post. Go view the source code. All of the tables were created using stack overflow mark-down language. — Toothpick Anemone, Dec 29 '20 at 17:44
First SO post, so apologies if I messed up formatting. But there are no images in the post. I used `ctrl-k` and copy/pasted into the "insert code here". Any tips on how I should have formatted it instead @wwii? — jlambert, Dec 29 '20 at 17:44
@jlambert do not listen to [@wwii](https://stackoverflow.com/users/2823755/wwii). Your stack overflow post is definitely in the top 25% in terms of good formatting. You also provided short examples of simple test cases, etc... I do have no idea why [wwii](https://stackoverflow.com/users/2823755/wwii) dislikes your question. — Toothpick Anemone, Dec 29 '20 at 17:48

anky · Accepted Answer · 2020-12-30T09:48:15.383

You can use the below to dynamically transpose event types and dates. This uses a helper column which groups on ID and crates an index, then we pivot using df.pivot_table.

events = df_events.groupby("ID")['Event Type'].apply(lambda x: x.ne(x.shift).cumsum())

out = (df_profiles.merge(df_events.drop("Name",1).assign(Events=events),on='ID')
        .pivot_table(index=['ID','Name','Email'],columns='Events',aggfunc='first')
        .sort_index(axis=1,level=1))

out.columns = out.columns.map('{0[0]} {0[1]}'.format)
out = out.reset_index()

print(out)

	ID	Name	Email	Event Date 1	Event Type 1	Event Date 2	Event Type 2	Event Date 3	Event Type 3
0	12	Mike	Mike@AOL.com	10/02/20	upgrade	10/30/20	cancel	11/25/20	upgrade
1	78	Jane	Jane@AOL.com	9/18/20	upgrade	11/14/20	cancel	NaN	NaN

EDIT:

As per edited question we could try converting the date field to datetime and fo an argsort , then use the same code:

u = df_events.loc[pd.to_datetime(df_events['Event Date'],format='%m/%d/%Y').argsort()]

events = u.groupby('ID')['Event Type'].apply(lambda x: x.ne(x.shift).cumsum())

out = (df_profiles.merge(u.drop(['NAME','email'],1).assign(Events=events),
         on='ID',how='outer')
        .pivot_table(index=['ID','NAME','email'],columns='Events',aggfunc='first')
        .sort_index(axis=1,level=1))

out.columns = out.columns.map('{0[0]} {0[1]}'.format)
out = out.reset_index()
print(out)

	ID	NAME	email	Event Date 1	Event Type 1	Event Date 2	Event Type 2	Event Date 3	Event Type 3	Event Date	4 Event Type 4
0	12	Mike	Mike@AOL.com	10/2/2020	upgrade	10/30/2020	cancel	11/25/2020	upgrade	12/03/2020	cancel
1	78	Jane	Jane@AOL.com	9/18/2020	upgrade	10/30/2020	upgrade	11/14/2020	cancel	NaN	NaN

Sorry to ask another question in this thread, and if I should start a new one, let me know: But is there a way to make the Event Dates go in order? It keeps grouping it so all of the cancels are first then all of the upgrades. I tried sorting the dates in the events.csv first, but I still get this grouping. — jlambert, Dec 29 '20 at 19:51
The dates are already in datetime. I'll edit the question to include this as well with examples. Whenever you get around to it is great. In the meantime I'll keep at it and update it if I find a solution. Thank you for all of your help. — jlambert, Dec 29 '20 at 20:01
@jlambert My pleasure!! Glad I could help. :) Happy coding. Happy new year — anky, Dec 30 '20 at 17:07

How do I add duplicates from a CSV to the end of a specific row over several instances using Python?

Profiles Table (df_profiles)

Events Table (df_events)

Follow Up Question

1 Answers1

Profiles Table (`df_profiles`)

Events Table (`df_events`)