I have 2 DataFrame with data.
df1 = pd.DataFrame({'User': ['user1', 'user1' 'user2', 'user3'],
'Grade': ['XLM', 'YK', 'AAO', 'FRT']})
df2 = pd.DataFrame({'User': ['user1', 'user1', 'user1', 'user2', 'user2', 'user3'],
'SocMed': ['Instagram', 'FB', 'Twitter', 'Quora', 'Pinterest', 'Snapchat']})
I want to use pd.merge
(or any other command that is probably more appropriate) to get 3rd DataFrame which will look as follows
merged = pd.DataFrame({'User': ['user1', 'user1', 'user2', 'user3'],
'Grade': ['XLM', 'YK', 'AAO', 'FRT'],
'SocMed': [['Instagram', 'FB', 'Twitter'], ['Instagram', 'FB', 'Twitter'], ['Quora', 'Pinterest'], ['Snapchat']]})
Note: These are samples only. My actual first DataFrame contains 15 columns with ~1000000 rows (370 unique users) and my second one has 600 rows (~350 unique users). This means that for me after the merge some entries will be a null list. I am also fine if I get an 'exploded' dataframe like so:
User Grade SocMed
user1 XLM Instagram
user1 XLM FB
user1 XLM Twitter
user1 YK Instagram
user1 YK FB
user1 YK Twitter
user2 AAO Quora
user2 AAO Pinterest
user3 FRT Snapchat
I have read up on pd.merge
and pd.explode
but I do not know how to get started.