I have observations disappear after I merge two dataframes.
I have two dataframes that look like this:
df_1
text user
bla bla bla user1
ga ga ga ga user1
bur bur bur user2
. .
df_2
user url
user1 asd.com
user2 dsa.com
. .
I use the unique list of users from the first one to web scrape data on them and construct the second one. I would like to merge them so they look like this:
df_merged
text user url
bla bla bla user1 asd.com
ga ga ga ga user1 asd.com
bur bur bur user2 dsa.com
. . .
I merge them by using:
df_merged = df_1.merge(df_2, on = 'user', validate = "m:m")
The problem is that after the merge observations disappear randomly, for example:
len(df_1['user'].drop_duplicates())
returns 11115
len(df_2['user'])
returns 11115
len(df_merged['user'].drop_duplicates())
returns 7076
df_1 contains about 70k observations, while df_merged contains about 30k
Does anyone know what's going on?