I'm working with python pandas now. Here is a problem I'm experiencing. There's a dataset called master, and its length comes with like this:
print(len(master))
120000
And then I try to left-outer-join this with another dataset called click:
master_active=pd.merge(master, click, how='left', on='user_id')
print(len(master_active))
120799
I don't know why the number changes from 120000 to 120799 because the merge must happen based on the dataset master.
Appreciate any single idea to solve this problem, Thanks!