I have multiple dataframes, example data:
df1:
user_id username firstname lastname
123 abc abc abc
456 def def def
789 ghi ghi ghi
df2:
user_id username firstname lastname
111 xyz xyz xyz
456 def def def
234 mnp mnp mnp
df3:
user_id username firstname lastname
789 ghi ghi ghi
456 def def def
222 uwv uwv uwv
I want to concatenate those dataframe, drop duplicate rows, and keep track of the rows' origins by adding more columns. Desired output:
df1:
user_id username firstname lastname df1 df2 df3
123 abc abc abc 1 0 0
456 def def def 1 1 1
789 ghi ghi ghi 1 0 1
111 xyz xyz xyz 0 1 0
234 mnp mnp mnp 0 1 0
222 uwv uwv uwv 0 0 1
I can concatenate do the first step by using:
pd.concat([df1, df2, df3]).drop_duplicates('user_id').reset_index(drop=True)
How can I do the last step (making the origin columns)? I don't know how to do this without forloop, which is not practical for big dataframe. Thank you