I am trying to merge 7 different data frames on the basis of same column (accident_no) but the problem is some data frame contains more rows and duplication of (accident_no) e.g
table 1(Accident) contains 200 accident_no (all unique), table 3 contains 196 accident_no (all unique) but table 4 (Person) contains 400 accident_no (some duplications) as there may be multiple passengers were involved in the same crash so accident_no would be same and information can be used for analysis.
The problem I am facing is I have tried concat, join, merge but the answer reaches the highest number of rows and I am getting more rows than 400.
So far I tried below methods:
dfs = [df1,df2,df3,df5,df6,df7]
df_final = reduce(lambda left,right: pd.merge(left,right,on='ACCIDENT_NO', how = 'left'), dfs)
AND
dfs = [df.set_index(['ACCIDENT_NO']) for df in [df1, df2, df3, df4, df5, df6, df7]]
print(pd.concat(dfs, axis=1).reset_index())
So, is it possible that I may get more rows than 400 or am I doing something wrong?
Thanks