0

To make this as clear as possible I started with a simple example. I created two random dataframes

dummy_data1 = {
        'id': ['1', '2', '3', '4', '5'],
        'Feature1': ['A', 'C', 'E', 'G', 'I'],
        'Feature2': ['B', 'D', 'F', 'H', 'J']}
df1 = pd.DataFrame(dummy_data1, columns = ['id', 'Feature1', 'Feature2'])
dummy_data2 = {
        'id': ['1', '2', '6', '7', '8'],
        'Feature3': ['K', 'M', 'O', 'Q', 'S'],
        'Feature4': ['L', 'N', 'P', 'R', 'T']}
df2 = pd.DataFrame(dummy_data2, columns = ['id', 'Feature3', 'Feature4'])

And if I apply this df_merge = pd.merge(df1, df2, on = 'id', how='outer') or df_merge = df1.merge(df2,how='left', left_on='id', right_on='id') I get the desired output of

enter image description here

Now I am trying to apply the same technique with two large datasets that have the same number of rows. All I want to do is join the columns together into one large dataframe. The length of each dataframe is 512573 But when I apply

df_merge = orig_data_updated.merge(demographic_data1,how='left', left_on='Location+Type', right_on='Location+Type')

Then the length magically becomes 3596301 which is simply not possible. My question is simple. How do I do a left join on two dataframes such that the number of rows is the same and I just join the columns together?

Snorrlaxxx
  • 168
  • 1
  • 3
  • 18
  • 2
    Does this answer your question? [Pandas Left Outer Join results in table larger than left table](https://stackoverflow.com/questions/22720739/pandas-left-outer-join-results-in-table-larger-than-left-table) – M_S_N Jan 23 '20 at 17:20
  • What is happening here is cartesian product caused by multiple records in either or both dataframes with same key. – Scott Boston Jan 23 '20 at 17:20
  • 1
    do a groupby key and count then filter the results by those with count greater than 1, if your count is great then 1 then you we get a multiplying of records. – Scott Boston Jan 23 '20 at 17:22
  • @ScottBoston Could you provide an answer please? – Snorrlaxxx Jan 23 '20 at 17:35
  • @Snorrlaxxx Sorry, I don't have an answer for your problem. This is just a suggestion on what I think is causing your issues. – Scott Boston Jan 23 '20 at 17:37
  • @M_S_N The link provided answers my question. Thank you! – Snorrlaxxx Jan 23 '20 at 17:42

0 Answers0