From what I understand about inner join is that the resulting table should never have more rows than both tables. It should give the index that is common between both tables.
- Dataframe has 79972 rows × 32 columns
- Dataframe1 has 8344745 rows × 4 columns
My first data frame has a column called 'C_LATITUDE', which matches a column in the second data frame called the same 'C_LATITUDE'
I merged them as follows:
merged_df = df.merge(df1, left_on='C_LATITUDE', right_on='C_LATITUDE', how='inner')
merged_df
The resulting data frame shape is 45030823 rows × 35 columns
I referred to this question: Pandas Left Outer Join results in table larger than left table And tried to remove duplicates, but it's giving me only 8,000 rows, which is incorrect.
Where am I going wrong?