0

From what I understand about inner join is that the resulting table should never have more rows than both tables. It should give the index that is common between both tables.

  1. Dataframe has 79972 rows × 32 columns
  2. Dataframe1 has 8344745 rows × 4 columns

My first data frame has a column called 'C_LATITUDE', which matches a column in the second data frame called the same 'C_LATITUDE'

I merged them as follows:

merged_df = df.merge(df1, left_on='C_LATITUDE', right_on='C_LATITUDE', how='inner')
merged_df

The resulting data frame shape is 45030823 rows × 35 columns

I referred to this question: Pandas Left Outer Join results in table larger than left table And tried to remove duplicates, but it's giving me only 8,000 rows, which is incorrect.

Where am I going wrong?

Sukhi296
  • 1
  • 2
  • pls check if `C_LATITUDE` contains a Null value, if so that will change the result. – simpleApp Jun 13 '23 at 01:46
  • 1
    I checked both the data frames, and in both of them C_LATITUDE have no null values. – Sukhi296 Jun 13 '23 at 01:58
  • is it possible to share some rows from both dfs, so we can debug and recreate ? – simpleApp Jun 13 '23 at 02:04
  • is C_ALTITUDE repeated in your right or left data frame? because if it's repeated, all possible combinations of the repeated values will end up in the merged data frame. See this answer for example: https://stackoverflow.com/questions/12389284/inner-join-returning-more-rows-then-exist-in-tables – mirkhosro Jun 13 '23 at 02:13
  • Unfortunately, I cannot do that since its a private data. C_LATITUDE contains latitude points. Some columns do have null values. – Sukhi296 Jun 13 '23 at 02:14
  • Thank you, @mirkhosro. That makes sense. As stated in the comment, it is only possible to say how to resolve the query if the data is shared. And unfortunately, I am not able to share the data. Please let me know if there is any other possible way to do this. Thanks. – Sukhi296 Jun 13 '23 at 02:27
  • 1
    if you have duplicates in your data, in both dataframes, you'll have a cartesian join – sammywemmy Jun 13 '23 at 02:45

0 Answers0