0

I have 2 dataFrames to merge. One is df = [19894 rows x 4 columns] (so pretty big) and another one is airports = [2k rows x 4 columns]. When I pd.merge them

df = pd.merge(df, airport, on=['origin'], how='inner')

I get df that is [258031994 rows x 6 columns] !!! however when I df.drop_duplicates(inplace=True) it goes back to being 19894 rows which is fine and they indeed are merged.

My issue is that I have 12 months of data a year and 3 years worth of data to analyse and I get error "Process finished with exit code 137 (interrupted by signal 9: SIGKILL)" (my second months creates DataFrame of 340m x 6 columns...)

Is there a way to merge/join data frames without creating a multimillion monstrosity?

0 Answers0