0

I'm trying to merge two dataframes: 'new_df' and 'df3'. new_df contains years and months, and df3 contains years, months and other columns.

I've cast most of the columns as object, and tried to merge them both. The merge 'works' as doesn't return an error, but my final datafram is all empty, only the year and month columns are correct.

new_df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date_test  119 non-null    datetime64[ns]
 1   year       119 non-null    object        
 2   month      119 non-null    object        
dtypes: datetime64[ns](1), object(2)

df3

<class 'pandas.core.frame.DataFrame'>
Int64Index: 191 entries, 53 to 1297
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   case_number  191 non-null    object
 1   date         191 non-null    object
 2   year         191 non-null    object
 3   country      191 non-null    object
 4   area         191 non-null    object
 5   location     191 non-null    object
 6   activity     191 non-null    object
 7   fatal_y_n    182 non-null    object
 8   time         172 non-null    object
 9   species      103 non-null    object
 10  month        190 non-null    object
dtypes: object(11)

I've tried this line of code:

df_joined = pd.merge(left=new_df, right=df3, how='left', on=['year','month'])

I was expecting a table with only filled fields in all columns, instead i got the table: merged table

  • 1
    Sorry for the "Is the computer in on the position" type question, but do year/month combinations `2008/1`, `2008/2`, `2008/3` and `2008/4` exist in your `df3` dataframe? – JNevill Nov 08 '22 at 17:35

1 Answers1

1

Your issue is with the data types for month and year in both columns - they're of type object which gets a bit weird during the join.

Here's a great answer that goes into depth about converting types to numbers, but here's what the code might look like before joining:

# convert column "year" and "month" of new_df
new_df["year"] = pd.to_numeric(new_df["year"])
new_df["month"] = pd.to_numeric(new_df["month"])

And make sure you do the same with df3 as well.

You may also have a data integrity problem as well - not sure what you're doing before you get those data frames, but if it's casting as an 'Object', you may have had a mix of ints/strings or other data types that got merged together. Here's a good article that goes over Panda Data Types. Specifically, and Object data type can be a mix of strings or other data, so the join might get weird.

Hope that helps!

Zach J.
  • 366
  • 6