1

Trying to merge these two dataframes without any common column: enter image description hereenter image description here

However, when I enter this code:

combined = cleaned.merge(weather, how='cross')
combined

I get a merged dataset that contains 90 rows. enter image description here

I know one dataset has one row less than another. But how can I merge them without somehow duplicating 90 times? I just want to simply mash these two datasets together and drop the 9th row.

Tried variations on how=None

CWK
  • 33
  • 4
  • No you have 10 rows (not 9, 0 -> 9) and 9 rows (not 8, 0 -> 8) so the cross merge will give you 10 x 9 = 90 rows. – Corralien Jun 09 '23 at 04:19
  • `df1.merge(df2, left_on='Date', right_on=df2['date_time'].str.extract('(\d{4})', expand=False))` – mozway Jun 09 '23 at 04:33

2 Answers2

2

To merge two dataframes, you have to find a common key. You can extract the year from the Date column on your second dataframe. Now, you have a common key to process an inner join:

# If needed, convert date_time as datetime64
weather['date_time'] = pd.to_datetime(weather['date_time'])

# If needed, convert Date as int
cleaned['Date'] = cleaned['Date'].astype(int)

out = cleaned.merge(weather.assign(Date=df2['datetime'].dt.year), on='Date')
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

You need to create a new column from 'date_time' which contains only year and then apply merge on 'Year' column. After that drop that date_time column if you don't want to keep in your dataframe.

weather['date_time'] = pd.to_datetime(df['date_time'])
weather['Date'] = weather['date_time'].dt.year

combined = cleaned.merge(weather, on = 'Date')

# after that drop the 'date_time' column

combined.drop(['date_time'], axis = 1, inplace = True)