1

I have three data frames.

d.head()

        hotel          DATE
0   Resort Hotel    2015-07-01
1   Resort Hotel    2015-07-01
2   Resort Hotel    2015-07-01
3   Resort Hotel    2015-07-01
4   Resort Hotel    2015-07-01
r_w.head()
            hotel       DATE       PRCP TAVG
4744    Resort Hotel    2015-07-01  0.0 70
4745    Resort Hotel    2015-07-02  0.0 73
4746    Resort Hotel    2015-07-03  0.0 74
4747    Resort Hotel    2015-07-04  0.0 76
4748    Resort Hotel    2015-07-05  0.0 80
c_w.head()
          hotel     DATE        PRCP  TAVG
7111    City Hotel  2015-07-01  0.00    68
7112    City Hotel  2015-07-02  0.09    69
7113    City Hotel  2015-07-03  0.00    70
7114    City Hotel  2015-07-04  0.00    74
7115    City Hotel  2015-07-05  0.00    71

Data frame d has ~100,000 rows and the two other data frames have ~500 rows. I am trying to add PRCP and TAVG columns to data frame d based on columns hotel and DATE. I have tried

d.merge(r_w, how='outer', on=['hotel', 'DATE'])

but I get the error:

ValueError: You are trying to merge on datetime64[ns] and object columns. If you wish to proceed you should use pd.concat

To solve this issue, I tried to do r_w.loc['DATE'] = pd.to_datetime(r_w.DATE) but I still get the same error with the following warning:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value, self.name)

How do I merge/concat the columns and what does the warning mean?

user572780
  • 257
  • 3
  • 10
  • Use `r_w['DATE'] = pd.to_datetime(r_w.DATE)` without `.loc` ? – SeaBean Apr 08 '21 at 20:58
  • @SeaBean I get another warning and 4 additional columns: `PRCP_x`, `PRCP_y`, `TAVG_x` and `TAVG_y` with NaNs. I don't want 4 additional columns. – user572780 Apr 08 '21 at 21:06
  • `SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy` – user572780 Apr 08 '21 at 21:09
  • You mean you go through the ValueError by amending the code above without .loc ? – SeaBean Apr 08 '21 at 21:09
  • @SeaBean Yes but the warning seems like it's asking me to use `.loc`. – user572780 Apr 08 '21 at 21:11
  • The SettingWithCopyWarning error mostly is for chained assignment for DataFrame. When you do it on the Date column, it is a Series. So should not be related to this statement. Also, this way of using .loc will only give you unexpected result of appending a row with index 'DATE' at the end. – SeaBean Apr 08 '21 at 21:15
  • Take a look at [this post](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – SeaBean Apr 08 '21 at 21:19
  • I cannot simulate your problem and don't know how your dataframes may have copied from. So can't give you a concrete answer. So, see if the post above can give you hint. – SeaBean Apr 08 '21 at 21:21
  • What is the expected output? – Joe Ferndz Apr 08 '21 at 22:17

0 Answers0