2

I have these two pandas dataframes I created and cleaned from online data, and I was trying to merge them based on their dates, which are all by month. However, the first dataset has its days on the last day of the month, whether the second dataset is based on the first day of the month.

# data1
0    1987-01-01  63.752
1    1987-02-01  64.152
2    1987-03-01  64.488
3    1987-04-01  64.995

# data2
0   1987-01-31  1115.10
1   1987-02-30  1095.63
2   1987-03-30  1036.19
3   1987-04-30  1057.08

I would normally merge them by something like this if I had daily data with a few missing days

data3 = pd.merge(left=data1, left_on='Date', right=data2, right_on='Date')

but in this case they are never matching, even though they are all similar dates.

How would I go about "telling" Pandas to combine the datasets based on dates that are just a few days apart, and name each data by just "month - year"? I don't know where to begin.

halfer
  • 19,824
  • 17
  • 99
  • 186
Coolio2654
  • 1,589
  • 3
  • 21
  • 46
  • You could just add one day to the second data frame, as the last day in a month is always followed by a first day in a month. However, October has 31 days, so 2009-10-30 is not on the last day of the month. – fuglede Mar 02 '18 at 22:45
  • ^This is a good point. If that was just a mistype, then I would suggest editing and sorting the dates using the datetime module and looking into strptime: https://docs.python.org/2/library/datetime.html You can then add 1 or subtract 1 to the month (depending on which direction you want to go) and then drop the day inside of the date. – rahlf23 Mar 02 '18 at 22:46
  • Does this question help? https://stackoverflow.com/q/21201618/3639023 – johnchase Mar 02 '18 at 22:53
  • 2
    Give a [**Minimal, Complete, Verifiable** Example](https://stackoverflow.com/help/how-to-ask) in your question. In your case the two example DataFrames have no overlap. Also what is the expected result? – Alex Mar 02 '18 at 23:00

2 Answers2

4

IIUC, you want to merge on the closest dates? This is what merge_asof is for

If dates are not already converted to datetime, convert them like so

data1.date = pd.to_datetime(data1.date)
data2.date = pd.to_datetime(data2.date)

now complete the merge

pd.merge_asof(data1,data2,on='date',direction='nearest')

       date  value_x  value_y
0 1987-01-01   63.752  1115.10
1 1987-02-01   64.152  1036.19
2 1987-03-01   64.488  1057.08
3 1987-04-01   64.995  1057.08
DJK
  • 8,924
  • 4
  • 24
  • 40
  • I am glad to see there seems to be a tailor-made solution. However, I get a "left keys must be sorted" error when I try this, even though I made sure both data sets only have two columns and their date columns are named the same and already converted to datetime. – Coolio2654 Mar 02 '18 at 23:17
  • Right you just need to sort the columns before you merge them with this technique i.e. `data1.sort_values('date',ascending=True)` – DJK Mar 02 '18 at 23:19
  • I am sorting them using this method, but nothing gets sorted, despite no errors getting thrown at me. – Coolio2654 Mar 03 '18 at 00:36
  • I happened to guess that extra parameter myself a moment ago, and the data sorts now. Thanks a lot! – Coolio2654 Mar 03 '18 at 01:39
1

If your date columns are called date, you can do something along these lines:

data1['date'] = pd.to_datetime(data1['date'])
data2['date'] = pd.to_datetime(data2['date'])

data1['month'] = data1['date'].dt.month
data1['year'] = data1['date'].dt.year

data2['month'] = data2['date'].dt.month
data2['year'] = data2['date'].dt.year

Which would result in something like this:

>>> data1
        date   value  month  year
0                                
0 1987-01-01  63.752      1  1987
1 1987-02-01  64.152      2  1987
2 1987-03-01  64.488      3  1987
3 1987-04-01  64.995      4  1987
>>> data2
        date    value  month  year
0                                 
0 1987-01-01  1115.10      1  1987
1 1987-01-02  1095.63      1  1987
2 1987-02-01  1036.19      2  1987
3 1987-02-28  1057.08      2  1987

You can then merge those on month and year:

data3=data1.merge(data2, on=['month', 'year'])
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • While I chose DJK's answer bec. he showed a powerful function specifically meant to handle such situations, your solution also showed a valid and important to remember way of approaching this problem. Thanks a lot! – Coolio2654 Mar 03 '18 at 01:41
  • No problem, I also would have chosen his answer! I didn't know about `merge_asof` – sacuL Mar 03 '18 at 01:59