1

Below first two table is cat1 and cat2 table basically, I hope to left join cat1 with cat2, where cat1 date is between start/end date of cat2 Last table is the outcome, how should I achieve this?

enter image description here

Please find the code below for cat1 and cat2

import pandas as pd
from datetime import date

cat1=pd.DataFrame({
    "CatID":[1, 2, 3],
    "Date":[date(2019,10,1), date(2019,11,1), date(2019,12,1)]
})

cat2=pd.DataFrame({
    "Interesting":[1, 2, 3],
    "StartDate":[date(2019,10,1), date(2019,11,1), date(2018,10,6)],
    "EndDate":[date(2019,10,30), date(2019,11,30), date(2018,10,6)]
})
Himanshu
  • 666
  • 1
  • 8
  • 18
william007
  • 17,375
  • 25
  • 118
  • 194
  • what if you have `Date` of cat1 falling in two `StartDate` & `EndDate` (two rows of cat2), what value of `interesting` would you have to keep(or all). – Himanshu Nov 19 '19 at 06:29
  • Possible duplicate of [How to merge two dataframe with column in overlapping datetime range](https://stackoverflow.com/questions/58155328/how-to-merge-two-dataframe-with-column-in-overlapping-datetime-range) and [How to join two dataframes for which column values are within a certain range?](https://stackoverflow.com/questions/46525786/how-to-join-two-dataframes-for-which-column-values-are-within-a-certain-range). – giser_yugang Nov 19 '19 at 06:31
  • @Himanshu good question, all will be kept – william007 Nov 19 '19 at 07:04

3 Answers3

0

The simplest way..

import pandas as pd
from datetime import date

cat1=pd.DataFrame({"CatID":[1,2,3],
                 "Date":[date(2019,10,1),date(2019,11,1),date(2019,12,1)]})
cat2=pd.DataFrame({"Interesting":[1,2,3],
                 "StartDate":[date(2019,10,1),date(2019,11,1),date(2018,10,6)],
                  "EndDate":[date(2019,10,30),date(2019,11,30),date(2018,10,6)]})

cat3_data = []
for interesting, start, end in cat2.values:
    for d in pd.date_range(start, end, freq='D'):
        cat3_data.append([d.date(), interesting])
cat3 = pd.DataFrame(cat3_data, columns=['Date', 'Interesting'])

result = pd.merge(cat1, cat3, how='left', on='Date')

result

    CatID   Date    Interesting
0   1   2019-10-01  1.0
1   2   2019-11-01  2.0
2   3   2019-12-01  NaN
mustnot
  • 159
  • 5
0
import numpy as np
import pandas as pd
from datetime import date

cat1=pd.DataFrame({
          "CatID":[1, 2, 3],
          "Date":[date(2019,10,1), date(2019,11,1), date(2019,12,1)]
     })

cat2=pd.DataFrame({
          "Interesting":[1, 2, 3],
          "StartDate":[date(2019,10,1), date(2019,11,1), date(2018,10,6)],
          "EndDate":[date(2019,10,30), date(2019,11,30), date(2018,10,6)]
     })
def get_interesting(x):
    val = cat2[(cat2['StartDate']<=x) & (cat2['EndDate']>=x)]['Interesting'].values
    if len(val)==0:
        return np.nan
    else:
        return val[0]

interesting = cat1['Date'].apply(get_interesting).rename('Interesting')
out = pd.concat([cat1,interesting],axis=1)

out:

  CatID     Date   Interesting
0   1   2019-10-01     1.0
1   2   2019-11-01     2.0
2   3   2019-12-01     NaN
Himanshu
  • 666
  • 1
  • 8
  • 18
0

If you dont want to left join, you could use this.

import pandas as pd
from datetime import date
import numpy as np
cat1=pd.DataFrame({"CatID":[1,2,3],
                 "Date":[date(2019,10,1),date(2019,11,1),date(2019,12,1)]})
cat2=pd.DataFrame({"Interesting":[1,2,3],
                 "StartDate":[date(2019,10,1),date(2019,11,1),date(2018,10,6)],
                  "EndDate":[date(2019,10,30),date(2019,11,30),date(2018,10,6)]})

cat3 = cat1.copy()
cat3['range'] = cat2.apply(lambda x: pd.date_range(start=x['StartDate'], end=x['EndDate']), axis=1)
cat3['Interesting'] = cat2['Interesting']
cat1['Interesting'] = cat3.apply(lambda x: x['Interesting'] if x['Date'] in x['range'] else np.nan, axis=1)

cat1 outputs:

   CatID        Date  Interesting
0      1  2019-10-01          1.0
1      2  2019-11-01          2.0
2      3  2019-12-01          NaN
luigigi
  • 4,146
  • 1
  • 13
  • 30