-1

i have table A and table B as follows:

Table A

     from      to   comercial_event
0   01/07   02/07   January sales
1   02/12   02/12   Chinese new year
2   02/13   02/18   Carnival
3   02/07   02/14   Valentine's Day
4   03/12   03/19   Father's day

Table B

                        dates   comercial_event     month_day
0   2017-01-14 23:01:10+00:00           unknown         01/14
1   2017-08-29 23:01:10+00:00           unknown         08/29
2   2017-02-13 23:01:10+00:00           unknown         02/13
3   2017-08-31 23:01:10+00:00           unknown         08/31
4   2017-03-15 23:01:10+00:00           unknown         03/15

What i want to do is to merge the 2 tables using the month_day column from table B as reference. The month_day column indicates month and day only (no year). If the value of this column falls between the values of the columns from and to of table A, the comercial_event column of table B should be updated with the value of the comercial_event column of Table A for that range of month and day. The output would be something like this:

                            dates   comercial_event     month_day
    0   2017-01-14 23:01:10+00:00   January Sales           01/14
    1   2017-08-29 23:01:10+00:00           unknown         08/29
    2   2017-02-13 23:01:10+00:00   Valentine's Day         02/13
    3   2017-08-31 23:01:10+00:00           unknown         08/31
    4   2017-03-15 23:01:10+00:00   Father's day            03/15

If a month_day value was to coincide with multiple comercial events, the comercial_event column value should be "multiple".

I have seen this question and this question but none of them seems to really address this problem.

I'm using python 2.7 and pandas 0.24.

How could i solve this?

Miguel 2488
  • 1,410
  • 1
  • 20
  • 41
  • Does this answer your question? [How to join two dataframes for which column values are within a certain range?](https://stackoverflow.com/questions/46525786/how-to-join-two-dataframes-for-which-column-values-are-within-a-certain-range) – BallpointBen Feb 23 '21 at 01:32
  • Correct me if i'm wrong but that solution expects the dates to be in timestamp format and i'm working with strings here in fact. There are no complete datetime objects, only months and days pairs – Miguel 2488 Feb 23 '21 at 02:15

1 Answers1

1

Let us do

idx = pd.IntervalIndex.from_arrays(pd.to_datetime(df1['from'], format='%m/%d'),
           pd.to_datetime(df1['to'],format='%m/%d'), 'both')
s = pd.Series(df1.comercial_event.tolist(), index=idx)
look = pd.to_datetime(df2['month_day'], format='%m/%d')

l = []
for x in look:

    try:
        l.append(s.loc[[x]].tolist())
    except:
        l.append(np.nan)
df2['out'] = l
df2
Out[117]: 
        dates comercial_event month_day                         out
0  2017-01-14         unknown     01/14              [Januarysales]
1  2017-08-29         unknown     08/29                         NaN
2  2017-02-13         unknown     02/13  [Carnival, Valentine'sDay]
3  2017-08-31         unknown     08/31                         NaN
4  2017-03-15         unknown     03/15               [Father'sday]
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Hi @BENY , thank you very much for your answer. Though it seems functional, it doesn't do exactly what i wanted, it's pretty hard to read in addition, but i think this should do it. Thank you very much again – Miguel 2488 Feb 23 '21 at 01:04