I have a csv file (Delimited with '|') with data something like this:
|origin|destination|class|eff_start_date|eff_end_date|price
0|A|B|1|2016-01-01|2016-12-31|37.4
1|A|B|2|2016-01-01|2016-12-31|30.4
2|A|C|1|2016-01-01|2016-12-31|94.0
3|A|C|2|2016-01-01|2016-12-31|80.0
4|A|B|1|2017-01-01|2017-12-31|38.4
5|A|B|2|2017-01-01|2017-12-31|31.4
6|A|C|1|2017-01-01|2017-12-31|95.0
7|A|C|2|2017-01-01|2017-12-31|81.0
8|A|C|2|2017-02-01|2017-05-20|82.0
This file gives the fare for a particular origin, destination. eff_start_date & eff_end_date show the period for which a particular fare is valid.
I am writing a code to validate this file in a way that there should never be 2 separate fare values for any date. in the example above, records, 7 & 8 are conflicting as for same origin, destination & class, between February, 2017 to 20-may-2017 there may be 2 separate fare records.
Below is the snippet of my code but it didn't work, i am not able to figure out whats wrong.
import pandas as pd
df_fares = pd.read_csv('input.csv', sep='|')
df_fares.eff_start_date = pd.to_datetime(df_fares.eff_start_date, format='%Y-%m-%d')
df_fares.eff_end_date = pd.to_datetime(df_fares.eff_end_date, format='%Y-%m-%d')
df_fares['overlap'] = (df_fares.groupby(['origin', 'destination', 'class'])
.apply(lambda x: (x['eff_start_date'].shift() - x['eff_end_date']) > timedelta(0))
.reset_index(drop=True))