0

i need the help of the community to solve an issue. Basically i have a excel database with a series of information: train number, departure date, fare, owner of the train (NTV or TRN) and the market (which station the train is going to cover).

Now, i have created a code that filter some of those information but i miss a piece: how can I, given a specific market, class and range of departure dates, find FOR EACH TRAIN WITH CXR = "NTV" a train that has a CXR = "TRN" AND has a departure date that falls within plus or minus 40min from the NTV train (i would need to display those information and also the fare)?

I would also need to display the result side by side (preferably also exporting on excel) for a quick and easy comparison. Following, the code.

excel file photo enter image description here

import pandas as pd
import datetime

data = pd.read_excel (r'C:\Users\robda\Downloads\out_ow_train.xlsx')



pd.DataFrame([['2020-03-05 00:00:00', 8502, 'B',71,'FCOFLR','TRN']], 
             columns = ['DDTIME2','DTRAIN','CLASS','FARE','MARKET','CXR'])

OUTPUT 
DDTIME2             DTRAIN          CLASS   FARE    MARKET  CXR
2020-03-05 00:00:00 8502    B       71      FCOFLR  TRN

NS = ['TRNMXP','TRNPMF','TRNBLQ','TRNFLR','TRNFCO','TRNNAP','TRNQSR','MXPPMF','MXPBLQ','MXPFLR','MXPFCO','MXPNAP','MXPQSR',
  'PMFBLQ','PMFFLR','PMFFCO','PMFNAP','PMFQSR','BLQFLR','BLQFCO','BLQNAP','BLQQSR','FLRFCO',
  'FLRNAP','FLRQSR','FCONAP','FCOQSR','NAPQSR']
dates = ["2020-03-05","2020-03-10"]
classes = ['E']
time = pd.date_range( start = min(dates), end = max(dates), freq='Min')

df.MARKET.isin(NS)
df.DDATE.isin(dates)
df.CLASS.isin(classes)
df.DTIME2.isin(time)

df[df.MARKET.isin(NS) & df.DDATE.isin(dates) & df.CLASS.isin(classes) & df.DTIME2.isin(time)]

P.S. sorry for low quality pic, i have no idea how to update the excel file

Rob
  • 25
  • 1
  • 7
  • run pd.read_excel(excelfile).head(5).to_dict() and paste the print out. also share ur expected output. you can use this [link](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as a guide – sammywemmy Mar 07 '20 at 13:08
  • Edit: same content like sammywemmy – incognito Mar 07 '20 at 13:10

1 Answers1

0

Not knowing exactly how your data looks like, I would suggest the following approach:

  • Split your data into two dataframes - one for each of the trains: df1 = data.loc[data['CXR']=='NTV'] and df2 = data.loc[data['CXR']=='TRN']
  • Then, merge both dataframes again using pandas.merge_asof for an approximate merge with the appropriate tolerance.
  • Finally, you can check if you get multiple matches, and then only choose the closest one, maybe using merge_data.groupby(...).first()

Maybe, with some example data you will be able to get a more accurate answer.

divingTobi
  • 2,044
  • 10
  • 25
  • How can i add an excel file? That would make things very clear – Rob Mar 07 '20 at 13:49
  • `df = pd.read_excel("file.xlsx")`, see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html – divingTobi Mar 07 '20 at 13:56
  • i mean, on stack overflow so you could easily give it a check – Rob Mar 07 '20 at 14:00
  • As sammywemmy wrote above, `pd.read_excel(excelfile).head(5).to_dict()` is the way to go. That output can easily be read into a dataframe. Just add as many lines as are necessary to recreate the problem. Maybe it would be worthwhile to create an example dataset with a couple of entries where you can test your code. That way it is also possible to check manually if the result is correct. – divingTobi Mar 07 '20 at 14:04
  • Ok, i printed the result but i still have to figure out how to post in a "readable" way for other users. Sorry for that, but i'm kinda new to stack overflow and python itself! – Rob Mar 07 '20 at 15:43