I have a code Filtering Data, that I wan't displayed from an Excel Document using Pandas in Jupyter Notebook. It is for a UK RAF Historic Aircraft Display Team, Year 2009 Appearance Schedule.
Here is my Python Code :-
import pandas as pd
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')
data = pd.read_excel(xls, sheet_name="Sheet1")
pd.options.display.max_rows = 1000
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID'])
df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))]
I am unsure what to type, to filter the Data, for when the Numerical Value in the BID Column, is the same in the BID column in the next row. And also in addition, only when one of the Aircraft in the A/C Column, where both Numerical Values in the BID Column below and above are the same, is DAK, and excluding that principle, only if in a row for the A/C Column shows DHS Could someone please tell me, what I should add to my Python Code to enable this, it would be much appreciated if someone could.
Also for example with the filtered data for example, I would like :-
Output:
145 SCARBOROUGH DAK DISPLAY 2008-05-25 00:00:00 610
150 SCARBOROUGH SPIT DISPLAY 2008-05-25 00:00:00 610
Changed to showing the following, i.e. merging the two lines together :-
Output:
SCARBOROUGH DS DISPLAY 2008-05-25 00:00:00 610
And
Output:
173 TARRANT RUSHDEN HS DISPLAY NaN 132
174 TARRANT RUSHDEN DAK DISPLAY NaN 132
Changed to showing :-
Output:
TARRANT RUSHDEN DHS DISPLAY NaN 132
I mean changed to showing, for all those occurrences,
Not just for those two Venues.
Here is an Sample Of My Output Data :-
Venue A/C DISPLAY/ Date BID
25 SHUTTLEWORTH DAK DISPLAY NaN 529
55 KEMBLE DAK DISPLAY NaN 461
69 NORTHWICH SPIT DISPLAY 2008-05-10 00:00:00 514
72 POCKLINGTON SPIT DISPLAY 2009-05-10 00:00:00 821
75 BERLIN DAK DISPLAY 2008-05-12 00:00:00 587
78 MILDENHALL SPIT DISPLAY 2009-05-15 00:00:00 920
93 DUXFORD HS DISPLAY NaN 611
103 CRANWELL HS DISPLAY 2008-05-20 00:00:00 44
145 SCARBOROUGH DAK DISPLAY 2008-05-25 00:00:00 610
150 SCARBOROUGH SPIT DISPLAY 2008-05-25 00:00:00 610
151 CORBRIDGE SPIT DISPLAY NaN 353
167 BRIDGEND-CNX SPIT DISPLAY 2008-05-31 00:00:00 527
173 TARRANT RUSHDEN HS DISPLAY NaN 132
174 TARRANT RUSHDEN DAK DISPLAY NaN 132
179 NORTHOLT SPIT DISPLAY 2009-06-05 00:00:00 870
214 BRIZE NORTON HS DISPLAY NaN 939
218 ROPLEY HS DISPLAY 2008-06-13 00:00:00 355
223 THWAITES HS DISPLAY NaN 364
231 ROPLEY HS DISPLAY NaN 355
240 COSFORD HS DISPLAY 2008-06-14 00:00:00 667
241 QUORN HS DISPLAY NaN 314
244 COSFORD DAK DISPLAY 2008-06-14 00:00:00 NaN
260 REDHILL SPIT DISPLAY NaN 686
269 KEMBLE DAK DISPLAY NaN 316
270 KEMBLE HS DISPLAY NaN 316
280 KEMBLE SPIT DISPLAY 2008-06-21 00:00:00 316
285 KEMBLE DAK DISPLAY 2008-06-21 00:00:00 316
Here is the Website Link, to the .xls i.e. Excel Document File :-
You will obviously need to change the following in my Python Code, to whatever you call the .xls File. And the path, of where you save it, on your Computer :-
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')
I have changed the end bit of the Code to :-
selected = df.loc[df['A/C'] == 'DS', 'DH', 'DHS']
groupby_venue_date = selected.groupby(['Venue', 'BID', 'DISPLAY/'])
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
print(aircraft.shape)
pd.DataFrame(aircraft)
But get a :- IndexingError: Too many indexers message, when I run the Code, what does that mean ? And what has caused the Error Bill ?
This is the Code I am currently running as of 2nd January 2020 :-
import pandas as pd
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')
data = pd.read_excel(xls, sheet_name="Sheet1")
pd.options.display.max_rows = 1000
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))]
df["Date"].fillna("No Date", inplace = True)
df['A/C'].unique().tolist()
rename_map = {
'DAK': 'D',
'SPIT': 'S',
'LANC': 'L',
'HURRI': 'H',
'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
#selected = df.loc[df['A/C'] == 'DS', 'DH', 'DHS']
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/'])
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
print(aircraft.shape)
pd.DataFrame(aircraft)