I have dataframe
where the column names have the same format: data_sensor
, where the date is in the format of yymmdd
. Here is a subset of it:
Considering the last data (180722
), I would like to keep the column according to sensor pre-defined priority. For example, I would like to define that SN1
is more important than SK3
. So the desired result would be the same dataframe
, only without column 180722_SK3
. The number of columns with the same last date can be more than two.
This is the solution I implemented:
sensorsImportance = ['SN1', 'SK3'] #list of importence, first item is the most important
sensorsOrdering = {word: i for i, word in enumerate(sensorsImportance)}
def remove_duplicate_last_date(df,sensorsOrdering):
s = []
lastDate = df.columns.tolist()[-1].split('_')[0]
for i in df.columns.tolist():
if lastDate in i:
s.append(i.split('_')[1])
if len(s)>1:
keepCol = lastDate +'_'+sorted(s, key=sensorsOrdering.get)[0]
dropCols = [lastDate +'_'+i for i in sorted(s, key=sensorsOrdering.get)[1:]]
df.drop(dropCols,axis=1,inplace=True)
return df
It works fine, however, I feel that this is too cumbersome, is there a better way?