For a given dataframe df
, imported from a csv file and containing redundant data (columns), I would like to write a function that allows to perform recursive filtering and sub-sequent renaming of df.columns
, based on the amount of arguments given.
Ideally the function should perform as follows.
When input is (df, 'string1a', 'string1b', 'new_col_name1')
, then:
filter1 = [col for col in df.columns if 'string1a' in col and 'string1b' in col]
df_out = df [ filter1]
df_out.columns= ['new_col_name1']
return df_out
Whereas, when input is:
(df, 'string1a', 'string1b', 'new_col_name1','string2a', 'string2b', 'new_col_name2', 'string3a', 'string3b', 'new_col_name3')
the function should return
filter1 = [col for col in df.columns if 'string1a' in col and 'string1b' in col]
filter2 = [col for col in df.columns if 'string2a' in col and 'string2b' in col]
filter3 = [col for col in df.columns if 'string3a' in col and 'string3b' in col]
df_out = df [ filter1 + filter2 + filter3 ]
df_out.columns= ['new_col_name1','new_col_name2','new_col_name3']
return df_out