I have many dataframes in pickle format, they have the same information, but columns name are not identical in all of them. For example:
> file name: Columns_['Weryfikacja UK - 2022-04-14.xlsx_20uk-woocommerce-last-5yrs-expo.pkl'].pkl
> columns_nam: Index(['domain', 'phones', 'phones_data_source', 'company_name', 'SourceFile'], dtype='object')
>
> file name: Columns_['US _ Canada - rechurn - 01.2022-04.2022.xlsx_Other.pkl'].pkl
> columns_nam: Index(['Phone', 'Domain', 'SourceFile'], dtype='object')
>
> file name: Columns_['2022-08 - US _ Canada.xlsx_29.08-02.09 WixStore USCA.pkl'].pkl
> columns_nam: Index(['Phone', 'Alternative phone 1', 'Alternative phone2', 'Alternative phone3', 'Alternative phone4', 'SourceFile'], dtype='object')
I have a dict like this to rename columns names in all files:
my_dict = {
"Domain": ['Domain','domain', 'WWW', 'www'],
"Phone": ['Phone','phone_number', 'phones', 'Tel'],
"AlternativePhone1": ['Alternative phone1','Alternative phone 2', 'phones2'],
"AlternativePhone2": ['Alternative phone2','Alternative phone 3', 'phones3'],
"AlternativePhone3": ['Alternative phone3','Alternative phone 4', 'phones4'],
"AlternativePhone4": ['Alternative phone4','Alternative phone 5', 'phones5'],
"SourceFile": ['SourceFile']
}
I need a help with code, how should I do it?
for file in glob.glob("*.pkl"):
df = pd.read_pickle(file)
On output I would like to have something like this
> file name: Columns_['2022-08 - US _ Canada.xlsx_29.08-02.09 WixStore USCA.pkl'].pkl
> columns_nam: Index(['Phone', 'AlternativePhone1', 'AlternativePhone2', 'AlternativePhone3', 'AlternativePhone4', 'SourceFile'], dtype='object')