Help & valuable advise needed from the experts please.
I have 100s of csv files which i would like to merge but table columns aren't same. For example
- File 1: Header1, Header2, Header3
- File 2: Header1, Header2
- Files 3: Header3, Header4
- Files 4: Header1, Header3, Header4
I would like to merge the data from all of these csv files.
I thought of a logic but I am struggling to implement it in a python code. I think it requires a Dataframe with pre-defined headers. (Header1, Header2, Header3, Header4). Then i should loop through each csv file to search for that header. If the header exists then the data is appended to the dataframe else skip it.
Can someone please advise if there is a function in Python that can make it simple ? I have tried to use read_csv but the data structure should be the same when looping throw the csv files.
import pandas as pd
import os
import glob
# Location of CSV Files
data_location='C:\\Tableau Reports\\ST Database\\Files\\Downloads\\CSV Files\\*.csv'
# This will give path location to all CSV files
csv_files=glob.glob(data_location)
for csv_file in csv_files:
# Encoding to be changed (UTF-8).
with open(csv_file, newline='', encoding='cp1252') as csvfile:
df_file = pd.read_csv(csvfile)
Combined_df = pd.concat(df_file)
print(Combined_df)
I tried to following advise given in this forum but getting error at line 12 df_file = pd.read_csv(csvfile).
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 114934: character maps to
Any help or advise is greatly appreciated. Thank you