I used this solution to read in all CSV files from a Google Drive into a data frame in a Colab notebook using this solution (reading csv file with specific name in Python). Each file has the same naming convention and I want to split the file name into two new columns and append those to the dataframe.
The file name are structured like this: Platform_Company.csv (example Instagram_Microsoft.csv) and I want the columns to be appended at the beginning of the dataframe.
platform | company | employee id | employee email |
---|---|---|---|
Microsoft | person 1 | humanperson@microsoft.com |
So far, I've used this to read in the files. I'm not sure what the layer number is or whether I need it.
from pathlib import Path
import pandas as pd
ls_data = []
csv_directory = '/content/drive/MyDrive/Colab Notebooks/'
for idx, filename in enumerate(Path(csv_directory).glob('*Instagram_*.csv')):
df_temp = pd.read_csv(filename)
df_temp.insert(0, 'layer_number', idx)
ls_data.append(df_temp)
df = pd.concat(ls_data, axis=0)
I tried incorporating the following script (Read multiple csv files and Add filename as new column in pandas), but it isn't working and I'm not sure how to add it into the current version.
import glob
import os
import pandas as pd
path = r'\OUTPUT'
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f, delimiter='|') for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
concatenated_df['filename'] =(all_files[f] for f in all_files)
Thanks for any guidance and/or suggestions!