I am merging numerous CSV files using glob
. However, not all CSV files contain all fields. What logic should I use in my if
statement to:
- Create a column if it does not already exist in the temporary dataframe
- Fill above column with
NaN
values
Here's a simplified extract of my code for reference:
for file in allFiles:
try:
df_temp = pd.read_csv(os.path.join(file))
if 'text' in df_temp: # if the file contains 'text' column
print(file)
df_temp['mask'] = df_temp['text'].str.contains(regex_pattern)
df_temp = (df_temp[df_temp['mask'] == True]).drop('mask', axis = 1)
df_temp['dataset_source'] = str(file) # Create source file column
except pd.io.common.EmptyDataError:
print(file, " is empty and has been skipped.")
dataframes.append(df_temp)
Thanks!