0

I am merging numerous CSV files using glob. However, not all CSV files contain all fields. What logic should I use in my if statement to:

  1. Create a column if it does not already exist in the temporary dataframe
  2. Fill above column with NaN values

Here's a simplified extract of my code for reference:

for file in allFiles:    
    try:
        df_temp = pd.read_csv(os.path.join(file))
        if 'text' in df_temp: # if the file contains 'text' column
            print(file)
            df_temp['mask'] = df_temp['text'].str.contains(regex_pattern)
            df_temp = (df_temp[df_temp['mask'] == True]).drop('mask', axis = 1)
            df_temp['dataset_source'] = str(file) # Create source file column    
    except pd.io.common.EmptyDataError:
        print(file, " is empty and has been skipped.")
    dataframes.append(df_temp)

Thanks!

jim_jones
  • 167
  • 3
  • 13
  • do all files have the same `len`(number rows)? – Danila Ganchar May 08 '20 at 11:40
  • [jfyi](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). could you show a few `df` and expected `output`? – Danila Ganchar May 08 '20 at 11:42
  • thanks for your initial responses. no, the files have arbitrary numbers of rows. i'll put some time aside to write up a more complete example shortly. thanks! – jim_jones May 09 '20 at 20:54

0 Answers0