Python pandas read_csv optional columns (handling files with different number of columns)

Question

I need to read CSV from files (1 at a time) that can have different number of columns, where newer files have extra columns that old files don't have.

date|time|name|math
20101230|1345|mickey|0.5|

date|time|name|math|literature|physics
20101230|1345|mickey|0.5|3.5|9

date|time|name|math|literature|physics|chemistry|art
20101230|1345|mickey|0.5|3.5|9|6|7.4

I need to write code that can both old and new formats. The output dataframe will always use latest format. When the code read a file with old format, each unavailable column will be initialized with 1 default value. So in the above example, the output will always contain 8 columns, even if the file only contains 4.

The simplest solution is:

df = pandas.read_csv('input.txt',
  dtype = {'date': int, 'time': int, 'name': str, 'math': float,
           'literature': float, 'physics': float, 'chemistry': float, 'art': float})
n_cols = len(df.columnns)
if n_cols == 4:
  df['literature'] = 0.0
  df['physics'] = 0.0
  df['chemistry'] = 0.0
  df['art'] = 0.0
elif n_cols == 6:
  df['chemistry'] = 0.0
  df['art'] = 0.0
elif ...
return df

However, this solution doesn't look good, since you have to change a lot of old code everytime there's a new format.

How should I handle this problem?

Edit: the question was closed because it's "similar" to this. But it's very clearly different question. I need to load 1 file that might have missing columns (compared to latest format), not loading multiple files then concatenate them.

Something like `df = pd.concat([pd.read_csv(f) for f in files]).fillna(0)` — mozway, Aug 16 '23 at 09:28
That's totally unrelated. That code tries to load multiple file then concatenate them. Meanwhile I need to load 1 file (that might have missing columns). It isn't similar to the example code I shown at all. — Huy Le, Aug 16 '23 at 09:49
It's only one file? Then simply `reindex` the columns with `fill_value=0` — mozway, Aug 16 '23 at 10:24

Python pandas read_csv optional columns (handling files with different number of columns)

0 Answers0