1

I have some code that reads in multiple csv files into a pandas dataframe. The problem is that the first two lines of all the files need to be ignored and I cannot figure out how to do this.

import pandas as pd
import glob
import os

path = r'D:\E\Traficc\migration\Zambia-Mining\DATA\24monthimport'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)

data   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

print(data.tail())

I have tried to use next(df) but I am getting an error that df is not iterable.

Can I do all this within the existing 1 line loop or do I need to break it up? What can I use to accomplish this?

  • 2
    [`skiprows`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)? – anky Apr 29 '19 at 11:48

0 Answers0