I am looking for a a way to read just the header row of a large number of large CSV files.
Using Pandas, I have this method available, for each csv file:
>>> df = pd.read_csv(PATH_TO_CSV)
>>> df.columns
I could do this with just the csv module:
>>> reader = csv.DictReader(open(PATH_TO_CSV))
>>> reader.fieldnames
The problem with these is that each CSV file is 500MB+ in size, and it seems to be a gigantic waste to read in the entire file of each just to pull the header lines.
My end goal of all of this is to pull out unique column names. I can do that once I have a list of column headers that are in each of these files.
How can I extract only the header row of a CSV file, quickly?