0

I have a large txt file that I want to read in python. It's tab delimited. I want to be able to read the headers as well. I saw this stackoverflow site but it doesn't show how to both designate the n qty of rows as well as determine the delimiter and line break: Read first N lines of a file in python

Scott Davis
  • 147
  • 1
  • 4
  • 10

1 Answers1

1

Pandas dataframe will help you do that automatically.

import pandas as pd
df = pd.read_csv(myfile,sep='\t')
df.head(n=5)  # for the 5 first lines of your file

For more info, see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv

Lawrence
  • 869
  • 7
  • 10
  • Thanks! I ran into memory issues. Is there a read function that doesn't load the entire file into memory first? I'm looking at a txt file with 45,000,000 records... – Scott Davis Feb 09 '19 at 00:24
  • @ScottDavis try using `df = pd.read_csv(myfile,sep='\t',low_memory=False)`, or try using chunks. I've never had this problem before, but my files are much smaller. Good luck! – Lawrence Feb 11 '19 at 02:53
  • Thanks Lawrence! I used the suggested code and still hit the "pandas.errors.ParserError: Error tokenizing data. C error: out of memory" – Scott Davis Feb 11 '19 at 23:35