With python's readlines()
function I can retrieve a list of each line in a file:
with open('dat.csv', 'r') as dat:
lines = dat.readlines()
I am working on a problem involving a very large file and this method is producing a memory error. Is there a pandas equivalent to Python's readlines()
function? The pd.read_csv()
option chunksize
seems to append numbers to my lines, which is far from ideal.
Minimal example:
In [1]: lines = []
In [2]: for df in pd.read_csv('s.csv', chunksize = 100):
...: lines.append(df)
In [3]: lines
Out[3]:
[ hello here is a line
0 here is another line
1 here is my last line]
In [4]: with open('s.csv', 'r') as dat:
...: lines = dat.readlines()
...:
In [5]: lines
Out[5]: ['hello here is a line\n', 'here is another line\n', 'here is my last line\n']
In [6]: cat s.csv
hello here is a line
here is another line
here is my last line