I am trying to read and reformat a very large (2GB+) .out file that is structured like a csv. I had previously used the standard open(), with no such issue, but changed it to codecs.open() as it was having trouble with some characters.
It is now throwing
Traceback (most recent call last):
line 21, in <module>
if(r[5]==""):
IndexError: list index out of range
on the first row, although there is definitely an element at r[5].
(runtime is 0.301s)
import sys
import csv
import datetime
import codecs
maxInt=sys.maxsize
decrement=True
while decrement:
decrement=False
try:
csv.field_size_limit(maxInt)
except OverflowError:
maxInt = int(maxInt/10)
decrement = True
with codecs.open("file.out", 'rU', 'utf-16-be') as source:
rdr = csv.reader(source)
with open("out.csv","w", newline='') as result:
wtr = csv.writer(result)
wtr.writerow(("Column1", "column2", "column3", "etc..."))
for r in rdr:
if(r[5]==""):
continue
wtr.writerow((datetime.datetime.strptime(r[5], '%m/%d/%Y').strftime('%Y-%m-%d'), r[3], r[7], r[9]+r[10]+" "+r[12]))
using utf-8 throws UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 12: invalid continuation byte
using latin-1 or ISO-8859-1 throws UnicodeEncodeError: 'charmap' codec can't encode characters in position 57-58: character maps to <undefined>
, albeit after running much more.
input file looks like this:
"A00017","K","G","1999","4530","01/12/1999","","","","PEOPLE TO ELECT MANGINELLI","","","","258 MAGNIOLIA DRIVE","SELDEN","NY","11784","","","404.57","","","","","","","2","","NAA","07/22/1999 08:43:59"
"A00037","K","G","1999","999999","01/12/1999","","","","CITIZENS TO ELECT TEDISCO TO ASSEMBLY","","","","","","","","","","0","","","","","","","2","","",""
"A00037","K","N","1999","1693","01/15/1999","","","","OUTSTANDING LOAN","","","","2176 GUILDERLAND AVE","SCHENECTADY","NY","12306","","","10474.8","10474.8","","","OTHER","","PREVIOUS LOAN FROM JAMES TEDISCO","","P","JM","07/15/1999 15:08:17"
"A00037","J","N","2000","1694","01/13/2000","","","","OUTSTANDING LOAN","","","","2176 GUILDERLAND","SCHENECTADY","NY","12306","","","10474.8","10474.8","","","OTHER","","LOANS FROM PREVIOUS CAMPAIGNS FROM J","","P","JM","01/14/1900 16:35:09"
"A00037","K","X","2000","999999","","","","","","","","","","","","","","","","","","","","","","","","","07/20/2000 00:00:00"
"A00037","J","X","2001","999999","","","","","","","","","","","","","","","","","","","","","","","","","01/17/2001 00:00:00"
"A00037","K","X","2002","999999","","","","","","","","","","","","","","","","","","","","","","","","","07/19/2002 00:00:00"
"A00037","J","X","2003","999999","","","","","","","","","","","","","","","","","","","","","","","","","01/21/2003 00:00:00"
"A00037","K","X","2003","999999","","","","","","","","","","","","","","","","","","","","","","","","","07/16/2003 00:00:00"
"A00037","J","X","2004","999999","","","","","","","","","","","","","","","","","","","","","","","","","01/22/2004 00:00:00"
i've gotten this far thanks to: