I have a 102gb CSV file exported from MongoDB that I'm trying to upload to Postgres. The file contains ~55 million rows. I'm using \copy
to upload. However, I get a carriage return error on line 47,867,184:
ERROR: unquoted carriage return found in data
HINT: Use quoted CSV field to represent carriage return.
CONTEXT: COPY reso_facts, line 47867184
To my knowledge, Postgres doesn't allow for skipping bad rows on import. Seems like I need to fix the file. Is there a way to delete a CSV row in-place using Python? I strongly prefer not to write the file to an external hard drive.
I found this elegant solution for txt files:
import os
from mmap import mmap
def removeLine(filename, lineno):
f=os.open(filename, os.O_RDWR)
m=mmap(f,0)
p=0
for i in range(lineno-1):
p=m.find('\n',p)+1
q=m.find('\n',p)
m[p:q] = ' '*(q-p)
os.close(f)
But it errors out when fed a CSV: TypeError: a bytes-like object is required, not 'str'
. Is there a way to modify the above for CSVs, or is there an alternative method entirely?