I'm trying to extract huge amounts of data from a DB and write it to a csv file. I'm trying to find out what the fastest way would be to do this. I found that running writerows on the result of a fetchall was 40% slower than the code below.
with open(filename, 'a') as f:
writer = csv.writer(f, delimiter='\t')
cursor.execute("SELECT * FROM table")
writer.writerow([i[0] for i in cursor.description])
count = 0
builder = []
row = cursor.fetchone()
DELIMITERS = ['\t'] * (len(row) - 1) + ['\n']
while row:
count += 1
# Add row with delimiters to builder
builder += [str(item) for pair in zip(row, DELIMITERS) for item in pair]
if count == 1000:
count = 0
f.write(''.join(builder))
builder[:] = []
row = cursor.fetchone()
f.write(''.join(builder))
Edit: The database I'm using is unique to the small company that I'm working for, so unfortunately I can't provide much information on that front. I'm using jpype to connect with the database since the only means of connecting is via a jdbc driver. I'm running cPython 2.7.5; would love to use PyPy but it doesn't work with Pandas.
Since I'm extracting such a large number of rows, I'm hesitant to use fetchall for fear that I'll run out of memory. row
has comparable performance and is much easier on the eyes, so I think I'll use that. Thanks a bunch!