1

I need to output the results of a sql query to csv. The query results will be outside of my memory resources. Also csv operations have typically been much slower in pandas then using csv library for me, so I'd prefer not to use pandas.

I have tried to create the code below with the intent to batch a list of 1000 rows and then append them to a content to a csv file. When I run it, my system just runs out of memory and it doesn't work like I anticipated.

I'm not sure what I'm doing wrong or if there is something I just don't understand.

ROWS_AT_ONCE = 1000

curr.execute(
'''
SELECT
*
FROM '''+Table_Name+'''

;
'''
)
rows = curr.fetchall()
headers = list(map(lambda x: x[0], curr.description))
headers = tuple(headers)
csv_w_pointer = open(Export_Path_Name, 'a' , newline='')
csv_writer = csv.writer(csv_w_pointer, delimiter='\t', quotechar='"')
csv_writer.writerow(headers)
batch = list()
for row in rows:
    batch.append(row)
    if len(batch) >= ROWS_AT_ONCE:
        csv_writer.writerow(batch)
    batch = list()
if batch:
    csv_writer.writerow(batch)
del batch

I need to write the sql results to csv when out of memory. My current attempt is failing due to exceeding memory.

David 54321
  • 568
  • 1
  • 9
  • 23
  • At what point is it running out of memory? When you're writing to CSV or at `curr.fetchall()`? – rickdenhaan Mar 28 '19 at 21:52
  • It looks like during the fetchall actually. Anyway to manage that? – David 54321 Mar 28 '19 at 21:56
  • Unless this is part of a bigger program (and maybe even then), I'd just use the sqlite3 command line tool instead of python: https://www.sqlite.org/cli.html#csv_export – Shawn Mar 28 '19 at 22:19
  • I'm trying to have an automated process. Is there a way to have my python script open the sqlite3 command line to export the table? Using the sqlite3 command line give me really good performance, but I need a solution I can integrate into my python script. – David 54321 Mar 28 '19 at 22:35

2 Answers2

4

You should avoid using the fetchall method if your data set is large. Use the fetchmany method instead to fetch only a certain number of rows at a time, and write the fetched rows immediately to the CSV to minimize the memory usage:

ROWS_AT_ONCE = 1000

curr.execute(
'''
SELECT
*
FROM '''+Table_Name+'''

;
'''
)
headers = list(map(lambda x: x[0], curr.description))
headers = tuple(headers)
with open(Export_Path_Name, 'a' , newline='') as csv_w_pointer:
    csv_writer = csv.writer(csv_w_pointer, delimiter='\t', quotechar='"')
    csv_writer.writerow(headers)
    while True:
        rows = curr.fetchmany(ROWS_AT_ONCE)
        if not rows:
            break
        csv_writer.writerows(rows)
blhsing
  • 91,368
  • 6
  • 71
  • 106
0

Instead of all the csv loops, you can use the library odo

writing csv to sql

sqlite example

and one example for writing a 33gb csv file


Another option is the csv2sqlite.py script

philshem
  • 24,761
  • 8
  • 61
  • 127
  • I've never been smart enough to understand how odo works. The documentation doesn't make sense to me and I can only really follow simple examples. If there was a simple example of a full script that converted csv to sqlite and a sqlite query to csv I would really like that. I can follow the "sqlite example" okay, but that looks like it's for csv to sqlite. I'd like a simpler example the other direction. Also based on a query instead of just a full table would be helpful. – David 54321 Mar 28 '19 at 22:32
  • If you can create a view then that’s a simple way to not select the entire table. As for an example, did you see the 3rd link (added as an edit) – philshem Mar 28 '19 at 22:49