2

Using psycopg2, I could write large results as CSV using copy_expert and a BytesIO buffer like this with pandas:

copy_sql = "COPY (SELECT * FROM big_table) TO STDOUT CSV"

buffer = BytesIO()
cursor.copy_expert(copy_sql, buffer, size=8192)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel(self.output_file)

However, I can't figure out how to replace the buffer in copy_expert with psycopg3's new copy command. Has anyone figured out a way to do this?

FlipperPA
  • 13,607
  • 4
  • 39
  • 71
  • That is what the docs are for. Read [psycopg3 COPY](https://www.psycopg.org/psycopg3/docs/basic/copy.html). There are plenty of examples. FYI, there is no `copy_expert` in `psycopg3`. – Adrian Klaver Dec 16 '22 at 19:53
  • Also see my answer here [Psycopg COPY](https://stackoverflow.com/questions/73623152/psycopg2-copy-from-is-inserting-data-with-double-quotes-when-whitespace-is-pre/73623435#73623435). It has both `psycopg2` and `psycopg3` examples. – Adrian Klaver Dec 16 '22 at 19:59
  • I read the docs; there is no example with a buffer and a size. Thus, why I said: `However, I can't figure out how to replace the buffer in copy_expert with psycopg3's new copy command. Has anyone figured out a way to do this?` Thanks for the answer, an example like that in the docs would go a long way. I'll issue a PR. – FlipperPA Dec 16 '22 at 20:00
  • Not following, from here [psycopg2 copy_expert](https://www.psycopg.org/docs/cursor.html#cursor.copy_expert) `size` is only used for `COPY some _table FROM some_file` per docs: *size – size of the read buffer to be used in COPY FROM.*. In your case it would be a no-op. You should be able to use the `COPY ... TO STDOUT` example from *Copying block-by-block* to achieve the same as your `psycopg2` case. Just substitute a buffer for the file. – Adrian Klaver Dec 17 '22 at 01:30

2 Answers2

1

Data and table setup:

cat test.csv                                                                                                                                                            
1,john123@gmail.com,"John Stokes"
2,emily123@gmail.com,"Emily Ray"

create table test_csv (id integer, mail varchar, name varchar);

import psycopg

with open('test.csv') as f:
    with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
        while data := f.read(1000):
            copy.write(data)
con.commit()

Using a buffer:

buffer = BytesIO()
with cur.copy('copy test_csv to stdout') as copy:
    for data in copy:
        buffer.write(data)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel('test_csv.xlsx')

Since you are using Pandas there is also:


from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:@localhost:5432/test')
pd.read_sql('select * from test_csv', engine).to_excel('test_csv.xlsx')

Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28
0

The key to writing a large query to a file through psycopg3 in this fashion is to use a SpooledTemporaryFile, which will limit the amount of memory usage in Python (see max_size). Then after the CSV is written to disk, convert with pandas.

from tempfile import SpooledTemporaryFile
from pandas import read_csv
from psycopg import connect

cursor = connect([connection]).cursor()
copy_sql = "COPY (SELECT * FROM stocks WHERE price > %s) TO STDOUT"
price = 100

with SpooledTemporaryFile(
    mode="wb",
    max_size=65546,
    buffering=8192,
) as tmpfile:
    with cursor.copy(copy_sql, (price,)) as copy:
        for data in copy:
            tmpfile.write(data)
    tmpfile.seek(0)

    read_csv(tmpfile, engine="c").to_excel("my_spreadsheet.xlsx")
FlipperPA
  • 13,607
  • 4
  • 39
  • 71