1

I have postgres tables and i want to run a PostgreSQL script file on these tables using python and then write the result of the queries in a csv file. The script file have multiple queries separated by semicolon ;. Sample script is shown below

Script file:

--Duplication Check
select p.*, c.name
from scale_polygons_v3 c inner join cartographic_v3 p
on (metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20)) AND c.kind NOT IN (9,10)
where ST_Contains(c.geom, p.geom);

--Area Check
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1
and p.areaid != sp.areaid;

When i run the python code, it executes successfully without any error but the problem i am facing is, during writing the result of the queries to a csv file. Only the result of last executed query is written to the csv file. It means that first query result is overwrite by the second query, second by third and so on till the last query.

Here is my python code:

import psycopg2
import sys
import csv
import datetime, time

def run_sql_file(filename, connection):
'''
    The function takes a filename and a connection as input
    and will run the SQL query on the given connection  
'''
    start = time.time()

    file = open(filename, 'r')
    sql = s = " ".join(file.readlines())
   #sql = sql1[3:]
    print "Start executing: " + " at " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M")) + "\n"
    print "Query:\n", sql + "\n"
    cursor = connection.cursor()
    cursor.execute(sql)
    records = cursor.fetchall()
    with open('Report.csv', 'a') as f:
        writer = csv.writer(f, delimiter=',')
        for row in records:
            writer.writerow(row)
    connection.commit()
    end = time.time()
    row_count = sum(1 for row in records)
    print "Done Executing:", filename
    print "Number of rows returned:", row_count
    print "Time elapsed to run the query:",str((end - start)*1000) + ' ms'
    print "\t ==============================="

def main():    
    connection = psycopg2.connect("host='localhost' dbname='central' user='postgres' password='tpltrakker'")
    run_sql_file("script.sql", connection)
    connection.close()

if __name__ == "__main__":
    main()

What is wrong with my code?

Shahzad Bacha
  • 226
  • 2
  • 12
  • probably not helpful, but the code looks good to me. You are opening the file with mode 'a' as is done [here](http://stackoverflow.com/questions/2363731/append-new-row-to-old-csv-file-python) – Matthias Aug 25 '16 at 10:57
  • Executing a script in the single call you'll get the result only for the last command executed (or error if something goes wrong). – Abelisto Aug 25 '16 at 11:42
  • All the queries have the same number of columns and are they of the same type (and order)? – Clodoaldo Neto Aug 25 '16 at 11:46
  • @ClodoaldoNeto No the queries have different number of columns and have different type i.e one table is lines and another is polygons data – Shahzad Bacha Aug 25 '16 at 11:49
  • @Abelisto so what are you suggesting? should i create sql file for every single query? – Shahzad Bacha Aug 25 '16 at 11:53
  • So how do you intend to use a single csv file with different lines' layouts? – Clodoaldo Neto Aug 25 '16 at 11:54
  • 1
    There is a lot of suggestions: single query per file; parse the file for every single command; ... My suggestion is to use [cursors](https://www.postgresql.org/docs/current/static/sql-declare.html) or [prepared statements](https://www.postgresql.org/docs/current/static/sql-prepare.html) instead of the simple queries and then fetch the results using its names. – Abelisto Aug 25 '16 at 11:57
  • @ClodoaldoNeto Does it matter? – Shahzad Bacha Aug 25 '16 at 12:00
  • If a single csv file with multiple layouts is what you really want then say so because the solution is different and more complex. – Clodoaldo Neto Aug 25 '16 at 12:07
  • @Abelisto can i parse the file for every query? If yes, please guide me. Thank you – Shahzad Bacha Aug 25 '16 at 12:08
  • It seems like csv.writer is ignoring your open mode and is using 'w' anyway. Can you please try open('Report.csv', 'ab') instead of 'a'? – Andrey Vykhodtsev Aug 26 '16 at 10:59
  • thanks, i tried it already but the issue is still there – Shahzad Bacha Aug 26 '16 at 11:34

2 Answers2

1

This is the simplest to output each query as a different file. copy_expert

query = '''
    select p.*, c.name
    from
        scale_polygons_v3 c
        inner join
        cartographic_v3 p on metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20) and c.kind not in (9,10)
    where ST_Contains(c.geom, p.geom)
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()

query = '''
    select sp.areaid,sp.name_displ,p.road_id,p.name
    from scale_polygons_v3 sp, pak_roads_20162207 p
    where st_contains(sp.geom,p.geom) and sp.kind = 1 and p.areaid != sp.areaid;
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report2.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()

If you want to append the second output to the same file then just keep the first file object opened.

Notice that it is necessary that copy outputs to stdout to make it available to copy_expert

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • This worked for me Ok, but i don't understand the code as i have to do some further modification – Shahzad Bacha Aug 26 '16 at 11:35
  • 1
    @ShahzadBacha You mean why `copy` to `stdout`? I updated the answer. Otherwise what part you do not understand? – Clodoaldo Neto Aug 26 '16 at 12:19
  • Actually this should be a separate question but anyway how can i modify the code, also to count the total number of rows returned from the query and only write limited rows like `LIMIT 10` at the end of the query – Shahzad Bacha Aug 26 '16 at 13:05
1

If you are able to change the SQL script a bit then here is a workaround:

#!/usr/bin/env python

import psycopg2

script = '''
    declare cur1 cursor for
        select * from (values(1,2),(3,4)) as t(x,y);

    declare cur2 cursor for
        select 'a','b','c';
    '''
print script

conn = psycopg2.connect('');

# Cursors exists and available only inside the transaction
conn.autocommit = False;

# Create cursors from script
conn.cursor().execute(script);

# Read names of cursors
cursors = conn.cursor();
cursors.execute('select name from pg_cursors;')
cur_names = cursors.fetchall()

# Read data from each available cursor
for cname in cur_names:
    print cname[0]
    cur = conn.cursor()
    cur.execute('fetch all from ' + cname[0])
    rows = cur.fetchall()
    # Here you can save the data to the file
    print rows


conn.rollback()

print 'done'

Disclaimer: I am totally newbie with Python.

Abelisto
  • 14,826
  • 2
  • 33
  • 41