0

I am trying to do a script in python that will fetch data from table one and input into another table. Sort of like an ETL of some sorts. I am however running into this SyntaxError: unexpected EOF while parsing error. I am sort of harzading my way around and trying to use techniques that I have seen others use so I don't really know my way around much.

Here's my code so far:

import psycopg2
try:
    connectionone = psycopg2.connect(user = "postgres",
                      password = "xxxxxx",
                      host = "127.0.0.1",
                      port = "5432",
                      database = "xxxxxx")
    connectiontwo = psycopg2.connect(user = "postgres",
                      password = "xxxxxx",
                      host = "127.0.0.1",
                      port = "5432",
                      database = "xxxxxx")

    cursorsource = connectionone.cursor()
    cursordest = connectiontwo.cursor()
    #Truncating dest table
    print("Truncating Destination")
    cursordest.execute('delete from testarea.salepersons_2')
    connectiontwo.commit()

    #Fetch source data
    cursorsource.execute('SELECT sp_no, sp_name, sp_territory, sp_product, 
     active FROM testarea.salepersons_original;') 
    rows = cursorsource.fetchall()

    sql_insert = 'INSERT INTO testarea.salepersons_2 (sp_no, sp_name,  
      p_territory, sp_product, active) values '
    sql_values = ['(%s, %s, %s, %s, %s)'] 

    data_values = []

    batch_size = 1000 #customize for size of tables... 

    sql_stmt = sql_insert + ','.join(sql_values*batch_size) + ';'

    for i, row in enumerate(rows, 1):

                data_values += row[:5] #relates to number of columns (%s)
                if i % batch_size == 0:
                    cursordest.execute (sql_stmt , data_values )
                    cursordest.commit()
                    print("Inserting")
                    data_values = []

    if (i % batch_size != 0):
        sql_stmt = sql_insert + ','.join(sql_values*(i % batch_size)) + 
        ';'
        cursordest.execute (sql_stmt, data_values)
        print("Last Values ....")
        connectiontwo.commit()
except (Exception, psycopg2.Error) as error :
        print ("Error occured :-(", error)
finally:
    #closing database connection.
        if(connectionone):
            cursorsource.close()
            connectionone.close()
            print("PostgreSQL connection is closed")

    #closing database connection.
        if(connectiontwo):
            cursordest.close()
            connectiontwo.close()
            print("PostgreSQL connection is closed")
#close connections
cursorsource.close()
cursordest.close()
cursorsource.close()
cursordest.close()
kanataki
  • 433
  • 2
  • 21
  • 1
    `'SELECT sp_no, sp_name, sp_territory, sp_product, sp_territory, sp_product, active) values'` is a multi-line string. It needs to be contained in triple quotes. – roganjosh Feb 23 '19 at 18:06
  • The same problem applies with the next string. But then I get confused about the purpose of `sql_values = ['(%s, %s, %s, %s, %s)']` and the way you build the remainder of the string. – roganjosh Feb 23 '19 at 18:08
  • If you use triple quotes, you'll end up with an EOL in your string, which maybe you don't want. You can also make the substrings individual string literals, and add them together across the line boundary: So add a single quote followed by a plus sign to the first line and put a single quote at the front of the second line. - oh...and the second expression would have to be put in parentheses. – CryptoFool Feb 23 '19 at 18:11
  • Your `try` statement has no corresponding `else`, `except`, or `finally`, although that would be a general syntax error with no mention of EOF, so there's still some other problem, probably related to parentheses. – chepner Feb 23 '19 at 18:16
  • @Steve right - we have wires crossed sorry. Yes, you will have EOL and the missing piece was that SQL will tolerate it just fine. – roganjosh Feb 23 '19 at 18:21
  • To summarize @roganjosh's correct implication, "SQL will be fine with an EOL character, and so using triple quotes might be best here as it is the most straightforward and efficient change". - I'd +1 that all day – CryptoFool Feb 23 '19 at 18:28
  • BTW, there's a reason I wouldn't put a EOL in my SQL query string. Both my own code and SQL client code will often do logging that I'll be interested in. EOLs in log statements make logs more confusing, and harder to parse if reading them with some sort of tool. Same goes for looking at them in a debugger. I prefer to not introduce EOLs where they aren't serving any purpose. - code format and code logic should be kept separate – CryptoFool Feb 23 '19 at 18:36
  • @Steve that's a fair point. I am working on an answer because I think the next parts of the code are going off-course if I understand them correctly. I happen to route all my INSERT/UPDATE/DELETE etc queries through a common method on a webapp and then use custom formatting in that method to record them legibly because your observation does trash the logs. I accept that it's some overhead, but it's an intranet app thankfully so doesn't matter – roganjosh Feb 23 '19 at 18:40
  • I forgot to add the except finally part, I was really sleepy while doing this – kanataki Feb 24 '19 at 06:14

1 Answers1

0

The first issue is simple enough to solve. You have a multi-line string enclosed by only single quotes:

cursorsource.execute('SELECT sp_no, sp_name, sp_territory, sp_product, 
    active FROM testarea.salepersons_original;') 

You should enclose this in triple quotes, which will not impact the SQL execution:

cursorsource.execute("""SELECT sp_no, sp_name, sp_territory, sp_product, 
                        active FROM testarea.salepersons_original;""") 

The remaining code is hard for me to follow. I doubt you actually have a table with 5000 columns, so I think you're trying to do 1000 inserts of rows containing 5 values. I can only give a general approach for this, if my understanding is correct:

import random
import string

# Create some fake data to visualise
fake_data = [random.choice(list(string.ascii_letters)) for x in range(50)]

# Chunk the data (https://stackoverflow.com/a/1751478/4799172)
# This reshapes it into sublists each of length 5.
# This can fail if your original list is not a multiple of 5, but I think your
# existing code will still throw the same issues.
def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))


chunked_data = chunks(fake_data, 5)

sql_insert = """INSERT INTO testarea.salepersons_2 (sp_no, sp_name, 
                sp_territory, sp_product, active) values (?, ?, ?, ?, ?)"""

# Use executemany, not execute in a loop, to repeat for each sublist in 
# chunked_data
cursor.executemany(sql_insert, chunked_data)

Note that, in this case, I'm using a parameterized query to guard against SQL injection (I'm using ? as placeholders for values). Different libraries have different placeholders; for example, the MySQL wrappers expect %s while SQLite expects ? - in this case I've used ? to remove ambiguity that it's not just regular string formatting, but you may have to change back to %s.

roganjosh
  • 12,594
  • 4
  • 29
  • 46
  • he part about the multiline string with only one quotation mark, this line fits well in sublime as a single line, I guess it got broken into multiple lines when I was indenting on Stack Overflow – kanataki Feb 24 '19 at 06:21