4

As per Unload to S3 with Python using IAM Role credentials, the unload statement worked perfectly. So did other commands I tried, like copy and select statements.

However, I also tried to run a query which creates a table.. The create table query runs without error, but when it gets to the select statement, it throws an errors that relation "public.test" does not exist.

Any idea why is the table not created properly? Query below:

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
import config
import pandas as pd

#>>>>>>>> MAKE CHANGES HERE >>>>>>>>
DATABASE = "db"
USER = "user"
PASSWORD = getattr(config, 'password') #see answer by David Bern https://stackoverflow.com/questions/43136925/create-a-config-file-to-hold-values-like-username-password-url-in-python-behave/43137301
HOST = "host"
PORT = "5439"
SCHEMA = "public"      #default is "public"

########## connection and session creation ##########
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)

--create table example
query2 = '''\ 
create table public.test (
id integer encode lzo,
user_id integer encode lzo,
created_at timestamp encode delta32k,
updated_at timestamp encode delta32k
)
distkey(id)
sortkey(id)
'''

r2 = s.execute(query2)

--select example
query4 = '''\ 
select * from public.test
'''

r4 = s.execute(query4)

########## create DataFrame from SQL query output ##########
df = pd.read_sql_query(query4, connection_string)

print(df.head(50))

########## close session in the end ##########
s.close()

If I run the same directly in Redshift, it works just fine..

--Edit--

Some of the things tried:

  • Removing "\" from query string

  • adding ";" at the end of query string

  • changing "public.test" to "test"

  • removing SetPath = "SET search_path TO %s" % SCHEMA and s.execute(SetPath)

  • breaking the create statement- generates expected error

  • adding copy from S3 command after create- runs without error, but again no table created

  • adding a column to create statement that doesnt exist in the file that is generated from the copy command- generates expected error

  • adding r4 = s.execute(query4)- runs without error, but again created table not in Redshift

user8834780
  • 1,620
  • 3
  • 21
  • 48
  • Remove the "\" character from your query string. Multi-line strings start and end with triple quotes or triple single quotes. I think the backslash character is breaking your query. However, you should be getting an error if this is true. When you look at Redshift, do you see any new tables? – John Hanley Jan 12 '18 at 19:38
  • @JohnHanley The "\" character allows me to unload/create/select so that shouldn't be it, but after removing it still same issue. No the table is not being created from the script above. However running the create statement directly in redshift does create the table – user8834780 Jan 12 '18 at 19:54
  • Try changing "public.test" to "test" OR comment out the two lines that set the schema: SetPath = "SET search_path TO %s" % SCHEMA s.execute(SetPath) – John Hanley Jan 12 '18 at 19:59
  • @JohnHanley tried both- no go. Does it work if you run the same on your end? – user8834780 Jan 12 '18 at 20:03
  • Deliberately break the CREATE Table sql statement and verify that you are getting an error message. Example change "CREATE" to XXCREATE". – John Hanley Jan 12 '18 at 20:05
  • @JohnHanley as expected- I get syntax error at or near "xcreate" – user8834780 Jan 12 '18 at 20:06
  • @JohnHanley If I actually add a copy command after the create table to populate from S3 with the data- it also runs without issue, yet again going to Redshift- the table is not found there. Running create -> copy -> select gives same error as before (table doesnt exist) – user8834780 Jan 12 '18 at 20:14
  • I am trying to research this. The only thing that I can find on "relation does not exist" is that the table existed and was dropped. This does not make sense in the scope of your example. – John Hanley Jan 12 '18 at 20:19
  • Did you change both locations "public.test" to "test". Also all of my code has semicolons ";" at the end of the SQL statements. I have to leave for a conference call. I will check back in later this afternoon. I am very interested in the solution for this issue. – John Hanley Jan 12 '18 at 20:22
  • @Michael - sqlbot - Do you have any ideas for this issue? – John Hanley Jan 12 '18 at 20:24
  • @JohnHanley changed both, added ";" at end. Same situation. If I add r4 = s.execute(query4) it actually runs without errors, but again table is not actually found in Redshift and the create DataFrame from query fails too with same error. If I add a column to create statement that doesnt exist in the file that is generated from the "copy" command- I get an error as expected. Perhaps the table is being created elsewhere outside of Redshift?! – user8834780 Jan 12 '18 at 20:31
  • @JohnHanley I have the solution.. – user8834780 Jan 12 '18 at 21:24

1 Answers1

3

Apparently need to add s.commit() in order to create the table.. If you are populating it via copy command or insert into: then add it after the copy command (after the create table is optional). Basically, it does not auto commit for create/alter commands!

http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#session-faq-whentocreate http://docs.sqlalchemy.org/en/latest/core/connections.html#understanding-autocommit

user8834780
  • 1,620
  • 3
  • 21
  • 48
  • Thank you for posting the solution. I cannot believe that I overlooked the commit part. My code always includes the commit, I just did not notice it missing in the code. – John Hanley Jan 12 '18 at 23:20