6

I'm trying to import 200 SAS XPT files to my PostgreSQL database:

engine = create_engine('postgresql://user:pwd@server:5432/dbName')
for file in listdir(dataPath):
    name, ext = file.split('.', 1)
    with open(join(dataPath, file), 'rb') as f:
        xport.to_dataframe(f).to_sql(name, engine, schema='schemaName', if_exists='replace', index=False)
    print("Successfully wrote ", file, " to database.")

However, the SQL generated has double quotation marks for all identifiers, for example: CREATE TABLE "Y2009"."ACQ_F" ("SEQN" FLOAT(53), "ACD010A" FLOAT(53));. The problem is, if the column / table / schema is created with quotation marks, every time I need to query them, I must include the quotation marks as well, at the same time use the exact capitalization.

I want to get rid of quotation marks, while I cannot write custom SQLs myself, because these files each has very different structure.

Ryan
  • 1,040
  • 3
  • 16
  • 25
  • 1
    If you ensure that the schema, table, and column names are all lower-case, then you can write queries without double-quotes, regardless of how SQLAlchemy creates the table. – cco Jan 03 '18 at 03:18
  • 2
    If you are reading SAS version 5 transport files then all of the variable (column) names will be in uppercase. You should add a step in your process to change the names to lowercase before pushing into PostgreSQL Then you don't need to worry that quoted identifiers will be used to create them. Note you will still need to use quoted identifiers to refer to columns with names like "from", "select" or other PostgreSQL keywords. – Tom Jan 03 '18 at 03:28
  • @Tom thanks that solved my problem. To help others better understand this: Postgres requires uppercase table / column names to be quoted ([reference](https://stackoverflow.com/a/6331643/6305831)). – Ryan Jan 03 '18 at 04:43
  • @Tom could you please add the answer below so I can mark this question as answered? Thanks! – Ryan Jan 03 '18 at 04:46
  • @Ryan It would be better if someone that actually knows HOW to change the case of the variable names in Python could post a working answer. – Tom Jan 03 '18 at 14:24

1 Answers1

5

PostgreSQL requires uppercase table / column names to be quoted (reference). That is why identifiers in the SQL constructed by SQLalchemy are quoted. To avoid this, convert the column names of the dataframe to all lowercase:

with open(join(dataPath, file), 'rb') as f:
     data = xport.to_dataframe(f)
     data.columns = map(str.lower, data.columns)
     data.to_sql(name.lower(), engine, schema='y2007')
Ryan
  • 1,040
  • 3
  • 16
  • 25