10

I have found some other questions that have a similar error to what I am getting, but have not been able to figure out how to resolve this based on the answers. I am trying to import an excel file into SQL Server with the help of python. This is the code I wrote:

import pandas as pd
import numpy as np
import pandas.io.sql
import pyodbc
import xlrd

server = "won't disclose private info"
db = 'private info'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + Server + ';DATABASE=' + 
db + ';Trusted_Connection=yes')

cursor = conn.cursor()
book = xlrd.open_workbook("Daily Flash.xlsx")
sheet = book.sheet_by_name("Sheet1")

query1 = """CREATE TABLE [LEAF].[MK] ([LEAF][Lease_Number] varchar(255), 
[LEAF][Start_Date] varchar(255), [LEAF][Report_Status] varchar(255), [LEAF] 
[Status_Date] varchar(255), [LEAF][Current_Status] varchar(255), [LEAF] 
[Sales_Rep] varchar(255), [LEAF][Customer_Name] varchar(255),[LEAF] 
[Total_Finance] varchar(255),
[LEAF][Rate_Class] varchar(255) ,[LEAF][Supplier_Name] varchar(255) ,[LEAF] 
[DecisionStatus] varchar(255))"""


query = """INSERT INTO [LEAF].[MK] (Lease_Number, Start_Date, Report_Status, 
Status_Date, Current_Status, Sales_Rep, Customer_Name,Total_Finance,
Rate_Class,Supplier_Name,DecisionStatus) VALUES (%s, %s, %s, %s, %s, %s, %s, 
%s, %s, %s, %s)"""

for r in range(1, sheet.nrows):
    Lease_Number  = sheet.cell(r,0).value
    Start_Date    = sheet.cell(r,1).value
    Report_Status = sheet.cell(r,2).value
    Status_Date   = sheet.cell(r,3).value
    Current_Status= sheet.cell(r,4).value
    Sales_Rep     = sheet.cell(r,5).value
    Customer_Name = sheet.cell(r,6).value
    Total_Financed= sheet.cell(r,7).value
    Rate_Class    = sheet.cell(r,8).value
    Supplier_Name = sheet.cell(r,9).value
    DecisionStatus= sheet.cell(r,10).value


    values = (Lease_Number, Start_Date, Report_Status, Status_Date, 
    Current_Status, Sales_Rep, Customer_Name, Total_Financed, Rate_Class, 
    Supplier_Name, DecisionStatus)

    cursor.execute(query1)

    cursor.execute(query, values)


database.commit()


database.close()


database.commit()

The error message I get is:

ProgrammingError                          Traceback (most recent call last)
<ipython-input-24-c525ebf0af73> in <module>()
 16 
 17     # Execute sql Query
 ---> 18     cursor.execute(query, values)
 19 
 20 # Commit the transaction

 ProgrammingError: ('The SQL contains 0 parameter markers, but 11 parameters 
 were supplied', 'HY000')

Can someone please explain the problem to me and how I can fix it? Thank you!

Update:

I have gotten that error message to go away based on the comments below. I modified my query also because the table into which I am trying to insert values into was not previously created, so I updated my code in an attempt to create it.

However, now I am getting the error message:

ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL 
Server]The specified schema name "dbo" either does not exist or you do not 
have permission to use it. (2760) (SQLExecDirectW)')

I tried changing that slightly by writing CREATE [HELLO][MK] instead of just CREATE MK but that tells me that MK is already in the database... What steps should I take next?

bernando_vialli
  • 947
  • 4
  • 12
  • 27
  • 5
    Instead of using `%s` as the placeholders, use `?` instead. – Scratch'N'Purr Jul 10 '18 at 15:29
  • Thank you that got rid of that error message and created a new one: ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'orders'. (208) (SQLExecDirectW)"). What can I do about that? – bernando_vialli Jul 10 '18 at 15:31
  • I just realized 'Orders' that is listed as invalid object is the name I created for the table I want to create in SQL. Why is that invalid, what should it be renamed to? – bernando_vialli Jul 10 '18 at 15:36
  • `ORDER` is an SQL keyword (as in "ORDER BY"), but I wouldn't normally expect `orders` to be a problem. In any case, try using `[orders]` in your SQL command text. – Gord Thompson Jul 10 '18 at 15:38
  • just gave that a shot, that didn't change anything, thank you regardless. Any other ideas? – bernando_vialli Jul 10 '18 at 15:40
  • question, does the code I have create a table as well or does it only insert in a previously existing table? – bernando_vialli Jul 10 '18 at 15:48
  • That form of INSERT statement will only insert into an existing table; it won't automatically create a new table. – Gord Thompson Jul 10 '18 at 15:49
  • so I added to my code an extra query: query1 = """CREATE TABLE orders (Lease_Number varchar(255), Start_Date varchar(255), Report_Status varchar(255), Status_Date varchar(255), Current_Status varchar(255), Sales_Rep varchar(255), Customer_Name varchar(255),Total_Finance varchar(255), Rate_Class varchar(255) ,Supplier_Name varchar(255) ,DecisionStatus varchar(255))""" and also added cursor.execute(query1) should that resolve my problem, I am getting a new error now: – bernando_vialli Jul 10 '18 at 15:58
  • do you know if this error is because I do not have administrative privilges or because I am doing something wrong? – bernando_vialli Jul 10 '18 at 16:40
  • It looks like you created a table called `MK` in schema `LEAF`, but you want to insert into table `orders` which is in schema `dbo` by default. Therefore, you are referencing two different tables. You may also have to commit the query after your `CREATE TABLE` statement or your subsequent `INSERT` queries won't find the table. – Scratch'N'Purr Jul 10 '18 at 17:32
  • @Scratch'N'Purr sorry, I forgot to update the other name, I just updated them both now. Orders was just a copy and paste from a tutorial I tried using, I actually decided to use "MK" for the table name and it is telling me that there is an object named "MK" already in the database despite my CREATE TABLE and INSERT INTO both being into [LEAF].[MK] – bernando_vialli Jul 10 '18 at 17:36
  • @Parfait, I think its relevant because that table did not previously exist in the database – bernando_vialli Jul 10 '18 at 17:39
  • @mkheifetz Ok, if your table already exists, then you don't need to create it again. Just make sure your column names match up with the column definitions when you created your table. – Scratch'N'Purr Jul 10 '18 at 17:48
  • @Scratch'N'Purr see the thing is it doesn't exist, I don't know why it is showing that it exists because it doesn't... Guess I should have specified that in my previous comment. I don't get why it is saying that it exists – bernando_vialli Jul 10 '18 at 17:49
  • it seems like regardless of what I put in there it says the same thing, whether it's 'MK" or any other letters, it gives the same error message – bernando_vialli Jul 10 '18 at 17:53
  • any thoughts on this? – bernando_vialli Jul 10 '18 at 18:29
  • You are insert into `[LEAF].[MK]` right? And you have already committed the `CREATE TABLE` transaction with `conn.commit()` right? – Scratch'N'Purr Jul 10 '18 at 18:41
  • for the later, I have database.commit() – bernando_vialli Jul 10 '18 at 18:44
  • Use `conn.commit()` since that's the connector object that you created earlier. Also, don't put `cursor.execute(query1)` in your `for` loop because that's asking Python to create the table at each row. Put it before your `for` loop and then commit the table creation with `conn.commit()`. You might have to flush your journal first since there's probably a pending transaction for creating your table. – Scratch'N'Purr Jul 10 '18 at 18:55
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174751/discussion-between-mkheifetz-and-scratchnpurr). – bernando_vialli Jul 10 '18 at 19:07

1 Answers1

11

Based on the conversation we had in our chat, here are a few takeaways:

  1. After executing your CREATE TABLE query, make sure to commit it immediately before running any subsequent INSERT queries.
  2. Use error catching for cases when the table already exists in the database. You asked that if you wanted to import more data to the table, would the script still run. The answer is no, since Python will throw an exception at cursor.execute(query1).
  3. If you want to validate whether your insert operations were successful, you can do a simple record count check.

EDIT Yesterday, when I had @mkheifetz test my code out, he caught a minor bug where the validation check would return False, and the reason was because the database already had existing records, so when comparing against only the current data being imported, the validation would fail. Therefore, as a solution to address the bug, I have modified the code again.

Below is how I would modify your code:

import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt

import pandas.io.sql
import pyodbc

import xlrd
server = 'XXXXX'
db = 'XXXXXdb'

# create Connection and Cursor objects
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
cursor = conn.cursor()

# read data
data = pd.read_excel('Flash Daily Apps through 070918.xls')

# rename columns
data = data.rename(columns={'Lease Number': 'Lease_Number',
                            'Start Date': 'Start_Date',
                            'Report Status': 'Report_Status',
                            'Status Date': 'Status_Date',
                            'Current Status': 'Current_Status',
                            'Sales Rep': 'Sales_Rep',
                            'Customer Name': 'Customer_Name',
                            'Total Financed': 'Total_Financed',
                            'Rate Class': 'Rate_Class',
                            'Supplier Name': 'Supplier_Name'})

# export
data.to_excel('Daily Flash.xlsx', index=False)

# Open the workbook and define the worksheet
book = xlrd.open_workbook("Daily Flash.xlsx")
sheet = book.sheet_by_name("Sheet1")

query1 = """
CREATE TABLE [LEAF].[ZZZ] (
    Lease_Number varchar(255),
    Start_Date varchar(255),
    Report_Status varchar(255),
    Status_Date varchar(255),
    Current_Status varchar(255),
    Sales_Rep varchar(255),
    Customer_Name varchar(255),
    Total_Finance varchar(255),
    Rate_Class varchar(255),
    Supplier_Name varchar(255),
    DecisionStatus varchar(255)
)"""

query = """
INSERT INTO [LEAF].[ZZZ] (
    Lease_Number,
    Start_Date,
    Report_Status,
    Status_Date,
    Current_Status,
    Sales_Rep,
    Customer_Name,
    Total_Finance,
    Rate_Class,
    Supplier_Name,
    DecisionStatus
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"""

# execute create table
try:
    cursor.execute(query1)
    conn.commit()
except pyodbc.ProgrammingError:
    pass

# grab existing row count in the database for validation later
cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
before_import = cursor.fetchone()

for r in range(1, sheet.nrows):
    Lease_Number = sheet.cell(r,0).value
    Start_Date = sheet.cell(r,1).value
    Report_Status = sheet.cell(r,2).value
    Status_Date = sheet.cell(r,3).value
    Current_Status= sheet.cell(r,4).value
    Sales_Rep = sheet.cell(r,5).value
    Customer_Name = sheet.cell(r,6).value
    Total_Financed= sheet.cell(r,7).value
    Rate_Class = sheet.cell(r,8).value
    Supplier_Name = sheet.cell(r,9).value
    DecisionStatus= sheet.cell(r,10).value

    # Assign values from each row
    values = (Lease_Number, Start_Date, Report_Status, Status_Date, Current_Status,
              Sales_Rep, Customer_Name, Total_Financed, Rate_Class, Supplier_Name,
              DecisionStatus)

    # Execute sql Query
    cursor.execute(query, values)

# Commit the transaction
conn.commit()

# If you want to check if all rows are imported
cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
result = cursor.fetchone()

print((result[0] - before_import[0]) == len(data.index))  # should be True

# Close the database connection
conn.close()
Scratch'N'Purr
  • 9,959
  • 2
  • 35
  • 51
  • 2
    thank you very much for your help! You are an amazing asset on stackoverflow! – bernando_vialli Jul 10 '18 at 20:22
  • @mkheifetz I just made some changes to my response to address the validation bug that you caught last night! It should return `True` now. – Scratch'N'Purr Jul 11 '18 at 11:38
  • thank you! Do you by any chance know, I have been trying to fix my 2 date columns where I changed them from just saying Start_Date to showing Start_Date datetime DEFAULT(getdate()) in the insert. The bizzare thing is when I check my SQL table, there are dates but they are off for some reason, and it seems like usually or always its by exactly by 2 days. For example, in my excel file it is 2018-07-09 00:00:00 in SQL it's 07-11 instead of 07-09, very strange – bernando_vialli Jul 11 '18 at 12:35
  • @mkheifetz That's interesting. It might be a difference in how Excel serializes numbers to dates and how MSSQL serializes numbers to dates. Here's a [post](https://stackoverflow.com/questions/27065040/sql-server-and-excel-float-to-date-offset) on this. What you could try (and it will involve some major code changes) is to use SQLAlchemy and then use panda's [`to_sql`](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.to_sql.html) method to dump the data to the db. The method will do the date abstractions for you. – Scratch'N'Purr Jul 11 '18 at 12:53
  • interesting, but the to_sql method is only applicable to MySQL, not SQL Server, do I understand that correctly? Does this code from this other stack overflow question: https://stackoverflow.com/questions/48689682/exporting-pandas-dataframe-to-mysql-using-sqlalchemy – bernando_vialli Jul 11 '18 at 13:24
  • 1
    It will work with MSSQL, but you'll need the SQLAlchemy package to create your MSSQL engine. There's some info in their [docs](http://docs.sqlalchemy.org/en/latest/dialects/mssql.html#module-sqlalchemy.dialects.mssql.pyodbc) for creating one. – Scratch'N'Purr Jul 11 '18 at 13:28
  • question, is it sufficient to just do this with my dataframe and it will be in SQL: https://stackoverflow.com/questions/43453420/import-csv-to-database-using-sqlalchemy engine = create_engine('sqlite:///cdb.db') Base.metadata.create_all(engine) file_name = 'client_db.csv' df = pandas.read_csv(file_name) df.to_sql(con=engine, index_label='id', name=cdb1.__tablename__, if_exists='replace') – bernando_vialli Jul 11 '18 at 15:15
  • 1
    No, the example you provided uses a SQLite database as the backend. Since your database is MSSQL, your connection string should look like `engine = create_engine("mssql+pyodbc://username:password@myhost:port/databasename?driver=SQL+Server+Native+Client+10.0")`. Once you have that, then your export code would look like `data.to_sql(con=engine, index=False, if_exists='append')`, where `data` is the FIRST excel file that you read. You don't have to create an additional excel file. Make sure to change `SQL+Server+Native+Client+10.0` to whichever SQL Server ODBC driver you have installed. – Scratch'N'Purr Jul 11 '18 at 15:25
  • sorry for the stupid question, how do I determine what goes inside the create_engine() arguments? How is it different from what goes into pyodbc.connect()? – bernando_vialli Jul 11 '18 at 15:31
  • Well, if you look at the connection string, just replace the `username`, `password`, `myhost`, `port`, and `databasename` with your credentials, and the server info. `myhost` would be the `server` variable, and `databasename` would be your `db` variable. If the server doesn't need credentials, you can leave it out. – Scratch'N'Purr Jul 11 '18 at 16:51
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174831/discussion-between-mkheifetz-and-scratchnpurr). – bernando_vialli Jul 11 '18 at 17:08
  • what would I put for port and is it possible not to have a password? When I login to the SQL database, there is no password – bernando_vialli Jul 11 '18 at 20:27
  • If you dont have a password, you can leave it out. You should also leave out the user, e.g. `mssql+pyodbc://@myhost:port/database?driver=SQL+Server`. With the port, your MSSQL might be running on the default port of 1433, but if it's not the case, you'd have to ask the system admin who installed MSSQL. – Scratch'N'Purr Jul 12 '18 at 06:52
  • This is the code I have in there now: engine = create_engine("mssql+pyodbc://LEAFNOW\mkheifetz:@LCCPHL-PDSQL003/LEAFCore?driver=SQL+Server+Native+Client+10.0") When I run it I get a: NameError: name 'create_engine' is not defined – bernando_vialli Jul 12 '18 at 12:31
  • Let's continue discussion in chat – Scratch'N'Purr Jul 12 '18 at 13:35