40

I am trying to understand how python could pull data from an FTP server into pandas then move this into SQL server. My code here is very rudimentary to say the least and I am looking for any advice or help at all. I have tried to load the data from the FTP server first which works fine.... If I then remove this code and change it to a select from ms sql server it is fine so the connection string works, but the insertion into the SQL server seems to be causing problems.

import pyodbc
import pandas
from ftplib import FTP
from StringIO import StringIO
import csv

ftp = FTP ('ftp.xyz.com','user','pass' )
ftp.set_pasv(True)
r = StringIO()
ftp.retrbinary('filname.csv', r.write)

pandas.read_table (r.getvalue(), delimiter=',')


connStr = ('DRIVER={SQL Server Native Client 10.0};SERVER=localhost;DATABASE=TESTFEED;UID=sa;PWD=pass')
conn = pyodbc.connect(connStr)

cursor = conn.cursor()
cursor.execute("INSERT INTO dbo.tblImport(Startdt, Enddt, x,y,z,)" "VALUES                  (x,x,x,x,x,x,x,x,x,x.x,x)")
cursor.close()
conn.commit()
conn.close()
print"Script has successfully run!"

When I remove the ftp code this runs perfectly, but I do not understand how to make the next jump to get this into Microsoft SQL server, or even if it is possible without saving into a file first.

DataSwede
  • 5,251
  • 10
  • 40
  • 66
andy redmayne
  • 403
  • 1
  • 4
  • 6
  • Do you know how to pass parameters to the `execute` function? If so, all you need to do is iterate over the rows of the `DataFrame` and, for each one, call `execute` and pass the row as the values for the SQL parameters. Or, if PyODBC supports `executemany`, that's even easier—just pass any iterable of rows, which you already have. – abarnert Sep 04 '14 at 09:28
  • That being said, is there a reason you're using Pandas instead of `csv` here? You're not actually doing anything Pandas-y with the data, just iterating over it. In a case like that, `csv` is a lot simpler, and doesn't require reading the whole thing into memory all at once. – abarnert Sep 04 '14 at 09:28
  • thanks for the reply im not really using pandas for any other reason than i read about it and it seemed logical to dump into a dataframe. My basic aim is to get the FTP data into SQL with CSV would this then only be possible by a CVS file after the event? idealy i'd like pull and push into SQL in one go. – andy redmayne Sep 04 '14 at 09:39
  • in terms of the data the files are quite small (around 10Kb) so reading into memory isnt an issue – andy redmayne Sep 04 '14 at 09:40
  • You don't need to create an actual CSV file; the `csv` module works just fine with a `StringIO`, as you're already doing, and I'm pretty sure Pandas does also. Either way, the goal is just to get some iterable of rows and insert each one by looping over the rows and calling `execute`. – abarnert Sep 04 '14 at 09:44
  • ok thanks given my limited knowledge of this what is the way through code to do this? also how does this work with PYODBC to insert the values? – andy redmayne Sep 04 '14 at 09:47
  • Did you read my first comment? You haven't responded to it, and it has the key to your question. – abarnert Sep 04 '14 at 10:24

9 Answers9

70

For the 'write to sql server' part, you can use the convenient to_sql method of pandas (so no need to iterate over the rows and do the insert manually). See the docs on interacting with SQL databases with pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql

You will need at least pandas 0.14 to have this working, and you also need sqlalchemy installed. An example, assuming df is the DataFrame you got from read_table:

import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine("mssql+pyodbc://<username>:<password>@<dsnname>")

# write the DataFrame to a table in the sql database
df.to_sql("table_name", engine)

See also the documentation page of to_sql.
More info on how to create the connection engine with sqlalchemy for sql server with pyobdc, you can find here:http://docs.sqlalchemy.org/en/rel_1_1/dialects/mssql.html#dialect-mssql-pyodbc-connect


But if your goal is to just get the csv data into the SQL database, you could also consider doing this directly from SQL. See eg Import CSV file into SQL Server

Community
  • 1
  • 1
joris
  • 133,120
  • 36
  • 247
  • 202
  • thanks for the info apologies on late reply I will work with this and I am sure it will be ok. – andy redmayne Sep 10 '14 at 15:17
  • i had to pass in engine.raw_connection() as explained here https://stackoverflow.com/questions/20401392/read-frame-with-sqlalchemy-mysql-and-pandas – Richard Blackman Dec 29 '14 at 05:04
  • 1
    @RichardBlackman That is not correct for pandas versions 0.14 and above. In that case, you should just pass the engine itself (but, the answer you linked to is relevant for pandas <= 0.13) – joris Dec 29 '14 at 10:55
  • 4
    @joris , please update the answer, the doc page linked suggests starting 1.0.0 you need to explicity specify a driver, `create_engine("mssql+pyodbc://scott:tiger@myhost:port/databasename?driver=SQL+Server+Native+Client+10.0")` I was trying to connect with the engine without specifying one and my to_sql kept complaining that I didn't choose a driver – Some Guy Mar 16 '17 at 21:19
  • @SomeGuy Thanks for noting, I updated the link to the sqlalchemy docs to link to a newer version. – joris Mar 17 '17 at 14:02
  • It may be obvious from previous answers, but in case of SSPI, engine = sqlalchemy.create_engine('mssql+pyodbc://{svr}/{db}?DRIVER=SQL+Server+Native+Client+11.0&trusted_connection=yes') – benik9 Mar 01 '23 at 17:51
30

Python3 version using a LocalDB SQL instance:

from sqlalchemy import create_engine
import urllib
import pyodbc
import pandas as pd

df = pd.read_csv("./data.csv")

quoted = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=(localDb)\ProjectsV14;DATABASE=database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))

df.to_sql('TargetTable', schema='dbo', con = engine)

result = engine.execute('SELECT COUNT(*) FROM [dbo].[TargetTable]')
result.fetchall()
Random
  • 4,519
  • 2
  • 38
  • 46
10

Yes, the bcp utility seems to be the best solution for most cases.

If you want to stay within Python, the following code should work.

from sqlalchemy import create_engine
import urllib
import pyodbc

quoted = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=YOUR\ServerName;DATABASE=YOur_Database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))

df.to_sql('Table_Name', schema='dbo', con = engine, chunksize=200, method='multi', index=False, if_exists='replace')

Don't avoid method='multi', because it significantly reduces the task execution time.

Sometimes you may encounter the following error.

ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request. (8003) (SQLExecDirectW)')

In such a case, determine the number of columns in your dataframe: df.shape[1]. Divide the maximum supported number of parameters by this value and use the result's floor as a chunk size.

3

I found that using bcp utility (https://learn.microsoft.com/en-us/sql/tools/bcp-utility) works best when you have a large dataset. I have 2.7 million rows that inserts at 80K rows/sec. You can store your data frame as csv file (use tabs for separator if your data doesn't have tabs and utf8 encoding). With bcp, I've used format "-c" and it works without issues so far.

  • bcp tends to throw errors with types and formatting in the csv file. Using a separator like "|" (remove it from data) works better than ",". It is better to have the target table as all varchar columns so that bcp doesn't throw errors. You can then run an "insert into" to your target table with the correct types by casting it. – Babu Arunachalam Oct 17 '18 at 17:20
3

This worked for me on Python 3.5.2:

import sqlalchemy as sa
import urllib
import pyodbc

conn= urllib.parse.quote_plus('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(conn))

frame.to_sql("myTable", engine, schema='dbo', if_exists='append', index=False, index_label='myField')
FedericoSala
  • 148
  • 6
1

"As the Connection represents an open resource against the database, we want to always limit the scope of our use of this object to a specific context, and the best way to do that is by using Python context manager form, also known as the with statement." https://docs.sqlalchemy.org/en/14/tutorial/dbapi_transactions.html

The example would then be

from sqlalchemy import create_engine
import urllib
import pyodbc
connection_string = (
    "Driver={SQL Server Native Client 11.0};"
    "Server=myserver;"
    "UID=myuser;"
    "PWD=mypwd;"
    "Database=mydb;"
)
quoted = urllib.parse.quote_plus(connection_string)
engine = create_engine(f'mssql+pyodbc:///?odbc_connect={quoted}')
with engine.connect() as cnn:
    df.to_sql('mytable',con=cnn, if_exists='replace', index=False)
ekraus
  • 124
  • 4
  • ERROR: function schema_name() does not exist;\nError while executing the query (1) (SQLExecDirectW)') [SQL: SELECT schema_name()] – thistleknot Sep 10 '22 at 15:39
0

Following is what worked for me using sqlalchemy. Pay attention to the last part ?driver=SQL+Server'.

import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine('mssql+pyodbc://MyUser:MyPWD@dataserver.sandbox.myserver/MY_DB?driver=SQL+Server')
dt.to_sql("PatientResultTest", engine,if_exists='append')

The SQL table needs an index column at the beginning to store the index value of dataframe.

LCJ
  • 22,196
  • 67
  • 260
  • 418
0
# using class function   
import pandas as pd
import pyodbc
import sqlalchemy
import urllib
class data_frame_to_sql():    
   def__init__(self,dataFrame,sql_table_name):
           self.dataFrame=dataFrame
           self.sql_table_name=sql_table_name
           def conversion(self):
       
           params = urllib.parse.quote_plus("DRIVER={SQL Server};"
                                            "SERVER=######;"
                                            "DATABASE=####;"
                                            "UID=#####;"
                                            "PWD=###;")
       
       
           try:
               engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
               return f"Table '{self.sql_table_name}' added sucsessfully in database" ,self.dataFrame.to_sql(self.sql_table_name, engine)
       
           except Exception as e :
               e=str(e).replace(".","")
               print(f"{e} in Database." )

data={"BusinessEntityID":["1","2","3"],"FirstName":["raj","abhi","amir"],"LastName":["kapoor","bachn","khhan"]}
df = pd.DataFrame(data, columns= ['BusinessEntityID','FirstName','LastName'])
ab=data_frame_to_sql(df,"ab").conversion()
print(ab)
Flair
  • 2,609
  • 1
  • 29
  • 41
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 18 '22 at 10:05
-2

It's not necessary to use sqlamchemy, one could create a connection with pyodbc directly to use it with pandas, as below:

with pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password) as newconn:
    df = pd.read_sql(<your sql goes here>,newconn)
Standin.Wolf
  • 1,224
  • 1
  • 10
  • 32
  • https://stackoverflow.com/q/71082494/2144390 – Gord Thompson May 17 '22 at 14:42
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 18 '22 at 05:35