Pandas dataframe for loading into SQL server studio through Import wizard

Question

I am new to Python as well as SQL server studio. I have a scrapping code in python which collects data off the internet, saves it into pandas data frame, which eventually writes the data into csv. It scraped 100 rows at a time and saves it into csv. I get 100 such csv's which I append together, and convert into one big file. I do need the 100 separate csv.

Now, when I am trying to load this merged csv into SQL server management studio, through the import wizard, I get the truncation error. I checked and found a few rows having a column with string length more than 256 characters.

Is there a work around through this. An alternative I though was to limit the character length to 256 characters while exporting it to csv(which I know is not an optimal solution but it will work for me), how can I achieve that while writing the pandas data frame to csv. Is there any other format I can save it in and then load it through the windows import wizard.

Any help would be appreciated as I am really new to databases and python. Thank you in advance.

score 0 · Answer 1 · answered Nov 06 '17 at 17:27

0

Consider pandas' to_sql method layered on the SQLAlchemy API where you can append or replace data in database table. SQLAlchemy interfaces with a standard Python-SQL Sever DB-API such as pyodbc or pymssql. So install either before use.

import pyodbc
import sqlalchemy

engine = sqlalchemy.create_engine("mssql+pyodbc://user:pwd@server/database", echo=False)

mydataframe.to_sql('mytable', con=engine, if_exists='replace')

engine.dispose()

Unlike various import wizards, this method does a straight SQL connection between pandas dataframe and your SQL Server database table. With replace, the method actually runs DROP TABLE ..., then CREATE TABLE ..., finally INSERT INTO ... using data types of pandas dataframe or as specified by user in dtype argument of to_sql.

answered Nov 06 '17 at 17:27

Parfait

104,375
17
94
125

The only caveat, from my experience, is that the direct to_sql method is fine only for small data sets. When it becomes a larger set of items (>10000 rows) the to_sql takes a LONG time. I have preferred to dump to CSV files using the same approach, but with the to_csv method, and then using the database importer to import the CSV files from a location on the same network as the database. – dg3781 Nov 06 '17 at 17:31
@Parfait I will try that, can you confirm the arguments I need to pass with example, like where do I pass my username, server etc. is this right ("mssql+pyodbc://my_user_name:my_password@my_server/my_database", echo=False) and then mydataframe.to_sql('my_database', con=engine, if_exists='replace') I want my database and table name to be same – Neil S Nov 06 '17 at 17:47
That looks to be correct. Give it a try and report back. However, you are not dumping into a database but a database table. So, *my_database* should be *my_table*. – Parfait Nov 06 '17 at 18:03
And @dg3781 ... Interesting point. Other factors can play into it including local/remote server connections, number of columns and data types, etc. Of course too *LONG* is relative. Some can wait 5 mins. for bulk process! – Parfait Nov 06 '17 at 18:05
@Parfait Initially it gave me this error `C:\ProgramData\Anaconda2\lib\site-packages\sqlalchemy\connectors\pyodbc.py:92: SAWarning: No driver name specified; this is expected by PyODBC when using DSN-less connections "No driver name specified; "` for which i added `?driver=SQL+Server+Native+Client+11.0", echo=False)` like this `engine = sqlalchemy.create_engine("mssql+pyodbc://user:pwd@server/database?driver=SQL+Server+Native+Client+11.0", echo=False)` but now, I have this error – Neil S Nov 06 '17 at 18:49
`sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]Named Pipes Provider: Could not open a connection to SQL Server [53]. (53) (SQLDriverConnect); [HYT00] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see ` – Neil S Nov 06 '17 at 18:52
SQL Server Books Online. (53)') – Neil S Nov 06 '17 at 18:52
This is not a python error but database connection issue. Make sure MSSQLSERVER service is running. Check this [answer](https://stackoverflow.com/a/34573742/1422451). – Parfait Nov 21 '17 at 19:10
From your recent [question](https://stackoverflow.com/q/47416699/1422451), try this connection: `engine = sqlalchemy.create_engine("mssql+pyodbc://neils:neil!pass@bidept/BIDB", echo=False)` – Parfait Nov 21 '17 at 19:12

Pandas dataframe for loading into SQL server studio through Import wizard

1 Answers1