5

In a previous programme I was reading data from a csv file like this:

AllData = np.genfromtxt(open("PSECSkew.csv", "rb"),
                        delimiter=',',
                        dtype=[('CalibrationDate', datetime),('Expiry', datetime), ('B0', float), ('B1', float), ('B2', float), ('ATMAdjustment', float)],
                        converters={0: ConvertToDate, 1: ConvertToDate})

I'm now writing an incredibly similar programme but this time I want to get a really similar data structure to AllData (except the floats will all be in a csv string this time) but from SQL Server instead of a csv file. What's the best approach?

pyodbc looks like it involves using cursors a lot which I'm not familiar with and would like to avoid. I just want to run the query and get the data in a structure like above (or like a DataTable in C#).

Dan
  • 45,079
  • 17
  • 88
  • 157
  • Can you be more specific about what you need? You have two main options: query the database with pyodbc or some other library, get a result set and convert it into a Python data structure; or use database-specific tools (e.g. bcp.exe) to export the data to a flat file then read the file in Python. It isn't really clear why pyodbc isn't suitable for you, it's easy enough to use and you could use it with sqlalchemy if you prefer an ORM. – Pondlife May 07 '13 at 16:02
  • @Pondlife pyodbc may very well be suitable, could you point me to an example of how to use it to get the data from the query into a data structure similar to what genfromtext creates all in one go rather than having a cursor and having to navigate it record by record? I don't want to take the flat file approach. – Dan May 07 '13 at 16:56
  • I know nothing about numpy, although I've used pyodbc and sqlalchemy successfully with SQL Server. But queries always return result sets (a cursor in pyodbc), so if you want to have that result set in another data structure then you'll have to transform it somehow. You might want to ask a question about how to implement the equivalent of `numpy.genfromcursor()` and perhaps someone with more knowledge will be able to help. – Pondlife May 07 '13 at 18:42
  • @Pondlife thanks, could you maybe post a small example transforming a general result set from pyodbc to anything, like a tuple array or something maybe? I'm sure I'll be able to adapt that to a numpy structure. Thanks – Dan May 08 '13 at 06:25
  • @Pondlife actually I think this may be what I'm looking for: http://stackoverflow.com/questions/7061824/whats-the-most-efficient-way-to-covert-mysql-output-into-a-numpy-array-in-pytho going to give it a try – Dan May 08 '13 at 07:09
  • No luck :( couldn't get it to work – Dan May 08 '13 at 12:30

4 Answers4

4

Here's a minimal example, based on the other question that you linked to:

import pyodbc
import numpy

conn = pyodbc.connect('DRIVER={SQL Server};SERVER=MyServer;Trusted_Connection=yes;')
cur = conn.cursor()
cur.execute('select object_id from sys.objects')
results = cur.fetchall()
results_as_list = [i[0] for i in results]
array = numpy.fromiter(results_as_list, dtype=numpy.int32)
print array
Community
  • 1
  • 1
Pondlife
  • 15,992
  • 6
  • 37
  • 51
  • if I define `results_as_list` like this `[item for sublist in results for item in sublist]` (form here: http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python) then I can get this to work for the float columns after calling `np.reshape(array, [-1, 4])`, thanks! But is there a way to handle the date columns as well? I tried `dtype=[date, date, float, float, float, float]` but no luck :( – Dan May 08 '13 at 13:55
  • @Dan I don't know enough about Numpy to answer that directly, but the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html) says that you can initialize a Numpy `datetime` type from from a date in `YYYY-MM-DD` format, so perhaps you could try [`CONVERT`ing](http://msdn.microsoft.com/en-us/library/ms187928.aspx) the SQL Server `datetime` values to that format in your query: `select convert(char(10), DateTimeColumn, 21)`? – Pondlife May 08 '13 at 14:29
  • Thanks for the help, I couldn't use this method because I can't work out how to get it to accept multiple data types, but I took your advice about just iterating with the cursor in order to 'transform' it to my desired data structure and finally got there. Solution provide. Thanks again! – Dan May 08 '13 at 15:41
3

In the mean time, there is a better way. Check out the turbodbc package. To transform your result set into an OrderedDict of NumPy arrays, just do this:

import turbodbc
connection = turbodbc.connect(dsn="My data source name")
cursor = connection.cursor()
cursor.execute("SELECT 42")
results = cursor.fetchallnumpy()

It should also be much faster than pyodbc (depending on your database, factor 10 is absolutely possible).

Michael Koenig
  • 364
  • 2
  • 9
2

How about using pandas? For example:

import psycopg2
import pandas

try :
    con = psycopg2.connect(
    host = "host",
    database = "innovate",
    user = "username",
    password = "password")
except:
    print "Could not connect to database."

data = pandas.read_sql_query("SELECT * FROM table", con)
ste_j
  • 115
  • 1
  • 8
0

In the end I just used pyodbc and iterated through the cursor / result set put each result in a manually constructed structured array through a lot of trial and error. If there is a more direct way, I'm all ears!

import numpy as np
import pyodbc as SQL
from datetime import datetime


cxn = SQL.connect('Driver={SQL Server};Server=myServer; Database=myDB; UID=myUserName; PWD=myPassword')
c = cxn.cursor()

#Work out how many rows the query returns in order to initialise the structured array with the correct number of rows
num_rows = c.execute('SELECT count(*) FROM PSECSkew').fetchone()[0]

#Create the structured array
AllData = np.zeros(num_rows, dtype=[('CalibrationDate', datetime),('Expiry', datetime), ('B0', float), ('B1', float), ('B2', float), ('ATMAdjustment', float)])

ConvertToDate = lambda s:datetime.strptime(s,"%Y-%m-%d")

#iterate using the cursor and fill the structred array.
r = 0
for row in c.execute('SELECT * FROM PSECSkew ORDER BY CalibrationDate, Expiry'):
    AllData[r] = (ConvertToDate(row[0]), ConvertToDate(row[1])) + row[2:] #Note if you don't need manipulate the data (i.e. to convert the dates in my case) then just tuple(row) would have sufficed
    r = r + 1
Dan
  • 45,079
  • 17
  • 88
  • 157