1

I am trying to use sqlite3 to compute the average of a numpy.array and I would like to take advantage of the sum function.

So far I have taken advantage of this post : stackoverflow numpy.array which help me to store and retreive easily the arrays I need.

import sqlite3
import numpy
import io


def adapt_array(arr):
    out = io.BytesIO()
    numpy.save(out, arr)
    out.seek(0)
    a = out.read()
    return buffer(a)

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    return numpy.load(out)


sqlite3.register_adapter(numpy.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)

x1 = numpy.arange(12)
x2 = numpy.arange(12, 24)

con = sqlite3.connect(":memory:", detect_types = sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (idx int, arr array)")

cur.execute("insert into test (idx, arr) values (?, ?)", (1, x1))
cur.execute("insert into test (idx, arr) values (?, ?)", (2, x2))
cur.execute("select idx, sum(arr) from test")
data = cur.fetchall()
print data

but unfortunately the request output does not give me the sum of the arrays.

[2, (0.0))

I would like to go one step further and get directly the result I want from an sql request. Thanks.

Edit : after reading stackoverflow : manipulation of nyumpy.array witl sqlite3 I am more sceptical about the feasibility of this. Any way to get a result close to what I want would be appreciated.

Edit2 : in other words what I am trying to do is to redefine the sum function to the particular kind of data I am using. IS it doable ? That's what was done to compress / uncompress the numpy.array.

Community
  • 1
  • 1
Dvx
  • 289
  • 2
  • 10
  • As usual when something is not easily doable, it is because you are not using the right tools. Simply use pandas and the magic groupby function of DataFrame is the right way to do what I was trying to do at that time. – Dvx Oct 04 '15 at 20:23
  • i dont mean disrespect but you should run a memory profiler on a pandas dataframe to see how horribly bloated it is, took 7 mb of memory to have 100 rows and 8 columns with the data types 'U16,i8,f8,f8,f8,f8,f8,f8', a simple numpy array without the entire data frame in it takes 0.2 mb or so, it is almost 35x more performant and most operations that pandas provides such as rolling mean can be easily vectorized with strides – PirateApp May 15 '18 at 05:29

0 Answers0