0

I have an ndarray of np.float32 that is saved in a Postgres database in the bytea format:

import pandas as pd
import numpy as np
import sqlite3

myndarray=np.array([-3.55219245e-02, 1.33227497e-01, -4.96977456e-02, 2.16857344e-01], dtype=np.float32)
myarray=[myndarray.tobytes()]
mydataframe=pd.DataFrame(myarray, columns=['Column1'])
mydataframe.to_sql('mytable', sqlite3.connect("/tmp/floats.sqlite"))

In SQLITE3, this will produce:

CREATE TABLE IF NOT EXISTS "mytable" ("index" INTEGER, "Column1" TEXT);
INSERT INTO mytable VALUES(0,X'707f11bdca6c083edd8f4bbdda0f5e3e');

In Postgresql, this will produce:

mydatabase=# select * from mytable;
 index |              Column1
-------+------------------------------------
     0 | \x707f11bdca6c083edd8f4bbdda0f5e3e

Which format is bytea. How to convert that \x707f... back to myndarray? No expert here, I've found a lot of obscure documentation about frombuffer(), python2 buffer(), memoryview() but I am far from a proper result.

My best so far is:

np.frombuffer(bytearray('707f11bdca6c083edd8f4bbdda0f5e3e', 'utf-8'), dtype=np.float32)

which is completely wrong (myndarray has 4 values):

[2.1627062e+23 1.6690035e+22 3.3643249e+21 5.2896255e+22 2.1769183e+23
 1.6704162e+22 2.0823326e+23 5.2948159e+22]
RodolfoAP
  • 743
  • 1
  • 7
  • 18
  • 2
    it isn't clear to me, how are you retrieving the information from the database? If you query it through a postgres driver in python (psycopg2 etc), it should give you a bytes object I believe that you can just pass to `frombuffer`. Although. You shouldn't be using `tobytes` you should be using `numpy.save` as a serialization format – juanpa.arrivillaga Feb 16 '23 at 21:06
  • Does this answer your question? [How to read and insert bytea columns using psycopg2?](https://stackoverflow.com/questions/40049046/how-to-read-and-insert-bytea-columns-using-psycopg2) Once you do this, you have a `bytes` object, which _is_ a python datatype, and `np.frombuffer` will parse it as expected: https://stackoverflow.com/q/47637758/843953 – Pranav Hosangadi Feb 16 '23 at 21:12
  • @juanpa.arrivillaga "It should give you a bytes object": No. `to_sql` creates a TEXT column, but I found how to get the actual data. Will provide an example on my answer. – RodolfoAP Feb 17 '23 at 08:26
  • @PranavHosangadi I cannot. Data has been saved like this. But I've solved the problem, thanks. Please see my answer. – RodolfoAP Feb 17 '23 at 08:27

1 Answers1

0

After a lot of trial and error (repeat, I don't know python), I've found a solution.

ndarray=np.frombuffer(bytes.fromhex("707f11bdca6c083edd8f4bbdda0f5e3e"), np.float32)

print(ndarray)
# [-0.03552192  0.1332275  -0.04969775  0.21685734]

print(type(ndarray))
# <class 'numpy.ndarray'>

print(type(ndarray[0]))
# <class 'numpy.float32'>

Now, the full example:

ngine=create_engine('postgresql://postgres:mypassword@localhost/mydatabase')
myndarray=np.array([-3.55219245e-02, 1.33227497e-01, -4.96977456e-02, 2.16857344e-01], dtype=np.float32)

print(myndarray)
# [-0.03552192  0.1332275  -0.04969775  0.21685734]

myarray=[myndarray.tobytes()]
mydataframe=pd.DataFrame(myarray, columns=['Column1'])
mydataframe.to_sql('mytable', ngine)
# SELECT * FROM mytable
#
#index  Column1
#    0  \x707f11bdca6c083edd8f4bbdda0f5e3e

bytea=pd.read_sql(sql=select(mytable), con=ngine).iloc[0]['Column1']
print(type(bytea))
# <class 'str'>

print(bytea)
# \x707f11bdca6c083edd8f4bbdda0f5e3e

print(bytea[2:])
# 707f11bdca6c083edd8f4bbdda0f5e3e
# Surprise! The \x is not interpreted! 

ndarray=np.frombuffer(bytes.fromhex(bytea[2:]), np.float32)
print(ndarray)
# [-0.03552192  0.1332275  -0.04969775  0.21685734]

Thanks, @hpaulj, @PranavHosangadi @juanpa.arrivillaga, @ACarter.

RodolfoAP
  • 743
  • 1
  • 7
  • 18