2

I have a PostgreSQL database (v 9.6) in which images are stored as bytea data. I do not know the image encoding. (I did not set up this database, and it's not clear if I can change this setup, although I would like to, as storing large images in PostgreSQL database is not (IIUC) best practice.)

I'd like to extract this data into an image, or better yet, directly into a NumPy array.

Using SQLAlchemy, I can connect and extract the data:

engine = create_engine(postgresql+psycopg2://user:password@server:port/database)
connection = engine.connect()
result = connection.execute('SELECT image FROM database.table LIMIT 1;')

The image in question is returned as a memoryview object; cast as a numpy array, it looks like this (per Cython: Convert memory view to NumPy array):

[b'\xaa' b'\x04' b'u' b'\x04' b'\x85' b'\x04' b'E' b'\x04' b'\x7f' b'\x04'
 b'\xa5' b'\x04' b'K' b'\x04' b'j' b'\x04' b'\x97' b'\x04' b';' b'\x04'
 b'w' b'\x04' b'k' b'\x04' b'E' b'\x04' b'b' b'\x04' b's' b'\x04']

I tried saving to jpg or tiff files (per Converting BLOB, stored on a database, to an image on an HTML website), but was not able to open the resulting files with an image viewer.

I also tried this (Open PIL image from byte file), but get this result:

OSError: cannot identify image file <_io.BytesIO object at 0x000002299F4DD830>

Or, from How to convert hex string to color image in python?, I get this error:

ValueError: non-hexadecimal number found in fromhex() arg at position 0

So: How do I convert this bytea data or this memoryview object into a NumPy array?

I may be missing something simple, or this may just be one reason why images should not be stored in SQL databases.

Evan
  • 2,121
  • 14
  • 27

1 Answers1

1

For posterity, here's the simplest solution I arrived at.

Best practice would be to NOT store the images in a database, but rather store multiple versions (different resolutions, from thumbnail (64x64 ish) to full res (2504x2504 in this case) in the file system, with filepaths to those images. Images can be sorted by a hash (some overhead) or by something like timestamp; the latter works for us because all the data is coming from one camera and will therefore have different timestamps.

The data in question is a 16-bit grayscale TIFF file. Python Image Library (PIL) isn't able to translate those images. OpenCV can. However, since I want a NumPy array anyway, that doesn't really matter. MatPlotLib can display the arrays directly. Numpy can slice or downsample as needed.

engine = create_engine('postgresql+psycopg2://user:pass@server:port/database')
connection = engine.connect()

query = 'SELECT * FROM database.schema.table WHERE "ID" = 1234'
result = connection.execute(query)

for row in result:
    data = row[-1] # our image is the last column in the table

connection.close()

From here, numpy and matplotlib can do the lifting. I know the resolution of the image, but that is also stored elsewhere in the database table.

img_array = np.reshape(np.frombuffer(data, dtype="Int16"), (2504, 2504))

norm = cm.colors.Normalize(vmax=abs(img_array).max(), vmin=-abs(img_array).max())
plt.matshow(img_array, norm=norm, cmap="gray")
plt.show()

plt.imshow() also works.

With OpenCV, the code we used was this:

cv2.namedWindow("Image", cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)
cv2.imshow("Image", img_array)
cv2.waitKey(0)
cv2.destroyAllWindows()
Evan
  • 2,121
  • 14
  • 27