-1

I am going to work with very large spatio-temporal dataset stored in PostgreSQL server with read-only access. The goal is to load these dataset to Python environment, do all processing and create some kind of learning model.

It wouldn't be a nice idea to have these dataset loaded in python from postgres every time I want work.

I am wondering if there is a way to have Postgres dump these dataset to disk in a format (e.g. .pkl or .hdf) that could support faster loading in the future. I really don't think \COPY to csv would be an option.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
arilwan
  • 3,374
  • 5
  • 26
  • 62

1 Answers1

1

If you're going from PostgreSQL, then COPY or \copy is your only option. You haven't said why that is not viable.

I'm assuming the database is changing and you want to pull in recent data into your Python program. If the database is static, then you could possibly

  • use COPY or \copy once
  • read the result into Python
  • save the data into another format

Another way to work the problem is to pull the data from Python using a database connector. This solution is specific to MySQL, but should work with a PostgreSQL data connector. Here is that code adapted for PostgreSQL:

import psycopg2
import numpy

conn = psycopg2.connect(host='localhost', user='bob', passwd='mypasswd', db='bigdb')
curs = conn.cursor() 
numrows = curs.execute("SELECT id, rating FROM video")

#dtype='i4,i4' means two columns, both 4 byte (32 bit) integers
# you will have to adapt to your data type and data structure
A = numpy.fromiter(curs.fetchall(), count=numrows, dtype=('i4,i4'))
bfris
  • 5,272
  • 1
  • 20
  • 37