2

I have a postgres database which contains time series data.The size of the database is around 1 GB.Currently to read data, this is what I do

import psycopg2
import pandas as pd
import pandas.io.sql as psql

conn = psycopg2.connect(database="metrics", user="*******", password="*******", host="localhost", port="5432")
cur = conn.cursor()
df = psql.read_sql("Select * from timeseries", conn)
print(df)

But this loads the entire data into the memory.Now I am aware of techniques where the database can be dumped to a csv file and then the csv file can be read in chunks as suggested here How to read a 6 GB csv file with pandas

But for me that is not an option since the database will be continously changing and I need to read it on the fly.Is there any technique to read the database content maybe in chunks or use any third party libraries?

Souvik Ray
  • 2,899
  • 5
  • 38
  • 70

2 Answers2

2

pd.read_sql() also has parameter chunksize, so you can read data from SQL table/query in chunks:

for df in pd.read_sql("Select * from timeseries", conn, chunksize=10**4):
    # process `df` chunk here...
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

Consider loading this file directly to the database using psql. From your console try this command:

$ cat file.csv | psql db -c "COPY your_table FROM STDIN DELIMITER ',' CSV HEADER"

Example:

Consider the following table structure ..

CREATE TEMPORARY TABLE t (foo INT, bar TEXT);

and the following CSV file ..

x,y
1,foo
2,bar
3,foo-bar

Execute the following command from your terminal:

$ cat /tmp/file.csv | psql db -c "COPY t FROM STDIN DELIMITER ',' CSV HEADER"

And here is your data:

db=# SELECT * FROM t;
 a |    b    
---+---------
 1 | foo
 2 | bar
 3 | foo-bar
(3 Zeilen)
Jim Jones
  • 18,404
  • 3
  • 35
  • 44
  • this is great but I will be directly given an operational database which is continously changing and I need to load data from that to my pandas.I am looking for ways to maybe load the data in chunks and then process it just like it can be done with `pd.read_csv(filename, chunksize=10**6)` – Souvik Ray May 04 '18 at 12:59