How to load data into pandas from a large database?

Question

I have a postgres database which contains time series data.The size of the database is around 1 GB.Currently to read data, this is what I do

import psycopg2
import pandas as pd
import pandas.io.sql as psql

conn = psycopg2.connect(database="metrics", user="*******", password="*******", host="localhost", port="5432")
cur = conn.cursor()
df = psql.read_sql("Select * from timeseries", conn)
print(df)

But this loads the entire data into the memory.Now I am aware of techniques where the database can be dumped to a csv file and then the csv file can be read in chunks as suggested here How to read a 6 GB csv file with pandas

But for me that is not an option since the database will be continously changing and I need to read it on the fly.Is there any technique to read the database content maybe in chunks or use any third party libraries?

Can you afford loading the data directly to the database via `psql`? — Jim Jones, May 04 '18 at 12:33
Try this from the console: `cat file.csv | psql yourdb -c "copy yourtable from stdin` — Jim Jones, May 04 '18 at 12:35
@JimJones can you explain what are you exactly trying to do?I don't quite follow. — Souvik Ray, May 04 '18 at 12:38
Just left a short description of my idea as an answer. Gotta leave the office now... good luck! — Jim Jones, May 04 '18 at 12:56

score 2 · Accepted Answer · answered May 05 '18 at 19:14

2

pd.read_sql() also has parameter chunksize, so you can read data from SQL table/query in chunks:

for df in pd.read_sql("Select * from timeseries", conn, chunksize=10**4):
    # process `df` chunk here...

answered May 05 '18 at 19:14

MaxU - stand with Ukraine

205,989
36
386
419

score 0 · Answer 2 · answered May 04 '18 at 12:55

Consider loading this file directly to the database using psql. From your console try this command:

$ cat file.csv | psql db -c "COPY your_table FROM STDIN DELIMITER ',' CSV HEADER"

Example:

Consider the following table structure ..

CREATE TEMPORARY TABLE t (foo INT, bar TEXT);

and the following CSV file ..

x,y
1,foo
2,bar
3,foo-bar

Execute the following command from your terminal:

$ cat /tmp/file.csv | psql db -c "COPY t FROM STDIN DELIMITER ',' CSV HEADER"

And here is your data:

db=# SELECT * FROM t;
 a |    b    
---+---------
 1 | foo
 2 | bar
 3 | foo-bar
(3 Zeilen)

this is great but I will be directly given an operational database which is continously changing and I need to load data from that to my pandas.I am looking for ways to maybe load the data in chunks and then process it just like it can be done with `pd.read_csv(filename, chunksize=10**6)` — Souvik Ray, May 04 '18 at 12:59

How to load data into pandas from a large database?

2 Answers2

Linked

Related