0

I'm trying to retrieve information from a database with python and sql.

Currently one of the tables is taking a huge amount of time to query and is in effect unloadable.

How can I more effectively load this data?

This is the current code I am using,

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import psycopg2 as pg
from sqlalchemy import create_engine



#establishing the connection
conn = pg.connect(
   database=POSTGRESQL_DATABASE, user=POSTGRESQL_USER, password=POSTGRESQL_PASSWORD, host=POSTGRESQL_HOST, port=POSTGRESQL_PORT
)

SQL = '''
select sentiments.user_id,
settings.diagnosis,expressions.angry,expressions.disgusted, 
expressions.fearful,expressions.happy,expressions.neutral, 
expressions.sad,expressions.surprised,expressions.created, 
settings.medication,settings.dosage_times,settings.medication_activity_time,
settings.gender,settings.age, sentiments.id, sentiments.sentiment, 
sentiments.magnitude, sentiments.created,
pupil_detections.created,pupil_detections.left_eye_upper,pupil_detections.left_eye_lower,pupil_detections.right_eye_upper
,pupil_detections.right_eye_lower,pupil_detections.left_eye_iris,pupil_detections.right_eye_iris
from sentiments
left join expressions 
on sentiments.user_id = expressions.user_id
left join pupil_detections 
on sentiments.user_id = pupil_detections.user_id
left join settings 
on sentiments.user_id = settings.user_id;

'''

df = pd.read_sql(SQL, con=conn)
df

Is there a way to batch the request or something like this? THe table that is massive is the pupil_detections database.

Instead of using pandas is there another way I could access this data to do analysis on it?

Thanks!

LeCoda
  • 538
  • 7
  • 36
  • 79
  • Do you have indexes created on the tables? – JD2775 Apr 13 '21 at 14:06
  • I believe so - they have underneath index Name, a index with the ID column specified. – LeCoda Apr 13 '21 at 14:08
  • Have a look at this, using chunk may help: https://stackoverflow.com/questions/18107953/how-to-create-a-large-pandas-dataframe-from-an-sql-query-without-running-out-of – JD2775 Apr 13 '21 at 14:12
  • 2
    How many rows is "massive"? PostgreSQL can manage simple joins like yours on very large tables - maybe it is a problem with loading lots of data in pandas? – fog Apr 13 '21 at 18:53
  • Yeah you're probably correct hmm. – LeCoda Apr 14 '21 at 04:02
  • https://aetperf.github.io/2019/07/30/Loading_data_into_a_Pandas_DataFrame_-_a_performance_study.html – Mitch Wheat Apr 14 '21 at 04:55

0 Answers0