I have to carry out some statistical treatments on data that is stored in PostgreSQL tables. I have been hesitating between using R and Python.
With R I use the following code:
require("RPostgreSQL")
(...) #connection to the database, etc
my_table <- dbGetQuery(con, "SELECT * FROM some_table;")
which is very fast : it will take only 5 seconds to fetch a table with ~200 000 lines and 15 columns and almost no NULL's in it.
With Python, I use the following code:
import psycopg2
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("SELECT * FROM some_table;")
my_table = cursor.fetchall()
and surprisingly, it causes my Python session to freeze and my computer to crash.
As I use these librairies as "black boxes", I don't understand why something that is so quick in R can be that slow (and thus almost impossible for a practical use) in Python.
Can someone explain this difference in performance, and can someone tell if there exists a more efficient method to fetch a pgSQL table in Python?