I have a table in a postgresql database that is ~900,000 rows. I want to copy it row by row to another table with some extra columns after transforming each row and adding data to the new columns. The problem is that RAM gets full.
Here is the relevant part of the code:
engine = sqlalchemy.create_engine(URL(**REMOTE), echo=False)
Session = sessionmaker(bind=engine)
session = Session()
n=1000
counter=1
for i in range(1,total+1,n):
ids=str([j for j in range(i,i+n)])
**q="SELECT * from table_parts where id in (ids)"%ids**
r=session.execute(q).fetchall()
for element in r:
data={}
....
[taking data from each row, extracting string,calculation,
and filling extra columns that the new table has]
...
query=query.bindparams(**data)
try:
session.execute(query)
except:
session.rollback()
raise
if counter%n==0:
print COMMITING....",counter,datetime.datetime.now("%H:%M:%S")
session.commit()
counter+=1
The queries are correct, so there is no errors there. Before I press Ctrl+C, the new table gets correctly updated.
The problem seems to the query:
"SELECT * from table_parts where id in (1,2,3,4...1000)
"
I already tried with a postgresql array.
Things I have already tried:
results = (connection .execution_options(stream_results=True) # Added this line .execute(query))
from here . As far as I know this uses a server side cursor when used with postgresql. I ditched the session object I have in my posted code and usedengine.connect()
- creating a new connection object on each iteration, surprisingly this does not work either. RAM gets full
from the documentation ,
so the yield_per from the query api is the same with the stream_result option mentioned above
thanks