I have a generator whoms content I need to get into an SQL database. The generator contains more than a million items and while iterating over them I send them in bins of 10.000 into a pandas dataframe, and commit that via the to_sql function into the database.
That works fine as long as I can fill the bin with 10.000 items. However, the StopIteration is never triggered, even though the last item of the generator was called by the next method.
Debugging it, I can see item 1.384.471 is processed in the try-part, and then while
is still True
, next
does not get a new item, but instead of activating the except-part the program simply runs into nirvana.
It would be nice if some of you can point me to my mistake.
import ijson
import pandas as pd
import sqlalchemy as sa
def connect_to_database():
engine = sa.create_engine(f"postgresql+psycopg2://{USER}:{PWD}@{SERVERNAME}:{PORT}/{DB}")
return engine
engine = connect_to_database()
def create_content(objects, table_name):
''' simple helper function to get content
for every object list from the JSON file'''
item_lst = []
item_count = 0
while True:
try:
item = next(objects)
value_conversion(item)
item_lst.append(item)
if item_count == 1380000 and len(item_lst) == 4470:
print('Watch it!')
if len(item_lst) == 10000:
item_count += len(item_lst)
submit_sql(table_name, item_lst)
item_lst.clear()
except StopIteration:
if len(item_lst) > 0:
item_count += len(item_lst)
submit_sql(table_name, item_lst)
item_lst.clear()
print(item_count)
break
except Exception as e:
print(e)
def submit_sql(engine, table_name, items):
''' submit dataframe content of items to sql database '''
engine = connect_to_database()
temp_df = pd.DataFrame(items)
temp_df.to_sql(table_name, engine, if_exists='append', index=False)
engine.dispose()
with open("C:/Path/To/Json/file.json", "r", encoding="utf-8") as json_file:
s_objects = ijson.items(json_file, 'ObjectList.item')
create_content(s_objects, 'tablename')
The version with a simple for-loop doesn't work either. The if-statement after the for-loop is never reached.
try:
for item in objects:
value_conversion(item)
item_lst.append(item)
if len(item_lst) == 10000:
item_count += len(item_lst)
submit_sql(table_name, item_lst)
item_lst.clear()
print(item_count)
if len(item_lst) > 0:
item_count += len(item_lst)
submit_sql(table_name, item_lst)
item_lst.clear()
print(item_count)
except Exception as e:
print(e)