0

I'm trying to improve some code and stop it throwing a warning/error once it's finished (due to the use of next(iter).

I'm essentially creating a database engine, reading in a large CSV file in chunks, and appending the data here to a PostgreSQL database.

However once the while loop is finished, there's an error. I could probably add some error handling to suppress this, but wondering if there's a more efficient way to do this in general.

   # Create database engine
   engine = create_engine(# contains Postgres connection stuff)

   # Read in CSV
   iter = pd.read_csv("myfile.csv", iterator=True, chunksize=100000)

   df = next(iter)

   # Convert column to datetime
   df.date_col = pd.to_datetime(df.date_col)

   # Create table
   df.head(0).to_sql(name = "mytable", con = engine, if_exists='replace')

   # Add first chunk
   df.to_sql(name="mytable", con=engine, if_exists='append')

   # Add remaining CSV data to database table
   while True:
       df = next(iter)
       df.date_col = pd.to_datetime(df.date_col)
       df.to_sql(name="mytable", con=engine, if_exists='append')

1 Answers1

0

The problem here is usage of while(True) loop. Ideally to iterate an iterator, you can use a for loop.

Incase you want to use a while loop, then take a look at how to use iterator in while loop.

I would achieve what you are looking for using something like this:

_iter = pd.read_csv("myfile.csv", iterator=True, chunksize=100000)

df = next(_iter)

# Convert column to datetime
df.date_col = pd.to_datetime(df.date_col)

# Create table
df.head(0).to_sql(name = "mytable", con = engine, if_exists='replace')

# Add first chunk
df.to_sql(name="mytable", con=engine, if_exists='append')

for i in _iter:
    i.date_col = pd.to_datetime(i.date_col)
    i.to_sql(name="mytable", con=engine, if_exists='append')

Using a for loop avoids running into StopIteration exception which indicates end of an iterator. Read More on Iterators here.

Irfanuddin
  • 2,295
  • 1
  • 15
  • 29