Using Python to load a dataset of 10MM records into Oracle Database table. Dataframe created with no issue. When loading the dataframe record count too large error from cx_Oralce.
Seeking to loop over the dataframe and batch load the 10MM records by inserting hem 100k records at a time.
Code shwon below works, but only for small datasets that fit in allocated memory. I need one that works for batches and large datasets
Have tried iterating over rows, but this takes very long. Also have tried loading a much smaller data frame - this works but does not achieve the goal.
Also tried using Bindarray and array size to lad the dataframe but nothing worked.
import pandas as pd
import datetime
import sys
import re
from itertools import groupby, islice, takewhile
import cx_Oracle
format = '%y_%m_%d'
TODAY = datetime.date.today()
add = datetime.timedelta(days=1)
yesterday = datetime.date.today() - add
dd = datetime.date.strftime(TODAY,format)
# connection variables
connection = cx_Oracle.connect("user/Oracle_database_connect_info")
cur = connection.cursor()
# dataframe headers
columns = ['C1','C2','C3','C4']
# -- >> test and sample the file
csv_df = pd.read_csv(r'csvfile_location')
# add record_id for values
csv_df_idx = csv_df.index.values +1
csv_df.insert(0,column = 'RECORD_ID' , value=csv_df_idx)
### TABLE ALREADY CREATED IN DATABASE ###
for index, row in csv_df.iterrows():
### Insert and Iterate to inset records
### Convert to list for easy load into DB
csv_df_dataset_lst = csv_df.values.tolist()
insert_statement = """
INSERT INTO TEST_LOAD
( RECORD_ID ,C1 ,C2 ,C3 ,C4)values (:1,:2,:3,:4,:5) """
# control number of records to bind for insert
# cur.bindarraysize = 100000 # --->>> did not work
# cur.arraysize = 100000 # --->>> did not work
cur.executemany(insert_statement,csv_df_dataset_lst)
connection.commit()
connection.close()