I have the following code which recursively iterates over a directory containing thousands of csv's, and attempts to read and add them all to one DataFrame:
df = pd.DataFrame()
symbol = symbol.upper()
for filepath in glob.iglob(r'W:\data\{0}\option\**\**\**.csv'.format(188), recursive=True):
optionNameCSI = filepath.split("\\")[-1].split('.')[0]
try:
tmp = pd.read_csv(filepath, engine='c')
strike = tmp['Strike'].iloc[-1]
expiry = pd.to_datetime(tmp['Option Expiration Date'].iloc[-1])
m = expiry.month
y = expiry.year
PutCall = tmp['PutCall'].iloc[-1]
future = symbol + numToLetter[m] + str(y)
except (IndexError, KeyError) as e:
continue
if tmp.empty:
df = tmp
else:
df = df.append(tmp)
print(optionName, 'loaded')
However, this code starts off iterating very quickly, then slows down exponentially and never completes. Is there something I'm doing wrong? I know that the file paths are all acquired correctly, so it's the growing of the DataFrame that is the issue.