I have almost 7000 csv files, with an almost 2.4 million rows (in total). I've written code that will open the csv, do some calculations to add new columns. In the end I would like to vstack all these into one master csv/txt file.
an example of my code (please excuse any dumb mistakes as this is an example code):
def my_func(file):
df = pd.read_csv(file)
new_df = custom_calculations(df)
return new_df
newarray = np.empty((85), int)
a = my_func(file)
newarray = np.vstack([newarray,a)]
I've been reading the documentation on threading, so that this will go faster. I followed some examples and came up with this code:
for ii in csv_list:
process = threading.Thread(target=my_func, args=[ii])
process.start()
threads.append(my_func(ii))
print('process: ',type(process), process, )
for process in threads:
process.join()
It doesn't seem to be actually appending the arrays together though, and I'm not sure what I'm doing wrong.