Raw data in Excel (as screenshot) of 3 columns. The script is to calculate the result by a simple formula with the columns. When the result reaches a limit, it prints result.
import pandas as pd
df = pd.read_excel("C:\excel_file.xlsx", sheet_name = "Sheet1")
P1 = df['Period 1']
P2 = df['Period 2']
P3 = df['Period 3']
df['Predict'] = 12.5 + (0.35 * P1 + 0.5 * P2 + 0.8 * P3)
for index, row in df.iterrows():
if row['Predict'] >= 100:
print row['SKU and Product code']
The problem: many rows in a file and there > 100k files. Now it takes about 3 full days to complete 1 run.
Calculation and the logic are simple. but data volume and number of files are huge, for this frequent task.
When no other options in the raw data (not to reduce numbers of rows and files). I am wondering if GPU programming is a choice to shorten the processing time.
I googled, flipped through a book, and got a fleeing GPU programming is more for advanced tasks like machine learning etc.
How is the rewritten code looked like for above, if GPU programming can be used for this case? Thank you.