I'm a very novice web developer and I am currently building a website from scratch. I have most of the frontend part setup, but I am really struggling with backend and databases.
The point of the website is to display a graph with class completion status (for each class, it will display what percent is complete/incomplete, and how many total users). It will retrieve this data from a CSV file on an SFTP server. The issue I am having is when I try to directly access the data, it loads incredibly slowly.
Here is the code I am using to retrieve the data:
Courses = ['']
Total =[0]
Compl =[0]
csvreal = pandas.read_csv(file)
for index, row in csvreal.iterrows():
string =(csvreal.loc[[index]].to_string(index=False, header=False))
if(Courses[i] !=string.split(' ')[0]):
i+=1
Courses.append(string.split(' ')[0])
Total.append(0)
Compl.append(0)
if(len(string.split(' ')[2])>3):
Compl[i]+=1
Total[i]+=1
To explain it a little bit, the CSV file has the roster information, i.e. each row has a name of course, name of user, completion date, and course code. The course name is the first column so that is why in the code, you see string,split(' ')[0], as it is the first part of the string. If the user has completed it, then the third column (completion date) is empty, so that is why it checks if it is longer than 3 chars, because if it is, then the user has completed it.
This takes entirely too long to compute. About 30 seconds with around 7,000 entries. Recently the CSV size was increased to something like 36,000.
I was advised to setup a database using SQL and have a nightly cronjob to parse the data and have the website retrieve the data from the database, instead of the CSV.
Any advice on where to even begin, or how to do this would be greatly appreciated.