0

I'm a very novice web developer and I am currently building a website from scratch. I have most of the frontend part setup, but I am really struggling with backend and databases.

The point of the website is to display a graph with class completion status (for each class, it will display what percent is complete/incomplete, and how many total users). It will retrieve this data from a CSV file on an SFTP server. The issue I am having is when I try to directly access the data, it loads incredibly slowly.

Here is the code I am using to retrieve the data:

Courses = ['']
Total =[0]
Compl =[0]
csvreal = pandas.read_csv(file)
for index, row in csvreal.iterrows():
    string =(csvreal.loc[[index]].to_string(index=False, header=False))
    if(Courses[i] !=string.split('  ')[0]):
        i+=1
        Courses.append(string.split('  ')[0])
        Total.append(0)
        Compl.append(0)
    if(len(string.split('  ')[2])>3):
        Compl[i]+=1
    Total[i]+=1

To explain it a little bit, the CSV file has the roster information, i.e. each row has a name of course, name of user, completion date, and course code. The course name is the first column so that is why in the code, you see string,split(' ')[0], as it is the first part of the string. If the user has completed it, then the third column (completion date) is empty, so that is why it checks if it is longer than 3 chars, because if it is, then the user has completed it.

This takes entirely too long to compute. About 30 seconds with around 7,000 entries. Recently the CSV size was increased to something like 36,000.

I was advised to setup a database using SQL and have a nightly cronjob to parse the data and have the website retrieve the data from the database, instead of the CSV.

Any advice on where to even begin, or how to do this would be greatly appreciated.

macosx136
  • 31
  • 2

1 Answers1

0

This takes entirely too long to compute. About 30 seconds with around 7,000 entries. Recently the CSV size was increased to something like 36,000.

I was advised to setup a database using SQL and have a nightly cronjob to parse the data and have the website retrieve the data from the database, instead of the CSV.

Before I recommend using a database, how fast is the connection to the SFTP server you are getting the data from? Would it be faster to host it on the local machine? If this isn't the issue, so see below.

Yes, in this case a database would speed up your computation time and retrieval time. You need to setup a SQL database, have a way to put data into it, and then retrieve it. I included resources at the bottom that will help familiarize yourself with SQL. Knowledge of PHP will be needed in order to interact and manipulate the database.

Using SQl will be much simpler for you to interact with. For example, you needed to check to see if a cell is empty. In SQL, this can be done with;

SELECT * FROM table WHERE some_col IS NULL OR some_col = '';

https://www.khanacademy.org/computing/computer-programming/sql

https://www.w3schools.com/sql/

https://www.guru99.com/introduction-to-database-sql.html

Community
  • 1
  • 1
Zach P.
  • 342
  • 1
  • 10
  • I see, thats very helpful! Thank you! How would I go about actually creating the database on the SFTP? Can you please provide some resources for that as well? – macosx136 Apr 27 '20 at 20:57
  • @macosx136 This depends a lot on what kind of a sever you are running. I would strongly recommend running the database on your web server, but if you must, you might be able to accomplish this over SSH. This thread might help: https://stackoverflow.com/questions/18373366/mysql-connection-over-ssh-tunnel-how-to-specify-other-mysql-server I strongly advise against pulling in the entire database from a remote server. – Zach P. Apr 27 '20 at 23:58