So I have an SFTP server that hosts a single CSV file that contains data about multiple courses. The data is in the following format (4 columns):
Activity Name,Activity Code,Completion Status,Full Name
Safety with Lasers, 3XX1, 10-Jul-20, "Person, Name"
Safety with Lasers, 3XX1, NaN, "OtherP, OtherName"
How to use wrench, 7NPA, 10-Aug-19, "OtherName, Person"
etc...
I am using Paramiko to access the file using the following code:
file = sftp.open('Data.csv')
But the issue I am having is that it is a SFTPFile type. How can I go about parsing the data from it? I need to extract the names of the courses, and keep track of how many people have completed it and not completed it. I am using the following code at the moment but it is horrendously slow. Any suggestions would be appreciated:
Courses = ['']
Total =[0]
Compl =[0]
csvreal = pandas.read_csv(file)
for index, row in csvreal.iterrows():
string =(csvreal.loc[[index]].to_string(index=False, header=False))
if(Courses[i] !=string.split(' ')[0]):
i+=1
Courses.append(string.split(' ')[0])
Total.append(0)
Compl.append(0)
if(len(string.split(' ')[2])>3): #Note that incomplete courses do not have completion date, so it is NaN
Compl[i]+=1
Total[i]+=1
I know it is very terrible, I'm new and have no idea what I am doing. Any advice on where to read up on relevant documentation or tutorials would be appreciated. Thank you!