Python List Slicing from CSV

Question

I'm trying to parse the following file:

student_id, 521, 597, 624, 100,
1, 99, 73, 97, 98,
2, 98, 71, 70, 99,

I have the following code:

def load_students(filename):
    exercises = []
    students = []
    grades = []
    fr = None
    try:
        fr = open(filename, 'r')
        for line in fr:
            tokens = line.strip('\n').split(',')

            # Get Exercises
            # Need help here

            # Get Students
            if tokens[0].isdigit():
                students.append(tokens[0])

            # Get grades
            grades.append([int(x) for x in tokens[1:]])
    except IOError:
        print("IO Error!")

    finally:
        if fr is not None:
            fr.close()
            print(exercises)
            print(students)
            print(grades)
        return np.array(exercises), np.array(students), np.array(grades)

How I can get the file header (521,597,624, 100) as an array excluding the student_id string?

Since you are already using Numpy, did you try just [using Numpy functionality to read the CSV file](https://stackoverflow.com/questions/3518778/how-do-i-read-csv-data-into-a-record-array-in-numpy)? — Karl Knechtel, Jun 28 '20 at 07:35
Whatever wrote that file was not CSV conformant. It should not have spaces after the commas. Those may be treated as valid column values by CSV parsers. — tdelaney, Jun 28 '20 at 07:43

arshovon · Accepted Answer · 2020-06-28T07:51:53.703

Code:

def load_students(filename):
    exercises = []
    students = []
    grades = []
    fr = None
    try:
        fr = open(filename, 'r')
        for line in fr:
            tokens = [val.strip() for val in line.strip('\n').split(',') if val.strip()]

            # Get Exercises
            if tokens[0].isdigit() == False:
                exercises+=[int(x) for x in tokens[1:]]

            # Get Students
            if tokens[0].isdigit():
                students.append(tokens[0])

            # Get grades
            if tokens[0].isdigit():
                grades.append([int(x) for x in tokens[1:]])
    except IOError:
        print("IO Error!")

    finally:
        if fr is not None:
            fr.close()
            print(exercises)
            print(students)
            print(grades)


load_students("data.csv")

Output:

[521, 597, 624, 100]
['1', '2']
[[99, 73, 97, 98], [98, 71, 70, 99]]

Explanation:

I have stripped the white spaces in [val.strip() for val in line.strip('\n').split(',') if val.strip()].

Also I used the same logic you have included to identify the first line elements as exercise numbers (first character is not numeric).

Nick · Answer 2 · 2020-06-28T09:17:14.893

1

In terms of slotting in to your existing code, you could add an else clause to your if tokens[0].isdigit():

    for line in fr:
        tokens = line.strip('\n').split(',')

        if tokens[0].isdigit():
            # Get Students
            students.append(tokens[0])
            # Get grades
            grades.append([int(x) for x in tokens[1:] if x.strip().isdigit()])
        else:
            exercises = [int(x) for x in tokens[1:] if x.strip().isdigit()]

If you don't need the exercises values to be integers, just use

exercises = tokens[1:]

Also, if there might be other random data in the file, you could make the else be

elif tokens[0] == 'student_id'

edited Jun 28 '20 at 09:17

answered Jun 28 '20 at 07:35

Nick

138,499
22
57
95

But I need all of them, `students`, `grades`, and `exercises`. the `exercises` are the IDS on the header only, while `grades` are from the second line. – TheUnreal Jun 28 '20 at 07:37
@TheUnreal that's what this will do; if the first entry on the line is a digit it will grab the student and grades, otherwise (presumably first line only) it will grab the exercises – Nick Jun 28 '20 at 07:40
Thanks, not sure why but the grades are missing the last character of the last element (showing `9` instead of `98` – TheUnreal Jun 28 '20 at 09:07
@TheUnreal that is weird; if I just process a string line as `'1, 99, 73, 97, 98, '` it gets the expected result of `[[99, 73, 97, 98]]`. Note I have changed the answer slightly to deal with the trailing `, ` in the line. See https://rextester.com/EPRB39947 – Nick Jun 28 '20 at 09:17
Not sure why it's not working from the CSV, `[[99 73 97 9] [98 71 70 9]]` – TheUnreal Jun 28 '20 at 09:29
Can you verify what's in`line`? – Nick Jun 28 '20 at 09:50
It's weird that the answer you've accepted works - it's essentially exactly the same code. Anyway, I'm glad you've got a working solution so let's leave it at that. – Nick Jun 28 '20 at 11:53

Balaji Ambresh · Answer 3 · 2020-06-28T11:17:23.390

1

Is this what you want?

import pandas as pd
def load_students(filename):
    df= pd.read_csv('data.csv')
    df.drop(columns = df.columns[-1], inplace=True)
    df.columns = [col.strip() for col in df.columns]
    exercises = df.columns[1:].to_numpy()
    students = df.student_id.to_numpy()
    grades = df.iloc[:, 1:].to_numpy()
    return exercises, students, grades
    
print(load_students('data.csv'))

Output:

(array(['521', '597', '624', '100'], dtype=object), array([1, 2]), array([[99, 73, 97, 98],
       [98, 71, 70, 99]]))

edited Jun 28 '20 at 11:17

answered Jun 28 '20 at 07:37

Balaji Ambresh

4,977
2
5
17

Very close, I just don't need the `student_id` as part of the headers. – TheUnreal Jun 28 '20 at 07:41
Not sure why but the exercises array is missing the `100` (shows only 521 597 624) – TheUnreal Jun 28 '20 at 09:24
@TheUnreal Please check the csv at your end. – Balaji Ambresh Jun 28 '20 at 11:18

Tinu · Answer 4 · 2020-06-28T07:42:57.423

0

To process csv-files in python I would highly recommend to use pandas.

Here is an examlpe: Your file (slightly modificated, removed spaces in the header and commas at the end of the lines):

student_id,521,597,624,100
1, 99, 73, 97, 98
2, 98, 71, 70, 99

Code:

import pandas as pd
df = pd.read_csv(filename, index_col=0) # student_id becomes an index now
df.keys()

df.keys() returns a list of the header and will give you the desired result. You also can do other things much simpler:df['521'].values will give you a numpy array with the values of that column for example.

edited Jun 28 '20 at 07:42

answered Jun 28 '20 at 07:37

Tinu

2,432
2
8
20

1

Or perhaps `pd.read_csv(filename, index_col=0)` so that student_id becomes an index. – tdelaney Jun 28 '20 at 07:41
1

And it would be good to mention why you had to remove spaces from the csv. – tdelaney Jun 28 '20 at 07:42

Python List Slicing from CSV

4 Answers4