0

New to Python, sorry if this is too easy, I usually work with R but want to try out this. I am trying to convert a csv file with student numbers, course ID(in total 7 courses) and the rating into a dictionary. It is different than the other questions since the key in the csv file is not a unique value, it is duplicated randomly based on how many courses this student evaluated. The sample data look like this:

participant_id;course_id;rating
103;4;2
104;5;3.5
104;7;2.5
108;3;3.5
108;5;2
114;2;4.5
114;5;3.5
114;7;4.5
116;1;2
116;2;3
116;3;3
116;4;4
126;5;3
129;1;4
129;5;3.5
135;1;4.5

so the optimal outcome would look like this, student numbers would be the key and value would be a list, with course_id as the index of the list and rating as the value. The rest is just NA.

{'103': ['NA', 'NA', 'NA', 2.0, 'NA', 'NA', 'NA'],
 '104': ['NA', 'NA', 'NA', 'NA', 3.5, 'NA', 2.5],
 '108': ['NA', 'NA', '3.5, 'NA',2.0', 'NA', 'NA'],
 '114': ['NA', 4.5, 'NA', 'NA', 3.5, 'NA', '4.5],
 '116': [2.0, 3.0, 3.0, 4.0, 'NA', 'NA', 'NA'],
 '126': ['NA', 'NA', 'NA', 'NA', 3.0, 'NA', 'NA'],
 '129': [4.0, 'NA', 'NA', 'NA', '3.5, 'NA', 'NA'],
 '135': [4.5, 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']}

I tried to extract the student number using set() and now I have the unique value for each student number and all I can do is to make a list with the right key but all the course ratings are NA because I don't know how to extract the course_id and rating in groups and put them into the list. Here is my code so far:

def ratings(filename):
    with open(filename) as fp: 
        buffer = fp.readlines()
        stu_id = []
        dic = {}

        for i in (buffer):
            stu_id.append(i.split(';')[0])
            stu_id_set = list(set(stu_id))
            for j in stu_id_set:
                dic[j] = ['NA','NA','NA','NA','NA','NA','NA']
    return dic


senera
  • 85
  • 5
  • 2
    Possible duplicate of [Creating a dictionary from a csv file?](https://stackoverflow.com/questions/6740918/creating-a-dictionary-from-a-csv-file) – clubby789 Oct 11 '19 at 10:11

2 Answers2

1

We can do something like this:

def ratings(filename):
    d = {}
    max_col = 0                                     # Number of columns needed. Maximum course_id.
    idx_col_val_list = []

    with open(filename) as fp:
        fp.readline()                               # Ignore "participant_id;course_id;rating"

        for line in fp.readlines():
            line = line.strip()
            idx, col, val = line.split(';')
            col = int(col)
            val = float(val)

            max_col = max(max_col, col)
            idx_col_val_list.append((idx, col, val))

    for idx, col, val in idx_col_val_list:
        if idx not in d:
            d[idx] = ['NA'] * max_col
        d[idx][col - 1] = val

    return d


ans = ratings('input.txt')

assert ans == {
    '103': ['NA', 'NA', 'NA', 2.0, 'NA', 'NA', 'NA'],
    '104': ['NA', 'NA', 'NA', 'NA', 3.5, 'NA', 2.5],
    '108': ['NA', 'NA', 3.5, 'NA',2.0, 'NA', 'NA'],
    '114': ['NA', 4.5, 'NA', 'NA', 3.5, 'NA', 4.5],
    '116': [2.0, 3.0, 3.0, 4.0, 'NA', 'NA', 'NA'],
    '126': ['NA', 'NA', 'NA', 'NA', 3.0, 'NA', 'NA'],
    '129': [4.0, 'NA', 'NA', 'NA', 3.5, 'NA', 'NA'],
    '135': [4.5, 'NA', 'NA', 'NA', 'NA', 'NA', 'NA'],
}
Dipen Dadhaniya
  • 4,550
  • 2
  • 16
  • 24
  • hello Dipen, thank you for answering the question. I have one additional question: how was the fp.readline() used to remove the participant_id;course_id;rating part? I always thought the only difference between readline() and readlines() is just that readlines() read more lines at the same time. So somehow, after calling readline(), the file just sort of pop the first line out of the file? – senera Oct 11 '19 at 20:35
  • You might want to look at the example in Python's official documentation. https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects. `readline ()` reads one line at a time. So, we used it to remove the first line and then used the subsequent lines. – Dipen Dadhaniya Oct 12 '19 at 07:21
1

Here's a compact approach using pandas and dictionaries:

import pandas as pd

df = pd.read_csv('your_csv_file.csv')

# build a list of dictionaries
# each element will lool like {'participant_id':104, 'course_id':4, 'rating':2}
records = df.to_dict(orient='records')

# initialize the final dictionary
# assign a 7-element list to each participant, filled with zeros
performance = {i['participant_id']:7*[0] for i in records}

# populate the final dictionary
for r in records:
    performance[r['participant_id']][r['course_id']] = r['rating']
Patrizio G
  • 362
  • 3
  • 13