Pandas: Import feature vectors from list of dictionaries into dataframe

Question

I have a list of dictionaries, and each dictionary consists of two key-value tuples. The first key-value is the name of a person and the second one is a feature vector consisting of the grades each person achieved in different courses. For example:

ListOfGrades=[{'Name':"Mike", 'grades':[98,86,90,72]},{'Name':"Sasha", 'grades':[92,79,85,94]},{'Name':"Beth", 'grades':[89,89,76,90]}]

I want to import this data into a pandas dataframe such that each row has the label of a person's name with each column filled with their grades. In short, I need to get something like this:

Mike    98  86  90  72
Sasha   92  79  85  94
Beth    89  89  76  90

I know I should use pd.DataFrame(ListOfGrades), but I'm not sure how to set it for my purpose. I have seen Convert list of dictionaries to Dataframe, but it's different from the way I want to order my data in the data frame. I have tried this:

for i in ListOfGrades:
    ListOfGrades[i]=str(ListOfGrades[i]['grades'])

# Convert to dataframe
df = pd.DataFrame.from_dict(ListOfGrades, orient='index').reset_index()

But, python throws me an error:

 ListOfGrades[i]=str(ListOfGrades[i]['grades'])
 TypeError: list indices must be integers, not dict

Also, I don't know how to add the names to each row, such that the first column of my data frame consists of the name of people, like the way I want my data frame look (as I showed above). Any help is appreciated!

Just curious, why not have a single dictionary where the keys are the student name and the values are a list of grades? — nbryans, Jun 14 '16 at 23:14
Actually, this is a simple example. In reality, I have a very big list and each dictionary consists of several key-values. I need to keep these dictionaries separate for another purpose in my code. — Miranda, Jun 14 '16 at 23:25

Merlin · Accepted Answer · 2016-06-15T02:00:51.487

Try this..

df  = pd.DataFrame.from_records(ListOfGrades, index='Name')['grades'].apply(pd.Series)
df

#         0   1   2   3
# Name                 
# Mike   98  86  90  72
# Sasha  92  79  85  94
# Beth   89  89  76  90

Adding data to list:

ListOfGrades=[{'Name':"Mike", 'grades':[98,86,90,72, 34]},{'Name':"Sasha", 'grades':[92,79,85,94,78]},{'Name':"Beth", 'grades':[89,89,76,90]}]



#           0     1     2     3     4
# Name                               
# Mike   98.0  86.0  90.0  72.0  34.0
# Sasha  92.0  79.0  85.0  94.0  78.0
# Beth   89.0  89.0  76.0  90.0   NaN

score 1 · Answer 2 · answered Jun 14 '16 at 23:19

This reason you are getting an error is that i is already an item (in this case a dictionary) from the list and is not an index. To have this work better you could change your loop as follows

for i in range(len(ListOfGrades)):

This will have the effect of making i a proper index. However, as I mentioned in my previous comment there may be more practical ways of solving this problem, such as having a single dictionary where the keys are names and the values are a list of grade. This would mean you don't need a list of dictionaries.

Thank you for your response. This solved the previous error. But, I still get another error: AttributeError: 'list' object has no attribute 'values'. I'm looking for a way to create a data frame that looks like what I described in my question. — Miranda, Jun 14 '16 at 23:31

score 1 · Answer 3 · answered Jun 14 '16 at 23:50

1

Ok, this approach is a bit of a hack, and it will quickly run into problems if each student doesn't have the same number of grades, but essentially, you need to build a new list and create the dictionary from that list. For python 3.5:

new_list = []
for student in ListOfGrades:
    new_list.append({'Name': student['Name'], **{'grade_'+str(i+1): grade for i, grade in enumerate(student['grades'])}})

df = pd.DataFrame(new_list)

This is the dataframe I'm getting:

    Name  grade_1  grade_2  grade_3  grade_4
0   Mike       98       86       90       72
1  Sasha       92       79       85       94
2   Beth       89       89       76       90

If you don't have python 3.5 but have a version of python 3, this should work:

new_list = []
for student in ListOfGrades:
    new_list.append(dict(Name = student['Name'], **{'grade_'+str(i+1): grade for i, grade in enumerate(student['grades'])}))

df = pd.DataFrame(new_list)

Edited to add: The above should also work for python 2.7

answered Jun 14 '16 at 23:50

juanpa.arrivillaga

88,713
10
131
172

Thank you for your response; it worked! In the list that I have, all the feature vectors have the same length, so I don't think it would cause any problem. – Miranda Jun 15 '16 at 01:03
Just wondering if there is a way to do this without creating a new list – Miranda Jun 15 '16 at 01:11
Yes, using a generator comprehension inside of the `DataFrame` constructor: `pd.DataFrame({'Name': student['Name'], **{'grade_'+str(i+1): grade for i, grade in enumerate(student['grades'])}} for student in ListOfGrades)` I originally didn't do it this way to make the logic clear. I'm not sure if you'll actually save on memory using this method, it depends on how pandas creates the dataframe under the hood. – juanpa.arrivillaga Jun 15 '16 at 01:28
@ juanpa.arrivillaga What if the order of the grades matter (or in general the order of feature vector). Is there a way to keep the order of grades as they actually are? This method puts the grades in a random order. It works fine with this example, but when I apply it to my own data, which consists of a very large feature vector, the order of features changes. – Miranda Jun 15 '16 at 02:21
Aha, I hadn't thought of that! Actually, you should go with Merlin's answer below. It is the most straightforward and it should be the accepted answer. – juanpa.arrivillaga Jun 15 '16 at 02:25

Pandas: Import feature vectors from list of dictionaries into dataframe

3 Answers3