I'm a Pandas newbie and have written some code that should append a dictionary to the last column in a row. The last column is named "Holder"
Part of my code, which offends the pandas engine is shown below
df.loc[df[innercat] == -1, 'Holder'] += str(odata)
I get the error message
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S75') dtype('S75') dtype('S75')
When I run my code replacing the "+=" with "=" the code runs just fine although I only get part of the data I want. What am I doing wrong? I've tried removing the str() cast and it still works as an assignment, not an append.
Further clarification:
Math1 Math1_Notes Physics1 Physics1_Notes Chem1 Chem1_Notes Bio1 Bio1_Notes French1 French1_Notes Spanish1 Spanish1_Notes Holder
-1 Gr8 student 0 0 0 0 -1 Foo NaN
0 0 0 0 0 -1 Good student NaN
0 0 -1 So so 0 0 0 NaN
0 -1 Not serious -1 Hooray -1 Voila 0 NaN
My original dataset contains over 300 columns of data, but I've created an example that captures the spirit of what I'm trying to do. Imagine a college with 300 departments each offering 1(or more) courses. The above data is a micro-sample of that data. So for each student, next to their name or admission number, there is a "-1" indicating that they took a certain course. And in addition, the next column USUALLY contains notes from that department about that student.
Looking at the 1st row of the data above, we have a student who took Math & Spanish and each department added some comments about the student. For each row, I want to add a dict that summarises the data for each student. Basically a JSON summary of each departments entry. Assuming a string of the general form
json_string = {"student name": a, "data": {"notes": b, "Course name": c}}
I intend my code to read my csv, form a dict for each department and APPEND it to Holder column. Thus for the above student(1st row), there will be 2 dicts namely
{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}
{"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
and the final contents of Holder for row 1 will be
{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
when I can successfully append the data, I will probably add a comma or '|' in between the seperate dicts. The line of code that I have written is
df.loc[df[innercat] == -1, 'Holder'] = str(odata)
whether or not I cast the above line as str(), writing the assignment instead of the append operator appears to overwrite all the previous values and only write the last value into Holder, something like
-1 Gr8 student 0 0 0 0 -1 Foo {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
while I want
-1 Gr8 student 0 0 0 0 -1 Foo {"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}
For anyone interested in reproducing what I have done, the main part of my code is shown below
count = 0
substrategy = 0
for cat in col_array:
count += 1
for innercat in cat:
if "Notes" in innercat:
#b = str(df[innercat])
continue
substrategy += 1
c = count
a = substrategy
odata = {}
odata['did'] = a
odata['id'] = a
odata['data'] = {}
odata['data']['notes'] = b
odata['data']['substrategy'] = a
odata['data']['strategy'] = c
df.loc[df[innercat] == -1, 'Holder'] += str(odata)