0

I'm a Pandas newbie and have written some code that should append a dictionary to the last column in a row. The last column is named "Holder"

Part of my code, which offends the pandas engine is shown below

df.loc[df[innercat] == -1, 'Holder'] += str(odata)

I get the error message

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S75') dtype('S75') dtype('S75')

When I run my code replacing the "+=" with "=" the code runs just fine although I only get part of the data I want. What am I doing wrong? I've tried removing the str() cast and it still works as an assignment, not an append.

Further clarification:

Math1 Math1_Notes Physics1 Physics1_Notes Chem1 Chem1_Notes Bio1 Bio1_Notes French1 French1_Notes Spanish1 Spanish1_Notes Holder
-1    Gr8 student  0                        0                  0              0                    -1        Foo            NaN
0                  0                        0                  0              0                    -1        Good student   NaN
0                  0                       -1                  So so          0                    0         0              NaN
0                 -1        Not serious    -1                  Hooray        -1                    Voila         0          NaN

My original dataset contains over 300 columns of data, but I've created an example that captures the spirit of what I'm trying to do. Imagine a college with 300 departments each offering 1(or more) courses. The above data is a micro-sample of that data. So for each student, next to their name or admission number, there is a "-1" indicating that they took a certain course. And in addition, the next column USUALLY contains notes from that department about that student.

Looking at the 1st row of the data above, we have a student who took Math & Spanish and each department added some comments about the student. For each row, I want to add a dict that summarises the data for each student. Basically a JSON summary of each departments entry. Assuming a string of the general form

json_string = {"student name": a, "data": {"notes": b, "Course name": c}}

I intend my code to read my csv, form a dict for each department and APPEND it to Holder column. Thus for the above student(1st row), there will be 2 dicts namely

{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}
{"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}

and the final contents of Holder for row 1 will be

{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}

when I can successfully append the data, I will probably add a comma or '|' in between the seperate dicts. The line of code that I have written is

df.loc[df[innercat] == -1, 'Holder'] = str(odata)

whether or not I cast the above line as str(), writing the assignment instead of the append operator appears to overwrite all the previous values and only write the last value into Holder, something like

-1    Gr8 student  0                        0                  0              0                    -1        Foo            {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}

while I want

-1    Gr8 student  0                        0                  0              0                    -1        Foo            {"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}

For anyone interested in reproducing what I have done, the main part of my code is shown below

count = 0
substrategy = 0
for cat in col_array:
    count += 1
    for innercat in cat:        
        if "Notes" in innercat:
            #b = str(df[innercat])
            continue
        substrategy += 1
        c = count
        a = substrategy
        odata = {}
        odata['did'] = a
        odata['id'] = a        
        odata['data'] = {}
        odata['data']['notes'] = b
        odata['data']['substrategy'] = a
        odata['data']['strategy'] = c
        df.loc[df[innercat] == -1, 'Holder'] += str(odata)
user1801060
  • 2,733
  • 6
  • 25
  • 44

1 Answers1

1

is that what you want?

In [190]: d1 = {"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}

In [191]: d2 = {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}

In [192]: import json

In [193]: json.dumps(d1)
Out[193]: '{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}'

In [194]: df
Out[194]:
   Investments_Cash  Holder
0                 0     NaN
1                 0     NaN
2                -1     NaN

In [196]: df.Holder = ''

In [197]: df.ix[df.Investments_Cash == -1, 'Holder'] += json.dumps(d1)

In [198]: df.ix[df.Investments_Cash == -1, 'Holder'] += ' ' + json.dumps(d2)

In [199]: df
Out[199]:
   Investments_Cash
                   Holder
0                 0
1                 0
2                -1  {"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course nam...

NOTE: it will be really painful to work / parse your Holder column in future, because it's not standard - you won't be able to parse it back without additional preprocessing (for example splitting using complex RegEx'es, etc.)

So i would strongly recommend you to convert a list of dicts to JSON - you'll be able to read it back using json.loads() method:

In [201]: df.ix[df.Investments_Cash == -1, 'Holder'] = json.dumps([d1, d2])

In [202]: df
Out[202]:
   Investments_Cash
                   Holder
0                 0

1                 0

2                -1  [{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}}, {"student name": "Peter", "data": {"notes": "Foo", "Course n...

parse it back:

In [204]: lst = json.loads(df.ix[2, 'Holder'])

In [205]: lst
Out[205]:
[{'data': {'Course name': 'Math1', 'notes': 'Gr8 student'},
  'student name': 'Peter'},
 {'data': {'Course name': 'Spanish1', 'notes': 'Foo'},
  'student name': 'Peter'}]

In [206]: lst[0]
Out[206]:
{'data': {'Course name': 'Math1', 'notes': 'Gr8 student'},
 'student name': 'Peter'}

In [207]: lst[1]
Out[207]: {'data': {'Course name': 'Spanish1', 'notes': 'Foo'}, 'student name': 'Peter'}
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • @user1801060, it's not yet clear. Make an example of how your `dict1 dict2 dict3` should look like - is it a concatenated string representation of dicts? is it a list of dicts? is it something different? Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and change your question accordingly – MaxU - stand with Ukraine Jul 30 '16 at 16:01
  • Updated the question – user1801060 Jul 30 '16 at 17:03
  • @user1801060, what is `{"student name": "Peter", "data": {"notes": "Gr8 student", "Course name": "Math1"}} {"student name": "Peter", "data": {"notes": "Foo", "Course name": "Spanish1"}}`? It's not a pandas/numpy/python dtype. Is it a string? – MaxU - stand with Ukraine Jul 30 '16 at 17:09
  • It is created as a dict, cast as a string . Notice that there are 2 dicts, seperated by a space – user1801060 Jul 30 '16 at 17:20
  • That's close to it. However, a student can take more than 2 courses. In the full scale model, there's at least one entry(dict), but there can be up to 50 seperate dicts per row. Also "Holder" doesn't contain a -1, by default it's "NaN" – user1801060 Jul 30 '16 at 19:49
  • @user1801060, so just collect all your dicts into a list and convert this list to JSON string: `json.dumps(list_of_dicts)` - that's it – MaxU - stand with Ukraine Jul 30 '16 at 19:51