1

My current code is

from numpy import *

def buildRealDataObject(x):
    loc = array(x[0])
    trueClass = x[1]
    evid = ones(len(loc))
    evid[isnan(loc)] = 0
    loc[isnan(loc)] = 0
    return DataObject(location=loc, trueClass=trueClass, evidence=evid)

if trueClasses is None:
    trueClasses = zeros(len(dataset), dtype=int8).tolist()    
realObjects = list(map(lambda x: buildRealDataObject(x), zip(dataset, trueClasses)))

and it is working. What I expect is to create for each row of the DataFrame dataset each combined with the corresponding entry of trueClasses a realObject. I am not really sure though why it is working because if run list(zip(dataset, trueClasses)) I just get something like [(0, 0.0), (1, 0.0)]. The two columns of dataset are called 0 and 1. So my first question is: Why is this working and what is happening here?

However I think this might still be wrong on some level, because it might only work due to "clever implicit transformation" on side of pandas. Also, for the line evid[isnan(loc)] = 0 I now got the error

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

How should I rewrite this code instead?

Make42
  • 12,236
  • 24
  • 79
  • 155
  • You could looks mystical, e.g. where is `isnan` from, and not very pythonic?. You better try giving some input, and tell people what's your expected output. – zyxue May 25 '17 at 18:55
  • @zyxue: It's from numpy. – Make42 May 25 '17 at 19:15
  • Give minimal executable code so people can copy/paste to see the exception you describe, otherwise people have to guess what dataset is, and what DataObject is. – jonnybazookatone May 26 '17 at 07:54

1 Answers1

4

Currently the zip works on columns instead of rows. Use one of the method from Pandas convert dataframe to array of tuples to make the zip work on rows instead of columns. For example substitute

zip(dataset, trueClasses)

with

zip(dataset.values, trueClasses)

Considiering this post, if you have already l = list(data_train.values) for some reason, then zip(l, eClass) is faster than zip(dataset.values, trueClasses). However, if you don't then the transformation takes too much time to make it worth it in my tests.

Make42
  • 12,236
  • 24
  • 79
  • 155