4

I have a array like below

np.array(["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"d":9,"e":10,"f":11}])

and a pandas DataFrame like below

df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"d":9,"e":10,"f":11}]})

When I apply np.isreal to DataFrame

df.applymap(np.isreal)
Out[811]: 
       A
0  False
1  False
2   True
3  False
4  False
5   True

When I do np.isreal for the numpy array.

np.isreal( np.array(["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"d":9,"e":10,"f":11}]))
Out[813]: array([ True,  True,  True,  True,  True,  True], dtype=bool)

I must using the np.isreal in the wrong use case, But can you help me about why the result is different ?

BENY
  • 317,841
  • 20
  • 164
  • 234
  • This is even more confusing to me than the answer you gave to trigger this question! :). Not only why is it different, but why does it differentiate between strings and dicts in `pandas`? – roganjosh Oct 20 '17 at 20:40
  • 1
    @roganjosh I just have time to test it , even we use it in the wrong way, we are expected the same wrong answer , but this one ..LOL – BENY Oct 20 '17 at 20:42
  • 1
    Pandas is a bit of a red herring here, that just uses the element-wise behavior e.g. `[np.isreal(aa) for aa in np.array(["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"d":9,"e":10,"f":11}])]` – Andy Hayden Oct 20 '17 at 20:42

3 Answers3

7

A partial answer is that isreal is only intended to be used on array-like as the first argument.

You want to use isrealobj on each element to get the bahavior you see here:

In [11]: a = np.array(["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"d":9,"e":10,"f":11}])

In [12]: a
Out[12]:
array(['hello', 'world', {'a': 5, 'b': 6, 'c': 8}, 'usa', 'india',
       {'d': 9, 'e': 10, 'f': 11}], dtype=object)

In [13]: [np.isrealobj(aa) for aa in a]
Out[13]: [True, True, True, True, True, True]

In [14]: np.isreal(a)
Out[14]: array([ True,  True,  True,  True,  True,  True], dtype=bool)

That does leave the question, what does np.isreal do on something that isn't array-like e.g.

In [21]: np.isrealobj("")
Out[21]: True

In [22]: np.isreal("")
Out[22]: False

In [23]: np.isrealobj({})
Out[23]: True

In [24]: np.isreal({})
Out[24]: True

It turns out this stems from .imag since the test that isreal does is:

return imag(x) == 0   # note imag == np.imag

and that's it.

In [31]: np.imag(a)
Out[31]: array([0, 0, 0, 0, 0, 0], dtype=object)

In [32]: np.imag("")
Out[32]:
array('',
      dtype='<U1')

In [33]: np.imag({})
Out[33]: array(0, dtype=object)

This looks up the .imag attribute on the array.

In [34]: np.asanyarray("").imag
Out[34]:
array('',
      dtype='<U1')

In [35]: np.asanyarray({}).imag
Out[35]: array(0, dtype=object)

I'm not sure why this isn't set in the string case yet...

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 2
    This doesn't fully answer the question of what isreal is doing here, but the docs make it clear the first argument should be array-like (so I think all bets are off ..) and we have to look at the code – Andy Hayden Oct 20 '17 at 20:49
  • Thank you , Make me aware not all `numpy` function can be borrowed without think to pandas calculation – BENY Oct 20 '17 at 20:52
  • @Wen it tuns out it comes from interesting behavior of `.imag` https://github.com/numpy/numpy/blob/v1.13.0/numpy/lib/type_check.py#L221-L249 – Andy Hayden Oct 20 '17 at 20:54
  • I found it .. just not sure about `.imag` – BENY Oct 20 '17 at 20:56
  • @Wen Ah, that actually makes a *little* more sense than my theory, but it's still definitely peculiar. I had assumed that `isreal` was a ufunc but clearly it is not. – Iguananaut Oct 20 '17 at 21:03
  • @Iguananaut I think the imag property is set somewhere in C code (in constructor of array object)... – Andy Hayden Oct 20 '17 at 21:12
  • @Wen will update if I find the where imag is set in the C constructor. :) – Andy Hayden Oct 20 '17 at 21:15
  • @AndyHayden this is so interesting , you bring me do deep in it . :-) Appreciated your help – BENY Oct 20 '17 at 21:17
4

I think this a small bug in Numpy to be honest. Here Pandas is just looping over each item in the column and calling np.isreal() on it. E.g.:

>>> np.isreal("a")
False
>>> np.isreal({})
True

I think the paradox here has to do with how np.real() treats inputs of dtype=object. My guess is it's taking the object pointer and treating it like an int, so of course np.isreal(<some object>) returns True. Over an array of mixed types like np.array(["A", {}]), the array is of dtype=object so np.isreal() is treating all the elements (including the strings) the way it would anything with dtype=object.

To be clear, I think the bug is in how np.isreal() treats arbitrary objects in a dtype=object array, but I haven't confirmed this explicitly.

Iguananaut
  • 21,810
  • 5
  • 50
  • 63
  • 2
    we were both very fast, you just beat me :) – Andy Hayden Oct 20 '17 at 20:51
  • It's a strange question but definitely an interesting one? – Iguananaut Oct 20 '17 at 20:52
  • 1
    speaking of strangeness of the question: what is the purpose of the OP here? Why is @wen trying to use isreal on these strings and dictionaries, that are not even numbers. The purpose of a isreal is to know if something is complex or not according to numpy documentation. – Semihcan Doken Oct 20 '17 at 20:59
  • @Semihcan I raised it as a comment to an answer [here](https://stackoverflow.com/a/46856468/4799172) because I couldn't understand why `isreal` could be used for differentiating between dicts and strings. – roganjosh Oct 20 '17 at 21:01
  • 1
    @Semihcan sorry, I originally gave the wrong link. Fixed that in my last comment. I'm also curious of the origin of this as a filter on dataframe columns for this purpose. – roganjosh Oct 20 '17 at 21:08
  • Thank you , You answer is making a lot of sense too – BENY Oct 20 '17 at 21:14
1

There are a couple things going on here. First is pointed out by the previous answers in that np.isreal acts strangely when passed ojbects. However, I think you are also confused about what applymap is doing. Difference between map, applymap and apply methods in Pandas is always a great reference.

In this case what you think you are doing is actually:

df.apply(np.isreal, axis=1)

Which essentially calls np.isreal(df), whereas df.applymap(np.isreal) is essentially calling np.isreal on each individual element of df. e.g

np.isreal(df.A)

array([ True,  True,  True,  True,  True,  True], dtype=bool)

np.array([np.isreal(x) for x in df.A])

array([False, False,  True, False, False,  True], dtype=bool)
Grr
  • 15,553
  • 7
  • 65
  • 85
  • 1
    Related to this earlier comment https://stackoverflow.com/questions/46856988/np-isreal-behavior-different-in-pandas-dataframe-and-numpy-array/46857114#comment80661655_46856988 – Andy Hayden Oct 20 '17 at 21:14