1

I have a numpy array-like (as stated here, simply an object that can be used to create a numpy array), and want to create a pandas.Series from it. According to its documentation, it supports array-likes. Now consider the following MWE.

import numpy as np
import pandas as pd

class ArrayLike:
    def __array__(self, dtype = None):
        return np.asarray([0, 1])

a = ArrayLike()
print(pd.Series(a))
print(pd.Series(np.asarray(a)))

This results in

0    <__main__.ArrayLike object at [...]>
dtype: object
0    0
1    1
dtype: int64

This is not what I would expect, since the whole point of the array-like is the ability to convert to a numpy array, so the behaviour when creating the series directly from my ArrayLike seems weird to me.

Is this intentional from pandas, and if so, what is the reasoning behind it? And is there any possibility to achive the behaviour of the second statement when directly calling pd.Series on my object?

502E532E
  • 431
  • 2
  • 11
  • what is your goal? – adir abargil Jul 20 '22 at 11:44
  • 1
    I haven't followed this closely, but I believe the test for a `__array__` method is a relatively recent addition to `numpy`, Even in your linked SO, the original 2016 answers don't mention it, and newer ones talk it about in the context of `typing`. A Series does have an `__array__` method (used by `np.asarray(aSeries)`. But the `Series.__init__` is much more complex, creating or using a `index` as well as the `data. Looking for the `__array__` method doesn't have same priority as with `np.asarray`. – hpaulj Jul 20 '22 at 16:14
  • The `Series` docs do not define `array like` in the same way as `np.array`. There's no mention of the `__array__` method. – hpaulj Jul 20 '22 at 16:20
  • 1
    This maybe relevant https://github.com/pandas-dev/pandas/issues/41807 – Dani Mesejo Jul 20 '22 at 17:24
  • @DaniMesejo yes, that is basically the answer I was looking for. So (sadly), pandas does not support array-likes the way I need, and has a different understanding of the term than numpy. – 502E532E Jul 20 '22 at 17:40

1 Answers1

2

The problem seems to be that pandas, check if the passed object is list-like first, and if not it wraps a list around the object (see source code):

if index is None:
    if not is_list_like(data):
        data = [data]

then it doesn't find the __array__ attribute when searching for it (see source code) because at this point data points to a list:

if hasattr(data, "__array__"):
    # e.g. dask array GH#38645
    data = np.asarray(data)
else:
    data = list(data)

One solution is to define __iter__:

import numpy as np
import pandas as pd

class ArrayLike:
    def __array__(self, dtype = None):
        return np.asarray([0, 1])

    def __iter__(self):
        return iter(np.asarray([0, 1]))

a = ArrayLike()
print(pd.Series(a))

Output

0    0
1    1
dtype: int64
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • The `__iter__` solution sounds quite good, should work in most cases. However, I have some checks inside `__array__` (raises an Error otherwise), which I do not want to have inside `__iter__`. So sadly this does not solve it for me. – 502E532E Jul 20 '22 at 17:38