Pandas gives incorrect result when asking if Timestamp column values have attr astype

Question

With a column containing Timestamp values, I am getting inconsistent results about whether the elements have the attribute astype:

In [30]: o.head().datetime.map(lambda x: hasattr(x, 'astype'))
Out[30]: 
0    False
1    False
2    False
3    False
4    False
Name: datetime, dtype: bool

In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]

In [32]: o.datetime.dtype
Out[32]: dtype('<M8[ns]')

In [33]: o.datetime.head()
Out[33]: 
0   2012-09-30 22:00:15.003000
1   2012-09-30 22:00:16.203000
2   2012-09-30 22:00:18.302000
3   2012-09-30 22:03:37.304000
4   2012-09-30 22:05:17.103000
Name: datetime, dtype: datetime64[ns]

If I pick off the first element (or any single element) and ask if it has attr astype, I see that it does, and I even can convert to other formats.

But if I type to do this to the entire column in one go, with Series.map, I get an error claiming that Timestamp objects do not have the attribute astype (though they clearly do).

How can I achieve mapping the operation to the column with Pandas? Is this a known error?

Version: pandas 0.13.0, numpy 1.8

Added

It appears to be some sort of implicit casting on the part of either pandas or numpy:

In [50]: hasattr(o.head().datetime[0], 'astype')
Out[50]: False

In [51]: hasattr(o.head().datetime.values[0], 'astype')
Out[51]: True

unutbu · Accepted Answer · 2014-10-14T00:26:37.440

2

Timestamps do not have an astype method. But numpy.datetime64's do.

NDFrame.values returns a numpy array. o.head().datetime.values returns a numpy array of dtype numpy.datetime64, which is why

In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]

Note that Series.__iter__ is defined this way:

def __iter__(self):
    if  com.is_categorical_dtype(self.dtype):
        return iter(self.values)
    elif np.issubdtype(self.dtype, np.datetime64):
        return (lib.Timestamp(x) for x in self.values)
    elif np.issubdtype(self.dtype, np.timedelta64):
        return (lib.Timedelta(x) for x in self.values)
    else:
        return iter(self.values)

So, when the dtype of the Series is np.datetime64, iteration over the Series returns Timestamps. This is where the implicit conversion takes place.

edited Oct 14 '14 at 00:26

answered Oct 13 '14 at 23:38

unutbu

842,883
184
1,785
1,677

So there is an implicit type conversion depending on whether you access the data within a pandas context, or bring it out to a numpy context and then access it? – ely Oct 13 '14 at 23:45
I'm not sure what you mean by context. Objects are objects. If you iterate through the Series, Timestamps are returned. But if you iterate through Series.values, then datetime64s are returned. – unutbu Oct 13 '14 at 23:46
Why would `o.head().datetime[0]` and `o.head().datetime.values[0]` have different attributes? In other words, if something is a pandas Series of `x`'s, then shouldn't each entry in pandas.Series.values be an `x`? If it is a pandas Series of `Timestamp`s why isn't it a numpy ndarray of `Timestamp`s when you use `values`? – ely Oct 13 '14 at 23:48
When you go to `.values` on any Pandas object, you are leaving 'Pandas' and go back to the underlying numpy realm. That's why the iteration or indexing of a Pandas Series returns you an object that is of higher value in the Pandas world, while accessing .values on *any* Pandas object just returns you the underlying numpy object,which is useful if the Pandas features offered are not enough for your needs. – K.-Michael Aye Oct 14 '14 at 00:01
It's still quite misleading. Imagine if this happened when you called `to_dict`. You could just as well say that you are "leaving Pandas" and going back to "pure Python", and then apply some type conversion on the values that will be `dict` values. Then `o.head().datetime.to_dict()[0]` would be different than `o.head().datetime[0]`. In any of these cases, if you are asking for some iterable thing that has values in it as a sequence (whether dict, Series, or ndarray), you expect the entries to be references to a single value in memory. You don't expect to get a different value. – ely Oct 14 '14 at 00:08
In other words, it's reasonable to expect (dare I say least astonishment) that `id(o.datetime[0]) == id(o.datetime.values[0]) == id(o.datetime.to_dict()[0])` when the elements are complex objects. Or *at least* that they satisfy `operator.eq`. But now that I know this is not true, I can just code around it. – ely Oct 14 '14 at 00:11

Pandas gives incorrect result when asking if Timestamp column values have attr astype

1 Answers1

Linked