1

With a column containing Timestamp values, I am getting inconsistent results about whether the elements have the attribute astype:

In [30]: o.head().datetime.map(lambda x: hasattr(x, 'astype'))
Out[30]: 
0    False
1    False
2    False
3    False
4    False
Name: datetime, dtype: bool

In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]

In [32]: o.datetime.dtype
Out[32]: dtype('<M8[ns]')

In [33]: o.datetime.head()
Out[33]: 
0   2012-09-30 22:00:15.003000
1   2012-09-30 22:00:16.203000
2   2012-09-30 22:00:18.302000
3   2012-09-30 22:03:37.304000
4   2012-09-30 22:05:17.103000
Name: datetime, dtype: datetime64[ns]

If I pick off the first element (or any single element) and ask if it has attr astype, I see that it does, and I even can convert to other formats.

But if I type to do this to the entire column in one go, with Series.map, I get an error claiming that Timestamp objects do not have the attribute astype (though they clearly do).

How can I achieve mapping the operation to the column with Pandas? Is this a known error?

Version: pandas 0.13.0, numpy 1.8

Added

It appears to be some sort of implicit casting on the part of either pandas or numpy:

In [50]: hasattr(o.head().datetime[0], 'astype')
Out[50]: False

In [51]: hasattr(o.head().datetime.values[0], 'astype')
Out[51]: True
ely
  • 74,674
  • 34
  • 147
  • 228

1 Answers1

2

Timestamps do not have an astype method. But numpy.datetime64's do.

NDFrame.values returns a numpy array. o.head().datetime.values returns a numpy array of dtype numpy.datetime64, which is why

In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]

Note that Series.__iter__ is defined this way:

def __iter__(self):
    if  com.is_categorical_dtype(self.dtype):
        return iter(self.values)
    elif np.issubdtype(self.dtype, np.datetime64):
        return (lib.Timestamp(x) for x in self.values)
    elif np.issubdtype(self.dtype, np.timedelta64):
        return (lib.Timedelta(x) for x in self.values)
    else:
        return iter(self.values)

So, when the dtype of the Series is np.datetime64, iteration over the Series returns Timestamps. This is where the implicit conversion takes place.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • So there is an implicit type conversion depending on whether you access the data within a pandas context, or bring it out to a numpy context and then access it? – ely Oct 13 '14 at 23:45
  • I'm not sure what you mean by context. Objects are objects. If you iterate through the Series, Timestamps are returned. But if you iterate through Series.values, then datetime64s are returned. – unutbu Oct 13 '14 at 23:46
  • Why would `o.head().datetime[0]` and `o.head().datetime.values[0]` have different attributes? In other words, if something is a pandas Series of `x`'s, then shouldn't each entry in pandas.Series.values be an `x`? If it is a pandas Series of `Timestamp`s why isn't it a numpy ndarray of `Timestamp`s when you use `values`? – ely Oct 13 '14 at 23:48
  • When you go to `.values` on any Pandas object, you are leaving 'Pandas' and go back to the underlying numpy realm. That's why the iteration or indexing of a Pandas Series returns you an object that is of higher value in the Pandas world, while accessing .values on *any* Pandas object just returns you the underlying numpy object,which is useful if the Pandas features offered are not enough for your needs. – K.-Michael Aye Oct 14 '14 at 00:01
  • It's still quite misleading. Imagine if this happened when you called `to_dict`. You could just as well say that you are "leaving Pandas" and going back to "pure Python", and then apply some type conversion on the values that will be `dict` values. Then `o.head().datetime.to_dict()[0]` would be different than `o.head().datetime[0]`. In any of these cases, if you are asking for some iterable thing that has values in it as a sequence (whether dict, Series, or ndarray), you expect the entries to be references to a single value in memory. You don't expect to get a different value. – ely Oct 14 '14 at 00:08
  • In other words, it's reasonable to expect (dare I say least astonishment) that `id(o.datetime[0]) == id(o.datetime.values[0]) == id(o.datetime.to_dict()[0])` when the elements are complex objects. Or *at least* that they satisfy `operator.eq`. But now that I know this is not true, I can just code around it. – ely Oct 14 '14 at 00:11