Why is Pandas not returing a scalar/string instead of a Series when accessing a DataFrame with both a column and an index value?

Question

Assuming the following 3 Pandas Dataframes:

df1:

   ask.c    ask.h    ask.l    ask.o    bid.c    bid.h    bid.l    bid.o complete    mid.c    mid.h    mid.l    mid.o  volume     date 

0  1.39146  1.39148  1.39146  1.39146  1.39138  1.39140  1.39136  1.39136     True  1.39142  1.39144  1.39141  1.39141      5    2014-03-19 09:00:00
.  .  
.  . 
.  .

df2:

   ask.c    ask.h    ask.l    ask.o    bid.c    bid.h    bid.l    bid.o complete    mid.c    mid.h    mid.l    mid.o  volume   date

0  1.39147  1.39148  1.39147  1.39147  1.39138  1.39138  1.39137  1.39137     True  1.39142  1.39142  1.39142  1.39142    4    2014-03-19 09:00:05
1  1.39149  1.39149  1.39148  1.39149  1.39138  1.39141  1.39138  1.39141     True  1.39143  1.39145  1.39143  1.39145    3    2014-03-19 09:00:10
.  .  
.  . 
.  .

df is built with:

df = pandas.concat([df1,df2], ignore_index=True)
df.drop_duplicates("date", keep="first", inplace=True)
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date")

Why is df["ask.c"]["2014-03-19 09:00:00"] returning a Panda Series instead of a scalar/string value ?

print(type(df1["ask.c"]["2014-03-19 09:00:00"]))¬                                                                              
<class 'pandas.core.series.Series'>

print(df1["ask.c"]["2014-03-19 09:00:00"])¬
Date
2014-03-19 09:00:00    1.39146
Name: ask.c, dtype: object

EDIT:

@IIya V.Schurov's explanation got me on the right track but we are not quite fully there yet. Pandas has something called "DatetimeIndex Partial String Indexing" which is a form slicing.

Hence the expression df["2016-12-07"] would return a series rather than a scalar. The documentation mentions that df["2016-12-07 09:00:00"] isnot a slice nor resolve to one and will raise a KeyError but in my case, it does not raise any KeyError and does resolve like a slice.

Could you make your example [reproducible](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)? Currently I cannot reproduce it as `pd.read_clipboard` fails. — Ilya V. Schurov, Dec 06 '16 at 16:14
If you are using `ignore_index=True` then your index will be 0 to n-1 so something is wrong with `df`. Also, if you want to ensure a scalar is returned use `.at` or `.iat`. — Ted Petrou, Dec 06 '16 at 16:19
@IlyaV.Schurov I had simplified the step in the post. The post is now updated with all operations applied onto the dataframe. To fully reproduce it, you would need to clone + checkout the pull request https://github.com/pydata/pandas-datareader/pull/257 and then run the test "test_oanda_historical_currencypair2:" cd "`pwd`/panda-datareader-oanda.git"; nosetests --with-coverage --cover-package=pandas_datareader --nocapture -v — David Andreoletti, Dec 07 '16 at 00:04
is the Date the index in your dataframe? If you set ignore_index = True, you will lose the index. I don't think you can do the following df["ask.c"]["2014-03-19 09:00:00"] — Just don't rain, Dec 06 '16 at 16:15
You are right. I edited the post showing all operations carried onto the dataframe. — David Andreoletti, Dec 07 '16 at 00:05

Ilya V. Schurov · Accepted Answer · 2016-12-22T22:15:04.483

Adding a new answer to keep previous one for historical purposes.

Minimal example

import pandas as pd
series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
                                                '2016-12-07 09:00:05',
                                                '2016-12-08 09:00:10']))
print(type(series["2016-12-07 09:00:00"]))
# <class 'pandas.core.series.Series'>

The documentation

As the docs discuss, one can use string on DateTimeIndex's __getitem__/[] operator to get so-called Partial String Indexing. It works like this:

series['2016-12-07']
# give me all records for December, 07 2016

2016-12-07 09:00:00    1
2016-12-07 09:00:05    2
dtype: int64

One can explicitly provide a timestamp to get scalar instead of Series:

print(type(series[pd.to_datetime("2016-12-07 09:00:00")]))
# <class 'numpy.int64'>

Resolutions

String used as subscript can be detected as coerceable to slice or not. In the former case, an output of [] is Series. Otherwise, it is scalar (or KeyError). Moreover, if string is detected as coercable to slice, it can be used to index DataFrame as a slice (i.e. row-wise, not column-wise):

df = pd.DataFrame(series)
print(df["2016-12-07"])

                     0
2016-12-07 09:00:00  1
2016-12-07 09:00:05  2

When string can be considered as a slice? It depends on the resolution of this string and DateTimeIndex instance's resolution. Examples:

series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-06 23:59:00'
                                                '2016-12-07 01:00:00',
                                                '2016-12-07 01:01:00',
                                                '2016-12-07 01:02:00']))
print(series.index.resolution)
# minute
print(type(series["2016-12-06"]))
# <class 'pandas.core.series.Series'>
print(type(series["2016-12-07"]))
# <class 'pandas.core.series.Series'>
print(type(series["2016-12-07 01"]))
# <class 'pandas.core.series.Series'>
print(type(series["2016-12-07 01:01"]))
# <class 'numpy.int64'>

So generally string is a slice if its resolution is less than the resolution of DateTimeIndex instance. However, there is an exception for second resolution.

series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-06 23:59:00'
                                                '2016-12-07 01:00:00',
                                                '2016-12-07 01:01:00',
                                                '2016-12-07 01:02:01']))
print(series.index.resolution)
# second
# this is due to 1 second at the last timestamp
print(type(series["2016-12-07 01:01:00"]))
# <class 'pandas.core.series.Series'>

I can't answer, why this decision was made (see the discussion here for the references in the code), but it definitely the OP's case (as df2['date']'s resolution is second). So every timestamp with second resolution will be resolved to slice.

UPD. Finally the behavior of slicing for second resolution is fixed (in currrent master, pending to 0.20 release).

@IIya V. Schurov It is exactly my case. I had other datasets with minutes resolution/periodicity and Pandas returned a scalar (which triggers this particular question on SO). — David Andreoletti, Dec 09 '16 at 15:39
Note that currently it is a confirmed pandas [bug](https://github.com/pandas-dev/pandas/issues/14826), so this behaviour may change in the future. Thanks for interesting question. — Ilya V. Schurov, Dec 09 '16 at 16:45

score 0 · Answer 2 · edited May 23 '17 at 11:45

This answer is kept for historical purposes only. Please, refer to the other answer.

Okay, I got it. First of all, here is a simple reproduction:

import pandas as pd
series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
                                                '2016-12-07 09:00:05',
                                                '2016-12-08 09:00:10']))
print(type(series["2016-12-07 09:00:00"]))
# <class 'pandas.core.series.Series'>

My first idea was to replace the string with an actual Timestamp instance.

print(type(series[pd.to_datetime("2016-12-07 09:00:00")]))
# <class 'numpy.int64'>

Woah, it works! Then I tried to figure out, why pandas behave this way, and I runned the code above in the debugger. My first suggestion was it treats strings as sequences and therefore produces new Series instead of scalar value, but in fact it is more interesting. This behavior is due to fact that you indeed can pick a series of values with single string, like this:

series['2016-12-07']

2016-12-07 09:00:00    1
2016-12-07 09:00:05    2
dtype: int64

So pandas can understand that a string can represent date-time with some precision and if DateTimeIndex has several records that agrees with the string (up to precision given by the string), it returns all of them.

As it cannot understand, do you really want to get an interval or just one record from the datetime-like string, it returns an interval. Thus Series instead of scalar.

Morale: date-time is tricky. Explicit is better then implicit. Use explicit conversion to Timestamp and the Force will be with you.

EDIT. (Re: EDIT in the question.)

It seems that the behaviour depends on periodicity of index. If I make index periodic, I will get the following output:

series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
                                                '2016-12-08 09:00:00',
                                                '2016-12-09 09:00:00']))
print(type(series["2016-12-07 09:00:00"]))

# <class 'numpy.int64'>

Even more interesting:

df = pd.DataFrame(series)
df['2016-12-07 09:00:00']

KeyError: '2016-12-07 09:00:00'

I believe that this is the thing described in the docs:

Warning The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)

However, it is seem to me a bit strange that behaviour of Series is not the same as in DataFrame and that it depends so heavily on periodicity of index.

@IIya V. Schurov Your logic is correct in general and matches most Pandas's documenation about it (see details in my updated post). `series[pd.to_datetime("2016-12-07 09:00:00")]` is the way to go. Thanks for your explanation. — David Andreoletti, Dec 08 '16 at 00:35

Why is Pandas not returing a scalar/string instead of a Series when accessing a DataFrame with both a column and an index value?

2 Answers2

Minimal example

The documentation

Resolutions