1

Why does the index seem to get appended to the first column of a .loc boolean selected row of a dataframe?

Dataframe:

       date  price
0  20180926    100
1  20180925     99
2  20180924     98
3  20180923     97

Code:

import pandas as pd
d = {'date': ['20180926', '20180925','20180924','20180923'], 'price': [100,99,98,97]}
df = pd.DataFrame(d)
a = df.loc[df['date'] == '20180924']
print(a['date'])

Yields:

2    20180924
Name: date, dtype: object

The "2" index seems to be automatically appended to the front of the 'date' field.

Whereas:

b=a.iloc[0]['date']
print(b)

Yields:

20180924

I expected both methods to yield the same result as 'b'.

tommylicious
  • 309
  • 5
  • 11

2 Answers2

2

It looks like when you pass a list into loc or iloc on a dataframe, a dataframe will always be returned (notice that df['date'] == '20180924' is a list of booleans).

type(df.loc[df['date'] == '20180924']) = pandas.core.frame.DataFrame
type(df.loc[[0]]) = pandas.core.frame.DataFrame
type(df.iloc[[0]]) = pandas.core.frame.DataFrame

However, if you pass in an index (assuming your dataframe is not multiindexed) into loc or iloc on a dataframe, it will result in a Series:

type(df.loc[0]) = pandas.core.series.Series
type(df.iloc[0]) = pandas.core.series.Series

df.loc[0] and df.iloc[0] are identical (though this is not always the case, see here why). This is the result:

date     20180926
price         100
Name: 0, dtype: object

Likewise, if you pass in an index into iloc on a Series, it will result in a scalar (i.e. a value is returned):

type(df.iloc[0].iloc[0])

In this case you are picking the 0th positioned element in the series df.iloc[0], which is '20180926'. Notice that calling df.iloc[0].loc[0] is not valid as 0 is NOT an index in this series. The indexes in df.iloc[0] series are date and price.

Joe Patten
  • 1,664
  • 1
  • 9
  • 15
1

Using [] will return the pd.Series

a['date'][2]
Out[257]: '20180924'
a.iloc[0]['date']
Out[258]: '20180924'

a.loc[2,'date']
Out[259]: '20180924'
BENY
  • 317,841
  • 20
  • 164
  • 234
  • how come a['date'][0] and a['date'][1] throw KeyErrors? what is in those locations? – tommylicious Sep 27 '18 at 02:57
  • 1
    @tommylicious a['date'][0] here is the explanation , date you select the columns , and [2] is to select the index, since in your a , if you print it , only have the index 2 – BENY Sep 27 '18 at 02:58
  • 1
    `a['date'][0]` and `a['date'][1]` throw KeyErrors since `a` is a dataframe with only one index: 2. It kept that index from the original dataframe, df. Now `a['date']` makes a series (it converts the column 'date' into a series, and keeps the original index). Thus the series will have only one index: 2. – Joe Patten Sep 27 '18 at 03:07
  • @Wen no problem :D – Joe Patten Sep 27 '18 at 03:08