4

Consider the following:

d = {'a': 0.0, 'b': 1.0, 'c': 2.0}

e = pd.Series(d, index = ['a', 'b', 'c'])

df = pd.DataFrame({ 'A' : 1.,'B' : e,'C' :pd.Timestamp('20130102')}).

When i try to access the first row of column B in the following way:

>>> df.B[0]
0.0

I get the correct result.

However, after reading KeyError: 0 when accessing value in pandas series, I was under the assumption that, since I have specified the index as 'a', 'b' and 'c', the correct way to access the first row of column B (using positional arguments) is: df.B.iloc[0] , and df.B[0] should raise a Key Error. I dont know what am I missing. Can someone clarify in which case do I get a Key Error ?

Yash
  • 510
  • 2
  • 6
  • 14
  • 2
    You should be using loc or at. See more info at https://stackoverflow.com/questions/48035493/pandas-select-rows-and-columns-based-on-boolean-condition/48035642#48035642. – cs95 Jul 20 '18 at 15:23

3 Answers3

11

Problem in your referenced Question is that index of given dataframe is integer, but does not start from 0.

Pandas behaviour when asking for df.B[0] is ambiguous and depends on data type of index and data type of value passed to python slice syntax. It can behave like df.B.loc[0] (index label based) or df.B.iloc[0] (position based) or probably something else I'm not aware of. For predictable behaviour I recommend using loc and iloc.

To illustrate this with your example:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # 0.0 - fall back to position based
df.B['0'] # KeyError - no label '0' in index
df.B['a'] # 0.0 - found label 'a' in index
df.B.loc[0] # TypeError - string index queried by integer value
df.B.loc['0'] # KeyError - no label '0' in index
df.B.loc['a'] # 0.0 - found label 'a' in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
df.B.iloc['a'] # TypeError - string can't be used for position

With example from referenced article:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = [4, 5, 6])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # KeyError - label 0 not in index
df.B['0'] # KeyError - label '0' not in index
df.B.loc[0] # KeyError - label 0 not in index
df.B.loc['0'] # KeyError - label '0' not in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
Justinas Marozas
  • 2,482
  • 1
  • 17
  • 37
1

df.B is actually a pandas.Series object (a shortcut for df['B']), which can be iterated. df.B[0] is no longer a "row" but just the first element of df.B, since by writing df.B you basically create a 1-D object.

More information in the data structure documentation

You can treat a DataFrame semantically like a dict of like-indexed Series objects.

NiGiord
  • 57
  • 6
0

df.B returns a pandas series which is why you can do positional indexing. If you select column B as a dataframe this will throw an error:

df[['B']][0]
xyzjayne
  • 1,331
  • 9
  • 25