12

I have a question about the differences between df.loc and df.at with respect to dataframes with a MultiIndex. I've been looking at a few wonderful resources from stackoverflow but it doesn't seem to shed on light on my issue. This one in particular... pandas .at versus .loc (or at least I do not fully understand what is being displayed here).

Per the pandas documentation, https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.at.html, df.at is supposed to return singular values, and its faster than df.loc, so i'm inclined to want to use df.at. Let me show my confusion as it applies to using df.at with a MultiIndex.

I have the following dataframe:

df = pd.DataFrame({'field1':['foo']*6, 'field2':['bar']*6, 'field3': 
['a','a','b','b','b','c'],'value1':[0.4,0.5,0.4,0.7,.9,.4],'value2': 
[4000,4000,9000,9000,9000,10000]}, index=range(6))

df
Out[329]: 
  field1 field2 field3  value1  value2
0    foo    bar      a     0.4    4000
1    foo    bar      a     0.5    4000
2    foo    bar      b     0.4    9000
3    foo    bar      b     0.7    9000
4    foo    bar      b     0.9    9000
5    foo    bar      c     0.4   10000

I'd like to access this dataframe using a MultiIndex, so I'm doing the following:

df = df.set_index(['field1','field2','field3'])

So now I'd like to access value1 in my df at ('foo','bar','c') which is a singular value, and it errors.

df.at[('foo','bar','c'),'value1']
Traceback (most recent call last):

  File "<ipython-input-344-921b8b658a49>", line 1, in <module>
    df.at[('foo','bar','c'),'value1']

  File "C:\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1610, 
in __getitem__
    return self.obj.get_value(*key, takeable=self._takeable)

  File "C:\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1836, in 
get_value
    return engine.get_value(series.get_values(), index)

  File "pandas\index.pyx", line 103, in pandas.index.IndexEngine.get_value 
(pandas\index.c:3234)

  File "pandas\index.pyx", line 111, in pandas.index.IndexEngine.get_value 
(pandas\index.c:2931)

  File "pandas\index.pyx", line 152, in pandas.index.IndexEngine.get_loc 
(pandas\index.c:3830)

  File "pandas\index.pyx", line 170, in 
pandas.index.IndexEngine._get_loc_duplicates (pandas\index.c:4154)

TypeError: only integer arrays with one element can be converted to an index

I'm assuming this is returning a series object, which cannot be expressed a single value? That is just my assumption given the output with df.loc.

df.loc[('foo','bar','c')]['value1']
 Out[345]: 
field1  field2  field3
foo     bar     c         0.4
Name: value1, dtype: float64

Now if I wasn't using a MultiIndex, I assume this issue does not arise...

Is there anyway around this, or I am clearly missing something? Thank you

jboxxx
  • 707
  • 10
  • 19

1 Answers1

1

You can try something like this:

# setting multiindex
df = df.set_index(['field1','field2','field3'])

Now when you use df.at like this: df.at[('foo','bar','c')]['value1'], you will get the desired result:

field1  field2  field3
foo     bar     c         0.4
Name: value1, dtype: float64

According to my attempts you probably didn't query the dataframe using at correctly.

iamarchisha
  • 175
  • 7