I have a question about the differences between df.loc
and df.at
with respect to dataframes with a MultiIndex. I've been looking at a few wonderful resources from stackoverflow but it doesn't seem to shed on light on my issue. This one in particular... pandas .at versus .loc (or at least I do not fully understand what is being displayed here).
Per the pandas documentation, https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.at.html, df.at
is supposed to return singular values, and its faster than df.loc
, so i'm inclined to want to use df.at
. Let me show my confusion as it applies to using df.at
with a MultiIndex.
I have the following dataframe:
df = pd.DataFrame({'field1':['foo']*6, 'field2':['bar']*6, 'field3':
['a','a','b','b','b','c'],'value1':[0.4,0.5,0.4,0.7,.9,.4],'value2':
[4000,4000,9000,9000,9000,10000]}, index=range(6))
df
Out[329]:
field1 field2 field3 value1 value2
0 foo bar a 0.4 4000
1 foo bar a 0.5 4000
2 foo bar b 0.4 9000
3 foo bar b 0.7 9000
4 foo bar b 0.9 9000
5 foo bar c 0.4 10000
I'd like to access this dataframe using a MultiIndex, so I'm doing the following:
df = df.set_index(['field1','field2','field3'])
So now I'd like to access value1
in my df
at ('foo','bar','c')
which is a singular value, and it errors.
df.at[('foo','bar','c'),'value1']
Traceback (most recent call last):
File "<ipython-input-344-921b8b658a49>", line 1, in <module>
df.at[('foo','bar','c'),'value1']
File "C:\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1610,
in __getitem__
return self.obj.get_value(*key, takeable=self._takeable)
File "C:\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1836, in
get_value
return engine.get_value(series.get_values(), index)
File "pandas\index.pyx", line 103, in pandas.index.IndexEngine.get_value
(pandas\index.c:3234)
File "pandas\index.pyx", line 111, in pandas.index.IndexEngine.get_value
(pandas\index.c:2931)
File "pandas\index.pyx", line 152, in pandas.index.IndexEngine.get_loc
(pandas\index.c:3830)
File "pandas\index.pyx", line 170, in
pandas.index.IndexEngine._get_loc_duplicates (pandas\index.c:4154)
TypeError: only integer arrays with one element can be converted to an index
I'm assuming this is returning a series object, which cannot be expressed a single value? That is just my assumption given the output with df.loc
.
df.loc[('foo','bar','c')]['value1']
Out[345]:
field1 field2 field3
foo bar c 0.4
Name: value1, dtype: float64
Now if I wasn't using a MultiIndex, I assume this issue does not arise...
Is there anyway around this, or I am clearly missing something? Thank you