23

I'm sensing some weird pandas behavior here. I have a dataframe that looks like

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])

In [14]: df
Out[14]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN   NaN   NaN
(2, b)   NaN   NaN   NaN

I can set the value of an arbitrary element

In [15]: df['Col 2'].loc[('1', 'b')] = 6

In [16]: df
Out[16]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN     6   NaN
(2, b)   NaN   NaN   NaN

But when I go to reference the element that I just set using the same syntax, I get

In [17]: df['Col 2'].loc[('1', 'b')]
KeyError: 'the label [1] is not in the [index]'

Can someone tell me what I'm doing wrong or why this behavior occurs? Am I simply not allowed to set the index as a multi-element tuple?

Edit

Apparently, wrapping the tuple index in a list works.

In [38]: df['Col 2'].loc[[('1', 'b')]]
Out[38]:
(1, b)    6
Name: Col 2, dtype: object

Although I'm still getting some weird behavior in my actual use case so it'd be nice to know if this is not recommended usage.

lanery
  • 5,222
  • 3
  • 29
  • 43
  • 1
    The response in [this question](https://stackoverflow.com/questions/25476880/using-dataframe-ix-with-a-tuple-index-in-pandas) suggests it's not recommended usage cause of ambiguity between tuple keys and MultiIndex selection. – p-robot Oct 21 '16 at 23:15
  • wrapping the tuple index in a list worked for me – ZakS Aug 30 '22 at 11:02

1 Answers1

23

Your tuple in the selection brackets is seen as a sequence containing the elements you want to retrieve. It's like you would have passed ['1', 'b'] as argument. Thus the KeyError message: pandas tries to find the key '1' and obviously doesn't find it.

That's why it works when you add additional brackets, as now the argument becomes a sequence of one element - your tuple.

You should avoid dealing with ambiguities around list and tuple arguments in selection. The behavior can be also different depending on the index being a simple index or a multiindex.

In any case, if you ask about recommendations here, the one I see is that you should try to not build simple indexes made of tuples: pandas will work better and will be more powerful to use if you actually build a multiindex instead:

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]))

df['Col 2'].loc[('1', 'b')] = 6

df['Col 2'].loc[('1', 'b')]
Out[13]: 6

df
Out[14]: 
    Col 1 Col 2 Col 3
1 a   NaN   NaN   NaN
2 a   NaN   NaN   NaN
1 b   NaN     6   NaN
2 b   NaN   NaN   NaN
Zeugma
  • 31,231
  • 9
  • 69
  • 81