1

I have the following DataFrame named pop:

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561

I want to print out values of 2010 in California and Texas. Whenever I try pop[['California','Texas'], 2010] I meet the error '(['California', 'Texas'], 2010)' is an invalid key How can I print the information then?

zabop
  • 6,750
  • 3
  • 39
  • 84

2 Answers2

1

TLDR

df.loc[(level_1_indices_list, level_2_indices_list), :]

which is, in this case:

df.loc[(['California','Texas'], ['2010']), :]

Below is a more elaborated version.


# import packages & set seed
import numpy as np
import pandas as pd
np.random.seed(42)

Create example dataframe

Using the Pandas docs:

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]

s = pd.Series(np.random.randn(8), index=arrays)

df = pd.DataFrame(np.random.randn(8, 4), index=arrays)

This will look like:

enter image description here


Slicing using multiindex

With df you can do:

df.loc[(['qux','foo'], 'one'), :]

For notational consistency you can use [] on the second element of the slice:

df.loc[(['qux','foo'], ['one']), :]

which will yield the same result.

Which is:

enter image description here

(Selecting 'one' is equivalent to selecting 2010 in your df. ['qux','foo'] should be equivalent to selecting ['California','Texas']. Based on this, I think you can apply the steps here to your data.)


This might also be helpful.

zabop
  • 6,750
  • 3
  • 39
  • 84
0

I would something like this with specifying column names which can be extended further by adding more &

df.loc[(df['level_0'].isin(['California', 'Texas'])) & (df['level_1'].isin(['2010']))]

geekzeus
  • 785
  • 5
  • 14