1

I have a DataFrame that looks like this:

+-----+------+--------+
| idx | Col1 |  Col2  |
+-----+------+--------+
|   0 | A    | [1, 2] |
|   1 | B    | [3, 4] |
|   2 | C    | [5, 6] |
+-----+------+--------+

What I would like to accomplish is a new column layout like this:

+-----+------+-------------+
| idx | Col1 |    Col2     |
+-----+------+------+------+
|     |      | sub1 | sub2 |
+-----+------+------+------+
|   0 | A    |   1  |   2  |
|   1 | B    |   3  |   4  |
|   2 | C    |   5  |   6  |
+-----+------+------+------+

The end goal is to be able to do a df.query() like the following:

df.query("Col2.sub1 == 3 & Col2.sub2 == 4")

to get the row at index 1.

Is this even possible with df.query()?

Edit This is what produces the first table.

records = [{'Col1': 'A', 'Col2': [1, 2]},{'Col1': 'B', 'Col2': [3,4]},{'Col1': 'C', 'Col2': [5,6]}]
df = pd.DataFrame.from_records(records)
Zexelon
  • 495
  • 5
  • 18
  • 1
    you could just do `df.loc[df['Col2'].isin([[1,2]])]` – rhug123 Aug 17 '22 at 20:42
  • 1
    Please provide the DataFrame constructor of your input for easy reproducibility. – mozway Aug 17 '22 at 20:47
  • Added constructor. – Zexelon Aug 17 '22 at 20:55
  • I would rename this question to "Can you query a Multiindex?". The first part of the question is answered [here](https://stackoverflow.com/q/35491274/14627505), but I couldn't find the solution to the second part anywhere online. This could be the place for it. – Vladimir Fokow Aug 17 '22 at 21:49

1 Answers1

0

Firstly, split lists into columns:

df[['sub1', 'sub2']] = pd.DataFrame(df['Col2'].tolist(), index=df.index)
df = df.drop(columns='Col2')

  Col1  sub1  sub2
0    A     1     2
1    B     3     4
2    C     5     6

Create a Multiindex:

df.columns = pd.MultiIndex.from_arrays([['0', 'Col2', 'Col2'], 
                                       df.columns.tolist()])
  0     Col2     
  Col1  sub1  sub2
0    A     1     2
1    B     3     4
2    C     5     6

Now, here is how you can query the Multiindex:

df.query("`('Col2', 'sub1')` == 3 & `('Col2', 'sub2')` == 4")

  0     Col2     
  Col1  sub1  sub2
1    B     3     4
Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27