1

Given lists of indices and columns, one can easily pull out the elements of a pandas DataFrame using the lookup() method. Is there a way to get the ordered lists of indices and columns from a given data frame (for example, after applying a boolean operation)? To be clear, I want the names of the indices and columns, not just their integer locations.

This is the closest I've come up with, although it's a bit contorted:

In [137]: df = pandas.DataFrame({"a":range(3), "b":range(10,13), "c":range(20,23)}, index=list("ABC"))

In [138]: df
Out[138]: 
   a   b   c
A  0  10  20
B  1  11  21
C  2  12  22

In [139]: df % 3 == 0
Out[139]: 
       a      b      c
A   True  False  False
B  False  False   True
C  False   True  False

In [140]: numpy.where(df % 3 == 0)
Out[140]: (array([0, 1, 2]), array([0, 2, 1]))

In [141]: iindices, icolumns = numpy.where(df % 3 == 0)

In [142]: indices = df.index[iindices]

In [143]: columns = df.columns[icolumns]

The result I'm looking for:

In [144]: indices, columns
Out[144]: 
(Index([u'A', u'B', u'C'], dtype='object'),
 Index([u'a', u'c', u'b'], dtype='object'))

Alternate form that's easier to look at by eye:

In [145]: zip(indices, columns)
Out[145]: [('A', 'a'), ('B', 'c'), ('C', 'b')]

(h/t Python - find integer index of rows with NaN in pandas)

smci
  • 32,567
  • 20
  • 113
  • 146
Noah
  • 21,451
  • 8
  • 63
  • 71
  • Here's a similar questions (with a similar response as DSM's below: http://stackoverflow.com/questions/20935264/finding-bogus-data-in-a-pandas-dataframe/20936428#20936428 – Paul H Jan 22 '14 at 00:41
  • This is like [COO format](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)): represent the dataframe(/matrix) as list of tuples of `(row-name, column-name, value)` except here you don't need `value`. Can you please edit your title to be more accurate? – smci Dec 19 '17 at 00:15

1 Answers1

4

How about:

>>> s = df.stack()
>>> s[s % 3 == 0].index.tolist()
[('A', 'a'), ('B', 'c'), ('C', 'b')]

Step-by-step, first we stack:

>>> s = df.stack()
>>> s
A  a     0
   b    10
   c    20
B  a     1
   b    11
   c    21
C  a     2
   b    12
   c    22
dtype: int64

Select:

>>> s % 3 == 0
A  a     True
   b    False
   c    False
B  a    False
   b    False
   c     True
C  a    False
   b     True
   c    False
dtype: bool

Use this to filter the series:

>>> s[s % 3 == 0]
A  a     0
B  c    21
C  b    12
dtype: int64

Get the index:

>>> s[s % 3 == 0].index
MultiIndex(levels=[[u'A', u'B', u'C'], [u'a', u'b', u'c']],
           labels=[[0, 1, 2], [0, 2, 1]])

And the values we're looking for:

>>> s[s % 3 == 0].index.tolist()
[('A', 'a'), ('B', 'c'), ('C', 'b')]
DSM
  • 342,061
  • 65
  • 592
  • 494