8

I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...].

I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase. How can this be done?

I looked into the df.filter() with the regex argument, but it fails and I get:

df.filter(regex='a')
TypeError: expected string or buffer

or:

df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern

I've tried other things such as passing re.compile('a') to no avail.

rypel
  • 4,686
  • 2
  • 25
  • 36
Shatnerz
  • 2,353
  • 3
  • 27
  • 43

3 Answers3

9

So it looks like part of my problem with filter was that I was using an outdated version of pandas. After updating I no longer get the TypeError. After some playing around, it looks like I can use filter to fit my needs. Here is what I found out.

Simply setting df.filter(regex='string') will return the columns which match the regex. This looks to do the same as df.filter(regex='string', axis=1).

To search the index, I simply need to do df.filter(regex='string', axis=0)

Shatnerz
  • 2,353
  • 3
  • 27
  • 43
5

Maybe try a different approach by using list comprehension and .ix:

import pandas as pd

df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])

df.ix[[x for x in df.index if x=='Bob']]

df.ix[[x for x in df.index if x[0]=='A']]

df.ix[[x for x in df.index if x.islower()]]
Ezer K
  • 3,637
  • 3
  • 18
  • 34
  • Thanks this answers what I was asking. Any idea if anyone uses `df.filter`? It would be nice to see some examples. This is nice, but then I need to separately handle searching the columns making my code less concise – Shatnerz Feb 26 '16 at 13:58
2

How about using pandas.Series.str.contains(). The function works in both series and index if your index is confined to the string. Boolean for non-string becomes nan.

import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
mask = df.index.str.contains(rf"^A")
columns = df.index[mask]  # columns = Index(['Andrew'], dtype='object')
MgAl2O4
  • 31
  • 3