use regex to filter pandas dataframe

Question

my dataframe, df, contains a set of columns including two like:

'age-15y','age-5y'

i want to apply a filter to the dataframe for the sake of obtaining the columns whose names end in each string, so '5y' and '15y' would be separate.

if i try

    df.filter(regex='5y'+'$')

then i will obtain the columns ending in '15y' as well, which is not what I am after

is there a way to conveniently accomplish what i am after? i was hoping there was a way to use the regex and specify the number of characters that it should apply to

Try without the `+` in your regex -- `df.filter(regex='5y$')` — Brendan, Jul 13 '19 at 17:50
Trey `df.filter(regex='\d{1}y$')`? Or `df.filter(regex='-5y$')` — Erfan, Jul 13 '19 at 17:58
`df.filter(regex=r'\b5y$')` and `df.filter(regex=r'\b15y$')` — Wiktor Stribiżew, Jul 13 '19 at 17:59

score 0 · Answer 1 · answered Jul 13 '19 at 17:58

0

.filter() with the $ works for me:

df = pd.DataFrame({'age-15y':np.random.choice(['A','B'], 500),
                   'age-5y':np.random.uniform(10,15,500),
                   'age-15y-abc':np.random.uniform(-32,105,500)})

print(df.filter(regex='5y').head(2))
print(df.filter(regex='5y$').head(2))

returns

# without '$'
  age-15y     age-5y  age-15y-abc
0       B  14.044916    -4.875092
1       B  13.271348    28.054364

# with '$'
  age-15y     age-5y
0       B  14.044916
1       B  13.271348

answered Jul 13 '19 at 17:58

Brendan

3,901
15
23

im not sure what this is illustrating - you are still getting the age-15y column which is not the intended result – laszlopanaflex Jul 13 '19 at 18:07
@laszlopanaflex In the OP, it seemed like you wanted to filter anything ending in `5y`. If that isn't the case, it would help to be more specific with the question. Saying *I want things that end in `5y`, except for some things that don't* is too vague. What is the rule for which things that end in `5y` are not included in the result? – Brendan Jul 13 '19 at 19:19

use regex to filter pandas dataframe

1 Answers1