0

my dataframe, df, contains a set of columns including two like:

'age-15y','age-5y'

i want to apply a filter to the dataframe for the sake of obtaining the columns whose names end in each string, so '5y' and '15y' would be separate.

if i try

    df.filter(regex='5y'+'$')

then i will obtain the columns ending in '15y' as well, which is not what I am after

is there a way to conveniently accomplish what i am after? i was hoping there was a way to use the regex and specify the number of characters that it should apply to

laszlopanaflex
  • 1,836
  • 3
  • 23
  • 34

1 Answers1

0

.filter() with the $ works for me:

df = pd.DataFrame({'age-15y':np.random.choice(['A','B'], 500),
                   'age-5y':np.random.uniform(10,15,500),
                   'age-15y-abc':np.random.uniform(-32,105,500)})

print(df.filter(regex='5y').head(2))
print(df.filter(regex='5y$').head(2))

returns

# without '$'
  age-15y     age-5y  age-15y-abc
0       B  14.044916    -4.875092
1       B  13.271348    28.054364

# with '$'
  age-15y     age-5y
0       B  14.044916
1       B  13.271348
Brendan
  • 3,901
  • 15
  • 23
  • im not sure what this is illustrating - you are still getting the age-15y column which is not the intended result – laszlopanaflex Jul 13 '19 at 18:07
  • @laszlopanaflex In the OP, it seemed like you wanted to filter anything ending in `5y`. If that isn't the case, it would help to be more specific with the question. Saying *I want things that end in `5y`, except for some things that don't* is too vague. What is the rule for which things that end in `5y` are not included in the result? – Brendan Jul 13 '19 at 19:19