I have a dataframe like this
import pandas as pd
df = pd.DataFrame({'a': ['abc', 'r00001', 'r00010', 'rfoo', 'r01234', 'r1234'], 'b': range(6)})
a b
0 abc 0
1 r00001 1
2 r00010 2
3 rfoo 3
4 r01234 4
5 r1234 5
I now want to select all columns of this dataframe where the entries in column a
start with r
followed by five numbers.
From here I learned how one would do this if it started just with r
without the numbers:
print df.loc[df['a'].str.startswith('r'), :]
a b
1 r00001 1
2 r00010 2
3 rfoo 3
4 r01234 4
5 r1234 5
Something like this
print df.loc[df['a'].str.startswith(r'[r]\d{5}'), :]
does of course not work. How would one do this properly?