I have the following series in pandas which I would like to scan with regex:
Sample data:
s1 = pd.Series(['PQ4Y-ab56-kj23', 'dont-pick-this', '23', 'dont-pick-these', np.NaN])
The objective is to identify JUST the first element in the series with regex
I have the following regex as a starter for ten, but it is identifying the first, second and third in the series.
s1.str.match('\w{4}-\w{4}-\w{4}')
0 True
1 True
2 False
3 True
4 NaN
I need the regex to be modified to take account of two factors which are causing problems:
- It needs to exclusively identify four character strings - not a string that contains four or more characters
- It needs to match only four character strings that contain BOTH letters and numbers and NOT select strings that are exclusively letters A-Z (or lowercase a-z)
Many thanks