pandas regex match must be numbers and text

Question

I have the following series in pandas which I would like to scan with regex:

Sample data:

s1 = pd.Series(['PQ4Y-ab56-kj23', 'dont-pick-this', '23', 'dont-pick-these', np.NaN])

The objective is to identify JUST the first element in the series with regex

I have the following regex as a starter for ten, but it is identifying the first, second and third in the series.

s1.str.match('\w{4}-\w{4}-\w{4}')

0     True
1     True
2    False
3     True
4      NaN

I need the regex to be modified to take account of two factors which are causing problems:

It needs to exclusively identify four character strings - not a string that contains four or more characters
It needs to match only four character strings that contain BOTH letters and numbers and NOT select strings that are exclusively letters A-Z (or lowercase a-z)

Many thanks

`s1.str.match('^(?=[^A-Za-z]*[A-Za-z])(?=\D*\d)\w{4}-\w{4}-\w{4}$')` - make sure there is at least 1 letter and at least 1 digit — Wiktor Stribiżew, May 19 '20 at 12:51
@WiktorStribiżew won't match either `'4ACE-ZXYZ-1a2b'` or `'4ABCD-ZXYZ-1a2b'` — Quang Hoang, May 19 '20 at 12:53
@QuangHoang `4ABCD-ZXYZ-1a2b` should not be matched, it has 5 chars at the start. `4ACE-ZXYZ-1a2b` [will be matched well](https://regex101.com/r/D56Usk/1). — Wiktor Stribiżew, May 19 '20 at 12:55
The second string, my mistake. The first should not be match as `ZXYZ` does not contain a digit. — Quang Hoang, May 19 '20 at 12:59
thanks Wiktor - works great. And I can extend to other w{} types. Brilliant! — Michael K, May 19 '20 at 15:28

0 Answers0