3

I have the following series in pandas which I would like to scan with regex:

Sample data:

s1 = pd.Series(['PQ4Y-ab56-kj23', 'dont-pick-this', '23', 'dont-pick-these', np.NaN])

The objective is to identify JUST the first element in the series with regex

I have the following regex as a starter for ten, but it is identifying the first, second and third in the series.

s1.str.match('\w{4}-\w{4}-\w{4}')

0     True
1     True
2    False
3     True
4      NaN

I need the regex to be modified to take account of two factors which are causing problems:

  1. It needs to exclusively identify four character strings - not a string that contains four or more characters
  2. It needs to match only four character strings that contain BOTH letters and numbers and NOT select strings that are exclusively letters A-Z (or lowercase a-z)

Many thanks

Michael K
  • 355
  • 3
  • 9

0 Answers0