0

I have a bunch of strings that I need to clean, and it has the following patterns:

12345SNET1
1234567SNETA2
123456SNET3

The headache is that, anything after SNET could be any integer from 0 to 9, and it could also be a char from A-Z plus an integer from 0 to 9.

Is there anyway to use regex to detect if the string has this pattern so I could use:

if regex detect (returns True):
    str = str[:-1]

elif regex detect (returns True):
    str = str[:-2]
ACuriousCat
  • 1,003
  • 1
  • 8
  • 21

3 Answers3

2
  • You can use re.fullmatch for checking (return True is the given string matches totaly the regex) with basic regex like .*SNET\d and .*SNET[A-Z]\d , also don't use str as variable name, it's a built-in word

    if re.fullmatch(r".*SNET\d", value):
        value = value[:-1]
    
    if re.fullmatch(r".*SNET[A-Z]\d", value):
        value = value[:-2]
    
  • You can directly use re.sub to replace the end

    value = re.sub(r"(?<=SNET)[A-Z]?\d", "", value)
    

For use, you can export this in a method

def clean(value):
    if re.fullmatch(r".*SNET\d", value):
        return value[:-1]
    if re.fullmatch(r".*SNET[A-Z]\d", value):
        return value[:-2]
    return value

# OR
def clean(value):
    return re.sub(r"(?<=SNET)[A-Z]?\d", "", value)


if __name__ == '__main__':
    values = ["12345SNET1", "1234567SNETA2", "123456SNET3"]
    print(values)  # ['12345SNET1', '1234567SNETA2', '123456SNET3']
    values = list(map(clean, values))
    print(values)  # ['12345SNET', '1234567SNET', '123456SNET']
azro
  • 53,056
  • 7
  • 34
  • 70
2

You might use re.sub combined with positive lookbehind to jettison unwanted characters following way:

import re
s1 = "12345SNET1"
s2 = "1234567SNETA2"
s3 = "123456SNET3"
out1 = re.sub(r"(?<=SNET)[A-Z]?\d", "", s1)
out2 = re.sub(r"(?<=SNET)[A-Z]?\d", "", s2)
out3 = re.sub(r"(?<=SNET)[A-Z]?\d", "", s3)
print(out1)  # 12345SNET
print(out2)  # 1234567SNET
print(out3)  # 123456SNET
Daweo
  • 31,313
  • 3
  • 12
  • 25
1

You don't need to have two cases if you use the right regular expression.

values = ["12345SNET1", "1234567SNETA2", "123456SNET3"]
for value in values:
    m = re.match(r'\d+SNET([A-Z]?\d)', value)
    if m:
        print(m.group(1))

This will print

1
A2
3

If you want the text before the last character(s) you can add extra parentheses in the regular expression to catch that part:

values = ["12345SNET1", "1234567SNETA2", "123456SNET3"]
for value in values:
    m = re.match(r'(\d+SNET)([A-Z]?\d)', value)
    if m:
        print(m.group(1))

Result

12345SNET
1234567SNET
123456SNET
Matthias
  • 12,873
  • 6
  • 42
  • 48