0

How can I match a word using RE in the following format: Letter number Alphanumeric dot(.) Alphanumeric{0-4}

Examples:

A24.L
A2F.L9
A2F.LG4

This is what I've come up with so far:

answer=re.findall(r'[A-Za-z]\d\w\.\w{0-4})
Kraigolas
  • 5,121
  • 3
  • 12
  • 37
  • “*is it correct?*” I’m not understanding, have you not tested it yet? Why not? – esqew Oct 04 '22 at 04:34
  • Search "test regex" for sites that can help you with this. Or just try it out in Python. There's no need for a question as you've posed it. – Kraigolas Oct 04 '22 at 04:36
  • A common beginner error is forgetting word boundaries. Your regex will match any substring in a longer string; I'm guessing that's not what you want. For example, it will pick out "a24.exam" from "ba24.example.com" – tripleee Oct 04 '22 at 04:40
  • 1
    `0-4` should be `0,4` – Barmar Oct 04 '22 at 04:44

1 Answers1

0

As you are using re.findall, I assume you are looking for partial matches inside longer text. Bearing that in mind, you need to fix the following:

  • \w matches not only alphanumeric, but also a _ char
  • {0-4} is not a valid limiting ("range", or "interval") quantifier, it has a {min,max} syntax (note that the min value should not be omitted, although some regex engines allow that with 0 value used as default, but there are regex engines that either do not support or that do not work correctly with this omitting)
  • In Python 3, \d matches any Unicode digit (like ٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789), so you probably want to use (?a) inline modifier (to only match ASCII digits) or an explicit [0-9].

So, you can use

answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{1,4}\b', text)

if the alphanumeric after . is obligatory, and the following if the match can end in a dot:

answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{0,4}(?<!\w\B)', text)

Details:

  • \b - word boundary
  • [A-Za-z] - a letter
  • [0-9] - an ASCII digit
  • [A-Za-z0-9] - an ASCII alphanumeric
  • \. - a . char
  • [A-Za-z0-9]{1,4}\b - one to four alphanumeric chars at the word boundary.

The second regex does not contain a word boundary at the end since the match is supposed to be able to end in a . (that is not a word char). The (?<!\w\B) is a right-hand dynamic word boundary that only requires a non-word char or end position if the preceding char is a word char.

See the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563