Question about matching RE in a complicated form

Question

How can I match a word using RE in the following format: Letter number Alphanumeric dot(.) Alphanumeric{0-4}

Examples:

A24.L
A2F.L9
A2F.LG4

This is what I've come up with so far:

answer=re.findall(r'[A-Za-z]\d\w\.\w{0-4})

“*is it correct?*” I’m not understanding, have you not tested it yet? Why not? — esqew, Oct 04 '22 at 04:34
Search "test regex" for sites that can help you with this. Or just try it out in Python. There's no need for a question as you've posed it. — Kraigolas, Oct 04 '22 at 04:36
A common beginner error is forgetting word boundaries. Your regex will match any substring in a longer string; I'm guessing that's not what you want. For example, it will pick out "a24.exam" from "ba24.example.com" — tripleee, Oct 04 '22 at 04:40

score 0 · Answer 1 · answered Oct 04 '22 at 07:43

As you are using re.findall, I assume you are looking for partial matches inside longer text. Bearing that in mind, you need to fix the following:

\w matches not only alphanumeric, but also a _ char
{0-4} is not a valid limiting ("range", or "interval") quantifier, it has a {min,max} syntax (note that the min value should not be omitted, although some regex engines allow that with 0 value used as default, but there are regex engines that either do not support or that do not work correctly with this omitting)
In Python 3, \d matches any Unicode digit (like ٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙０１２３４５６７８９), so you probably want to use (?a) inline modifier (to only match ASCII digits) or an explicit [0-9].

So, you can use

answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{1,4}\b', text)

if the alphanumeric after . is obligatory, and the following if the match can end in a dot:

answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{0,4}(?<!\w\B)', text)

Details:

\b - word boundary
[A-Za-z] - a letter
[0-9] - an ASCII digit
[A-Za-z0-9] - an ASCII alphanumeric
\. - a . char
[A-Za-z0-9]{1,4}\b - one to four alphanumeric chars at the word boundary.

The second regex does not contain a word boundary at the end since the match is supposed to be able to end in a . (that is not a word char). The (?<!\w\B) is a right-hand dynamic word boundary that only requires a non-word char or end position if the preceding char is a word char.

See the regex demo.

Question about matching RE in a complicated form

1 Answers1