How can I match a word using RE in the following format: Letter number Alphanumeric dot(.) Alphanumeric{0-4}
Examples:
A24.L
A2F.L9
A2F.LG4
This is what I've come up with so far:
answer=re.findall(r'[A-Za-z]\d\w\.\w{0-4})
How can I match a word using RE in the following format: Letter number Alphanumeric dot(.) Alphanumeric{0-4}
Examples:
A24.L
A2F.L9
A2F.LG4
This is what I've come up with so far:
answer=re.findall(r'[A-Za-z]\d\w\.\w{0-4})
As you are using re.findall
, I assume you are looking for partial matches inside longer text. Bearing that in mind, you need to fix the following:
\w
matches not only alphanumeric, but also a _
char{0-4}
is not a valid limiting ("range", or "interval") quantifier, it has a {min,max}
syntax (note that the min
value should not be omitted, although some regex engines allow that with 0
value used as default, but there are regex engines that either do not support or that do not work correctly with this omitting)\d
matches any Unicode digit (like ٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
), so you probably want to use (?a)
inline modifier (to only match ASCII digits) or an explicit [0-9]
.So, you can use
answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{1,4}\b', text)
if the alphanumeric after .
is obligatory, and the following if the match can end in a dot:
answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{0,4}(?<!\w\B)', text)
Details:
\b
- word boundary[A-Za-z]
- a letter[0-9]
- an ASCII digit[A-Za-z0-9]
- an ASCII alphanumeric\.
- a .
char[A-Za-z0-9]{1,4}\b
- one to four alphanumeric chars at the word boundary.The second regex does not contain a word boundary at the end since the match is supposed to be able to end in a .
(that is not a word char). The (?<!\w\B)
is a right-hand dynamic word boundary that only requires a non-word char or end position if the preceding char is a word char.
See the regex demo.