1

I'm trying to reconstruct an example string from a given regular expression

test_re = r'\s([0-9A-Z]+\w*)\s+\S*[Aa]lloy\s'

However, the code below only gives ['1AZabc']

import re 
txt = " 1AZabc sdfsdfAlloy "
test_re = r'\s([0-9A-Z]+\w*)\s+\S*[Aa]lloy\s'
# test_re = r'\s+\S*[Aa]lloy\s'
x = re.findall(test_re,txt)
print(x)

Why the contents after the space (for matching the \s+) is not captured by re? What is a simple and valid example string that matches the text_re?

wovano
  • 4,543
  • 5
  • 22
  • 49
meTchaikovsky
  • 7,478
  • 2
  • 15
  • 34

1 Answers1

2

Your code works and finds all - you just misunderstand regex GROUPs and its usage when calling findall:

# code partially generated by regex101.com to demonstrate the issue
# see  https://regex101.com/r/Gngy0r/1

import re

regex = r"\s([0-9A-Z]+\w*)\s+\S*?[Aa]lloy\s"

test_str = " 1AZabc sdfsdfAlloy "

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# use findall and print its results
print(re.findall(regex, test_str))

Output:

# full match that you got 
Match 1 was found at 0-20:  1AZabc sdfsdfAlloy 
# and what was captured
Group 1 found at 1-7: 1AZabc

# findall only gives you the groups ...
['1AZabc']

Either remove the ( ) or put all into () that you are interested in:

regex = r"\s([0-9A-Z]+\w*\s+\S*?[Aa]lloy)\s"
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69