First of all, I checked these previous posts, and did not help me. 1 & 2 & 3
I have this string (or a similar case could be) that need to be handled with regex:
"Text Table 6-2: Management of children study and actions"
- What I am supposed to do is detect the word Table and the word(s) before if existed
- detect the numbers following and they can be in this format:
6 or 6-2 or 66-22 or 66-2
- Finally the rest of the string (in this case: Management of children study and actions)
After doing so, the return value must be like this:
return 1 and 2 as one string, the rest as another string
e.g. returned value must look like this: Text Table 6-2, Management of children study and actions
Below is my code:
mystr = "Text Table 6-2: Management of children study and actions"
if re.match("([a-zA-Z0-9]+[ ])?(figure|list|table|Figure|List|Table)[ ][0-9]([-][0-9]+)?", mystr):
print("True matched")
parts_of_title = re.search("([a-zA-Z0-9]+[ ])?(figure|list|table|Figure|List|Table)[ ][0-9]([-][0-9]+)?", mystr)
print(parts_of_title)
print(" ".join(parts_of_title.group().split()[0:3]), parts_of_title.group().split()[-1])
The first requirement is returned true as should be but the second doesn't so, I changed the code and used compile
but the regex
functionality changed, the code is like this:
mystr = "Text Table 6-2: Management of children study and actions"
if re.match("([a-zA-Z0-9]+[ ])?(figure|list|table|Figure|List|Table)[ ][0-9]([-][0-9]+)?", mystr):
print("True matched")
parts_of_title = re.compile("([a-zA-Z0-9]+[ ])?(figure|list|table|Figure|List|Table)[ ][0-9]([-][0-9]+)?").split(mystr)
print(parts_of_title)
Output:
True matched
['', 'Text ', 'Table', '-2', ':\tManagement of children study and actions']
So based on this, how I can achieve this and stick to a clean and readable code? and why does using compile
change the matching?