I have a bunch of code in a text file on my computer. I'm interested in two different types of codes in the file. They are:
<string>objectiwant1 <string2>objectiwant2</string2></string>
and
<string>objectiwant1 </string>
The first one would return [(objectiwant1, objectiwant2)] (with more tuples if they exist) while the second one would return [(objectiwant1, None)].
I'm trying to create a regular expression and the flawed code I have so far looks something like this:
regularexpression = r'<string>(.*) <string2>(.*)</string2>'
I'm using "re.findall(regularexpression, file)" to return the data. Which returns what I want only if both string and string2 are used. Using:
regularexpression = r'<string>(.*) (<string2>(.*)</string2>)|(</string>)
Returns everything within the larger parentheses, sometimes twice (as opposed to only the data within (.*), which are necessary to seperate the statements I want to compare with the OR operator.
I'm wondering whether or not there is something I could use to separate the parenthesis which wouldn't cause re.findall to output data twice and output so much data at once.
I'm also wondering whether there is a way to use regex to output data if a statement is not fulfilled (so if the objectiwant2 doesn't exist, I get to choose what the output is).
Thank you in advance.