Trying to solve a problem which I know I can solve through iterating through the string but with python I'm sure there is a regex expression that would solve it more elegantly... it feels like giving up resorting to an iterative process!
Basically I have a list in a single cell of properties and I need to work out which properties are subproperties and which ones are subsubproperties and match them to the property that they are under. For example:
ID=11669 Antam Laterite Nickel/Ferronickel Operation
ID=19807 Gebe Laterite Nickel Mine
ID=19808 Gee Island Laterite Nickel Mine
ID=18923 Mornopo Laterite Nickel Mine
ID=29411 Pomalaa Ferronickel Smelter
ID=19806 Pomalaa Laterite Nickel Mine
ID=29412 Maniang Laterite Nickel Project
ID=11665 Southeast Sulawesi Laterite Nickel Project
ID=27877 Bahubulu Laterite Nickel Deposit
Should generate:
MasterProp, SubProp
11669, 19807
11669, 19808
11669, 18923
11669, 29411
11669, 19806
19806, 29412
11669, 11665
11665, 27877
Getting the 11669 and the second level is easy - just grab the first ID I find and then add to all the rest. But getting the "3rd level" is a lot harder
I tried the following
tags = re.compile('ID=(\d+).+(\ \;){8}')
for tag, space in tags.findall(str(cell)):
print tag
But that gives me the first ID that is before 8 spaces rather than the last ID before 8 spaces... so in the example above I get 11669
rather than 19806
. I suspect there is an expression I can put in that says find an ID=(\d+)
where there are no other ID=(\d+)
between it and the 8 spaces, but that has proven beyond my (novice) capabilities! Any help would be welcomed...