1

I need help developing a robust regular expression targeted at key value pairs written in fortran syntax that follow the format of:

name = values* name=values* ...

Example

The string:

name="my name is" multipleValues = 0.543 0.754 1.166 multipleValues(2) = 'value' "Value2" 4.76454 100 single(1) = 10 single(2)=3.589  boolean = .True. .F. ! comment to mess things up

Should be split up into:

(name, "my name is"),
(multipleValues, [0.543, 0.754, 1.166])
(multipleValues(2), ['value', "Value2", 4.76454, 100])
(single(1), 10)
(single(2), 3.589)
(boolean, [.True., .F.])

Tried

Using the regex from this question sort of works:

"((?:\"[^\"]*\"|[^=,])*)=((?:\"[^\"]*\"|[^=,])*)"

however it includes all the text after an equals sign in the value list:

>>> re.findall('((?:\"[^\"]*\"|[^=,])*)=((?:\"[^\"]*\"|[^=,])*)', testStr)
[('name', "'my name is' multipleValues "), ('', ' 0.543 0.754 1.166 multipleValues(2) '), ('', " 'value' 'Value2' 4.76454 100 single(1) "), ('', ' 10 single(2)'), ('', '3.589  boolean '), ('', ' .True. .F. ! comment to mess things up')]

Maybe need a look behind?

Note: The solution does not need to be a single expression.

Community
  • 1
  • 1
Onlyjus
  • 5,799
  • 5
  • 30
  • 42

1 Answers1

3

Well you can use the following to get the key and a string containing all of its values

(\w+(?:\(\d+\))?)\s*=\s*(.*?)(?=(!|$|\w+(\(\d+\))?\s*=))

Group 1 is the key, Group 2 is all of its values combined. RegExr Example.

Python:

Use that regex, then split Group 2 on suitable spaces.

>>> matches = re.findall(r'(\w+(?:\(\d+\))?)\s*=\s*(.*?)(?=(!|$|\w+(\(\d+\))?\s*=))', testStr)
>>> keyval = {}
>>> for match in matches:
>>>     vals = match[1].strip()
>>>     keyval[match[0]] = re.split(r' (?![A-Za-z])', vals) 

Output:

{
    'name': ['"my name is"'], 
    'single(1)': ['10'], 
    'single(2)': ['3.589'], 
    'multipleValues': ['0.543', '0.754', '1.166'], 
    'boolean': ['.True.', '.F.'], 
    'multipleValues(2)': ["'value'", '"Value2"', '4.76454', '100']
}
OGHaza
  • 4,795
  • 7
  • 23
  • 29
  • Nice! This is exactly what I needed. One last edit though, either `vale should be vals` or `vals should be vale` ;). – Onlyjus Dec 03 '13 at 17:16