1

I am parsing some data...looks like this

Fourier analysis for v(1):
  No. Harmonics: 20, THD: 24.6928 %, Gridsize: 200, Interpolation Degree: 1

Harmonic Frequency    Magnitude    Phase        Norm. Mag    Norm. Phase 
-------- ---------    ---------    -----        ---------    ----------- 
 0       0            -1.4108e-005 0            0            0           
 1       100          1.81678      179.986      1            0           
 2       200          2.67431e-005 -89.68       1.472e-005   -269.67     
 3       300          0.374737     179.937      0.206264     -0.049661   
 4       400          2.57338e-005 -89.357      1.41645e-005 -269.34     
 5       500          0.185804     179.876      0.102271     -0.1108     
 6       600          2.46676e-005 -89.033      1.35777e-005 -269.02     
 7       700          0.112225     179.799      0.0617716    -0.18748    
 8       800          2.37755e-005 -88.71       1.30866e-005 -268.7      
 9       900          0.0757484    179.708      0.0416937    -0.27803    
 10      1000         2.31014e-005 -88.392      1.27156e-005 -268.38     
 11      1100         0.0558207    179.611      0.0307251    -0.37527    
 12      1200         2.25406e-005 -88.082      1.24069e-005 -268.07     
 13      1300         0.0439558    179.513      0.0241943    -0.47325    
 14      1400         2.19768e-005 -87.779      1.20966e-005 -267.77     
 15      1500         0.0362049    179.416      0.019928     -0.5704     
 16      1600         2.13218e-005 -87.483      1.1736e-005  -267.47     
 17      1700         0.0305653    179.316      0.0168239    -0.67046    
 18      1800         2.0553e-005  -87.194      1.13128e-005 -267.18     
 19      1900         0.0260612    179.207      0.0143447    -0.77967

There are several places where we have some float data. I can make a regular expression for the float, here is a part of the re for the line.

(?P<Magnitude>[-+]?(?:(?:\d*\.\d+)|(?:\d+\.?))(?:[Ee][+-]?\d+)?)

This part [-+]?(?:(?:\d*\.\d+)|(?:\d+\.?))(?:[Ee][+-]?\d+)? is pretty complicated, so it would be nice if there was some way to name this and reuse it...sort of like if you could name our own meta-character and replace that with \f+ or something (i.e. if this was the meta-character for float, which it is not...). By they way, I got this from this question.

So I am looking for a good approach for containing this complexity. I could probably use string concatenation or formatting on the pattern string, but I am wondering if there is some better way. Maybe I am missing something obvious.

Here is the unwieldy expression

re.compile(r"^\s*(?P<Harmonic>\d+)\s+(?P<Frequency>\d+)\s+(?P<Magnitude>[-+]?(?:(?:\d*\.\d+)|(?:\d+\.?))(?:[Ee][+-]?\d+)?)\s+(?P<Phase>[-+]?(?:(?:\d*\.\d+)|(?:\d+\.?))(?:[Ee][+-]?\d+)?)\s+(?P<NormMag>[-+]?(?:(?:\d*\.\d+)|(?:\d+\.?))(?:[Ee][+-]?\d+)?)\s+(?P<MormPhase>[-+]?(?:(?:\d*\.\d+)|(?:\d+\.?))(?:[Ee][+-]?\d+)?)\s+$", re.MULTILINE)
Community
  • 1
  • 1
ChipJust
  • 1,376
  • 12
  • 20
  • 1
    I think [this is related](http://stackoverflow.com/questions/19794603/reuse-part-of-a-regex-pattern).. – alecxe Nov 01 '16 at 04:03
  • 1
    By the way, think about using the *verbose mode* and splitting the complex regex into multiple lines to increase readability. – alecxe Nov 01 '16 at 04:04

1 Answers1

1

For these kinds of larger regexes with repeated parts, I tend to have a list of chunks of regex, then combine them all into one programmatically.

So in JavaScript:

var string = /"[^"]+(?:\\"[^"]+)+"/.source,
    float = /\d+\.\d+/.source;

var string_and_int = new RegExp(float + "|(?:" + int + ")?" + string + "|" + string + float, "g");

A bit rudimentary, but you get the idea. This can make it way easier to reuse bits of regex code, and arrange it in a somewhat more readable fashion.

Whothehellisthat
  • 2,072
  • 1
  • 14
  • 14