1

I'm trying to parse a document structure like this:

Headline
c=myClass1 myClass2 myClass3

Some text plus a number3gr
More text plus another number2cm
More text plus another number2.2m

I have a regular expression that is capturing the important parts into groups:

/(.*)[\r\n]c=(.*)[\r\n]*([a-zA-Z\s]*)(\d*\.?\d*)(\w*)[\r\n]/g

Later I'm using the groups to build a html-string:

'<xmp><!--begin recipe--\><h2>$1</h2><div class="$2"><div class="serves">Serves: <input type="text" class="servesinput" value="2" size="3"></div><span class="oldMulti">2</span></br><table class="ingredients"><tr><th>Amount:</th><th>Ingredient:</th></tr><tr><td class="amount $5 ">$4</td><td>$3</td></tr></div></xmp>'

This is where I am stuck: after the empty line, there can be any number of lines like these:

 Some text plus a number3gr

Is there a way to re-use this part of my reg exp as many times as necessary (as many times as there are those type of rows):

([a-zA-Z\s]*)(\d*\.?\d*)(\w*)[\r\n]

Maybe I can make use of subgroups? But then I have no idea how to repeat the results inside the html-string.

Antti
  • 313
  • 3
  • 11
  • pro tip: http://debuggex.com and http://regex101.com. – Joeytje50 Dec 31 '14 at 14:51
  • My brain is stuck on how to use these subgroups inside the html string so that the relevant html is repeated as many times as necessary. – Antti Dec 31 '14 at 15:02
  • Of course you can repeat any part using quantifiers (what you do already). See if [this sample helps you](https://regex101.com/r/iR3gA4/1). – Jonny 5 Dec 31 '14 at 15:09
  • @Jonny5 however if you actually try that in JavaScript with `.exec()`, you'll see that `.exec()` pays no attention to the "g" flag. If you match with `String.prototype.match()` instead, then the "g" flag causes the function to return an array containing the *complete* matches, so again the "g" flag doesn't give you the groups. – Pointy Dec 31 '14 at 15:27
  • 1
    You cannot use an unknown amount of capturing groups. You'll have to use a separate regex for the second part, and merely parse the tokens separately into an array or something. Regexes are not the best or most efficient way to do what you're trying to do though. – mbomb007 Dec 31 '14 at 15:33
  • @mbomb007 What would be a more efficient way? – Antti Dec 31 '14 at 15:47
  • Iterate through the file's text line by line. Then iterate through each line's tokens. – mbomb007 Dec 31 '14 at 15:49
  • 1
    @mbomb007 If you want to make an answer pointing out what I'm asking for is impossible, I can accept it as the correct answer. It would be awesome if you could throw me a bone on how to do this iteration (but up to you)! – Antti Dec 31 '14 at 15:54

1 Answers1

0

For information on capturing a repeated group: http://www.regular-expressions.info/captureall.html

For a more efficient way, I'd try parsing the file line by line manually, since regular expressions can be quite inefficient.

Once you have the text (see here for example:) How can you read a file line by line in JavaScript?

I would split into lines (an array) per the example and iterate through them in a for loop.

var headline = "";
var classes = [];
var lineList = [];
var line;
var count = 0;

headline = lines[0];
classes = lines[1].split(" ");
classes[0] = classes[0].substring(2); // cut off "c=" in first token

for (line in lines) {
    if (count > 2) {
        // line is after the blank line
        // do something
    }
    count += 1;
}
Community
  • 1
  • 1
mbomb007
  • 3,788
  • 3
  • 39
  • 68
  • Quick question: is "lines" a built in javascript function? I can't find any info on it. Or did you type in "lines" by mistake? Should it be "lineList"? – Antti Dec 31 '14 at 16:50
  • I'm assuming you put the lines from the file into that variable. Look at the example a linked to. – mbomb007 Dec 31 '14 at 17:03