-1

I learned that indexOf() could not be used for searching the regular expression in the string, however search() has not the start position and the end position as the optional parameters. How can I find and replace all certain regular expression in the same string? I added the problem where it is no so simple as replace() will be enough.

Problem example

  1. Replace all consecutive two <br/><br/> with </p><p>, if after second <br/> some letters or digits (\w) are following.
  2. Leave all single <br/> of three or more consecutive <br/> such as.
  3. If there are no letter or digits after consecutive two <br/><br/>, leave it such as.

If we use replace() for solving this problem, not only <br/><br/>, but also following symbols will be replaced. To evade it, we need:

  1. Find the start of matching with regular expression. It will be /(?:<br\s*[\/]?>){2}\s*\w+/.
  2. From the start of matching position, find the start position of \w part.
  3. Replace the /(?:<br\s*[\/]?>){2}\s*/ part with </p><p>.
  4. Repeat 1-3 inside the loop from the end of the previous matching position util next matches exists.

As I told above, I don't know how to search the new matching from the certain position. Is there some ways except slice the string and join it again?

var testString = $('#container').html();
console.log(testString.search(/(?:<br\s*[\/]?>){2}\s*\w+/));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="container">
  <p> 
    <!-- Only one br: leave such as --> 
    Brick quiz whangs jumpy veldt fox! <br/>

    <!-- Two br and letters then: replace by </p><p> --> 
    Sphinx of black quartz judge my vow! <br/><br />

    <!-- No symbols after 2nd br: leave such as --> 
    Pack my box with five dozen liquor jugs. <br/><br /><br/>

    <!-- Two br and symbols then: replace by </p><p> --> 
    The vixen jumped quickly on her foe barking with zeal. <br/><br />

    <!-- No letters after <br/><br/>: leave such as --> 
    Brawny gods just flocked up to quiz and vex him.<br/><br />
  <p>
</div>
Takeshi Tokugawa YD
  • 670
  • 5
  • 40
  • 124
  • 2
    matching html with reg exp is normally a bad idea... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – epascarello May 09 '17 at 01:43
  • It's interesting. Well, how the HTML validators or HTML code inspection are working? – Takeshi Tokugawa YD May 09 '17 at 01:53
  • @GurebuBokofu They use HTML parsers, for the simple reason that it is theoretically impossible, in the computer science sense, to reliably parse HTML with regexp. Which is not to say that it *might* be an option in your case. With regard to your specific problem, have you considered using lookaheads? –  May 09 '17 at 02:28
  • @torazaburo I don't familiar with syntax analysis for now, but looks like it's necessary to study this. – Takeshi Tokugawa YD May 09 '17 at 02:44
  • O'K, even if I should not to match HTML with regexp, will the solution of this problem become easier, if instead `

    ` will be `\n\n`?
    – Takeshi Tokugawa YD May 09 '17 at 02:47
  • See https://jsfiddle.net/7wsvvof1/. – Wiktor Stribiżew May 09 '17 at 06:36

1 Answers1

1

As commented by @epascarello and @torazaburo its NOT recommended to use RegExp for parsing HTML and you should better use HTML parsers to be on safer side.

But if your HTML string that you want to parse is going to use a fixed template / format, you can still use RegExp for parsing it.

Assuming the current RegExp that you have posted returns expected search results for you, you can try following code to replace the string and use </p><p> as required.

var testString = $('#container').html();

console.log(testString.replace(/(?:<br\s*[\/]?>){2}(\s*\w+)/gi, '</p><p>$1'));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="container">
  <p> 
    Brick quiz whangs jumpy veldt fox! <br/>
    Sphinx of black quartz judge my vow! <br/><br />
    Pack my box with five dozen liquor jugs. <br/><br /><br/>
    The vixen jumped quickly on her foe barking with zeal. <br/><br />
    Brawny gods just flocked up to quiz and vex him.<br/><br />
  </p>
</div>

Note:

  1. I've kept your RegExp as is assuming it finds the <br> tags as per your requirement, and just added the () around \s*\w+ because we want to remember (keep) that string in the output
  2. I've used gi flags in the RegExp. You can find details here
  3. $1 in replace string will use the remembered string which was matched by \s*\w+
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Vivek Athalye
  • 2,974
  • 2
  • 23
  • 32
  • Magic! Very simple solution! I didn't know about keep string in the output possibility. Thank you very much for your time! – Takeshi Tokugawa YD May 09 '17 at 06:47
  • @GurebuBokofu: However, it does not check if there is a third `
    ` before the matched 2. The solution is not correct as per the requirements.
    – Wiktor Stribiżew May 09 '17 at 07:29
  • @Wiktor Stribiżew: Really? After second `
    ` has been checked, `\s*\w+` will be checked, is't it? `<` from `
    ` is not either letter or digit, so it is not `\w`. What have I missed?
    – Takeshi Tokugawa YD May 09 '17 at 07:44
  • @GurebuBokofu: See [this regex demo](https://regex101.com/r/WP0grl/1). You think `{2}` is checking the context - but it is not. The comments in your HTML must be removed for a proper testing, Vivek cheated a bit. – Wiktor Stribiżew May 09 '17 at 07:47
  • @ Wiktor Stribiżew: I see. In my real project, it could be simply solved, however in this example problem I did not think what must be before double `
    `. I keep Viveks solution accepted until someone will not write the perfect solution. Thanks you for the attention.
    – Takeshi Tokugawa YD May 09 '17 at 08:03
  • @WiktorStribiżew I had clearly mentioned that I'm `Assuming the current RegExp that you have posted returns expected search results for you, you can try following code to replace the string and use

    as required.` So focus of my answer was on showing OP how to remember some part of matched string and using it in replace function and also replacing all the occurrences at one go using `/g` flag. And just to demo how multiple replacements work I had to change the test data as per the search criteria used by OP. Not sure how I `cheated a bit`.

    – Vivek Athalye May 10 '17 at 08:53
  • Your code replaced 2 `
    ` after 1 or more `
    `, thatis how you "cheated". Now, it produces `Pack my box with five dozen liquor jugs.

    `.

    – Wiktor Stribiżew May 10 '17 at 08:57
  • @WiktorStribiżew Ok. Now I understand what you mean. But as I said my focus was not on fixing the RegExp used by OP but to show him how the replacement can be done. I don't prefer and don't recommend playing with RegExp to process HTML anyway. So I didn't find it worth spending time on validating / fixing the RegExp used by OP. – Vivek Athalye May 10 '17 at 09:03