0

I have a quote syntax for my users, similar to SO:

So, Mike, you say:

>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
>Nam mi dui, porta non gravida id
>sodales venenatis tellus

But this makes no sense!

There can also be a multiple quotes. I need to translate this into HTML markup like this, using JavaScript:

So, Mike, you say:

<div class="quote">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Nam mi dui, porta non gravida id
    sodales venenatis tellus
</div>

But this makes no sense!

Here is the best that I came up with, but it implements HTML to every line, not the block of lines.

x = x.replace(/^&(amp;)?gt;([^\n]+)$/mg, "<div class=\"quote\"> $2 </div>");

Is it possible to write such a regular expression? If yes, what would it look like?

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
Silver Light
  • 44,202
  • 36
  • 123
  • 164

2 Answers2

1

This is trying to manipulate HTML with a regular expression. (I say that based on the fact you're searching for HTML entities for > rather than literally for >.) That is almost always a bad idea, nearly as bad as trying to parse HTML with just a regular expression. Obligatory Link.

You cannot do this with a single replace call. But to my surprise, you can do it with two, or with a single outer call that uses a function callback to make a bunch of inner calls.

Here's the two-call version:

x = x.replace(/^(?:(?:>|&gt;|&amp;gt;).*?[\r\n]+)+/gm, '<div class="quote">***$&</div>');
x = x.replace(/(?:\*\*\*|^)(?:>|&gt;|&amp;gt;)/gm, '');

The first finds repeated serieses of lines with the quote markers and wraps the div markup around them. (I included a raw > in the set, so it looks for >, &gt;, and &amp;gt; [see note below about that last one, though].) The second removes the quote markers. You can't remove them in the first replace because you're replacing the entire group of lines. Also note that I had to prefix the first marker, since once we've added the div markup, it's not at the beginning of a line anymore.

Here's the one-call-with-subcalls version:

x = x.replace(/^(?:(?:>|&gt;|&amp;gt;).*?[\r\n]+)+/gm, function(m) {
  return '<div class="quote">' + m.replace(/^(?:>|&gt;|&amp;gt;)/gm, '') + '</div>';
});

How robust are they? Probably not very, see the first paragraph of the answer above. :-)

Side note: Your regex seems to be looking for &amp;gt; as a quote marker. I've preserved that above, but if you have &amp;gt; in your HTML, you have double-encoded HTML, which is usually an indicator of a problem elsewhere.

Live Example of the two-calls version with variations for the various different quote markers:

function test(x) {
  snippet.log("Before:");
  snippet.log(x);
  x = processString(x);
  snippet.log("After:");
  snippet.log(x);
  document.body.appendChild(document.createElement('hr'));
}

function processString(x) {
  x = x.replace(/^(?:(?:>|&gt;|&amp;gt;).*?[\r\n]+)+/gm, '<div class="quote">***$&</div>');
  x = x.replace(/(?:\*\*\*|^)(?:>|&gt;|&amp;gt;)/gm, '');
  return x;
}

test(
  "So, Mike, you say:\n" +
  "\n" +
  ">Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
  "&amp;gt;Nam mi dui, porta non gravida id\n" +
  "&gt;sodales venenatis tellus\n" +
  "\n" +
  "But this makes no sense!\n"
);
test(
  "So, Mike, you say:\n" +
  "\n" +
  "&amp;gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
  ">Nam mi dui, porta non gravida id\n" +
  "&gt;sodales venenatis tellus\n" +
  "\n" +
  "But this makes no sense!\n"
);
test(
  "So, Mike, you say:\n" +
  "\n" +
  "&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
  "&amp;gt;Nam mi dui, porta non gravida id\n" +
  ">sodales venenatis tellus\n" +
  "\n" +
  "But this makes no sense!\n"
);
<!-- Script provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>

Live Example of the one-call-with-subcalls version:

function test(x) {
  snippet.log("Before:");
  snippet.log(x);
  x = processString(x);
  snippet.log("After:");
  snippet.log(x);
  document.body.appendChild(document.createElement('hr'));
}

function processString(x) {
  x = x.replace(/^(?:(?:>|&gt;|&amp;gt;).*?[\r\n]+)+/gm, function(m) {
    return '<div class="quote">' + m.replace(/^(?:>|&gt;|&amp;gt;)/gm, '') + '</div>';
  });
  return x;
}

test(
  "So, Mike, you say:\n" +
  "\n" +
  ">Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
  "&amp;gt;Nam mi dui, porta non gravida id\n" +
  "&gt;sodales venenatis tellus\n" +
  "\n" +
  "But this makes no sense!\n"
);
test(
  "So, Mike, you say:\n" +
  "\n" +
  "&amp;gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
  ">Nam mi dui, porta non gravida id\n" +
  "&gt;sodales venenatis tellus\n" +
  "\n" +
  "But this makes no sense!\n"
);
test(
  "So, Mike, you say:\n" +
  "\n" +
  "&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
  "&amp;gt;Nam mi dui, porta non gravida id\n" +
  ">sodales venenatis tellus\n" +
  "\n" +
  "But this makes no sense!\n"
);
<!-- Script provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>
Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • Be careful with `[\r\n]+` since it may match more than one newline, in this case the contiguity of lines starting with `>` will be broken. – Casimir et Hippolyte Sep 03 '15 at 11:02
  • @CasimiretHippolyte: In HTML, a *series* of whitespace is just one bit of whitespace. (Based on what the OP posted, apparently they've already at least partially converted the original text to HTML.) – T.J. Crowder Sep 03 '15 at 11:07
1

Here is how it can be done: match all the block and then inside a callback function, split it and remove the initial > and wrap in div:

var s = 'So, Mike, you say:\n\n>Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n>Nam mi dui, porta non gravida id\n>sodales venenatis tellus\n\nBut this makes no sense!';
var res = s.replace(/^(?:(?:>|&(?:amp;)?gt;)[^\n]+)+/gm, function(m, grp1, grp2, offset, input) {
     return m.split("\n").map(function(el) {
         return '<div class="quote">' + el.replace(/(?:>|&(?:amp;)?gt;)/g,'') + "</div>";
     }).join("\n");
});
alert(res);
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563