Regular expressions: grab part of string

Question

So I got the following input inside my textarea element:

<quote>hey</quote>

what's up?

I want to separate the text between the <quote> and </quote> ( so the result would be 'hey' and nothing else in this case.

I tried with .replace and the following regular expression, but it did not achieve the right result and I can't see why:

quoteContent = value.replace(/<quote>|<\/quote>.*/gi, ''); (the result is 'hey what's up'it doesn't remove the last part, in this case 'what's up', it only removes the quote marks)

Does someone know how to solve this?

in ESNext, with the [`s` mode](https://github.com/tc39/proposal-regexp-dotall-flag). Before that, use `[\s\S]` or similar to match everything. — ASDFGerte, Jun 14 '18 at 19:12
obligatory https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa — epascarello, Jun 14 '18 at 19:15

baao · Answer 1 · 2018-06-14T19:35:27.730

3

Even if it's only a small html snippet, don't use regex to do any html parsing. Instead, take the value, use DOM methods and extract the text from an element. A bit more code, but the better and safer way to do that:

const el = document.getElementById('foo');
const tmp = document.createElement('template');
tmp.innerHTML = el.value;
console.log(tmp.content.querySelector('quote').innerText);

<textarea id="foo">
<quote>hey</quote>

what's up?
</textarea>

edited Jun 14 '18 at 19:35

answered Jun 14 '18 at 19:14

baao

71,625
17
143
203

1

I see, but here it is more some random string I made to create a forum reply instead of an actual element. – tilly Jun 14 '18 at 19:17
1

You're absolutely free to use an regex approach for your html parsing. Just don't wonder if your code breaks later; that's why I posted the correct approach @tilly. Changing the selector in getElementsByTagName is even much easier and more dynamic than the regex selector – baao Jun 14 '18 at 19:18
2

And don't use `.innerHTML` on anything other than a ` – Mike Samuel Jun 14 '18 at 19:27

score 1 · Accepted Answer · answered Jun 14 '18 at 19:13

1

You could also try using the match method:

quoteContent = value.match(/<quote>(.+)<\/quote>/)[1];

answered Jun 14 '18 at 19:13

clarmond

358
1
7

1

you may need the non-greedy modifier: `(.*?)` – Scrimothy Jun 14 '18 at 19:15
Yes, good point. And also probably want to make sure the match is successful before trying to access it. Otherwise it will throw an error. – clarmond Jun 14 '18 at 19:17

score 1 · Answer 3 · answered Jun 14 '18 at 19:19

You should try to avoid parsing HTML using regular expressions.

<quote><!-- parsing HTML is hard when </quote> can appear in a comment -->hey</quote>

You can just use the DOM to do it for you.

// Parse your fragment
let doc = new DOMParser().parseFromString(
    '<quote>hey</quote>\nWhat\'s up?', 'text/html')
// Use DOM lookup to find a <quote> element and get its
// text content.
let { textContent } = doc.getElementsByTagName('quote')[0]
// We get plain text and don't need to worry about "&lt;"s
textContent === 'hey'

score -1 · Answer 4 · answered Jun 14 '18 at 19:11

-1

The dot . will not match new lines.

Try this:

//(.|\n)* will match anything OR a line break
quoteContent = value.replace(/<quote>|<\/quote>(.|\n)*/gi, '');

answered Jun 14 '18 at 19:11

scunliffe

62,582
25
126
161

1

Yes! That was it. Thanks everyone for helping – tilly Jun 14 '18 at 19:13
1

Besides recommending parsing HTML with regular expressions, `.|\n` will not match `[\r\u2028\u2029]`. – Mike Samuel Jun 14 '18 at 19:25
1

**Never** use `(.|\n)*` unless you are using a Lucene regex engine. It is a performance killer. – Wiktor Stribiżew Jun 14 '18 at 19:32

Regular expressions: grab part of string

4 Answers4