5

I need to split up a string like this

<p>foo</p><p>bar</p>

to an array with "foo" and "bar"

I thought RegEx could help me, but it seems I didn't understand RegEx. This is my try.

var inputText = "<p>foo</p><p>bar</p>";
splittedSelection = inputText.split("/<p>|<\/p>/g");

But all I can achieve is an array with one entry and it's the same as the inputText.

I made a little fiddle for you.

Thanks for any help.

Yashia
  • 90
  • 1
  • 5
  • 1
    You're not using a regex here, you're using a string. `splittedSelection = inputText.split(/

    |<\/p>/g);`

    – Axnyff Aug 03 '17 at 15:10
  • 2
    https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454 – epascarello Aug 03 '17 at 15:10
  • Thanks for that, @epascarello. Everybody go click that link – jhhoff02 Aug 03 '17 at 15:13
  • 1
    [Do not parse HTML with Regex](https://stackoverflow.com/a/1732454/6320039) – Ulysse BN Aug 03 '17 at 15:16
  • Please take a look at @baao's answer :) – Erazihel Aug 03 '17 at 15:21
  • I should have been more specific about my aim. I need exactly the parts between the open and closed

    tags. I need to modify the stuff between with all the other HTML-Tags that may be or may be not be inside those tags.

    – Yashia Aug 03 '17 at 17:59

6 Answers6

2

You should use /<p>|<\/p>/g instead of inside quotations. However, this will produce ["", "foo", "", "bar", ""], which is undesirable, so you can .filter() out empty results, like this:

var inputText = "<p>foo</p><p>bar</p>";

splittedSelection = inputText.split(/<p>|<\/p>/g).filter(function(value) {
  // Filter out empty results
  return value !== "";
});

document.getElementById("bar").innerHTML += "0: " + splittedSelection[0] + "\n" + "1: " + splittedSelection[1] + "\n";
<div id="bar">
</div>
Angelos Chalaris
  • 6,611
  • 8
  • 49
  • 75
1

you can start from something like this:

  1. .+ will handle different tags and attributes
  2. .+? creates a lazy quantifier

const text = "<p>foo</p><p>bar</p>";

const re = /<.+?>(.+?)<\/.+?>/g;

console.log(text.split(re).filter(t => t));
Hitmands
  • 13,491
  • 4
  • 34
  • 69
  • "Lazy quantifier" = "By adding the ? after the +, we tell it to repeat as few times as possible, so the first match it comes across, is where we want to stop the matching." – lazy vs. greedy https://stackoverflow.com/a/2301298/1066234 – Avatar May 10 '22 at 05:44
0

ES6 based answer:

const regex = /<[^>]*>/gi;
let string = '<p>foo</p><p>bar</p>';
let result = string.split(regex).filter(e => e);
oboshto
  • 3,478
  • 4
  • 25
  • 24
0

Forget about the answers that try to fix your regex. Don't do it with regex.

Instead, get the elements and map their textContent to an array:

let res = Array.from(document.getElementsByTagName('p')).map(e => e.textContent);
console.log(res);
<p>foo</p><p>bar</p>

If you only have this string and it is not a part of the document, create an element and parse it then (you don't even need to append the element to the DOM):

let s = "<p>foo</p><p>bar</p>";
let el = document.createElement('div');
el.innerHTML = s;

let res = Array.from(el.getElementsByTagName('p')).map(e => e.textContent);
console.log(res);

If you're doing this in node, you can use cheerio:

const cheerio = require('cheerio')
let html = "<p>foo</p><p>bar</p>";
const $ = cheerio.load(html);
let res = [];
$('p').each((i,e) => res.push($(e).text()));
console.log(res);

If you are doing this in any other environment, changes are extremely high that there's a DOM/XML/HTML parser available, too.

baao
  • 71,625
  • 17
  • 143
  • 203
  • This is like offering apples to who's asking for milk, isn't it? What's about this task should be done in nodejs? – Hitmands Aug 03 '17 at 15:21
  • No it isn't @Hitmands. It's explaining someone who is doing it wrong how to do it right. If you ask me how to jump from a bridge I'd also say better don't do it instead of explaining your original question. I've added a version for node... – baao Aug 03 '17 at 15:28
  • All of us are aware that `Regex shouldn't be used as parsers`, but, he his asking for that... You can add a comment with a suggestion to better handle the problem but answers should be answers... – Hitmands Aug 03 '17 at 15:30
0

Assuming this is on the client you can use jQuery instead of regex.

var inputText = "<p>foo</p><p>bar</p>";
var splittedSelection = $('<div>'+inputText+'</div>').find("p").map(function() { 
  return $(this).text() 
});
$.each(splittedSelection, function(i,item) {
  $("#bar").append(i+": " +item + "<br/>");
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<div id="bar"></div>
mplungjan
  • 169,008
  • 28
  • 173
  • 236
0

Another solution with regex:

let regex = /(?![<p>])(.*?)(?=[<\/p>])/g
  , inputText = "<p>foo</p><p>bar</p>";

let array = inputText.match(regex).filter(i => i);
  
console.log(array);
BrTkCa
  • 4,703
  • 3
  • 24
  • 45