1

Let's say we have a string between 2 characters:

"<p>This is some text</p> and then this is some more"

How could we get only "This is some text"

cweiske
  • 30,033
  • 14
  • 133
  • 194
Cybernetic
  • 12,628
  • 16
  • 93
  • 132

5 Answers5

3

var str="<p>This is some text</p> and then this is some more";
var p=str.substring(str.lastIndexOf("<p>")+3,str.lastIndexOf("</p>"));
console.log(p);

In Case there are more than one occurrence of the tag use this:

// here `/<p>(.*?)<\/p>/g` will give string like <p>This is some text</p> with p tags then replace p with '' using `/<\/?p>/g,''`.
var str="<p>This is some text</p> and then this is some more.<p>hello</p>";
var p = str.match(/<p>(.*?)<\/p>/g).map(function(val){
   return val.replace(/<\/?p>/g,'');
});

console.log(p);

As per RobG suggestion if you can possibly construct html with the string then you can try this:

 var p = $('p').map(function(){
       return this.innerHTML;
    }).get();

    console.log(p);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p id="para">This is some text</p> and then this is some more<p>hello</p>

Another similar version of the above with html() function.

  var p = $('p').map(function(){
           return $(this).html();
        }).get();

        console.log(p);
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
    <p id="para">This is some text</p> and then this is some more<p>hello</p>
Suchit kumar
  • 11,809
  • 3
  • 22
  • 44
  • Excellent. I was just about to ask about multiple instances. Thank you. – Cybernetic Jan 12 '17 at 04:28
  • 2
    It is well documented that parsing HTML with regular expressions is not a good idea, see [*RegEx match open tags except XHTML self-contained tags*](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – RobG Jan 12 '17 at 04:34
0

Try like this... HTML:

<p id="para">This is some text</p> and then this is some more

JAVASCRIPT:

<script>
var text = document.getElementById("para").innerHTML;
alert(text);
</script>
Hikmat Sijapati
  • 6,869
  • 1
  • 9
  • 19
  • but everything is a string. The site has been extracted as one long string, – Cybernetic Jan 12 '17 at 04:10
  • But this does highlight that parsing HTML using a regular expression is bound to fail, because HTML does not have regular patterns. – RobG Jan 12 '17 at 04:31
0

Another way:

var text = />(.+)</.exec("<p>This is some text</p> and then this is some more")[1]
console.log(text)
MotKohn
  • 3,485
  • 1
  • 24
  • 41
0

Since you've said in a comment that "The site has been extracted as one long string", then the most robust way is to parse the site back into a document and use DOM methods, e.g.

var s = "<p>This is some text</p> and then this is some more";
var div = document.createElement('div');
div.innerHTML = s;

console.log(div.querySelector('p').textContent); // This is some text

Using a regular expression (or series of regular expressions) is bound to fail as HTML is not a regular language and regular expressions are of insufficient complexity to parse it.

RobG
  • 142,382
  • 31
  • 172
  • 209
-1

If you want to get text of tag then use following code :

HTML Code

<p id="para">This is some text</p> and then this is some more

JQuery Code

var text = $("#para").text();

text gives you text of <p> tag

Learner
  • 261
  • 1
  • 4
  • 16