5

page contents:

aa<b>1;2'3</b>hh<b>aaa</b>..
 .<b>bbb</b>
blabla..

i want to get result:

1;2'3aaabbb

match tag is <b> and </b>

how to write this regex using javascript? thanks!

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Koerr
  • 15,215
  • 28
  • 78
  • 108

5 Answers5

9

Lazyanno,

If and only if:

  1. you have read SLaks's post (as well as the previous article he links to), and
  2. you fully understand the numerous and wondrous ways in which extracting information from HTML using regular expressions can break, and
  3. you are confident that none of the concerns apply in your case (e.g. you can guarantee that your input will never contain nested, mismatched etc. <b>/</b> tags or occurrences of <b> or </b> within <script>...</script> or comment <!-- .. --> tags, etc.)
  4. you absolutely and positively want to proceed with regular expression extraction

...then use:

var str = "aa<b>1;2'3</b>hh<b>aaa</b>..\n.<b>bbb</b>\nblabla..";

var match, result = "", regex = /<b>(.*?)<\/b>/ig;
while (match = regex.exec(str)) { result += match[1]; }

alert(result);

Produces:

1;2'3aaabbb
Community
  • 1
  • 1
vladr
  • 65,483
  • 18
  • 129
  • 130
  • @lazyanno, before picking either the regex or DOM solution (based on the criteria of performance), make sure to **time both** (**parse a "representative" string** with both methods several times, in a loop, and see what the **actual timing is** on a **variety of browsers**.) – vladr Apr 12 '10 at 17:55
8

You cannot parse HTML using regular expressions.

Instead, you should use Javascript's DOM.

For example (using jQuery):

var text = "";
$('<div>' + htmlSource + '</div>')
    .find('b')
    .each(function() { text += $(this).text(); });

I wrap the HTML in a <div> tag to find both nested and non-nested <b> elements.

Community
  • 1
  • 1
SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • 1732348 is SO's 42. it answers a huge amount of questions. upvoting for it starts feeling daft, but heck, it won't stop being true any time soon... – David Hedlund Apr 12 '10 at 14:51
  • 3
    For the record, you cannot **reliably** parse HTML using regular expressions. If certain conditions are met, information can be *extracted* just fine from well-formed (X)HTML with regular expressions. – vladr Apr 12 '10 at 14:52
  • i want use javascript regex to get the result i don't like parse HTML (this's slow) any other idea? thanks :) – Koerr Apr 12 '10 at 14:53
  • @lazyanno, if you are trying to extract information from the page itself, then the HTML has already been parsed by the browser and you don't pay any additional penalty for using the DOM like `SLaks` suggested – vladr Apr 12 '10 at 14:55
  • You cannot do this with a regex. (Unless you want it to mysteriously fail every couple of hours) – SLaks Apr 12 '10 at 15:01
  • @Vlad Romascanu, i get this content from a XHR stream,it's not a HTML page and not parsed by my browser,it's only a javascript variable,so,i want use regex get the result – Koerr Apr 12 '10 at 15:08
  • i use $('
    '+c+'
    ').find('b') ,it's work,thanks,Do not know any better solution. I think faster regex directly.
    – Koerr Apr 12 '10 at 15:23
2
      var regex = /(<([^>]+)>)/ig;
      var bdy="aa<b>1;2'3</b>hh<b>aaa</b>..\n.<b>bbb</b>\nblabla..";

      var result =bdy.replace(regex, "");
      alert(result) ;

See : http://jsfiddle.net/abdennour/gJ64g/

Abdennour TOUMI
  • 87,526
  • 38
  • 249
  • 254
2

Here is an example without a jQuery dependency:

// get all elements with a certain tag name
var b = document.getElementsByTagName("B");

// map() executes a function on each array member and
// builds a new array from the function results...
var text = b.map( function(element) {
  // ...in this case we are interested in the element text
  if (typeof element.textContent != "undefined")
    return element.textContent; // standards compliant browsers
  else
    return element.innerText;   // IE
});

// now that we have an array of strings, we can join it
var result = text.join('');
Tomalak
  • 332,285
  • 67
  • 532
  • 628
1

Just use '?' character after the generating pattern for your inner text if you want to use Regular experssions. for example:

".*" to "(.*?)"
Soheil Setayeshi
  • 2,343
  • 2
  • 31
  • 43