3

I am currently getting response error in html format. It is of type string.

"<!DOCTYPE html>\r\n
<html>
  <head>
    <title>Data already exists</title>
  </head>
</html>"

I wanted to retrieve the content inside the <title>, for above instance "Data already exists". Can anybody suggest a appropriate regular expression to capture that text.

Please any help is appreciated!

inspiringmyself
  • 590
  • 1
  • 11
  • 29

3 Answers3

5

First, you can do it without regex, by creating a dummy element to inject the HTML:

var s = "your_html_string";
var dummy = document.createElement("div");
dummy.innerHTML = s;
var title = dummy.getElementsByTagName("title")[0].innerText;

But if you really insist on using regex:

var s = "your_html_string";
var title = s.match(/<title>([^<]+)<\/title>/)[1];

Here's a DEMO illustrating both approaches.

João Silva
  • 89,303
  • 29
  • 152
  • 158
  • You don't need to use *getElementsByTagName*, there is a [document.title](http://dev.w3.org/html5/spec/single-page.html#document.title) property that is more convenient. Also, the [title element](http://dev.w3.org/html5/spec/single-page.html#the-title-element) can have attributes, so the regular expression needs to be more sophisticated (parsing HTML with a regular expression is generally a bad idea). – RobG Aug 29 '12 at 01:40
  • @RobG: I absolutely agree that parsing HTML with a regex is generally a bad idea; however, OP explicitly said that it it was a response error that follows the above format. `document.title` will get the current document's title. Note that OP is no trying to parse the current document but a specific response message (probably from an `ajax` call). – João Silva Aug 29 '12 at 01:46
  • 1
    Hmm... One line of regex, or three lines of dummy element manipulation? One or three? I know which I'd choose. (I too agree that in a general sense parsing HTML with regex is not the way to go, but as you said João, for a specific case with a known format I think it is OK.) – nnnnnn Aug 29 '12 at 02:06
  • Yes, all good. The OP could use the response text to create a new document, then just use *document.title*. – RobG Aug 29 '12 at 02:10
  • I really appreciate everyone's suggestion and thanks for taking time to share the knowledge. You guys are awesome. – inspiringmyself Aug 29 '12 at 14:07
2

The very basics of parsing html tags in regex is this. http://jsbin.com/oqivup/1/edit

var text = /<(title)>(.+)<\/\1>/.exec(html).pop();

But for more complicated stuff I would consider using a proper parser.

elclanrs
  • 92,861
  • 21
  • 134
  • 171
1

You could parse it using DOMParser():

var parser=new DOMParser(),
    doc=parser.parseFromString("<!DOCTYPE html><html><head><title>Data already exists</title></head></html>","text/html");

doc.title; /* "Data already exists" */
Oriol
  • 274,082
  • 63
  • 437
  • 513
  • You probably need to use an `ActiveXObject` for IE < 9. – João Silva Aug 29 '12 at 01:32
  • and how we can use the `doc` variable with jquery? – Dariush Jafari Aug 29 '12 at 01:32
  • @DariushJafari Do you mean `$(doc)`? – Oriol Aug 29 '12 at 01:33
  • Chrome 23 Canary doesn't parse HTML with `DOMParser` though. If the HTML string is XML-valid, you can always use the `application/xml` parsing for cross-browser parsing. – Fabrício Matté Aug 29 '12 at 01:34
  • @Oriol how do you select some elements of `doc`? `$('div.cc')` selects the current document elements. – Dariush Jafari Aug 29 '12 at 01:51
  • @DariushJafari Sorry I can't help you more, but I'm not an expert. In fact, I have never used that function, but I knew it and I thought it could be a good solution to your problem. If you want to know more, you should ask it on another question, sorry. – Oriol Aug 29 '12 at 01:54
  • Cool, but that will go belly–up in IE 9 and lower (maybe 10 too). I guess the code is from the [MDN DOM Parser](https://developer.mozilla.org/en-US/docs/DOM/DOMParser) article, which also has a more general solution. – RobG Aug 29 '12 at 02:34
  • @RobG No, the code is from w3schools (http://www.w3schools.com/dom/dom_loadxmldoc.asp). But your link is great. Does "DOMParser HTML extension for other browsers" work for all browsers? It says that "text/html parsing is natively supported", but in "Browser compatibility" table it seems that only works on Firefox... – Oriol Aug 29 '12 at 02:44
  • The MDN code doesn't work in IE (can't set innerHTML of HTML element and IE doesn't support text/html with `parseFromString`), see my answer to [How to create Document objects with JavaScript](http://stackoverflow.com/questions/8227612/how-to-create-document-objects-with-javascript/12172023#12172023). W3Schools is very ordinary, much better to reference appropriate specifications with MDN and MSDN for examples. – RobG Aug 29 '12 at 06:39
  • I really appreciate everyone's suggestion and thanks for taking time to share the knowledge. You guys are awesome. – inspiringmyself Aug 29 '12 at 14:09