1

I'm sorry,I can't believe this question is not solved in stackoverflow but I've been searching a lot and I don't find any solution.

I want to change HTML code with regular expressions in this way:

testing <a href="url">anchor</a>

to

testing anchor

Only I want to unlink a text code without use DOM functions, the code is in a string not in the document and I don't want to remove other tags that the a ones.

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
Oscardrbcn
  • 508
  • 5
  • 13

4 Answers4

5

If you really don't want to use DOM functions (why ?) you might do

str = str.replace(/<[^>]*>/g, '')

You can use it if you're fairly confident you don't have a more complex HTML but it will fail in many cases, for example some nested tags, or > in an attribute. You might fix some of the problems with more complex regular expressions but they aren't the right tool for this job in the general case.

If you don't want to remove other tags than a, do this :

str = str.replace(/<\/?a( [^>]*)?>/g, '')

This changes

<a>testing</a> <a href="url"><b>a</b>nchor</a><div>test</div><aaa>E</aaa>

to

testing <b>a</b>nchor<div>test</div><aaa>E</aaa>
Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
  • 2
    +1 Works beautifully for OP's simple use case, I think this is the simplest regex solution. OP, if you're doing anything more complicated avoid this. – Benjamin Gruenbaum May 24 '13 at 11:23
  • Thank you very much, is all that I need, definitely I have to study some regular expressions tutorial, I don't know anything about it. It's enough although fails with nested tags. I can't use DOM functions (I suppose) because the code is in a string it's not showed in the document object. – Oscardrbcn May 24 '13 at 11:26
  • @user1901219 Is this regex clear or do you want me to explain it ? – Denys Séguret May 24 '13 at 11:27
  • Now I think, it doesn't work because I only want to remove the link tags, if I have I want the result
    anchor
    – Oscardrbcn May 24 '13 at 11:34
  • @user1901219 You specifically said you need to match _that_ case. Of course his answer wouldn't work in a more general case. Why are you against using a built in DOM method? – Benjamin Gruenbaum May 24 '13 at 11:35
  • @dystroy thank you very much, I've marked like solved, I just need this simple expression. – Oscardrbcn May 24 '13 at 11:42
  • @Benjamin Gruenbaum you are right, I'm sorry I should to specify better the case. I didn't want to use de DOM method because I have the code in a string and I want to parse before show in the document. – Oscardrbcn May 24 '13 at 11:43
  • 1
    You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document. – andy magoon May 24 '13 at 12:06
4

I know you only want regex, for future viewers, here is a trivial solution using DOM methods.

var a = document.createElement("div");
a.innerHTML = 'testing <a href="url">anchor</a>';
var wordsOnly = a.textContent || a.innerText; 

This will not fail on complicated use cases, allows nested tags and it's perfectly clear what's happening:

  • Hey browser! Create an element
  • Put that HTML in it
  • Give me back just the text, that's what I want now.

NOTE:

The element we're creating will not be added to the actual DOM since we're not adding it anywhere, it'll stay invisible. Here is a fiddle to illustrate how this works.

Community
  • 1
  • 1
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
  • Note to future readers, this is also possible if you're `nodejs` or another javascript framework. No need to reinvent wheels most of the time. – Benjamin Gruenbaum May 24 '13 at 11:24
  • 1
    +1 because even while it wasn't what OP asks, it's generally a better solution. Shouldn't that be a little more complex for compatibility with IE8, like `a.textContent||a.innerText` ? – Denys Séguret May 24 '13 at 11:26
  • What if I want to keep **bold** things, but just remove links? i.e. turn `foo bar baz blep awoo` into `foo bar baz blep awoo`? This gets rid of all HTML in it, giving back `foo bar baz blep awoo`. I wouldn't call that a complicated use case, and I wouldn't say that it "allows nested tags". – Nic Sep 01 '17 at 14:31
  • @QPaysTaxes still easy, you would just do a `querySelectorAll('a')` on it and then call `.remove` on the elements. – Benjamin Gruenbaum Sep 01 '17 at 15:08
  • @BenjaminGruenbaum That... also doesn't work, as far as I can tell. (I assume you mean something like `[].forEach(a.querySelectorAll('a'), function(l) { a.remove(l); }`; if that's incorrect, please clarify.) – Nic Sep 01 '17 at 21:53
  • This is removing the element https://jsfiddle.net/58Lz1ay7/ and this is unwrapping its text https://jsfiddle.net/9dawg6ee/ – Benjamin Gruenbaum Sep 02 '17 at 09:24
0

As has been mentioned, you cannot parse HTML with regular expressions. The principal reason is that HTML elements nest and regular expressions cannot handle that.

That said, with a few restrictions which I will mention, you can do the following :

string.replace (/(\b\w+\s*)<a\s+href="([^"]*)">(.*)<\/a>/g, '$1 $3')

This requires there to be a word before the tag, spacing between the word and the tag is optional, no attributes other than the href specified in the <a> tag and you accept anything between the <a> and the .

HBP
  • 15,685
  • 6
  • 28
  • 34
  • 1
    This gives me "testing url" and not "testing anchor" like OP asked for – Benjamin Gruenbaum May 24 '13 at 11:34
  • It didn't work for my simple code, I don't know if I understood good the "This requires there to be a word before the tag", I've tried with a word before. But anyway the expression of @dystroy is enough for me. Thank you! – Oscardrbcn May 24 '13 at 11:48
0

You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document

andy magoon
  • 2,889
  • 2
  • 19
  • 14
  • 2
    Hey andy, did you mean to post it as a comment and not an answer perhaps? – Benjamin Gruenbaum May 24 '13 at 12:11
  • Yes it's true, but I though it was quicker and elegant to do it with regular expressions, but now I see the Mat answer [link](http://stackoverflow.com/a/1732454/635608) and maybe I was wrong. – Oscardrbcn May 24 '13 at 12:20