1

How can I look for links in HTML and remove them?

$html = '<p><a href="javascript:doThis('Test Title 1')">Test Title 1</a></p>';
$html .= '<p><a href="javascript:doThis('Test Title 2')">Test Title 2</a></p>';
$html .= '<p><a href="javascript:doThis('Test Title 3')">Test Title 3</a></p>';

$match = '<a href="javascript:doThis('Test Title 2')">';

I want to remove the anchor but display the text. see below.

Test Title 1

Test Title 2

Test Title 3

I've never used Regular Expressions before, but maybe i can avoid it also. Let me know if im not clear.

Thanks

Mark

EDIT: its not a client side thing. I cant use javascript for this. I have a custom CMS and want to edit HTML stored in a Database.

madphp
  • 1,716
  • 5
  • 31
  • 72
  • 7
    If you use regex for parsing HTML bobince will hunt you down (http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Dominic Rodger Nov 23 '09 at 17:04
  • wow. I just read that last week too. Whooops. haha. – madphp Nov 23 '09 at 17:06

6 Answers6

4

You may try the simplest thing:

echo strip_tags($html, '<p>');

This strips all tags except <p>

If you really like regexp:

echo preg_replace('=</?a(\s[^>]*)?>=ims', '', $html);

EDIT:

Delete a - tag AND surrounding tags (code gets messy and doesn't work with broken (X)HTML):

echo preg_replace('=<([a-z]+)[^>]*>\s*<a(\s[^>]*)?>(.*?)</a>\s*</\\1>=ims', '$3', $html);

Howerwer if your problem is that complicated, I recommend that you try xpath.

hegemon
  • 6,614
  • 2
  • 32
  • 30
  • That will strip all links, but how can i search for the link using the match variable. So it will remove that link and the closing tag preceeding it. – madphp Nov 23 '09 at 17:14
3

You could see if Simple HTML DOM does the trick.

Yacoby
  • 54,544
  • 15
  • 116
  • 120
1

You might have some joy with Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/ (Python HTML parsing / manipulation API)

Brian Lyttle
  • 14,558
  • 15
  • 68
  • 104
Justin
  • 84,773
  • 49
  • 224
  • 367
0

You can use

var foo = document.getElementsByTagName('a');

to fetch all the link tags. No need for regular expressions here...

EDIT: I'm just learning to read... ;) Go with PHP's DOM or XML abilities. It should be pretty easy using those.

Franz
  • 11,353
  • 8
  • 48
  • 70
  • its not a client side thing. I have a custom CMS and want to edit HTML stored in a Database. – madphp Nov 23 '09 at 17:03
  • Whoops, sorry. You should prefer using PHP's DOM or XML abilites instead of RegEx in that case... – Franz Nov 23 '09 at 17:07
0

sed -i -e 's/<a.*<\/a>//g' filename.html

Note that using regular expressions for hacking HTML is a... dubious proposition, but it might just work in practice ;-)

Jonas Kölker
  • 7,680
  • 3
  • 44
  • 51
  • Just to warn you... you will get voted down for this one by some community members... – Franz Nov 23 '09 at 17:09
  • You sure Franz? I keep reading thats its ok to use it, if its a small porition of HTML. – madphp Nov 23 '09 at 17:17
  • Yeah, I know. But you can almost certainly always figure out a way to make your RegEx not work... – Franz Nov 23 '09 at 17:31
  • "[make regex break]" -- I agree. That's why I said using regexes for HTML may be a dubious proposition ||| "[downmod for even suggestion it]" -- well, so be it :( if the HTML is laid out right, which the OP might be in control of, regexes might actually be the best solution: it works and it's easy/fast to hack up. Not the cleanest, sure, but sometimes you just need something that works on the data you have (and not the data you don't have). – Jonas Kölker Nov 24 '09 at 09:21
0

open the HTML file in Microsoft Expression. Ctrl+F and then chose replace tag or tag attributes contents Easy and quick solution Thanks Shomaail

  • 1
    no need to close your answer with Thanks, and your username is already displayed. Also, this question is tagged "regex" which means the user is looking for a regular expression. Your answer is simple and elegant, so I would consider it useful to people searching, but it's not necessarily within the requirements of the question. – deltree Mar 21 '12 at 20:50