17

Possible Duplicate:
PHP String Manipulation: Extract hrefs

I am using php and have string with content =

<a href="www.something.com">Click here</a>

I need to get rid of everything except "www.something.com" I assume this can be done with regular expressions. Any help is appreciated! Thank you

Community
  • 1
  • 1
5et
  • 366
  • 1
  • 3
  • 10

5 Answers5

57

This is very easy to do using SimpleXML:

$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
mfonda
  • 7,873
  • 1
  • 26
  • 30
28

Give this a whirl:

$link = '<a href="www.something.com">Click here</a>';
preg_match_all('/<a[^>]+href=([\'"])(?<href>.+?)\1[^>]*>/i', $link, $result);

if (!empty($result)) {
    # Found a link.
    echo $result['href'][0];
}

Result: www.something.com

Updated: Now requires the quoting style to match, addressing the comment below.

leek
  • 11,803
  • 8
  • 45
  • 61
Tails
  • 3,350
  • 2
  • 17
  • 19
  • 1
    this can matches this: `href="_url_'` And that's wrong – dynamic Jun 15 '11 at 23:54
  • not 100%. if you allow `'` then you should allow even no quote at all: `href=url`. Now stuff **gets harder**. – dynamic Jun 15 '11 at 23:59
  • It does, with increased accuracy comes increased complexity. I'll leave it up to the OP to decide if what I've proposed is 'good enough' for his application. If you're going to go down the rabbit hole, read this first: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Tails Jun 16 '11 at 00:03
  • that link was where I wanted to bring you ^_^ – dynamic Jun 16 '11 at 00:04
3

I would suggest following code for this:

$str = '<a href="www.something.com">Click here</a>';
preg_match('/href=(["\'])([^\1]*)\1/i', $str, $m);
echo $m[2] . "\n";

OUTPUT

www.something.com

This will take care of both single quote ' and double quote " in the href link.

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Assuming that is ALWAYS the format of the variable, below should do the trick. If the content may not be a link, this won't work. Essentially it looks for data enclosed within two quotations.

<?php

$string = '<a href="www.something.com">Click here</a>';

$pattern = '/"[a-zA-Z0-9.\/\-\?\&]*"/';

preg_match($pattern, $string, $matches);
print_r($matches);
?>
-1

As probably you didn't meant your question that easy, but this does exactly what you're asking for:

$link = '<a href="www.something.com">Click here</a>';
$href = substr($link, 9, -16);

$href is:

string(17) "www.something.com"

As a regular expression it can be expressed it as this is:

$href = preg_match('(^<a href="([^"]*)">Click here</a>$)', $link, $matches) ? $matches[1] : die('Invalid input data.');

Is this helpful?

hakre
  • 193,403
  • 52
  • 435
  • 836
  • 5
    lol are you asking for a lots of -1? Note i didn't -1 you. You can check from my profile/reputation – dynamic Jun 15 '11 at 23:47