0

I currently have this but it's not flawless:

$testcases = array(
array("I love mywebsite.com", true),
array("mywebsite.com/ is what I like", true),
array("www.mywebsite.com is my website", true),
array("Check out www.mywebsite.com/", true),
array("... http://mywebsite.com ...", true),
array("... http://mywebsite.com/ ...", true),
array("... http://www.mywebsite.com ...", true),
array("... http://www.mywebsite.com/ ...", true),
array("I like commas and periods. Just like www.mywebsite.com, they do it too!", true),
array("thisismywebsite.com is a lot better", false),
array("The URL fake.mywebsite.com is unknown to their server", false),
array("Check out http://redirect.mywebsite.com/www.ultraspammer.com", false)
);

function contains_link($text) {
return preg_match("/(https?:\/\/(?:www\.)?|(?:www\.))mywebsite\.com/", $text) > 0;
}

foreach ($testcases as $case) {
echo $case[0] . "=".(contains_link($case[0]) ? "true" : "false") . " and it should be " . ($case[1] ? "true" : "false") . "<br />";
}

Output:

I love mywebsite.com=false and it should be true
mywebsite.com/ is what I like=false and it should be true
www.mywebsite.com is my website=true and it should be true
Check out www.mywebsite.com/=true and it should be true
... http://mywebsite.com ...=true and it should be true
... http://mywebsite.com/ ...=true and it should be true
... http://www.mywebsite.com ...=true and it should be true
... http://www.mywebsite.com/ ...=true and it should be true
I like commas and periods. Just like www.mywebsite.com, they do it too!=true and it should be true
thisismywebsite.com is a lot better=false and it should be false
The URL fake.mywebsite.com is unknown to their server=false and it should be false
Check out http://redirect.mywebsite.com/www.ultraspammer.com=false and it should be false
BronzeByte
  • 685
  • 1
  • 7
  • 11

3 Answers3

13

An alternative to regex: parse_url()

$url = parse_url($text);
if($url['host'] == 'www.mywebsite.com' || $url['host'] == 'mywebsite.com')

UPDATE:

Assuming that $text can have a lot of domains,use strstr() instead.

if(strstr($text,"mywebsite.com") !== FALSE)

UPDATE 2:

function contains_link($text) {
        return preg_match("/(^(https?:\/\/(?:www\.)?|(?:www\.))?|\s(https?:\/\/(?:www\.)?|(?:www\.))?)mywebsite\.com/", $text);
}

and:

  contains_link("AAAAAAA http://mywebsite.com"); //1
  contains_link("foo BAaa http://www.mywebsite.com"); //1
  contains_link("abc.com www.mywebsite.com"); // 1
The Mask
  • 17,007
  • 37
  • 111
  • 185
  • 2
    Will `parse_url()` won't work... What if someone puts another site before his? like: `www.stopsearchinghere.com http://mywebsite.com` – Robert Martin May 08 '12 at 16:25
  • I am looking for a real URL, not a static piece of text – BronzeByte May 08 '12 at 16:41
  • I asked for regex for a reason, because I have to FIND a link, not validate :) – BronzeByte May 08 '12 at 17:07
  • @BronzeByte: Maybe? `return preg_match("/(https?:\/\/(?:www\.)?|(?:www\.))mywebsite\.com/", $text,$match)? $match[0] : 0;` – The Mask May 08 '12 at 17:12
  • searching for optional prefixes is pointless.. `contains_link("notmywebsite.com")` – Karoly Horvath May 08 '12 at 17:12
  • @TheMask Sorry, I meant knowing it is there, not finding the actual link – BronzeByte May 08 '12 at 17:17
  • @KarolyHorvath:I will fix it. It's a sorry,I have no my environment on this computer to do some tests. – The Mask May 08 '12 at 17:20
  • Without both http:// and www. it doesn't work, and it doesn't JUST detect seperated words, this one DID work however :) – BronzeByte May 08 '12 at 17:32
  • @BronzeByte: Look at new regular expression. It can detect without `http://`, `http://www.`, `https://`, `https://www.`, `www.`. And it don't match with something like this: `foo.mywebsite.com`, `thisismywebsite.com is a lot better` or `aaamywebsite.com`. Can detect too the site in start of string: `www.mywebsite.com is my website`. – The Mask May 08 '12 at 19:46
  • @The Mask: it's probably fine, I haven't checked it thoroughly because it looks needlessly complicated ;) Check my updated answer. – Karoly Horvath May 08 '12 at 22:00
  • @KarolyHorvath: Hum.. Really,it's a bit complicated. But I had tried exactly it,but it not worked as expected. It don't match for example: `www.mywebsite.com is my website` (for this pattern I written this regex). I have tested using C#.NET, maybe that it. – The Mask May 08 '12 at 23:32
5

I think what you're looking for is this:

^(https?://)?(www\.)?mywebsite\.com/?

See it here in action: http://regexr.com?30t6m


Here it is in PHP:

function contains_link($text) {
    return preg_match("~^(https?://)?(www\.)?mywebsite\.com/?~", $text);
}

P.S. If you want to be sure that there's nothing after it, you should append a $ to the end.

Joseph Silber
  • 214,931
  • 59
  • 362
  • 292
  • @KarolyHorvath - By that reasoning, your answer would fail on `thisismywebsite.com`. The OP's question is ambiguous enough to allow all this. He should clarify his needs for further assistance. – Joseph Silber May 08 '12 at 16:39
  • How to get that as PHP regex which will work with preg_match? And will it also detect regular http:// ones? – BronzeByte May 08 '12 at 16:40
  • @BronzeByte - Yes it will match `http://`. See the demo link. I also updated the answer with the PHP code. – Joseph Silber May 08 '12 at 16:45
  • @JosephSilber: I removed `^` from my regex because can have anything(text,another domain etc) in start of string. – The Mask May 08 '12 at 17:00
  • @The Mask: in that case it won't work.. searching for optional prefixes and postfixes is pointless.. `contains_link("notmywebsite.com")` – Karoly Horvath May 08 '12 at 17:14
  • @KarolyHorvath: I hope this has been resolved. Check out the updated regular expression. – The Mask May 08 '12 at 19:50
  • Why are you escaping forward slashes? Isn't that the whole point of using `~` as the RegEx bookends, do you don't have to? – Madbreaks May 09 '12 at 00:26
  • @Madbreaks - Correct. I'm just used to doing it this way in Javascript. But you're absolutely right, and I updated the code to reflect that. – Joseph Silber May 09 '12 at 16:18
4

if you only search for the text:

strpos($text, "mywebsite.com") !== FALSE

if you want to seach for an exact "word" (start):

preg_match("/(^|\s)(https?:\/\/)?(www\.)?mywebsite\.com/", $text);

or (start & end):

preg_match("/(^|\s)(https?:\/\/)?(www\.)?mywebsite\.com\/?(\s|[,.]|$)/", $text);
Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176