0

I would like to use preg_match in PHP to test the format of a URL. The URL looks like this:

<a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a>

well honestly I have no idea of preg_match creating but my goal is

a pattern start with <a href= contain word ~dead host~ end with </a>

I try string contain in php native function but unfortunately it was not smart so I think preg_match is the only choice.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • 3
    If you are processing HTML data, it would be better to use `DOM`. – Passerby Apr 01 '14 at 04:45
  • thanks for suggestion but preg_match will be best option – user3479821 Apr 01 '14 at 05:00
  • http://stackoverflow.com/q/590747/570812 http://stackoverflow.com/q/1732348/570812 – Passerby Apr 01 '14 at 05:12
  • yeah you are right it is useful when you prase large amount of data in my case i have 10 approx links that some contain ~dead host~ string after carefully thought i decide preg_match will be best so that why i am working on it – user3479821 Apr 01 '14 at 05:31

2 Answers2

1

I was not entirely clear on what your text looks like, vs what you want to match against, but I will do my best to try and get it right.

Basically what I am doing here is looking for an opening link tag <a, followed by some stuff (anything except a closing HTML tag), followed by the text dead host wrapped in tildas ~. Then some more stuff, followed by the closing link tag </a>.

$string = "<a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a>";

if (preg_match('%<a[^>]*?~dead host~.*?</a>%i', $string)) {
    print "Circle up the wagons - a match was found!";  
}
else {
    print "Let's pitch camp here - no match was found!";
}

Here is an explanation of the REGEX:

%   <a   [^>]*?   ~dead host~   .*?   </a>   %   i
^    ^      ^          ^         ^      ^    ^   ^
1    2      3          4         5      6    7   8
  1. % Delimiter - Tells the script that the pattern starts here.
  2. <a Look for an opening link tag.
  3. [^>]*? This is a character class [] telling the script to find any character that is not ^ a closing html tag >, as many times as you can *, up until you hit the next part of the expression ?. In this case, it will stop when it finds ~dead host~. This is similar to item #5, except that we want it to match any characters except a closing HTML tag, whereas in number #5, it can match any character, including the closing HTML tag.
  4. ~dead host~ Look for the literal string 'dead host' wrapped in tildas '~'.
  5. .*? This means find any character ., as many times as you can *, up until you hit the next part of the expression ?. In this case, it is </a>.
  6. </a> Look for a closing link tag.
  7. % Delimiter - Tells the script that the pattern ends here.
  8. i Pattern modifier - Tells the script to ignore the case. If you are searching through multiple lines instead of just one line, you may want to add the ms flags as well. So instead of your pattern modifier looking like this: i, it will look like this: ims. Although this is not technically correct, generally speaking, this has the effect of treating your text as one line, even if you have multiple lines.

Hopefully this is what you were looking for. If I was off in my understanding of what you were looking for, let me know and I can make an edit to adjust it to get you what you want.

Here is a working demo

EDIT:

In response to your comment, you can use preg_replace instead of preg_match to replace stuff.

$string = " 

<a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://stackoverflow.com' rel='nofollow' target='blank'>part-2</a><a href='http://stackoverflow.com' rel='nofollow' target='blank'>part-2</a><a href='http://stackoverflow.com' rel='nofollow' target='blank'>part-2</a><a href='http://stackoverflow.com' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a>
<a href='http://stackoverflow.com' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a>
<a href='http://stackoverflow.com' rel='nofollow' target='blank'>part-2</a><a href='http://stackoverflow.com' rel='nofollow' target='blank'>part-2</a><a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a>

";

$string = preg_replace('%<a[^>]*?~dead host~.*?</a>%i', ' ', $string);

print $string;

This will replace all of the matches with a space instead of just matching them.

Here is a working demo of the replacement

Quixrick
  • 3,190
  • 1
  • 14
  • 17
  • I saw your reply .First of all i am very surprised .This is too brief and completed .I do not sure but claim that if i had paid someone , no one could write as you wrote.This code is complete and I am working as i thought .There is no need of any change in it p.s : any idea if have 100 links half of them are like string(above preg match) and i want to replace with space e.g http://pastebin.com/CfQdrihS – user3479821 Apr 02 '14 at 04:31
  • Yes, this is pretty easy to do. Instead of using `preg_match`, you'd use `preg_replace`. I have made an edit to my code above to show you how. – Quixrick Apr 03 '14 at 18:54
0

if you want to match only the url

$text="<a href='http://~dead host~/vypdye57f25o' rel='nofollow' target='blank'>part-2</a>";

 preg_match_all("/http:\/\/ ## starting from http://
 ~dead\shost~   ## along with http:// match ~dead host~
 [^\"']         ## upto singlequote or doublequote  
 +              ## one more character  
 /mx",$text,$matches);   //  m - multiple line x - include to commentary inside patterns
 print_r($matches);

Working Demo

Shafeeq
  • 467
  • 7
  • 17