2

I need to get href for a link with specific anchor text, within an html page.

  </tr>
       <tr>
      <td><a href="/thisisafile.pdf" target="_blank" class="body1">
        This is some anchor text </a></td>
    </tr>
      <tr>

I need to pull out /thisisafile.pdf, I have tried the following code:

preg_match('~<a.*href="(.*?)".?>.?This is some anchor text.?</a>~sm',$temp,$matches,0);

$temp contains the code above.

I have tried the regex in an online preg_match tester, and it matches. I have tried the regex in a regex tester without delimiter, and it works. But when I try it on my server (linux), I get 0 matches (not false).

user1400987
  • 21
  • 1
  • 2

2 Answers2

1

Possibly another duplicate. Check the first answer of this question. Regular expression engines tend to get clunky at times especially when you'd use the .* greedy pattern.

Grabbing the href attribute of an A element

Community
  • 1
  • 1
verisimilitude
  • 5,077
  • 3
  • 30
  • 35
0

You should change

~<a.*href="(.*?)".?>.?This is some anchor text.?</a>~sm

into

~<a.*?href="(.*?)".*?>.*?This is some anchor text.*?</a>~sm

You were missing the *. .? means that it only allows one or no characters. The target="_blank" class="body1" and the spaces before your text where therefore not allowed, causing your regular expression to fail.

EDIT: also made your first .* less greedy by replacing it with .*? to prevent future problems.

soimon
  • 2,400
  • 1
  • 15
  • 16