0

The title is kinda unclear imo, but I couldnt find a better way to tell my concern. I am trying to get some pictures from Reddit. So when I tried to get the url to the image i got some problems.

$url = 'http://www.reddit.com/r/pics';
$str = file_get_contents($url);

This is what i currently have. To get the specific part in the url code where the image-url stands, I need to find this part of the html:

`<a class="thumbnail may-blank " href="http://i.imgur.com/K4q9i5c.jpg">`

As i was trying to figure out a way how to get each href of all the links on the page, I could only think about regex. Finding the part of

<a class="thumbnail may-blank "

and then find this > sign I could get the whole line. Where I eventually could get the url of the picture from.

So I have been trying and trying to find an regex to match is, I couldnt get it work. Maybe someone here can help me. Or either has a better solution.

It would be highly appreciated, Thanks

  • There are simpler options, like QueryPath with `qp($url)->find("a.thumbnail.may-blank").attr("href");` (and a loop). Regex is only advisable with consistent input *and/or* if you're versed with it. – mario Mar 19 '14 at 19:41
  • Obligatory -> http://stackoverflow.com/a/1732454/1112089 – Crisp Mar 19 '14 at 19:46
  • Thanks for the tip about QueryPath, mario. I am kinda lost right now, as I am trying to get this thing on my Windows pc. The installer seems to be for Linux if I am right. – user3439303 Mar 19 '14 at 20:10

2 Answers2

0

Shouldn't use regex to parse html, its really a bad choice.
But if you really have to, something like this might work.
(untested)

 #  '/(?s)<a\s+class\s*=\s*(["\'])(?:(?!\1|[<>]).)*\1\s+href\s*=\s*(["\'])((?:(?!\2|[<>]).)*)\2/'

 (?s)                               # Dot-All
 <a \s+ class \s* = \s*             # class
 ( ["'] )                           # (1), delimiter
 (?:
      (?! \1 | [<>] )
      . 
 )*
 \1                                 # delimiter 
 \s+ 
                                    # [^<>]* ( add if necessary )
 href \s* = \s*                     # href
 ( ["'] )                           # (2), delimiter
 (                                  # (3 start), Url
      (?:
           (?! \2 | [<>] )
           . 
      )*
 )                                  # (3 end)
 \2                                 # delimiter
0

If you just want the hrefs in the a tags, try:

'<a.*href=\"(.*)\".*$'
pokero
  • 1,009
  • 3
  • 13
  • 27