0
$html = file_get_contents('https://website.com'); 
$regex = '/< *img[^>]*src *= *["\']?([^"\']*)/i';
preg_match_all($regex, $html, $matches)
return($matches[1][1]['src']); 

The problem is that the image URL that I want to return is broken because the URL in the view: source is a blue link means it's clickable, and after getting clicked it becomes a new URL, and that's the URL which I want to return not the unclicked one. hope you understand, I want to return the address source of the URL which is different from the actual URL.

Toto
  • 89,455
  • 62
  • 89
  • 125
  • " after getting clicked it become a new url"... I guess you are saying that the server redirects the user to another URL? If so, you can't determine that without visiting the first URL and seeing what location it redirects you to. But why does it matter really? If you capture the original URL, then when it is used, it will redirect no problem. That's a nice thing about redirects - you can have many URLs all going to the same place, and which one you use doesn't matter. Why do you care about knowing the second URL? – ADyson Mar 02 '20 at 10:23
  • 3
    Save your sanity, use a DOM parser NOT regex.... .see https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Wesley Smith Mar 02 '20 at 10:28
  • Well actully the first url is broken, i think it's a protection or something, but after beeing clicked it's a new one that changed a bit, but it's not broken, I want the second one – Nacereddine Allal Mar 02 '20 at 10:28
  • Ok Wesley ty I'll take a look on it :) – Nacereddine Allal Mar 02 '20 at 10:29
  • Not yet, can you explain it to me plz – Nacereddine Allal Mar 02 '20 at 10:46
  • When you click on a URL, a request goes to the server mentioned in the URL. When the request arrives, the server can choose to reply directly with some content, or it can choose to redirect the user to somewhere else. It does the redirection by sending a HTTP header called "Location" which contains the URL it wants to send the user to. It's then up to the browser to visit this new URL instead. This is likely to be why you see the URL change in the browser after you click on the link. If you watch progress in your browser's Network tools while clicking, you can likely see what happens. – ADyson Mar 02 '20 at 11:07
  • So, because the server is doing that, it doesn't matter that you capture the first URL from the page source, because when you actually come to use that URL, the user will be redirected to the final destination. So you don't really need to directly capture the final URL, because the first one is basically just a link to it anyway. – ADyson Mar 02 '20 at 11:08
  • Ok I understand that, but the first url exists, it's not a redirection, it's like a name for href maybe to protect from scraping data, when I use preg_match, it gets me the first url before redirection... when u copy the first url u past it on the browser u get the same url, but when u click on it directly it shows you the second url which shows the image – Nacereddine Allal Mar 02 '20 at 12:06
  • I'm talking about instagram profile picture url to be honest – Nacereddine Allal Mar 02 '20 at 12:09
  • Ok. What difference does that make? – ADyson Mar 02 '20 at 12:29
  • the first url image broken "Bad URL timestamp", when clicked it changes little bit but the image works – Nacereddine Allal Mar 02 '20 at 18:23

0 Answers0