i'm working on a small scraper for fun and when I grab some image urls from certain sites they come back really weird.
For example:
scraped url:
https:\/\/cdn1.vox-cdn.com\/thumbor\/zN9XawbQJgFPkuAcA2JEGgqApm8=\/cdn0.vox-cdn.com\/uploads\/chorus_asset\/file\/3700712\/tomorrowland54fdf04f23efb_2040.0.jpg
desired url:
https://cdn1.vox-cdn.com/thumbor/zN9XawbQJgFPkuAcA2JEGgqApm8=/cdn0.vox-cdn.com/uploads/chorus_asset/file/3700712/tomorrowland54fdf04f23efb_2040.0.jpg
it's adding unnecessary backslashes, so that url doesn't work when you follow it, it gives an error.
I tried using the stripslashes function as it seems like that's it's purpose but it didn't work. The url just stayed the same.
(edit) here's the code i'm using to grab urls:
function GetImages($page_dom) {
$found_links = [];
$images = $page_dom->getElementsByTagName('img');
foreach ($images as $image) {
$img_src = $image->getAttribute('src');
$found_links[] = $img_src;
}
return $found_links;
}