3

i'm working on a small scraper for fun and when I grab some image urls from certain sites they come back really weird.

For example:

scraped url:

https:\/\/cdn1.vox-cdn.com\/thumbor\/zN9XawbQJgFPkuAcA2JEGgqApm8=\/cdn0.vox-cdn.com\/uploads\/chorus_asset\/file\/3700712\/tomorrowland54fdf04f23efb_2040.0.jpg

desired url:

https://cdn1.vox-cdn.com/thumbor/zN9XawbQJgFPkuAcA2JEGgqApm8=/cdn0.vox-cdn.com/uploads/chorus_asset/file/3700712/tomorrowland54fdf04f23efb_2040.0.jpg

it's adding unnecessary backslashes, so that url doesn't work when you follow it, it gives an error.

I tried using the stripslashes function as it seems like that's it's purpose but it didn't work. The url just stayed the same.

(edit) here's the code i'm using to grab urls:

function GetImages($page_dom) {
        $found_links = [];

        $images = $page_dom->getElementsByTagName('img');
        foreach ($images as $image) {
            $img_src = $image->getAttribute('src');
            $found_links[] = $img_src;
        }

        return $found_links;
    }
Scott
  • 309
  • 2
  • 13

2 Answers2

10

When you call json_encode, use the JSON_UNESCAPED_SLASHES option to prevent it from escaping slashes.

But this shouldn't really be necessary. If you're outputing JSON, you should be sending it to a program that parses JSON, and the JSON parser will translate \/ to /.

Barmar
  • 741,623
  • 53
  • 500
  • 612
0

if this is the only pattern you are expecting you can use str_replace('\/', '/', $url) You can also use str_replace(array('\/', '\\'), array('/', '\'), $url) for more patterns

device_exec
  • 1,686
  • 1
  • 9
  • 7