2

I am screenshotting a bunch of web pages, using Python with Selenium. I want to save the PNGs locally for reference. The list of URLs looks something like this:

www.mysite.com/dir1/pageA
www.mysite.com/dir1/pageB

My question is about what filenames to give the screenshotted PNGs.

If I call the image files e.g. www.mysite.com/dir1/pageA.png the meaningless slashes will inevitably cause problems at some point.

I could replace all the / characters in the URL with _, but I suspect that might cause problems too, e.g. if there are already _ characters in the URL. (I don't strictly need to be able to work backwards from the filename to the URL, but it wouldn't be a bad thing.)

What's a sensible way to handle the naming?

Richard
  • 62,943
  • 126
  • 334
  • 542

2 Answers2

1

The easiest way to represent what's almost certainly a directory structure on the server is to do like wget does and replicate that structure on your local machine.

Thus the / characters become directory delimiters, and your www.mysite.com/dir1/pageA.png would become a PNG file called pageA.png in a directory called dir1, and dir1 is located in a directory called www.mysite.com.

It's simple, guaranteed to be reversible, and doesn't risk ambiguous results.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
0

What if you use '%2F'? It's the '/' but html encoded.

source: http://www.w3schools.com/tags/ref_urlencode.asp

BiggerD
  • 273
  • 3
  • 17
  • In general, `%` is a problematic character for use in a file name. See http://stackoverflow.com/questions/4814040/allowed-characters-in-filename for a full discussion. – Andrew Henle Jun 02 '16 at 09:45