0

Hi i am looking to save the html of urls. In an ideal world I would use the url as the file name - however given urls often contain special characters, this is not going to be possible.

What i want is a way of converting the url into a 'code' which can then use as the file name. I also need a way of being able to go back as well (i.e. from the file name to the original url).

I am assuming that the best way of doing this is hashing in some way, but not sure where to start. I will be working in Python, so ideally want something of the form:

 def url_to_file_name ():
     ...

     return (file_name)

 def file_name_to_url ():
    ...

    return (url)
kyrenia
  • 5,431
  • 9
  • 63
  • 93
  • Hashing is a one way operation. Is it acceptable to fully escape (into unicode values for example) these chars from your filename? – sobolevn Oct 11 '15 at 18:35
  • @sobolevn ideally yes - but given your comment, on-balance, I may just therefore hash, and accept that i won't be able to go back from the file name to get the original url. – kyrenia Oct 11 '15 at 18:45
  • 1
    then http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename-in-python – sobolevn Oct 11 '15 at 18:48

1 Answers1

0

You can use the url encode method, it would convert most(as per my knowledge) to '%' chars + alphanumbers and dots.

Eg: http://meyerweb.com/eric/tools/dencoder/

You can use:

urllib.quote_plus("stackoverflow.com?it!@#$%^&*()cool")

Hashing might not be a good solution as you won't be able to convert back.

frunkad
  • 2,433
  • 1
  • 23
  • 35