4

I want to save html to a file based on the url.

to get unique name to url I am using uuid.

>>> url = "https://www.google.co.in/?gfe_rd=cr&ei=-koUWPf4HqzT8ge2g6HoBg&gws_rd=ssl"
>>> uuidstring = str(uuid.uuid5(uuid.NAMESPACE_DNS, url))

but i want to further shorten the name. Is there any way to shorten string to unique small string.

I tried base64 but I could not figure out.

>>> uuid.UUID(uuidstring).bytes.encode('base64').rstrip('=\n').replace('/', '_')
>>> AttributeError: 'bytes' object has no attribute 'encode'

linked question: Convert UUID 32-character hex string into a "YouTube-style" short id and back

Community
  • 1
  • 1
Rahul
  • 10,830
  • 4
  • 53
  • 88

1 Answers1

5

Use the base64 module like this, it can handle binary data, then perform the decoding as ascii (will work because base64 is ascii).

import uuid,base64

url = "https://www.google.co.in/?gfe_rd=cr&ei=-koUWPf4HqzT8ge2g6HoBg&gws_rd=ssl"
uuidstring = str(uuid.uuid5(uuid.NAMESPACE_DNS, url))
z=base64.encodebytes(uuid.UUID(uuidstring).bytes).decode("ascii").rstrip('=\n').replace('/', '_')
print(z)

result:

pvEA9qOdX8COYyJf8zgzRA
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • is `.replace('/', '_')` necessary?. will there any '/' be present while converting and uuidstring to base64? – Rahul Oct 29 '16 at 07:57
  • it is possible to get `/` for instance with `base64.encodebytes(b"???")`. There's another char that you may need to strip: `+`. but with your string `a6f100f6-a39d-5fc0-8e63-225ff3383344` it is unlikely you're right, since you would need a lot of following 1 bits. – Jean-François Fabre Oct 29 '16 at 08:03
  • thanks. I will check for large data set and update accordingly. – Rahul Oct 29 '16 at 08:06