This is a bad idea as you will hit 255 byte limit for filenames as urls tend to be very long and even longer when b64encoded!
You can compress and b64 encode but it won't get you very far:
from base64 import b64encode
import zlib
import bz2
from urllib.parse import quote
def url_strategies(url):
url = url.encode('utf8')
print(url.decode())
print(f'normal : {len(url)}')
print(f'quoted : {len(quote(url, ""))}')
b64url = b64encode(url)
print(f'b64 : {len(b64url)}')
url = b64encode(zlib.compress(b64url))
print(f'b64+zlib: {len(url)}')
url = b64encode(bz2.compress(b64url))
print(f'b64+bz2: {len(url)}')
Here's an average url I've found on angel.co:
URL = 'https://angel.co/job_listings/browse_startups_table?startup_ids%5B%5D=972887&startup_ids%5B%5D=365478&startup_ids%5B%5D=185570&startup_ids%5B%5D=32624&startup_ids%5B%5D=134966&startup_ids%5B%5D=722477&startup_ids%5B%5D=914250&startup_ids%5B%5D=901853&startup_ids%5B%5D=637842&startup_ids%5B%5D=305240&tab=find&page=1'
And even with b64+zlib it doesn't fit into 255 limit:
normal : 316
quoted : 414
b64 : 424
b64+zlib: 304
b64+bz2 : 396
Even with the best strategy of zlib compression and b64encode you'd still be in trouble.
Proper Solution
Alternatively what you should do is hash the url and attach url as file attribute to the file:
import os
from hashlib import sha256
def save_file(url, content, char_limit=13):
# hash url as sha256 13 character long filename
hash = sha256(url.encode()).hexdigest()[:char_limit]
filename = f'{hash}.html'
# 93fb17b5fb81b.html
with open(filename, 'w') as f:
f.write(content)
# set url attribute
os.setxattr(filename, 'user.url', url.encode())
and then you can retrieve the url attribute:
print(os.getxattr(filename, 'user.url').decode())
'https://angel.co/job_listings/browse_startups_table?startup_ids%5B%5D=972887&startup_ids%5B%5D=365478&startup_ids%5B%5D=185570&startup_ids%5B%5D=32624&startup_ids%5B%5D=134966&startup_ids%5B%5D=722477&startup_ids%5B%5D=914250&startup_ids%5B%5D=901853&startup_ids%5B%5D=637842&startup_ids%5B%5D=305240&tab=find&page=1'
note: setxattr and getxattr require user.
prefix in python
for file attributes in python see related issue here: https://stackoverflow.com/a/56399698/3737009