Please help because this flipping program is my ongoing nightmare!
I have several files that include some base64 encoded strings. Part of one file for examples reads as follows:
charset=utf-8;base64,I2JhY2tydW5uZXJfUV81c3R7aGVpZ2h0OjkzcHg7fWJhY2tydW5uZXJfUV81c3R7ZGlzcGxheTpibG9jayFpbXBvcnRhbnQ7fQ=="
They are always in the format "ANYTHINGbase64,STRING" It is html but I am treating it as one large string and using BeautifulSoup elsewhere. I am using a regex expression 'base' to extract the base64 string, then using base64 module to decode this as per my defined function "debase".
This seems to work ok up to a point: the output of b64encode for some reason adds unnecessary stuff:
b'#backrunner_Q_5st{height:93px;}backrunner_Q_5st{display:block!important;}' with the string the stuff in the middle.
I'm guessing this means in bytes; so I have tried getting my function to encode this as utf8 but basically I am out of my depth.
The end result that I want is for all "base64,STRING" in my html to be decoded and replaced with DECODEDSTRING.
Please help!
import os, sys, bs4, re, base64, codecs
from bs4 import BeautifulSoup
def debase(instr):
outstring = base64.b64decode(instr)
outstring = codecs.utf_8_encode(str(outstring))
outstring.split("'")[1]
return outstring
base = re.compile('base64,(.*?)"')
for eachArg in sys.argv[1:]:
a=open(eachArg,'r',encoding='utf8')
presoup = a.read()
b = re.findall(base, presoup)
for value in b:
re.sub('base64,.*?"', debase(value))
print(debase(value))
soup=BeautifulSoup(presoup, 'lxml')
bname= str(eachArg).split('.')[0]
a.close()
[s.extract() for s in soup('script')]
os.remove(eachArg)
b=open(bname +'.html','w',encoding='utf8')
b.write(soup.prettify())
b.close()