I have a JSON file which contains text like this
.....wax, and voila!\u00c2\u00a0At the moment you can't use our ...
My simple question is how CONVERT (not remove) these \u codes to spaces, apostrophes and e.t.c...?
Input: a text file with .....wax, and voila!\u00c2\u00a0At the moment you can't use our ...
Output: .....wax, and voila!(converted to the line break)At the moment you can't use our ...
Python code
def TEST():
export= requests.get('https://sample.uk/', auth=('user', 'pass')).text
with open("TEST.json",'w') as file:
file.write(export.decode('utf8'))
What I have tried:
- Using .json()
- any different ways of combining .encode().decode() and e.t.c.
Edit 1
When I upload this file to BigQuery I have - Â
symbol
Bigger Sample:
{
"xxxx1": "...You don\u2019t nee...",
"xxxx2": "...Gu\u00e9rer...",
"xxxx3": "...boost.\u00a0Sit back an....",
"xxxx4": "\" \u306f\u3058\u3081\u307e\u3057\u3066\"",
"xxxx5": "\u00a0\n\u00a0",
"xxxx6": "It was Christmas Eve babe\u2026",
"xxxx7": "It\u2019s xxx xxx\u2026"
}
Python code:
import json
import re
import codecs
def load():
epos_export = r'{"xxxx1": "...You don\u2019t nee...","xxxx2": "...Gu\u00e9rer...","xxxx3": "...boost.\u00a0Sit back an....","xxxx4": "\" \u306f\u3058\u3081\u307e\u3057\u3066\"","xxxx5": "\u00a0\n\u00a0","xxxx6": "It was Christmas Eve babe\u2026","xxxx7": "It\u2019s xxx xxx\u2026"}'
x = json.loads(re.sub(r"(?i)(?:\\u00[0-9a-f]{2})+", unmangle_utf8, epos_export))
with open("TEST.json", "w") as file:
json.dump(x,file)
def unmangle_utf8(match):
escaped = match.group(0) # '\\u00e2\\u0082\\u00ac'
hexstr = escaped.replace(r'\u00', '') # 'e282ac'
buffer = codecs.decode(hexstr, "hex") # b'\xe2\x82\xac'
try:
return buffer.decode('utf8') # '€'
except UnicodeDecodeError:
print("Could not decode buffer: %s" % buffer)
if __name__ == '__main__':
load()