I have a JSON file full of different kinds of characters. I’m using it for an NLP project. I need to load the text into a dictionary, and then write the keys as they are into another file for some extra pre-processing. The text in question is a mix of numbers, alphabetical characters and code points. The issue is that when I write the dictionary into a text file, it changes the code points into strings if that makes sense. So \u00a1 becomes ¡ and \u00a2 becomes ¢ and so on and so forth. I’d like to write in the code points, not their string representations.
The file in question I am trying to process is here: https://storage.googleapis.com/gpt-2/encoder.json
This is the code I have been using to write the dictionary into a text file.
import os
import json
with open(r" file/path/to/encoder.json") as f:
encoder = json.load(f)
file1 = open(r"file/path/to/file.txt","a", encoding="utf-8")
for key in encoder:
file1.write(key + " " + str(encoder[key]) + '\n')
How do I write the code points without changing them?