I'm scraping data from ~500 .js
files, all of them are formatted like this:
dict[0]=[{"some_key": "<b>名詞</b>", "another_key": "modification"}, {"some_key": "<b>名詞</b>", "another_key": "idea"}]
My code looks like this:
my_file = open(filename, 'r',encoding='utf-8', errors='ignore')
obj = my_file.read()
try:
my_indexer_left = obj.replace('[', 'xxx', 1).find('[')
my_indexer_right = obj.rfind(']')
new_obj = obj[my_indexer_left:my_indexer_right+1]
And after this new_obj
is created I can't convert it out of a string
.
I tried list(new_obj)
:
new_list_obj = list(new_obj)
for item in new_list_obj:
print(item)
And while print(type(new_list_obj))
tells me list
, the print
statement prints out one character at a time.
I've tried several other things along these lines to get this to work.
The closest I came was referencing the answer here to come up with the following:
j = json.dumps(new_obj,ensure_ascii=False).encode('utf8').decode()
But when I print(j)
all of the quotation marks ("
) are turned into \"
and when I print(type(j))
it says str
.
I want to be able to read these files, iterate over all the dictionary
(json
) objects and access the keys
and values
.