im new to programming and just started 7 days ago, i hope my queastion is not that stupid. I do not want to be the "Can someone code me this..." guy. To name a keyword or a method on the basis of which I can search further myself would already help me a lot.
I have a hughe amount (500000-1000000) Json files in a dictionary, all have the same formate. My goal is to load the files and to write some Values into another CSV file. I already started to write a code, but if im running it, it stops after ~80 files and gives the following error:
Traceback (most recent call last):
File "C:\Users\Hauke\PycharmProjects\Json Merge\CSV.py", line 14, in <module>
data = json.load(g)
File "C:\Users\Hauke\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Users\Hauke\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 8934: character maps to <undefined>
I do not really understand why? I guess it is becauze the program can not load so many json files? I know that the "if" part is really a mess, if it has no impact on my problem, you guys can ignore that part.
My Code:
import json
import os
import csv
import time
start = time.time()
for path, dirs, files in os.walk("Json"):
for f in files:
fileName = os.path.join(path, f)
print(fileName)
with open(fileName, "r") as g:
data = json.load(g)
if data.get("Retweeted") == True:
name1 = data.get("ScreenName")
rtstatus = data.get("RetweetedStatus")
rtent = rtstatus.get("User")
name = rtent.get("ScreenNameResponse")
fields = [name1, name]
with open("Import.csv", "a", newline="") as t:
writer = csv.writer(t)
writer.writerow(fields)
if data.get("InReplyToScreenName") != "null":
name2 = data.get("ScreenName")
RPName = data.get("InReplyToScreenName")
fields2 = [name2, RPName]
with open("Import.csv", "a", newline="") as u:
writer2 = csv.writer(u)
writer2.writerow(fields2)
end = time.time()
print('Time taken for fun program: ', end - start)
Thanks for your help, I hope I'm not too stupid.