0

I'm trying to retrieve some tweets with snscrape but the JSON file generated is encoded 'cp1252'. I coulnd't find in the documentation if there is a way to request the JSON file to be encoded as I whis but, shoudn't it be possible, how can I convert a quite big text file from cp1252 to UTF-8? I've seen plenty of questions of this kind but they all explained how to print the correct text instead of storing it in a file.

This question is not a diplicate of this one as I'm not trying to do it by cmd but instead via python.

EDIT: I'll try to better explain the situation: I'm retrieving tweets but they happen to contain unicode characters. This is an example of a sentence I'd like to decode:

La mia vita \u00e8 fantastica I extracted the encoding of the file this sentence is written in and it is 'cp-1252'. I'm not sure anymore if this is a 'cp-1252' file containing unicode characters (is this even possible?), but I had no luck converting that "\u00e8" to my "è".

After the first comment, here's what I tried:

file = open(file_name_input, encoding='cp1252')
file_output = open(file_name_output, 'w')
for line in file:
    file_output.write(line.encode('utf-8').decode())
pedro
  • 417
  • 2
  • 7
  • 25
  • 2
    Do you have it as a file or as a bytestring? If you have it as bytes, just use "ustr = bstr.decode('cp1252')" to get it as a unicode str. If it's a file, open it using the cp1252 codec to get a unicode string. In either case, you can write it to a file using any codec you like or manipulate it in memory. – cco Mar 05 '21 at 20:47
  • I tried as suggested (edited the opening post) but without luck. Did I misunderstood your suggestion? – pedro Mar 05 '21 at 21:12
  • You are writing Unicode to the file, which is what I'm guessing you mean by "it didn't work". You should open the output file with an encoding (`file_output = open(file_name_output, 'w', encoding='utf-8'`)), and write to it using `file_output.write(line)`. – cco Mar 06 '21 at 03:26
  • Does this answer your question? [Best way to convert text files between character sets?](https://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets) – Joe Mar 07 '21 at 10:59
  • @cco I also tried your second tip but without luck, I still can't manage to decode that string. I added some infos in my opening post, maybe I wasn't explaining myself good enough :) – pedro Mar 07 '21 at 17:21

0 Answers0