1

I'm trying to build a simple code for sentiment analysis. While learning to do that, I already start saving data from Twitter API. Due to my coding limitation, I access the Twitter API, run the code, and save the output in .txt format.

I'm trying to open the saved .txt to be processed with the rest of the code.

Here's my code for getting search results from Twitter API:

import tweepy
import re
from textblob import TextBlob
api_key = CENSORED
api_key_secret = CENSORED
bearer_token = CENSORED
access_token = CENSORED
access_token_secret = CENSORED
auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token,access_token_secret)
api=tweepy.API(auth)


hasilSearch = api.search_tweets(q="(rokok OR merokok OR nyebat OR ngerokok OR ngudut OR ngudud OR udud OR sebat OR sebats OR roko OR cerutu OR cangklong OR tembakau) (kanker OR tumor OR ganas)", lang ='id',count ="10000")

print(type(hasilSearch))

(I then copy and paste the output to notepad and save it in 20230227.txt format)

Then I tried to open it, but I'm getting the error:

import tweepy
import re
data = open('20230227.txt', 'r')


print(data.read())

Here is the error I get:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_2712\2281530545.py in <cell line: 6>()
      4 
      5 
----> 6 print(data.read())

C:\Data\Software\WinPhyton 3.10\WPy64-31050\python-3.10.5.amd64\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 24825: character maps to <undefined>

Any idea what causes this error? How do you save the search_results for future uses?

I tried to see the type of variable using:

print(type(hasilSearch))

The type of variable is: <class 'tweepy.models.SearchResults'>

Lidor Eliyahu Shelef
  • 1,299
  • 1
  • 14
  • 35
  • 1
    If you're copying and pasting the output to notepad, you may be copying in some unexpected characters from your terminal. Why not write the results to a file in code? From the documentation, it looks like `SearchResults` is in a JSON format, so it should be trivial to save that to a file using the `json` module. – nigh_anxiety Feb 27 '23 at 07:31
  • Maybe this can help: https://stackoverflow.com/questions/42019117/unicodedecodeerror-charmap-codec-cant-decode-byte-0x8f-in-position-xxx-char – Alessandro Togni Feb 27 '23 at 11:57

0 Answers0