I am reading a tweeter feed in json format to read the number of users. Some lines in the input file might not be tweets, but messages that the Twitter server sent to the developer (such as limit notices). I need to ignore these messages.
These messages would not contain the created_at field and can be filtered out accordingly.
I have written the following piece of code, to extract the valid tweets, and then extract the user.id and the text.
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
if 'created_at' in json_object:
return json_object
else:
return
except ValueError as error:
return
def get_usr_txt (line):
tmp = safe_parse(line)
if(tmp != None):
return ((tmp.get('user').get('id_str'),tmp.get('text')))
else:
return
My challenge is that I get one extra user called "None"
Here is a sample output (it is a large file)
('49838600', 'This is the temperament you look for in a guy who would have access to our nuclear arsenal. ), None, ('2678507624', 'RT @GrlSmile: @Ricky_Vaughn99 Yep, which is why in 1992 I switched from Democrat to Republican to vote Pat Buchanan, who warned of all of t…'),
I am struggling to find out, what I am doing wrong. There is no None
in the tweeter file, hence I am assuming that I am reading the
{"limit":{"track":1,"timestamp_ms":"1456249416070"}}
but the code above should not include it, unless I am missing something.
Any pointers? and thanks for the your help and your time.