0

I'm a complete beginner with json, and so i'm struggling with the simple task to extract twitter screen_names from a json file received via tweepy.

Trying to load the file with json.loads(file) returns the following error:

Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "\Python3.5\lib\json\__init__.py", line 268, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "\lib\json\__init__.py", line 319, in loads return _default_decoder.decode(s)  
File "D:\Programme\Python3.5\lib\json\decoder.py", line 342, in decode raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 7250)`

What exactly does the ExtraData error mean? Is it not in the correct format?

Snippet from the json file (json file consists of about 7000 lines):

{"in_reply_to_user_id_str": null, "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", "retweeted": false, "in_reply_to_screen_name": null, "geo": null, "contributors": null, "in_reply_to_status_id": null, "retweet_count": 0, "coordinates": null, "id": 802931232334114816, "lang": "de", "id_str": "802931232334114816", "possibly_sensitive": false, "text": "39'788 mal #Danke!\nDie Medienmitteilung zum Resultat der #Regierungsratswahlen finden Sie hier: #WahlAG16 #rrag16", "in_reply_to_status_id_str": null, "favorite_count": 2, "in_reply_to_user_id": null, "favorited": false, "entities": {"urls": [{"display_url": "goo.gl/CkKjHO", "expanded_url": "shortened", "url": "shortened", "indices": [96, 119]}], "hashtags": [{"indices": [11, 17], "text": "Danke"}, {"indices": [57, 78], "text": "Regierungsratswahlen"}, {"indices": [121, 130], "text": "WahlAG16"}, {"indices": [131, 138], "text": "rrag16"}], "symbols": [], "user_mentions": []}, "is_quote_status": false, "created_at": "Sun Nov 27 17:44:58 +0000 2016", "place": null, "truncated": false, "user": {"profile_banner_url": "pbs.twimg.com/profile_banners/85668573/1480374176", "listed_count": 38, "friends_count": 237, "geo_enabled": true, "profile_background_tile": true, "protected": false, "default_profile": false, "profile_link_color": "0084B4", "favourites_count": 359, "has_extended_profile": false, "screen_name": "aargauer_bdp", "translator_type": "none", "default_profile_image": false, "profile_image_url": "http://pbs.twimg.com/profile_images/493457946/bdp_normal.png", "description": "Es zwitschern f\u00fcr Sie @BernhardGuhl und @PhTschopp", "url": "shortened", "follow_request_sent": false, "profile_sidebar_border_color": "D9D9D9", "contributors_enabled": false, "id": 85668573, "lang": "de", "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/48362618/bgbdp.png", "id_str": "85668573", "profile_use_background_image": false, "profile_text_color": "333333", "time_zone": "Bern", "is_translation_enabled": false, "location": "Kanton Aargau", "name": "BDP Kanton Aargau", "profile_background_image_url_https": "pbs.twimg.com/profile_background_images/48362618/bgbdp.png", "utc_offset": 3600, "following": false, "verified": false, "profile_image_url_https": "pbs.twimg.com/profile_images/493457946/bdp_normal.png", "entities": {"description": {"urls": []}, "url": {"urls": [{"display_url": "aargauer-bdp.ch", "expanded_url": "http://www.aargauer-bdp.ch", "url": "shortened", "indices": [0, 22]}]}}, "statuses_count": 425, "followers_count": 368, "is_translator": false, "profile_sidebar_fill_color": "EBEBEB", "profile_background_color": "FFE640", "notifications": false, "created_at": "Tue Oct 27 21:35:17 +0000 2009"}}

Edit: My current idea is to use a command following this structure:

import json
with open("file.json", "r") as f:
    for line in f:
      json.loads(line)
      print MISSING CODE TO ACCESS USER ENTITY WITH NAME and ID

Does this look like the right idea to pursuit?

Gladan
  • 31
  • 4
  • 1
    Please share the json file, or at least a snippet of it so that we can tell which key needs to be used. – Meghdeep Ray Feb 14 '17 at 09:18
  • http://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data – Harsha Biyani Feb 14 '17 at 09:19
  • Added a exemplary line from the json, my best guess is that the user keys containing "screen_name" would need to be accessed – Gladan Feb 14 '17 at 09:30
  • @HarshaBiyani: I see, so i need to create multiple dictionaries - one for each line? My json consist of about 7000 tweets/lines – Gladan Feb 14 '17 at 09:34
  • Can you post your tweepy code? For each `status` returned you should be able to use `status._json` to access the named elements – asongtoruin Feb 14 '17 at 09:37
  • Is there a reason why you're saving the data to a file? Can you process it as it is received? – asongtoruin Feb 14 '17 at 09:58

0 Answers0