0

I'm trying to load some JSON data using json.load() but I keep receiving an error message and I have no idea how to fix it.

Here is a sample of part of the json file, there are deleted tweets (that start with {"delete":{) and created ones (that start with {"created_at":):

{"delete":{"status":{"id":509743302972043264,"id_str":"509743302972043264","user_id":1366812392,"user_id_str":"1366812392"},"timestamp_ms":"1410368494532"}}
{"delete":{"status":{"id":64472572007428096,"id_str":"64472572007428096","user_id":31473446,"user_id_str":"31473446"},"timestamp_ms":"1410368494565"}}
{"created_at":"Wed Sep 10 17:01:34 +0000 2014","id":509748529070616576,"id_str":"509748529070616576","text":"Metin \u015eent\u00fcrk 
Twitterda @metinsenturk MUHTE\u015eEM \u00dc\u00c7L\u00dc; SEN, BEN, M\u00dcZ\u0130K","source":"\u003ca href=\"http:\/\/www.twitter.com\" 
rel=\"nofollow\"\u003eTwitter for Windows\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":
null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2748960160,"id_str":"2748960160","name":"Enise Erkuzu\n",
"screen_name":"eniseerkuzu38","location":"Denizli\n","url":null,"description":"Tipe bakarak a\u015f\u0131k olanlar , am\u0131n\u0131za koyay\u0131m.",
"protected":false,"verified":false,"followers_count":36,"friends_count":32,"listed_count":0,"favourites_count":75,"statuses_count":595,"created_at":
"Thu Aug 21 10:17:18 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,
"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":
"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED",
"profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":
"http:\/\/pbs.twimg.com\/profile_images\/502399080686190592\/tRqoEQyM_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/502399080686190592\/tRqoEQyM_normal.jpeg",
"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,
"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[{"screen_name":"metinsenturk","name":"Metin \u015eent\u00fcrk","id":523497734,"id_str":"523497734","indices":[24,37]}],
"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"tr","timestamp_ms":"1410368494662"}

My ultimate goal is to withdraw the text of the tweet from this file but for that I need to load it as a json file on python, so this is what I've tried so far:

with open('tweets.json', 'r') as f:
    data = json.load(f) 

And this is the error message I get:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-42-212742fc8eeb> in <module>
      1 with open('tweets.json', 'r') as f:
----> 2     data = json.load(f)

/opt/anaconda3/lib/python3.7/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
--> 296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
    297 
    298 

/opt/anaconda3/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

/opt/anaconda3/lib/python3.7/json/decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 153)

There seems to be extra data but I'm not really familiar with processing json files and I have no idea what's causing the error exactly and how to fix it, could you point me in in the right direction? What's exactly causing the error? How can I fix it?

JMarcos87
  • 183
  • 3
  • 11

2 Answers2

1

Try validating the JSON with something like https://jsonlint.com/. You will notice the text isn't valid JSON,after the first delete element, after that you have multiple issues with the JSON too.

r3verse
  • 1,000
  • 8
  • 19
1

In the provided json there is more than one json object so

{"delete":{"status":{"id":509743302972043264,"id_str":"509743302972043264","user_id":1366812392,"user_id_str":"1366812392"},"timestamp_ms":"1410368494532"}}

{"delete":{"status":{"id":64472572007428096,"id_str":"64472572007428096","user_id":31473446,"user_id_str":"31473446"},"timestamp_ms":"1410368494565"}}

{"created_at":"Wed Sep 10 17:01:34 +0000 2014","id":509748529070616576,"id_str":"509748529070616576","text":"Metin \u015eent\u00fcrk 
Twitterda @metinsenturk MUHTE\u015eEM \u00dc\u00c7L\u00dc; SEN, BEN, M\u00dcZ\u0130K","source":"\u003ca href=\"http:\/\/www.twitter.com\" 
rel=\"nofollow\"\u003eTwitter for Windows\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":
null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2748960160,"id_str":"2748960160","name":"Enise Erkuzu\n",
"screen_name":"eniseerkuzu38","location":"Denizli\n","url":null,"description":"Tipe bakarak a\u015f\u0131k olanlar , am\u0131n\u0131za koyay\u0131m.",
"protected":false,"verified":false,"followers_count":36,"friends_count":32,"listed_count":0,"favourites_count":75,"statuses_count":595,"created_at":
"Thu Aug 21 10:17:18 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,
"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":
"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED",
"profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":
"http:\/\/pbs.twimg.com\/profile_images\/502399080686190592\/tRqoEQyM_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/502399080686190592\/tRqoEQyM_normal.jpeg",
"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,
"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[{"screen_name":"metinsenturk","name":"Metin \u015eent\u00fcrk","id":523497734,"id_str":"523497734","indices":[24,37]}],
"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"tr","timestamp_ms":"1410368494662"}

so you will have to split all of them in seperate files or make the file like this

{
"all": [
{"delete":{"status":{"id":509743302972043264,"id_str":"509743302972043264","user_id":1366812392,"user_id_str":"1366812392"},"timestamp_ms":"1410368494532"}}

{"delete":{"status":{"id":64472572007428096,"id_str":"64472572007428096","user_id":31473446,"user_id_str":"31473446"},"timestamp_ms":"1410368494565"}}

{"created_at":"Wed Sep 10 17:01:34 +0000 2014","id":509748529070616576,"id_str":"509748529070616576","text":"Metin \u015eent\u00fcrk 
Twitterda @metinsenturk MUHTE\u015eEM \u00dc\u00c7L\u00dc; SEN, BEN, M\u00dcZ\u0130K","source":"\u003ca href=\"http:\/\/www.twitter.com\" 
rel=\"nofollow\"\u003eTwitter for Windows\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":
null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2748960160,"id_str":"2748960160","name":"Enise Erkuzu\n",
"screen_name":"eniseerkuzu38","location":"Denizli\n","url":null,"description":"Tipe bakarak a\u015f\u0131k olanlar , am\u0131n\u0131za koyay\u0131m.",
"protected":false,"verified":false,"followers_count":36,"friends_count":32,"listed_count":0,"favourites_count":75,"statuses_count":595,"created_at":
"Thu Aug 21 10:17:18 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,
"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":
"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED",
"profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":
"http:\/\/pbs.twimg.com\/profile_images\/502399080686190592\/tRqoEQyM_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/502399080686190592\/tRqoEQyM_normal.jpeg",
"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,
"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[{"screen_name":"metinsenturk","name":"Metin \u015eent\u00fcrk","id":523497734,"id_str":"523497734","indices":[24,37]}],
"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"tr","timestamp_ms":"1410368494662"}
]
}

and load the above data like:

with open("tweets.json", "r") as file:
    data = json.load(file)["all"]

now in the above situation you will get a list of all the things in your json file stored in variable data.

AmaanK
  • 1,032
  • 5
  • 25
  • I tried it and I now get a new error message saying 'JSONDecodeError: Expecting ',' delimiter: line 4 column 1 (char 164)' – JMarcos87 Dec 27 '20 at 13:54