1

I've been collecting some tweets into a JSON file, with which I need to do some statistics with certain data in the JSON. After Googling several options of how to do this, none could give me the correct solution.

The JSON looks like this:

{"contributors": null, "truncated": false, "text": .... }

And applied this code to try and load it:

 import json
 f = open("user_timeline_Audi.jsonl",'r')
 data = f.read()
 print(data)
 bla = json.loads(data)

Basically the json.loads() gives me the next error:

json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 2698)

The end goals is that I need to get the followers_count and likes from several JSON files. Hope that someone can help!

EDIT:

Based on the answer from Alex Hall, my code now is:

import json

with open("user_timeline_BMW.jsonl",'r') as f:
    for line in f:
    obj = json.loads(line)
    bla = ["followers_count"]
    print(bla)

This just outputs a list, instead of the values behind it:

....
['followers_count']
['followers_count']
....

Hope someone has a suggestion for this step!

JeBo
  • 187
  • 1
  • 3
  • 12
  • it's difficult to say without seeing the json file, but it looks like you are trying to load a file with multiple dictionaries? take a look at [the answer of this threat](https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data); it might be what you are looking for. – jaumebonet Mar 11 '18 at 10:27
  • Thanks for your reply, but I couldn't find the solution there. – JeBo Mar 11 '18 at 10:43
  • check out @alex reply. might that be your issue? – jaumebonet Mar 11 '18 at 11:03
  • It was for the json.loads() error! Now I need to figure out how to get the value from the lines. – JeBo Mar 11 '18 at 11:09

2 Answers2

4

You are dealing with JSON lines, where each line contains one JSON object. You should do:

for line in f:
    obj = json.loads(line)

and then do what you want with each object.

Alex Hall
  • 34,833
  • 5
  • 57
  • 89
  • Thanks for your reply, it helped me a bit further, now stuck on how to get the values from the lines. – JeBo Mar 11 '18 at 11:07
1

I think it is supposed to be bla = obj["followers_count"]

postmalloc
  • 90
  • 1
  • 8
  • That gives me the error: _KeyError: 'followers_count'_ – JeBo Mar 11 '18 at 11:16
  • 1
    Can you print out all the keys in the dict by doing `print(obj.keys())` and make sure the key you need is indeed present? – postmalloc Mar 11 '18 at 11:21
  • When i did print(obj) , the _followers_count_ was present. But with your mentioned print(obj.keys()), it is NOT present – JeBo Mar 11 '18 at 11:27
  • seems that `followers_count` might be a second level key? inside another one? It would be much more easier to give a proper answer if you posted the output of `print(obj.keys())`, otherwise is just guesswork. – jaumebonet Mar 11 '18 at 11:34
  • Sorry, this is the output of 1 of the lines: dict_keys(['contributors', 'truncated', 'text', 'is_quote_status', 'in_reply_to_status_id', 'id', 'favorite_count', 'source', 'retweeted', 'coordinates', 'entities', 'in_reply_to_screen_name', 'in_reply_to_user_id', 'retweet_count', 'id_str', 'favorited', 'retweeted_status', 'user', 'geo', 'in_reply_to_user_id_str', 'possibly_sensitive', 'lang', 'created_at', 'in_reply_to_status_id_str', 'place', 'extended_entities']) – JeBo Mar 11 '18 at 11:35
  • ok.. `followers_count` is not a key there.... you should try `print [k, obj[k] for k in obj.keys()]` or check in the json file which are the parent keys to `followers_count`. That should give you the proper set of keys to call; at the end you want something like `obj[parent_key1][parent_key2]...[followers_count]` depending on the amount fo parent keys – jaumebonet Mar 11 '18 at 11:45
  • From what it looks like, you are mixing up `Tweet` and `User` objects returned by the Twitter API. `followers_count` belongs to the `User` object. – postmalloc Mar 11 '18 at 11:47
  • I would recommend you to edit @segfaux solution with the correct call and accept it as a solution, so others might find the actual reply to your query – jaumebonet Mar 11 '18 at 12:21