2

I am trying to access data from a JSON file into a pandas dataframe and seem to be stuck on how to retrieve a data within a map of the JSON.

I want to retrieve the followers_count entity within the user object of this json into a dataframe.

JSON File (sample record) below:

{"created_at": "Tue Aug 01 16:23:56 +0000 2017", "id": 892420643555336193, "retweet_count": 12345, "favorite_count": 23456, "user": {"id": 4196983835, "followers_count": 3200889, "friends_count": 104}}

here is what I have in terms of code (doesnt work as I dont know how to fetch the followers_count within the user object :

        tweet_data_df = pd.read_json('tweet-json.txt', lines=True)
        #Doesnt work
        #tweet_data_df = tweet_data_df[['id', 'favorite_count', 'retweet_count', 'created_at', 'user''followers_count']]
        #works but not enough for me
        tweet_data_df = tweet_data_df[['id', 'favorite_count', 'retweet_count', 'created_at']]
        tweet_data_df.head(5)

Appreciate your help !

Prashanth
  • 1,388
  • 2
  • 11
  • 26
  • Try [json normalize](https://www.google.com/url?sa=t&source=web&rct=j&url=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html&ved=2ahUKEwia7fe9l6XqAhUY4XMBHcMeCroQFjAAegQIAhAC&usg=AOvVaw2c_NvU8MKZUEGtAjlJ_VAX) – sushanth Jun 28 '20 at 18:54
  • If json dictionary has a depth = 2 you can use`pd.DataFrame(json_dict).apply(pd.Series)` ? – 4.Pi.n Jun 28 '20 at 18:56

1 Answers1

0

If json object (dictionary) has a depth = 2, (i.e. just 2 nested dictionaries) you can use .apply(pd.Series).

{"key": {"key2":{val1, val2}} # depth = 2
{"key": {"key2":{val1, "key3":{val2}} # depth > 2, depth = 3

pd.DataFrame(dic).apply(pd.Series).reset_index(drop = True)

Otherwise depth > 2 you can iterate through, its keys recursively

def shrink_depth(dic, output_dict: dict, pkey= None):
    if isinstance(dic, dict):
        for key in dic:
            if key not in output_dict.keys():
                output_dict[key] = []
            
            shrink_depth(dic[key], output_dict, key) # call
    
    elif isinstance(dic, (list, set)):
        for val in dic:
            output_dict[pkey].append(val)
    else:
        output_dict[pkey].append(dic)

# update: Add nested dictionaries to the (id) key
dic = {"created_at": "Tue Aug 01 16:23:56 +0000 2017", "id": 892420643555336193, "retweet_count": 12345, "favorite_count": 23456, 
               "user": {"id": {4196983835: 43424}, "followers_count": 3200889, "friends_count": 104}}

output = {}

shrink_depth(dic, output)

output

{'created_at': ['Tue Aug 01 16:23:56 +0000 2017'],
 'id': [892420643555336193],
 'retweet_count': [12345],
 'favorite_count': [23456],
 'user': [],
 4196983835: [43424],
 'followers_count': [3200889],
 'friends_count': [104]}
4.Pi.n
  • 1,151
  • 6
  • 15
  • Just to understand - are you suggesting that the JSON object be modified to accommodate this ? – Prashanth Jun 28 '20 at 19:47
  • Yeah convert, json to a dictionary, then shrink its depth, but in your case, you can directly use the first one – 4.Pi.n Jun 28 '20 at 19:51
  • sorry, I am stuck converting a txt file with JSON data into a dictionary. Tried some code but that throws an error. – Prashanth Jun 28 '20 at 20:11
  • tried this: with open('tweet-json.txt', 'r') as json_file: json_dict = json.load(json_file) – Prashanth Jun 28 '20 at 20:11
  • JSONDecodeError: Extra data: line 2 column 1 (char 3974) - it is a valid json file from a third party site though – Prashanth Jun 28 '20 at 20:19
  • Check this solution https://stackoverflow.com/questions/8381193/handle-json-decode-error-when-nothing-returned – 4.Pi.n Jun 28 '20 at 20:22
  • You may try to modify, your json file content, manually, check if it's valid json format – 4.Pi.n Jun 28 '20 at 20:30
  • it is too big for me to check it manually....tried online formatters with no luck... – Prashanth Jun 28 '20 at 20:31
  • I have no, idea how your file looks like, check this one, https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data, if doesn't work for you, you may post another question. – 4.Pi.n Jun 28 '20 at 20:38
  • is there anyway to do this without using a dict and flattening the structure ? This doesnt seem to be working for me. – Prashanth Jun 28 '20 at 20:49
  • poor approach, parse data as string, and use regex to exclude special symbols, then split by `:` and `,` with some logic – 4.Pi.n Jun 28 '20 at 21:13
  • oh, isnt there anything where I could access the elements within the user object using a '.' notation or like ['user']['followers_count'] ? – Prashanth Jun 28 '20 at 21:15
  • `from types import SimpleNamespace` or define a class with some attributes `self.user = ['user']`, https://stackoverflow.com/questions/16279212/how-to-use-dot-notation-for-dict-in-python – 4.Pi.n Jun 28 '20 at 21:20