0

I have a text file which has a JSON structure and I want to transform it to a data frame.

The JSON files includes several such JSON strings:

{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}

tweets_data_path = "data.txt"
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue

tweets_data

df = pd.DataFrame.from_dict(pd.json_normalize(tweets_data), orient='columns')
df

However, apparently there is something wrong with either the json.loads or the append command, because the tweets_data is empty when I call it.

Do you have an idea?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • Is every single line is a valid JSON object in your text file? You should print the exception in the `except` clause instead of a simple `continue` statement. It is possible that your code is throwing error when decoding JSON and you won't know since you are not keeping track of the exceptions. – Divyesh Peshavaria Aug 26 '21 at 09:13
  • `df = pd.read_json('data.txt')` should do it easily? – Bijay Regmi Aug 26 '21 at 10:01

2 Answers2

0

instead of loading JSON into a dictionary, then converting that dictionary into a pandas dataframe, simply use pandas built-in function to convert from JSON to pandas dataframe

df = pd.read_json(tweets_file)

alternatively, if you wish to load JSON into dictionary, then convert dictionary to dataframe:

tweets_data = json.loads(tweets_file.read())
df = pd.DataFrame.from_dict(tweets_data, orient='columns')
0

This is how your code should be to append data to tweets_data.

import json
tweets_data_path = "data.txt"
tweets_data = []

with open(tweets_data_path, 'r') as f:
    for line in f.readlines():
        try:
            tweet = json.loads(json.dumps(line))
            tweets_data.append(tweet)
        except:
            continue
            

print(tweets_data)
["{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}\n", "{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}"]
Ram
  • 4,724
  • 2
  • 14
  • 22
  • It works this way, but then I cannot make a dataframe out of it because it shows me **'str' object has no attribute 'values'** – Data_Science_110 Aug 26 '21 at 09:50
  • I am not well versed in pandas and that is why I haven't mentioned anything about dataframe. – Ram Aug 26 '21 at 09:59