Reading dictionary stored on text file and convert to pandas dataframe

Question

I have a text file that contains a series of data in the form of dictionary. I would like to read and store as a data frame in pandas. How would I read. I read pd.csv yet it does not give me the dataframe. Can anyone help me with that? You can download the text file Here

Thanks,

Zep,

Have you tried anything yet? Looking at the file, this seems like a standard json file. However all the data inside is nested which won't work with a pandas dataframe so you'll need to change the shape of the de-serialized json. — Alexander Ejbekov, Oct 12 '18 at 15:04

devdob · Answer 1 · 2018-10-13T10:12:41.687

1

You can convert you data to json after reading it as string, then use pandas.read_json() to convert your json to a dataframe.

Example:

import json
from pandas.io.json import json_normalize

f = open("file.txt", "w+")
contents = f.read()
contents = contents.replace("\n", "")
json_data = json.loads(contents)
df = json_normalize(json.loads(data))

You should have your data as a dataframe after that.

Hope this helps!

edited Oct 13 '18 at 10:12

answered Oct 12 '18 at 15:04

devdob

1,404
12
13

Hi I just tried out and it showed the following error: JSONDecodeError: Expecting value: line 1 column 1 (char 0) – Zephyr Oct 13 '18 at 08:50
Thats because of your new line character. you need to strip that out, check the updated answer above. – devdob Oct 13 '18 at 10:00
Your data also needs to be normalized. This will require you to use normalize_json – devdob Oct 13 '18 at 10:11
Thanks Devdob. Appreciate your advice. – Zephyr Oct 13 '18 at 14:38
1

Glad I could help! – devdob Oct 13 '18 at 14:43
I have other json file that contain nested elements. How can I flatten the nested json data? Can you briefly explain. I used the following code: Position_data = json_normalize(data =jsonstr['events'], record_path='positions', meta = ['x','y','x','y'] ) positions has nested data – Zephyr Oct 13 '18 at 17:38
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181815/discussion-between-zephyr-and-devdob). – Zephyr Oct 13 '18 at 17:56

Anton vBR · Accepted Answer · 2018-10-13T13:57:23.740

The problem is you have a nested json. Try using json_normalize instead:

import requests     #<-- requests library helps us handle http-requests
import pandas as pd

id_ = '1DbfQxBJKHvWO2YlKZCmeIN4al3xG8Wq5'
url = 'https://drive.google.com/uc?authuser=0&id={}&export=download'.format(id_)
r = requests.get(url)

df = pd.io.json.json_normalize(r.json())
print(df.columns)

or from hard drive, and json_normalize as wants to read a dictionary object and not a path:

import pandas as pd
import json

with open('myfile.json') as f:
    jsonstr = json.load(f)

df = pd.io.json.json_normalize(jsonstr)

Returns:

Index(['average.accelerations', 'average.aerialDuels', 'average.assists',
       'average.attackingActions', 'average.backPasses', 'average.ballLosses',
       'average.ballRecoveries', 'average.corners', 'average.crosses',
       'average.dangerousOpponentHalfRecoveries',
       ...
       'total.successfulLongPasses', 'total.successfulPasses',
       'total.successfulPassesToFinalThird', 'total.successfulPenalties',
       'total.successfulSmartPasses', 'total.successfulThroughPasses',
       'total.successfulVerticalPasses', 'total.throughPasses',
       'total.verticalPasses', 'total.yellowCards'],
      dtype='object', length=171)

Another idea would be to store the nested objects in a Series (and you can let a dictionary hold that those series).

dfs = {k: pd.Series(v) for k,v in r.json().items()}    
print(dfs.keys()) 
# ['average', 'seasonId', 'competitionId', 'positions', 'total', 'playerId', 'percent'])
print(dfs['percent'])

Returns:

aerialDuelsWon                   23.080
defensiveDuelsWon                18.420
directFreeKicksOnTarget           0.000
duelsWon                         33.470
fieldAerialDuelsWon              23.080
goalConversion                   22.581
headShotsOnTarget                 0.000
offensiveDuelsWon                37.250
penaltiesConversion               0.000
shotsOnTarget                    41.940
...
yellowCardsPerFoul               12.500
dtype: float64

The data only has one entry though.

Dear Anton, Thanks for the multiple solutions. Related to your first solution, If I want to read from the local drive, I used pd.read_csv and destination of the file. It did not work. Can you advice how used your first solution using the file in local drive. Thanks — Zephyr, Oct 13 '18 at 11:13
Thanks Anton, I appreciate much. Yes, the data file only contains 1 row. However, I am json file that contains nested dictionaries. I tried the same method, But dataframe showed differently. If you dont mind, May I discuss with you on that too? — Zephyr, Oct 13 '18 at 14:35
@Zephyr wait now. The whole flattening is handled by json_normalize here. What is your real question? — Anton vBR, Oct 13 '18 at 18:07
Thanks for the prompt reply. I posted a new question as follow in the link https://stackoverflow.com/q/52795561/9561497 — Zephyr, Oct 13 '18 at 18:11

Reading dictionary stored on text file and convert to pandas dataframe

2 Answers2