-1

I have a text file that contains a series of data in the form of dictionary. I would like to read and store as a data frame in pandas. How would I read. I read pd.csv yet it does not give me the dataframe. Can anyone help me with that? You can download the text file Here

Thanks,

Zep,

Zephyr
  • 1,332
  • 2
  • 13
  • 31
  • 1
    Have you tried anything yet? Looking at the file, this seems like a standard json file. However all the data inside is nested which won't work with a pandas dataframe so you'll need to change the shape of the de-serialized json. – Alexander Ejbekov Oct 12 '18 at 15:04

2 Answers2

1

You can convert you data to json after reading it as string, then use pandas.read_json() to convert your json to a dataframe.

Example:

import json
from pandas.io.json import json_normalize

f = open("file.txt", "w+")
contents = f.read()
contents = contents.replace("\n", "")
json_data = json.loads(contents)
df = json_normalize(json.loads(data))

You should have your data as a dataframe after that.

Hope this helps!

devdob
  • 1,404
  • 12
  • 13
  • Hi I just tried out and it showed the following error: JSONDecodeError: Expecting value: line 1 column 1 (char 0) – Zephyr Oct 13 '18 at 08:50
  • Thats because of your new line character. you need to strip that out, check the updated answer above. – devdob Oct 13 '18 at 10:00
  • Your data also needs to be normalized. This will require you to use normalize_json – devdob Oct 13 '18 at 10:11
  • Thanks Devdob. Appreciate your advice. – Zephyr Oct 13 '18 at 14:38
  • 1
    Glad I could help! – devdob Oct 13 '18 at 14:43
  • I have other json file that contain nested elements. How can I flatten the nested json data? Can you briefly explain. I used the following code: Position_data = json_normalize(data =jsonstr['events'], record_path='positions', meta = ['x','y','x','y'] ) positions has nested data – Zephyr Oct 13 '18 at 17:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181815/discussion-between-zephyr-and-devdob). – Zephyr Oct 13 '18 at 17:56
1

The problem is you have a nested json. Try using json_normalize instead:

import requests     #<-- requests library helps us handle http-requests
import pandas as pd

id_ = '1DbfQxBJKHvWO2YlKZCmeIN4al3xG8Wq5'
url = 'https://drive.google.com/uc?authuser=0&id={}&export=download'.format(id_)
r = requests.get(url)

df = pd.io.json.json_normalize(r.json())
print(df.columns)

or from hard drive, and json_normalize as wants to read a dictionary object and not a path:

import pandas as pd
import json

with open('myfile.json') as f:
    jsonstr = json.load(f)

df = pd.io.json.json_normalize(jsonstr)

Returns:

Index(['average.accelerations', 'average.aerialDuels', 'average.assists',
       'average.attackingActions', 'average.backPasses', 'average.ballLosses',
       'average.ballRecoveries', 'average.corners', 'average.crosses',
       'average.dangerousOpponentHalfRecoveries',
       ...
       'total.successfulLongPasses', 'total.successfulPasses',
       'total.successfulPassesToFinalThird', 'total.successfulPenalties',
       'total.successfulSmartPasses', 'total.successfulThroughPasses',
       'total.successfulVerticalPasses', 'total.throughPasses',
       'total.verticalPasses', 'total.yellowCards'],
      dtype='object', length=171)

Another idea would be to store the nested objects in a Series (and you can let a dictionary hold that those series).

dfs = {k: pd.Series(v) for k,v in r.json().items()}    
print(dfs.keys()) 
# ['average', 'seasonId', 'competitionId', 'positions', 'total', 'playerId', 'percent'])
print(dfs['percent'])

Returns:

aerialDuelsWon                   23.080
defensiveDuelsWon                18.420
directFreeKicksOnTarget           0.000
duelsWon                         33.470
fieldAerialDuelsWon              23.080
goalConversion                   22.581
headShotsOnTarget                 0.000
offensiveDuelsWon                37.250
penaltiesConversion               0.000
shotsOnTarget                    41.940
...
yellowCardsPerFoul               12.500
dtype: float64

The data only has one entry though.

Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • Dear Anton, Thanks for the multiple solutions. Related to your first solution, If I want to read from the local drive, I used pd.read_csv and destination of the file. It did not work. Can you advice how used your first solution using the file in local drive. Thanks – Zephyr Oct 13 '18 at 11:13
  • 1
    @Zephyr sorry went for lunch. updated. – Anton vBR Oct 13 '18 at 13:57
  • Thanks Anton, I appreciate much. Yes, the data file only contains 1 row. However, I am json file that contains nested dictionaries. I tried the same method, But dataframe showed differently. If you dont mind, May I discuss with you on that too? – Zephyr Oct 13 '18 at 14:35
  • @Zephyr Feel free to contact me. – Anton vBR Oct 13 '18 at 17:44
  • Can you briefly advise me how to flatten nested json. – Zephyr Oct 13 '18 at 17:46
  • Unfortunately, Pc is ongoing window update. – Zephyr Oct 13 '18 at 17:50
  • @Zephyr wait now. The whole flattening is handled by json_normalize here. What is your real question? – Anton vBR Oct 13 '18 at 18:07
  • Thanks for the prompt reply. I posted a new question as follow in the link https://stackoverflow.com/q/52795561/9561497 – Zephyr Oct 13 '18 at 18:11