0

I would like to read JSON formatter file into pandas DataFrame organizing datasets into row and columns. I have attached the picture of how data in my JSON file looks like. The raw data has 30 days of daily temperature data from 2010/01/01 to 2010/01/30.

Raw JSON data in the input file '09386950_NM_USGS_TOBS.json':

[{
    'beginDate': '2010-01-01 00:00:00',
    'collectionDates': [],
    'duration': 'DAILY',
    'endDate': '2010-01-30 00:00:00',
    'flags': [],
    'stationTriplet': '302:OR:SNTL',
    'values': [
        Decimal('31.820'),
        Decimal('27.140'),
        Decimal('14.900'),
        Decimal('35.600'),
        Decimal('34.340'),
        Decimal('31.100'),
        Decimal('9.140'),
        Decimal('21.380'),
        Decimal('36.140'),
        Decimal('28.040'),
        Decimal('35.960'),
        Decimal('34.700'),
        Decimal('28.040'),
        Decimal('25.160'),
        Decimal('33.980'),
        Decimal('31.640'),
        Decimal('31.640'),
        Decimal('29.300'),
        Decimal('29.120'),
        Decimal('25.880'),
        Decimal('24.980'),
        Decimal('21.740'),
        Decimal('18.320'),
        Decimal('20.480'),
        Decimal('26.780'),
        Decimal('28.940'),
        Decimal('24.440'),
        Decimal('15.440'),
        Decimal('21.020'),
        Decimal('31.820')
    ]
}]

The code that I tried is as follows:

import pandas as pd
import json
with open(r'C:\Users\Anurag.Srivastava\Downloads\09386950_NM_USGS_TOBS.json', 'r') as f:
    data = json.load(f)
    df = pd.DataFrame(data)

The error message that I am getting is:

Traceback (most recent call last):

  File "<ipython-input-54-2d2966de6f43>", line 1, in <module>
    runfile('C:/Users/Anurag.Srivastava/Downloads/Learing_pandas.py', wdir='C:/Users/Anurag.Srivastava/Downloads')

  File "C:\Users\Anurag.Srivastava\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\Anurag.Srivastava\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Anurag.Srivastava/Downloads/Learing_pandas.py", line 63, in <module>
    data = json.load(f)

  File "C:\Users\Anurag.Srivastava\anaconda3\lib\json\__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

  File "C:\Users\Anurag.Srivastava\anaconda3\lib\json\__init__.py", line 348, in loads
    return _default_decoder.decode(s)

  File "C:\Users\Anurag.Srivastava\anaconda3\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())

  File "C:\Users\Anurag.Srivastava\anaconda3\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)

JSONDecodeError: Expecting property name enclosed in double quotes
ldz
  • 2,217
  • 16
  • 21
azs 0004
  • 3
  • 3
  • [JSON uses double quotes, not single quotes](https://stackoverflow.com/questions/14355655/jquery-parsejson-single-quote-vs-double-quote). this isn't properly formatted JSON. To make it properly formatted, you could try a mass replacement of single to double quotes, and that might work, depending on your data. If it's manually written data, write it again with double quotes. If not, you'll have to do the replacement transformation. – David Culbreth Dec 12 '19 at 20:18
  • Thank you for your comments. I am new to learning processing JSON file, so did not have a clue about quotes. – azs 0004 Dec 12 '19 at 21:30

2 Answers2

0

The file is not a valid JSON data hence the pasrser is not able to detect objects properly. Use any online JSON validator like this: https://jsonlint.com/ Once you fix the JSON. I would also advise you to use

pd.read_json(<Path>)

This approach will create dataframes in right way.

c0der512
  • 546
  • 5
  • 18
0

In order to enable Python to treat the file as json, the single quotes need to be replaced with double quotes and the Decimal() notation needs to be cleaned up. You could do something like follows:

import pandas as pd
import json
with open(file_name) as f:
    s = f.read()
    s = s.replace("'", '"')
    s = s.replace('Decimal("', '')
    s = s.replace('")', '')
    data = json.loads(s)
    for item in data:
        df = pd.DataFrame(item)

Then an index might be created and the corresponding values could be attached like:

idx = pd.date_range(item['beginDate'], item['endDate'])
df = pd.DataFrame(idx)
df['values'] = item['values']
ldz
  • 2,217
  • 16
  • 21
  • This works great. The only issue is the last line " df['values'] = data['values'] ", where I am getting an error "list indices must be integers or slices, not str". Probably, it has something to do with accessing lists within the dictionary. – azs 0004 Dec 12 '19 at 21:32
  • As `data` is a list of objects due to the json being a list, it needs to be iterated accordingly. I adapted the answer. – ldz Dec 12 '19 at 21:38
  • I found this modification to the code working fine: beginDate='2010-01-01' endDate='2010-01-30' with open('09386950_NM_USGS_TOBS.json', 'r') as f: s = f.read() s = s.replace("'", '"') s = s.replace('Decimal("', '') s = s.replace('")', '') data = json.loads(s) f.close() idx = pd.date_range(start=beginDate, end=endDate) df = pd.DataFrame(idx) df['values'] = data[0]['values'] – azs 0004 Dec 12 '19 at 23:38