1

I'm trying to load a dataset from this link in python using pandas library's read_json() and python's inbuilt json library for the same dataset stored in my computer(same folder as my notebook). The following are my codes:

import json
import pandas as pd
path='https://s3.amazonaws.com/istarsshare/normround1.json'
data=pd.read_json(path,orient='columns')
data.head(10)

Other one is:

import json
with open("normround1.json", "r") as read_it: 
data = json.load(read_it) 

In the first case I got the following error:

ValueError                                Traceback (most recent call last) <ipython-input-12-a81b255c7afd> in <module>
      1 path='https://s3.amazonaws.com/istarsshare/normround1.json'
----> 2 data=pd.read_json(path,orient='columns')
      3 data.head(10)
      4 #with open("normround1.json", "r") as read_it:
      5      #data = json.load(read_it)
 
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    590         return json_reader
    591 
--> 592     result = json_reader.read()
    593     if should_close:
    594         try:
 
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in read(self)
    715             obj = self._get_object_parser(self._combine_lines(data.split("\n")))
    716         else:
--> 717             obj = self._get_object_parser(self.data)
    718         self.close()
    719         return obj
 
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    737         obj = None
    738         if typ == "frame":
--> 739             obj = FrameParser(json, **kwargs).parse()
    740 
    741         if typ == "series" or obj is None:
 
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in parse(self)
    847 
    848         else:
--> 849             self._parse_no_numpy()
    850 
    851         if self.obj is None:
 
~/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1091         if orient == "columns":
   1092             self.obj = DataFrame(
-> 1093                 loads(json, precise_float=self.precise_float), dtype=None
   1094             )
   1095         elif orient == "split":

ValueError: Trailing data

And in the second case I got:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last) <ipython-input-13-b4c6a39520ca> in <module>
      3 #data.head(10)
      4 with open("normround1.json", "r") as read_it:
----> 5      data = json.load(read_it)
 
~/anaconda3/lib/python3.7/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
--> 296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
    297 
    298 
 
~/anaconda3/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

~/anaconda3/lib/python3.7/json/decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 161)

I'm using jupyter notebook in Ubuntu 18.04

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
sow_gali
  • 89
  • 7
  • first check what you have in file. It seems it is incorrect json. – furas Dec 13 '19 at 05:52
  • 1
    I checked file - it is not JSON but `multi-JSON` - every line is `JSON`. But standard tools can't read `multi-JSON` and you have to get every line separately and use as `JSON` – furas Dec 13 '19 at 05:54

2 Answers2

2

It is not JSON file but rather multi-JSON file - every line is JSON data.

Standard tools can't read it and you have to get every line separately and use as JSON

import json

data = []

with open('normround1.json') as fh:
    for line in fh:
        data.append(json.loads(line))

#print(data)
print(data[0]['_id'])
print(data[0]['message'])
furas
  • 134,197
  • 12
  • 106
  • 148
0

I fixed the file Go into vi, do this ( it adds a comma at the end of each line):

:%s/$/,/g

Add then add a [ to the beginning of the file, and a ] to the end of the file. After that i was able to load it.

oppressionslayer
  • 6,942
  • 2
  • 7
  • 24