0

I am new to JSON. I am doing a project for Vehicle Number Plate Detection. I have a dataset of the form:

{"content": "http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb0646e9cf9016473f1a561002a/77d1f81a-bee6-487c-aff2-0efa31a9925c____bd7f7862-d727-11e7-ad30-e18a56154311.jpg.jpeg","annotation":[{"label":["number_plate"],"notes":"","points":[{"x":0.7220843672456576,"y":0.5879828326180258},{"x":0.8684863523573201,"y":0.6888412017167382}],"imageWidth":806,"imageHeight":466}],"extras":null},
{"content": "http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb0646e9cf9016473f1a561002a/4eb236a3-6547-4103-b46f-3756d21128a9___06-Sanjay-Dutt.jpg.jpeg","annotation":[{"label":["number_plate"],"notes":"","points":[{"x":0.16194331983805668,"y":0.8507795100222717},{"x":0.582995951417004,"y":1}],"imageWidth":494,"imageHeight":449}],"extras":null},

There are in total 240 blocks of data. I want to do two things with the above dataset. Firstly,I need to download all the images from each block and secondly,need to get the values of "points" column to a text file.

I am getting problem while getting the values for the columns.

import json
jsonFile = open('Indian_Number_plates.json', 'r')
x = json.load(jsonFile)
for criteria in x['annotation']:
    for key, value in criteria.iteritems():
        print(key, 'is:', value)
    print('')

I have written the above code to get all the values under the "annotation". But,getting the following error

Traceback (most recent call last):
  File "prac.py", line 13, in <module>
    x = json.load(jsonFile)
  File "C:\python364\Lib\json\__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "C:\python364\Lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:\python364\Lib\json\decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 394 (char 393)

Please help me for getting the values for "points" column and also for downloading the images from the link in the "content" section.

Chris Doyle
  • 10,703
  • 2
  • 23
  • 42
  • Dataturks has provided the instructions to convert dataturks annotations to pascal VOC format. It will download respective images and also store annotations in xml file formats that can be used to train with tensorflow objection models. https://dataturks.com/help/ibbx_dataturks_to_pascal_voc_format.php – sthphoenix Feb 03 '20 at 07:50

2 Answers2

0

the error comes because your file contains two records or more :

{"content": "http://com.dataturks.a96- } ..... {"content": .....

to solve this you should reformat your json so that all the records are contained in an array :

{ "data" :  [ {"content": "http://com.dataturks.a96- .... },{"content":... }]}

to download the images, extract the image names and urls and use requests :

import requests

with open(image_name, 'wb') as handle:
        response = requests.get(pic_url, stream=True)

        if not response.ok:
            print response

        for block in response.iter_content(1024):
            if not block:
                break

            handle.write(block)
nassim
  • 1,547
  • 1
  • 14
  • 26
0

i found this answer while searching. Essentially, you can read an object, catch the exception when JSON sees an unexpected object, and then seek/reparse and build a list of objects.

in Java, i'd just tell you to use Jackson and their SAX style streaming interface, as i've done that to read a list of objects formatted like this - if JSON in python has a streaming api, i'd use that instead of the exception handler workaround

Mr. Cat
  • 55
  • 3