0

I am coding using Python, Flask, pandas. I am reading data from a REST API.

When I get the data from the REST API, the dispenserId used to be an Integer meaning that each value started with a number different from 0.

This weekend, I received dispenserIds starting with a 0 (zero) character, so calling json.load(path_to_filenamen) does not parse the JSON file anymore due to errors.

See the sample

{
    "result": {
        "dispensers": [
            {
                "dispenserId": 00000,
                "dispenserName": "1st Floor",
                "dispenserType": "H2",
                "status": "Green",
                "locationId": 12345
            },
            {
                "dispenserId": 98765,
                "dispenserName": "2nd Floor",
                "dispenserType": "S4",
                "status": "Green",
                "locationId": 23456
            },
            {
                "dispenserId": 00001,
                "dispenserName": "3rd Floor",
                "dispenserType": "H2",
                "status": "Green",
                "locationId": 34567
            }
           ]
       }
}

I receive Exception has occurred: TypeError string indices must be integers when I call data["result"]["dispensers"].

How can I indicate to the JSON parser that the dispenserId is a string instead of an Integer?

Abdelkrim
  • 2,048
  • 5
  • 30
  • 44
  • 1
    See e.g. https://stackoverflow.com/q/27361565/3001761 - a number with a leading zero isn't really valid JSON, you should try to get that fixed upstream. – jonrsharpe Dec 14 '20 at 09:08
  • @jorsharpe I do agree with you but this isn't like I do have access to the developers of the API platform. Your proposal was on my task list though. – Abdelkrim Dec 14 '20 at 13:10

2 Answers2

0

Few things :

1. your dispensers collection is not closed (no closing squarre braket, so it cannot work)

2. since you got integers, you should not get that much zeros. you should have :

"dispenserId": 0,

or

"dispenserId": 1,

Once you got this corrected, a["result"]["dispensers"] will work just fine, undepending of the values of "dispenserId".

OR :

The values should be given as strings :

"dispenserId": "00000",

and then convert them into integers :

int(a["result"]["dispensers"][0]["dispenserId"])

But anyway, your json file does not respect the json format.

Fred
  • 26
  • 2
  • thank you for your comment but I didn't copy/paste the entire json file. The oroginal file HAS a correct format. The issue doesn't come from the json format. – Abdelkrim Dec 14 '20 at 13:13
  • Well, if in your json you have keys / values such as : "dispenserId": 00001, then the format is not right. Hence, the json.load function cannot behave properly. You should either have : "dispenserId": "00001" or "dispenserId": 1 – Fred Dec 14 '20 at 13:51
  • What you could do if you cannot change the source, is a preprocessing of your file, removing the "0" padding before numbers using regular expression, and then load your file into the json format – Fred Dec 14 '20 at 13:53
0

This piece of code should "clean" your file, by deleting all non wanted "0" and convert into Json format :

import re
import codecs
import json

pattern = "(:\s*)0*(\d)"
with codecs.open(path_to_filenamen,"r","utf-8") as f:
    myJson = json.loads(re.sub(pattern,'\\1\\2', f.read()))

The myJson var is in Json format, you can therefore use myJson["result"]["dispensers"]

Fred
  • 26
  • 2
  • Parsing with regular expressions is fragile, it's entirely possible that pattern would appear inside real data. – jonrsharpe Dec 14 '20 at 21:07
  • I agree, but considering the data involved, it is most unlikely. And as said, the best would be to ask for a proper Json.... – Fred Dec 15 '20 at 08:46