2

The following code

import pandas as pd

dic = {'_id': '5436e3abbae478396759f0cf', 'meta': {'clinical': {'benign_malignant': 'benign', 'age_approx': 55, 'sex': 'female', 'diagnosis': 'nevus', 'diagnosis_confirm_type': None, 'anatom_site_general': 'anterior torso', 'melanocytic': True}, 'acquisition': {'image_type': 'dermoscopic', 'pixelsX': 1022, 'pixelsY': 767}}, 'name': 'ISIC_0000000'}

frame = pd.io.json.json_normalize(dic)

Throws a

KeyError: 'diagnosis_confirm_type'

I'm using pandas version 0.23.0. The code works without error in version 0.22.0.

Update:

Apparently, there really was a bug in 0.23.0 causing this problem. See https://github.com/pandas-dev/pandas/pull/21164

Community
  • 1
  • 1
Oblomov
  • 8,953
  • 22
  • 60
  • 106
  • Sounds like re.sub(). Though using json library ( json.dump() / json.dumps() ) might help, too. – Mika72 Jun 07 '18 at 08:52
  • Ok, as you could load if from file, the file did contain a valid json string. The json module converted it successfully to a Python `dict` object where all keys are strings and value are strings, integers, booleans, None or other `dict`. If you want to print it back as a json string, first convert it to json: `print(json.dumps(jsonObject))` – Serge Ballesta Jun 07 '18 at 09:08
  • Even not if you take the printed object and input it into pd.io.json.json_normalize(object) ? – Oblomov Jun 07 '18 at 09:50
  • See updated question for the pandas version – Oblomov Jun 07 '18 at 09:56
  • 2
    Bingo, now I can reproduce it :). Something's changed between 0.22.0 and 0.23.0. And I must give you huge kudos for taking the time to turn this to a [mcve] and a clear problem statement. – Ilja Everilä Jun 07 '18 at 09:57
  • Thanks for all your help. I was already about to give up, somewhere along the way... – Oblomov Jun 07 '18 at 09:59
  • @IljaEverilä: Does that mean the previous version of pandas would be able to deal with the input correctly? Because that is all I need. – Oblomov Jun 07 '18 at 10:03
  • If you are driving at the format of the data with your last comment: That is out of my hands. It is part of the large, public ISIC dataset. If you are talking about something else, please clarify. – Oblomov Jun 07 '18 at 10:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/172671/discussion-between-user1934212-and-ilja-everila). – Oblomov Jun 07 '18 at 10:10

1 Answers1

0

If you're getting it originally as a string you don't even need regex:

validPJson = [line.replace('None', '"None"').replace('True', '"True"') for line in invalidJsonObjects]

See here for why it's better than regex: Use Python's string.replace vs re.sub

EDIT: From the comments I have understood that your problem is loading a file of that format without fixing it first, and that's why you're getting errors in loading (btw, those errors should really be in your question, otherwise you've just confused a whole lot of people trying to help).

My suggestion, fix the file first with a similar method:

with open(pathToFile, 'r') as fp:
    contents = fp.read()
with open(pathToFile, 'w') as fp:
    fp.write(contents.replace('None', '"None"').replace('True', '"True"'))

Only after that try to use json to read the file, see if that works

Ofer Sadan
  • 11,391
  • 5
  • 38
  • 62
  • @Sraw It's not an original json he's getting that from a text file basically, at least that's what I understand – Ofer Sadan Jun 07 '18 at 09:02
  • 1
    What he thinks is a invalid json string is actually a **python object** deserialized by `json.load`. If he really reads it as raw string, it should be `null` and `true` inside the string. See the sixth comment under the question. – Sraw Jun 07 '18 at 09:10
  • Perhaps i'm wrong, but this same user asked another question a few minutes ago about why he can't `json.loads` from a string that doesn't contain those quotes. I don't think he explained this well enough here – Ofer Sadan Jun 07 '18 at 09:14
  • That's because you're still trying to do `loads` before fixing the file contents, I suggest you fix the file first, save it as a proper json that can be read by loads (this answer should help you do that) – Ofer Sadan Jun 07 '18 at 09:17
  • @IljaEverilä you are correct, i've updated the answer to reflect that – Ofer Sadan Jun 07 '18 at 09:22