Restructure json data

Question

I have a JSON with following structure:

    {
    "id": 2,
    "image_id": 2,
    "segmentation": [
        [
            913.0,
            659.5,
            895.0,
        ],
        [   
            658.5,
            875.0,
            652.5,
            659.5
        ],
    ],
    "iscrowd": 0,
    "bbox": [
        4.5,
        406.5,
        1098.0,
        1096.0
    ],
    "area": 579348.0,
    "category_id": 0
},

Now I need to split each entry it into two separate entries, like these:

    {
    "id": 2,
    "image_id": 2,
    "segmentation": [
        [
            658.5,
            875.0,
            652.5,
            659.5
        ],
    ],
    "iscrowd": 0,
    "bbox": [
        4.5,
        406.5,
        1098.0,
        1096.0
    ],
    "area": 579348.0,
    "category_id": 0
    },
    {
    "id": 3,
    "image_id": 2,
    "segmentation": [
        [
            913.0,
            659.5,
            895.0,
        ],
    ],
    "iscrowd": 0,
    "bbox": [
        4.5,
        406.5,
        1098.0,
        1096.0
    ],
    "area": 579348.0,
    "category_id": 0
},

So that each new entry has the same image_id and iscrowd, bbox, area & category_id as the original entry, however gets new (incremental) id, and has only one segmentations:[] . So if the original entry had 15 segmentations, the code would split it into 15 entries with unique IDs.

Any tips how? I have found some posts on how to merge based on key value, but not how to split.

a) I'm confused about what has changed in the new structure. b) What have you tried already? c) Have you seen [how to ask](https://stackoverflow.com/help/how-to-ask)? — blueteeth, Sep 05 '22 at 06:54
@blueteeth a) In original file, there are multiple segmentations under one annotation ID (which refers to single image ID). I need to split those segmentations, so each has a unique annotation ID (one segmentation under each), all refering to the original image ID. b) I havent found anything that would help me yet c) yes — Deamoon, Sep 05 '22 at 07:36

iamtrappedman · Accepted Answer · 2022-09-05T14:30:40.040

0

import json

new_json = []
ids = 0


for i in original_json:
    segms = i["segmentation"]
    for j in segms:
        dummy = {}
        for k in i:
            dummy[k] = i[k]
        dummy["id"] = ids
        dummy["segmentation"] = j
        ids+=1
        new_json.append(dummy)

with open("new_json_file.json", 'w') as f:
    json.dump(new_json, f)

Hope this helps

edited Sep 05 '22 at 14:30

answered Sep 05 '22 at 11:17

iamtrappedman

176
1
7

Thanks. I have tried this, however I get "list indices must be integers or slices, not str" error on the segms = original_json["segmentation"] line – Deamoon Sep 05 '22 at 11:31
how are you trying reading the original json file ? try ```segms = original_json[0]["segmentation"]``` – iamtrappedman Sep 05 '22 at 11:33
using `segms = original_json[0]["segmentation"]` works for reading the original json, however the outputed json has the same structure, no segmentations have been splitted – Deamoon Sep 05 '22 at 11:48
So it seems that `segms = original_json[0]["segmentation"]` results in segms being only list of floats of the first segmentation ( would be `segms = [[658.5, 875.0, 652.5, 659.5]]`from my example) – Deamoon Sep 05 '22 at 12:09
can you share your new output ? I also haven't done anything to increment ID, you also need to implement that. – iamtrappedman Sep 05 '22 at 12:37
Added as new anwser so I can share the code better: – Deamoon Sep 05 '22 at 12:48

Deamoon · Answer 2 · 2022-09-05T13:28:33.043

So the code provided by @iamtrappedman sort of works:

test_loc = "/content/TEST.json"
with open(test_loc) as j_f:
  original_json = json.load(j_f)

  segms = original_json[0]["segmentation"]
  new_json = []

  for i in segms:
    original_json[0]["segmentation"] = i
    new_json.append(original_json)

  with open("new_json_file.json", "w") as f:
    json.dump(new_json, f,indent=4)

If I input following JSON:

[
{
    "id": 0,
    "image_id": 0,
    "segmentation": [
        [
            465.0,
            1198.5,
            432.0,
            1190.5
        ],
        [
            525.0,
            2424.5,
            1257.0,
            2578.5
        ]
    ],
    "iscrowd": 0,
    "bbox": [
        0.5,
        407.5,
        869.0,
        791.0
    ],
    "area": 425968.25,
    "category_id": 0
}
]

I get a JSON thats splitted, however both entries are identical:

[
[
    {
        "area": 425968.25,
        "bbox": [
            0.5,
            407.5,
            869.0,
            791.0
        ],
        "category_id": 0,
        "id": 0,
        "image_id": 0,
        "iscrowd": 0,
        "segmentation": [
            525.0,
            2424.5,
            1257.0,
            2578.5
        ]
    }
],
[
    {
        "area": 425968.25,
        "bbox": [
            0.5,
            407.5,
            869.0,
            791.0
        ],
        "category_id": 0,
        "id": 0,
        "image_id": 0,
        "iscrowd": 0,
        "segmentation": [
            525.0,
            2424.5,
            1257.0,
            2578.5
        ]
    }
]
]

EDIT Now for JSON with two annotations:

[
{
    "id": 0,
    "image_id": 0,
    "segmentation": [
        [
            465.0,
            1198.5,
            432.0,
            1190.5
        ],
        [
            525.0,
            2424.5,
            1257.0,
            2578.5
        ]
    ],
    "iscrowd": 0,
    "bbox": [
        0.5,
        407.5,
        869.0,
        791.0
    ],
    "area": 425968.25,
    "category_id": 0
},
{
    "id": 1,
    "image_id": 2,
    "segmentation": [
        [
            4241.0,
            14.5,
            141.0,
            7557.5
        ],
        [
            578.0,
            2424.5,
            141.0,
            965.5
        ]
    ],
    "iscrowd": 0,
    "bbox": [
        0.5,
        407.5,
        869.0,
        791.0
    ],
    "area": 425968.25,
    "category_id": 0
}
]

It does not split the annotations but duplicates them

[
[
    {
        "id": 0,
        "image_id": 0,
        "segmentation": [
            525.0,
            2424.5,
            1257.0,
            2578.5
        ],
        "iscrowd": 0,
        "bbox": [
            0.5,
            407.5,
            869.0,
            791.0
        ],
        "area": 425968.25,
        "category_id": 0
    },
    {
        "id": 1,
        "image_id": 2,
        "segmentation": [
            [
                4241.0,
                14.5,
                141.0,
                7557.5
            ],
            [
                578.0,
                2424.5,
                141.0,
                965.5
            ]
        ],
        "iscrowd": 0,
        "bbox": [
            0.5,
            407.5,
            869.0,
            791.0
        ],
        "area": 425968.25,
        "category_id": 0
    }
],
[
    {
        "id": 0,
        "image_id": 0,
        "segmentation": [
            525.0,
            2424.5,
            1257.0,
            2578.5
        ],
        "iscrowd": 0,
        "bbox": [
            0.5,
            407.5,
            869.0,
            791.0
        ],
        "area": 425968.25,
        "category_id": 0
    },
    {
        "id": 1,
        "image_id": 2,
        "segmentation": [
            [
                4241.0,
                14.5,
                141.0,
                7557.5
            ],
            [
                578.0,
                2424.5,
                141.0,
                965.5
            ]
        ],
        "iscrowd": 0,
        "bbox": [
            0.5,
            407.5,
            869.0,
            791.0
        ],
        "area": 425968.25,
        "category_id": 0
    }
]
]

Now this works on the sample JSON. In this case, segms is a list of 2 values, 2 sublists. In case more than one annotation is present, only the first one is taken into consideration — Deamoon, Sep 05 '22 at 12:53
you have provided output thank you, now please provide your code too. And if possible an example of more than one annotation is present. — iamtrappedman, Sep 05 '22 at 13:05

Restructure json data

2 Answers2