2

I have a json file contains an array of objects, the data inside the file is something like this:

[
 {‘name’: ‘A’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to A’},
 {‘name’: ‘B’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to B’},
 {‘name’: ‘C’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to C’}
]

and I want to remove redundant key value, so the output should be something like this:

  [
   {‘name’: ‘A’,
   ‘address’: ‘some address related to A’,
   ‘details’: ‘some details related to A’},
   {‘name’: ‘B’,
   ‘details’: ‘some details related to B’},
   {‘name’: ‘C’,
   ‘details’: ‘some details related to C’}
  ]

so, I've tried this code found it in this link:

import json

with open(‘./myfile.json’) as fp:
    data= fp.read()
  
unique = []
for n in data:
    if all(unique_data["address"] != data for unique_data["address"] in unique):
        unique.append(n)

#print(unique)   
with open(“./cleanedRedundancy.json”, ‘w’) as f:
     f.write(unique)

but it gives me this error:

TypeError: string indices must be integers
n_dev
  • 25
  • 6
  • `for n in data` actually iterates through each symbol of text `data`, so each iteration `n` is one symbol of text. Is that what you really wanted? – Arty Oct 07 '20 at 13:37
  • You have to parse the JSON. See [How to parse JSON in Python?](https://stackoverflow.com/q/7771011/218196) – Felix Kling Oct 07 '20 at 13:37
  • 1
    Also `for unique_data["address"] in unique` should be really `for unique_data in unique`. – Arty Oct 07 '20 at 13:37
  • @Arty, thanks for your reply, but can you please clarify more, I didn't really get what you said! – n_dev Oct 07 '20 at 13:45
  • @n_dev Can you describe in more details algorithm of removing redunant entries? Then we can create a working code for implementing such algorithm. – Arty Oct 07 '20 at 13:49
  • Do you want to remove all pairs of (key, value) if it was already present in entries before? Or you want to delete such key pairs only if it contains words like `to A`? From you example input and output it is not very clear what is the correct algorithm of removing redundant data. – Arty Oct 07 '20 at 13:49
  • @Arty, yes I want to remove all (key, value) if it was already present in entries before. – n_dev Oct 07 '20 at 13:53
  • I don't think it has anything to do with json. It looks like a simple error. Look at unique_data. Is it supposed to be a dict or a string? Does it have the following literal string as a key: "address" ? – Kenny Ostrom Oct 07 '20 at 13:55
  • @KennyOstrom, No it doesn't have a nested (key, value) pairs. – n_dev Oct 07 '20 at 13:59
  • @n_dev Created [this answer](https://stackoverflow.com/a/64245698/941531) to solve your task. – Arty Oct 07 '20 at 14:06
  • But you're addressing the nested key value pairs, and the error message is saying literally "you can't do that because that's not what kind of data this is" – Kenny Ostrom Oct 07 '20 at 14:10

1 Answers1

2

I did solution with/without files support, without by default, for your case to support files change use_files = False to use_files = True inside my script.

I expected that you want to remove duplicates having same (key, value) pair.

Try it online!

import json

use_files = False
# Only duplicates with next keys will be deleted
only_keys = {'address', 'complex'}

if not use_files:
    fdata = """
    [
     {
       "name": "A",
       "address": "some address related to A",
       "details": "some details related to A"
     },
     {
       "name": "B",
       "address": "some address related to A",
       "details": "some details related to B",
       "complex": ["x", {"y": "z", "p": "q"}],
       "dont_remove": "test"
     },
     {
       "name": "C",
       "address": "some address related to A",
       "details": "some details related to C",
       "complex": ["x", {"p": "q", "y": "z"}],
       "dont_remove": "test"
     }
    ]
    """

if use_files:
    with open("./myfile.json", 'r', encoding = 'utf-8') as fp:
        data = fp.read()
else:
    data = fdata

entries = json.loads(data)

unique = set()
for e in entries:
    for k, v in list(e.items()):
        if k not in only_keys:
            continue
        v = json.dumps(v, sort_keys = True)
        if (k, v) in unique:
            del e[k]
        else:
            unique.add((k, v))

if use_files:
    with open("./cleanedRedundancy.json", "w", encoding = 'utf-8') as f:
        f.write(json.dumps(entries, indent = 4, ensure_ascii = False))
else:
    print(json.dumps(entries, indent = 4, ensure_ascii = False))

Output:

[
    {
        "name": "A",
        "address": "some address related to A",
        "details": "some details related to A"
    },
    {
        "name": "B",
        "details": "some details related to B",
        "complex": [
            "x",
            {
                "y": "z",
                "p": "q"
            }
        ],
        "dont_remove": "test"
    },
    {
        "name": "C",
        "details": "some details related to C",
        "dont_remove": "test"
    }
]
Arty
  • 14,883
  • 6
  • 36
  • 69
  • I'd suggest a small change to avoid modifying entries while iterating. I know you created a copy of the sub-entry, but canonically we'd just create a new cleaned copy? – Kenny Ostrom Oct 07 '20 at 14:56
  • @KennyOstrom Why? I read-in then process then write-out this object only to do transform once for the given task. Why can't I delete entries? I also do `in list(entries.items())`, here `list()` makes a copy of dictionary items hence this loop iteration would not have troubles if entries change on flight. – Arty Oct 07 '20 at 15:06
  • @Arty, I really appreciate your help, but when I tried your code I got this error: `TypeError: unhashable type: 'list'` – n_dev Oct 07 '20 at 19:08
  • @Arty, actually I didn't mention that the redundant key in my case has a value of type array, is that why I have got this error? because it seems to work fine in the case "without using the file". – n_dev Oct 08 '20 at 02:42
  • 1
    @n_dev Fixed my answer! To support lists and any complex value types. Used nice trick by converting value to json string for comparison, for same value they should be equal strings. – Arty Oct 08 '20 at 03:25
  • 1
    @n_dev Also see that in my example nested dictionary in two places has different order of `"y"` and `"p"` keys, still I consider such dictionaries to be equal if they are equal for sorted order of keys, for that I used argument `sort_keys = True`, if such dictionaries should be considered un-equal, replace with `sort_keys = False`. – Arty Oct 08 '20 at 03:28
  • @Arty, if I want to delete a specific redundant key, how can I do it? ex. if I want always delete the redundancy of the "address" key. – n_dev Oct 08 '20 at 04:00
  • 1
    @n_dev In the beginning of script just added new constant `only_keys = ...`, put there keys only that need to be deleted. In your case just do only `only_keys = {'address'}` , this will solve your last requested task to delete only addresses. – Arty Oct 08 '20 at 04:12
  • @Arty, it did exactly what I want, I really appreciate your help!, thanks a lot. – n_dev Oct 08 '20 at 04:26
  • @n_dev Welcome! :) – Arty Oct 08 '20 at 05:24