-1

Original Post = Remove duplicates from json data

This is only my second post. I didnt have enough points to comment my question on the original post...So here I am.

Andy Hayden makes a great point - "Also, those aren't really duplicates... – Andy Hayden"

My question is just that situation... How can you remove duplicates from a json file but by matching against more than 1 key in the json file?

Here is the original example: (it was pointed out that it is not a valid json)

{
  {obj_id: 123,
    location: {
      x: 123,
      y: 323,
  },
  {obj_id: 13,
    location: {
      x: 23,
      y: 333,
  },
 {obj_id: 123,
    location: {
      x: 122,
      y: 133,
  },
}

My case is very similar to this example except In my case, it would keep all these because the x and y values of obj_id are unique, however if x and y were the same than one would be removed from json file.

All the examples I have found only kick out ones based on only one key match..

I don't know if it matters, but the keys that I need to match against are "Company Name" , "First Name", and "Last Name" (it is a 100k plus line json of companies and contacts - there are times when the same person is a contact of multiple companies which is why I need to match against multiple keys)

Thanks.

martineau
  • 119,623
  • 25
  • 170
  • 301
Jbigger
  • 11
  • 4
  • All the keys in a dictionary must be unique, so there's not way to for `json.load()` or `json.loads()` to return one that has has values with duplicate keys. It's one of the differences between Python dictionaries and JSON objects. Would getting a `list` of the objects be useful, because that might be possible. – martineau Mar 22 '18 at 19:06
  • @martineau That is true, but JSON allows `arrays` (or `lists` in Python), that *can* have duplicates... consider: `json.loads("[1,2,3,1,1]")` – Joe Iddon Mar 22 '18 at 19:10
  • @Joe: I know that...which why I asked whether getting the objects in a `list` (aka JSON array) would be acceptable to them in my comment. – martineau Mar 22 '18 at 19:14
  • @martineau Sorry but I am unsure what you mean... the OP gives an example of the input data, *as a list or objects*. – Joe Iddon Mar 22 '18 at 19:16
  • Joe:It was a dictionary of dictionaries when I posted my comment(s), but now @MushroomMauLa has changed it (which I don't think is what the OP intended, so I'm going to roll-back his/her changes). – martineau Mar 22 '18 at 20:16
  • @martineau Let's be honest, as it stands, this question is extremely unclear – Joe Iddon Mar 22 '18 at 21:23
  • @joe: Then everyone should just cool their jets and wait until the OP's online again and has had a chance to respond. – martineau Mar 22 '18 at 21:48
  • Does this answer your question? [Remove duplicates from json data](https://stackoverflow.com/questions/17076345/remove-duplicates-from-json-data) – Alireza Abdi Aug 22 '22 at 15:20

1 Answers1

1

I hope this does what you are looking for (It only checks if First and Last Name are different)

raw_data = [
        {
            "Company":123,
            "Person":{
                "First Name":123,
                "Last Name":323
            }
        },
        {
            "Company":13,
            "Person":{
                "First Name":123,
                "Last Name":323
            }
        },
        {
            "Company":123,
            "Person":{
                "First Name":122,
                "Last Name":133
            }
        }
    ]

unique = []
for company in raw_data:
    if all(unique_comp["Person"] != company["Person"] for unique_comp in unique):
        unique.append(company)

print(unique)

#>>> [{'Company': 123, 'Person': {'First Name': 123, 'Last Name': 323}}, {'Company': 123, 'Person': {'First Name': 122, 'Last Name': 133}}]
optimalic
  • 511
  • 3
  • 17