5

Is there an efficient way to remove duplicates 'person_id' fields from this data with python? In this case just keep the first occurrence.

{
  {obj_id: 123,
    location: {
      x: 123,
      y: 323,
  },
  {obj_id: 13,
    location: {
      x: 23,
      y: 333,
  },
 {obj_id: 123,
    location: {
      x: 122,
      y: 133,
  },
}

Should become:

{
  {obj_id: 123,
    location: {
      x: 123,
      y: 323,
  },
  {obj_id: 13,
    location: {
      x: 23,
      y: 333,
  },
}
9-bits
  • 10,395
  • 21
  • 61
  • 83

4 Answers4

10

Presuming your JSON is valid syntax and you are indeed requesting help for Python you will need to do something like this

import json
ds = json.loads(json_data_string) #this contains the json
unique_stuff = { each['obj_id'] : each for each in ds }.values()

if you want to always retain the first occurrence, you will need to do something like this

all_ids = [ each['obj_id'] for each in ds ] # get 'ds' from above snippet
unique_stuff = [ ds[ all_ids.index(id) ] for id in set(ids) ]
Kalyan02
  • 1,416
  • 11
  • 16
  • Assuming, ``ds`` is an array of dictionaries (as that's the option that makes most sense) this works well, but keeps the last occurrence instead of the first one. – lqc Jun 12 '13 at 22:44
  • 1
    +1: elegant (the first example). Though `.values()`/`set()` may return objects in any order. Assuming the input is json array then it might matter. Here's [order preserving algorithm](http://stackoverflow.com/a/17076805/4279) – jfs Jun 12 '13 at 23:09
  • 1
    micro-nitpick: `obj` and `json_array` might be better names than `each` and `ds` – jfs Jun 12 '13 at 23:11
  • whats the difference between `json.loads(...)` and `json.load(...)`? – Robert Johnstone Nov 18 '16 at 09:20
  • 1
    @Sevenearths `json.loads(...)` loads from string (loads:load string); `json.load(...)` loads from a file handler (or a `read()` supporting file-like object) – Kalyan02 Nov 23 '16 at 12:27
5

Here's an implementation that preserves order of input json objects and keeps the first occurrence of objects with the same id:

import json
import sys
from collections import OrderedDict

L = json.load(sys.stdin, object_pairs_hook=OrderedDict)
seen = OrderedDict()
for d in L:
    oid = d["obj_id"]
    if oid not in seen:
        seen[oid] = d

json.dump(seen.values(), sys.stdout,  indent=2)

Input

[
  {
    "obj_id": 123, 
    "location": {
      "x": 123, 
      "y": 323
    }
  }, 
  {
    "obj_id": 13, 
    "location": {
      "x": 23, 
      "y": 333
    }
  }, 
  {
    "obj_id": 123, 
    "location": {
      "x": 122, 
      "y": 133
    }
  }
]

Output

[
  {
    "obj_id": 123, 
    "location": {
      "x": 123, 
      "y": 323
    }
  }, 
  {
    "obj_id": 13, 
    "location": {
      "x": 23, 
      "y": 333
    }
  }
]
jfs
  • 399,953
  • 195
  • 994
  • 1,670
-3

(if you had valid json)

from simplejson import loads, dumps
dumps(loads(my_json))
Rich Tier
  • 9,021
  • 10
  • 48
  • 71
  • How can you know it will do anything, if you don't know how the correct input looks like? – lqc Jun 12 '13 at 22:48
  • 1
    The question title is "Remove duplicates from json data". I caveat'd "if valid json" and provided an answer. This is what everyone else in the question has also done. – Rich Tier Jun 12 '13 at 22:56
-4

This is not valid JSON. On a valid JSON (Array), You can use jQuery $.each and look at the Obj_id to find and remove duplicates.

Something like this:

$.each(myArrayOfObjects, function(i, v)
{
      // check for duplicate and add non-repeatings to a new array
});
2D3D4D
  • 131
  • 13