0

I have a large geojson file with the following general structure:

{
  "features": [{
      "geometry": {
        "coordinates": [
          [
            [-12.345, 26.006],
            [-78.56, 24.944],
            [-76.44, 24.99],
            [-76.456, 26.567],
            [-78.345, 26.23456]
          ]
        ],

        "type": "Polygon"
      },

      "id": "Some_ID_01",

      "properties": {
        "parameters": "elevation"
      },
      "type": "Feature"
    },

    {
      "geometry": {
        "coordinates": [
          [
            [139.345, 39.2345],
            [139.23456, 37.3465],
            [141.678, 37.7896],
            [141.2345, 39.6543],
            [139.7856, 39.2345]
          ]
        ],
        "type": "Polygon"
      },
      "id": "Some_OtherID_01",
      "properties": {
        "parameters": "elevation"
      },
      "type": "Feature"
    }, {
      "geometry": {
        "coordinates": [
          [
            [143.8796, -30.243],
            [143.456, -32.764],
            [145.3452, -32.76],
            [145.134, -30.87],
            [143.123, -30.765]
          ]
        ],
        "type": "Polygon"
      },
      "id": "Some_ID_02",
      "properties": {
        "parameters": "elevation"
      },
      "type": "Feature"
    }
  ],
  "type": "FeatureCollection"
}

I'm trying to remove duplicate ID, and keep the newest version (ie. Some_ID_01 and Some_ID_02 are considered duplicates for my purposes and I would like to keep Some_ID_02). The contents of these "duplicates" are not in any kind of order (though it would be great if I could order them in the process, probably alphabetically), nor do these duplicates necessarily contain the same coordinates value (they are newer versions of the same point)

So far I have read a couple of remove duplicate json entries (tried modifying the code from this guide here in particular), but I don't know enough JS to modify it to my particular needs. I am reading the underscore.js to see if that would help (based on suggestions in other threads) and also going to look into python or excel (as a CSV file) to see if any of those simplify.

Would it be possible to feed in the geojson in to the program and get a file in return (even if it's a text file) or would it be simpler to feed it inline?

Keki
  • 15
  • 5

1 Answers1

0

I opted to go with python as I was stronger in that language. I will post my code below for reference, but you can also find another post I made here with more details as to a problem i faced when removing keys from a dictionary using a list

import json

json_file = open('features.json')
json_str = json_file.read()
json_data = json.loads(json_str)

dictionaryOfJsonId = {}
removalCounter = 0
keyToRemove = []
valueToRemoveFromList = []
IDList = []

for values in json_data['features']:    #This loop converts the values in the json parse into a dict of only ID
    stringToSplit = values["id"]        #the id values from the json file
    IDList.append(stringToSplit)        #list with all the ID
    newKey = stringToSplit[:-2]         #takes the initial substring up to the last 2 spaces (version)
    newValue = stringToSplit[-2:]       #grabs the last two characters of the string

    if newKey in dictionaryOfJsonId:
        dictionaryOfJsonId[newKey].append(newValue)
    else:
        dictionaryOfJsonId[newKey] = [newValue]


for key in dictionaryOfJsonId:          #Remove entries that do not have duplicates
    if len(dictionaryOfJsonId[key])<2:
        valueToRemoveFromList.append(str(key + dictionaryOfJsonId[key][0]))
    else:
        valueToRemoveFromList.append(str(key +max(dictionaryOfJsonId[key])))


for string in valueToRemoveFromList:    #Remove all values that don't have duplicates from the List of ID
    IDList.remove(string)
    removalCounter+=1


good_features = [i for i in json_data['features'] if i['id'] not in IDList] #Loops through the original and 
                                                                            #removes keys on list from original JSON


with open('features.geojson','w') as outfile:   #create JSON file from list
    json.dump(good_features,outfile)



print "Removed",len(json_data['features'])-removalCounter, "entries from JSON" 
Keki
  • 15
  • 5