1

I have a GeoJSON file with duplicate coordinates stored as nested arrays. Single object for reference, see the "coordinates" property towards the end:

{"type": "FeatureCollection", "crs": {"type": "name", "properties": {"name": "urn:ogc:def:crs:OGC:1.3:CRS84"}}, "features": [{"type": "Feature", "properties": {"original": {"Basin": "Valley and Ridge", "Lithology": "Shale", "Shale_play": "Conasauga", "Source": "EIA", "Area_sq_mi": 3240.58067885, "Area_sq_km": 8393.06542771, "Age_shale": "Cambrian", "Age_color": 1}, "required": {"unit": null, "viz_dim": "Area_sq_mi", "legend": "Tight Oil/Shale Gas", "years": []}, "optional": {"description": ""}, "type": {"primary": "shale", "secondary": "gas"}}, "geometry": {"type": "Polygon", "coordinates": [[[-87.37001800036694, 33.014571999710284], [-87.40415900005101, 33.02085199986492], [-87.32181299995092, 33.09078200035857], [-87.1556450003654, 33.23547099972991], [-87.04042899992734, 33.369561999932664], [-86.95666900036612, 33.44300300031086], [-86.8689999998124, 33.48764400030333], [-86.72767400031411, 33.59278400032282], [-86.50615199977062, 33.63360699988983], [-86.27221800030279, 33.7633590001767], [-86.0292749998697, 33.95005700030741], [-85.65364500010934, 34.20510899976165], [-85.48674699987951, 34.29893200020162], [-85.32838800026164, 34.33198600032085], [-85.19582200040985, 34.413098000151734], [-85.11523100025278, 34.571478000220715], [-85.1240029995949, 34.69609600001083], [-85.02976600007895, 34.92122100030121], [-85.0158389996143, 35.016744000332146], [-84.96455600003074, 35.05100200027126], [-84.9104880002841, 35.04050700010535], [-84.88522600007232, 34.966663999880176], [-84.9080849997126, 34.829821999995815], [-84.92937599990591, 34.597192000205226], [-85.01071799972527, 34.42299399983415], [-85.15765899975534, 34.22100600019924], [-85.21265299996477, 34.164361999669545], [-85.3834629997636, 34.0677709999621], [-85.485048000013, 33.963717000175365], [-85.65833099980517, 33.90826299963804], [-85.85014100044248, 33.88146900022848], [-85.95754299999173, 33.86326899965542], [-86.02704000042596, 33.80956099998609], [-86.06997199971867, 33.730073000042616], [-86.14711699956369, 33.657195000306224], [-86.24724099959798, 33.558865000003706], [-86.32383599962213, 33.51779899977148], [-86.46862800042221, 33.528083000339706], [-86.59074499990832, 33.50300299973619], [-86.68252000019984, 33.42025199969136], [-86.8427049999059, 33.276867000263394], [-87.02894500021725, 33.165200999956], [-87.19579499986818, 33.07244600001068], [-87.37001800036694, 33.014571999710284]]]}}, ...

I'm trying to use Python to remove these duplicate points by trying to convert them into a set object but keep running into TypeError: unhashable type: 'list'. I understand that Python interprets JSON arrays as lists. I've made various attempts to convert the lists into tuples, since you can't use a mutable type as a key in a dictionary, but when I try set(map(tuple, coords)) or list(set(map(tuple,ppoint))) as suggested in this SO post I get TypeError: unhashable type: 'list'.

Full script for reference:

# basic script to remove duplicate points from geojson coordinates
# a response to SO post at https://stackoverflow.com/questions/70071555/how-do-i-create-a-mongodb-2dsphere-index-on-a-document-containing-an-array-of-ar

try:
    import simplejson as json
except ImportError:
    import json

input = '../data/TightOil_ShaleGas_US_Aug2015.geojson'
output = '../data/new_data/TightOil_ShaleGas_US_Aug2015_no_duplicates.geojson'
with open(input, 'r') as f:
    file_data = json.loads(f.read())

with open(output, 'w') as f:
    for feature in file_data["features"]:
        coords = feature["geometry"]["coordinates"]
        set(map(tuple, coords))
        print(type(coords))

I'm also experiencing issues even just casting my lists to tuples. When I just run:

...

for feature in file_data["features"]:
        coords = feature["geometry"]["coordinates"]
        tuple(coords)
        print(type(coords))

I get <class 'list'> for each object in coords.

How do I remove these duplicates and wind up with a type that I can use in a dictionary to write out as an array to a JSON file?

bkleeman
  • 109
  • 1
  • 14

1 Answers1

1

Here's a few ideas:

  1. In your second example, the line tuple(coords) doesn't alter the coords variable. You probably want to say coords = tuple(coords) to have the effect you are looking for.

  2. In your first example, you have the same problem with the line set(map(tuple, coords)). You need to assign the result to a variable so you can use it in the rest of your code.

  3. In your first example, the coordinates are apparently a list-of-lists-of-lists, but it seems that you are treating them as a list-of-lists. The top-level list has only one element. Do you know if that is always the case? Do you know why there is an extra layer of lists here? If you knew that the top-level list always had only one element, then you could use something like coords = set(map(tuple, coords[0])) but if the top-level list might be longer then you would need to process every element in it as well.

Andrew Merrill
  • 1,672
  • 11
  • 14