I have a GeoJSON file with duplicate coordinates stored as nested arrays. Single object for reference, see the "coordinates" property towards the end:
{"type": "FeatureCollection", "crs": {"type": "name", "properties": {"name": "urn:ogc:def:crs:OGC:1.3:CRS84"}}, "features": [{"type": "Feature", "properties": {"original": {"Basin": "Valley and Ridge", "Lithology": "Shale", "Shale_play": "Conasauga", "Source": "EIA", "Area_sq_mi": 3240.58067885, "Area_sq_km": 8393.06542771, "Age_shale": "Cambrian", "Age_color": 1}, "required": {"unit": null, "viz_dim": "Area_sq_mi", "legend": "Tight Oil/Shale Gas", "years": []}, "optional": {"description": ""}, "type": {"primary": "shale", "secondary": "gas"}}, "geometry": {"type": "Polygon", "coordinates": [[[-87.37001800036694, 33.014571999710284], [-87.40415900005101, 33.02085199986492], [-87.32181299995092, 33.09078200035857], [-87.1556450003654, 33.23547099972991], [-87.04042899992734, 33.369561999932664], [-86.95666900036612, 33.44300300031086], [-86.8689999998124, 33.48764400030333], [-86.72767400031411, 33.59278400032282], [-86.50615199977062, 33.63360699988983], [-86.27221800030279, 33.7633590001767], [-86.0292749998697, 33.95005700030741], [-85.65364500010934, 34.20510899976165], [-85.48674699987951, 34.29893200020162], [-85.32838800026164, 34.33198600032085], [-85.19582200040985, 34.413098000151734], [-85.11523100025278, 34.571478000220715], [-85.1240029995949, 34.69609600001083], [-85.02976600007895, 34.92122100030121], [-85.0158389996143, 35.016744000332146], [-84.96455600003074, 35.05100200027126], [-84.9104880002841, 35.04050700010535], [-84.88522600007232, 34.966663999880176], [-84.9080849997126, 34.829821999995815], [-84.92937599990591, 34.597192000205226], [-85.01071799972527, 34.42299399983415], [-85.15765899975534, 34.22100600019924], [-85.21265299996477, 34.164361999669545], [-85.3834629997636, 34.0677709999621], [-85.485048000013, 33.963717000175365], [-85.65833099980517, 33.90826299963804], [-85.85014100044248, 33.88146900022848], [-85.95754299999173, 33.86326899965542], [-86.02704000042596, 33.80956099998609], [-86.06997199971867, 33.730073000042616], [-86.14711699956369, 33.657195000306224], [-86.24724099959798, 33.558865000003706], [-86.32383599962213, 33.51779899977148], [-86.46862800042221, 33.528083000339706], [-86.59074499990832, 33.50300299973619], [-86.68252000019984, 33.42025199969136], [-86.8427049999059, 33.276867000263394], [-87.02894500021725, 33.165200999956], [-87.19579499986818, 33.07244600001068], [-87.37001800036694, 33.014571999710284]]]}}, ...
I'm trying to use Python to remove these duplicate points by trying to convert them into a set
object but keep running into TypeError: unhashable type: 'list'
. I understand that Python interprets JSON arrays as lists. I've made various attempts to convert the lists into tuples, since you can't use a mutable type as a key in a dictionary, but when I try set(map(tuple, coords))
or list(set(map(tuple,ppoint)))
as suggested in this SO post I get TypeError: unhashable type: 'list'
.
Full script for reference:
# basic script to remove duplicate points from geojson coordinates
# a response to SO post at https://stackoverflow.com/questions/70071555/how-do-i-create-a-mongodb-2dsphere-index-on-a-document-containing-an-array-of-ar
try:
import simplejson as json
except ImportError:
import json
input = '../data/TightOil_ShaleGas_US_Aug2015.geojson'
output = '../data/new_data/TightOil_ShaleGas_US_Aug2015_no_duplicates.geojson'
with open(input, 'r') as f:
file_data = json.loads(f.read())
with open(output, 'w') as f:
for feature in file_data["features"]:
coords = feature["geometry"]["coordinates"]
set(map(tuple, coords))
print(type(coords))
I'm also experiencing issues even just casting my lists to tuples. When I just run:
...
for feature in file_data["features"]:
coords = feature["geometry"]["coordinates"]
tuple(coords)
print(type(coords))
I get <class 'list'>
for each object in coords
.
How do I remove these duplicates and wind up with a type that I can use in a dictionary to write out as an array to a JSON file?