I've a JSON object full of properties where some of them are randomly repeated. I want to remove those repeated ones based on the "word" index
and only keep the first occurrence as in the example:
{ "word" : "Apple", "meaning" : "First meaning" },
{ "word" : "Ball", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Cat", "meaning" : " \u090f\u0909\u091f\u093e" },
{ "word" : "Apple", "meaning" : "Repeated, but has another meaning" },
{ "word" : "Doll", "meaning" : " \u090f\u0909\u091f\u093e" },
I'm a Python beginner and am not able to come up ahead of this solution so far:
#!/usr/bin
import json
source="/var/www/dictionary/repeated.json"
destination="/var/www/dictionary/corrected.json"
def remove_redundant():
with open(source, "r") as src:
src_object = json.load(src)
for i in xrange(len(src_object)):
escape = 1
for j in xrange(len(src_object)):
if src_object[j]["word"] == src_object[i]["word"]:
# leave the first occurance
if escape == 1:
escape = 2
continue
else:
src_object.pop(j)
# open(destination, "w+").write(json.dumps(src_object, sort_keys=True, indent=4, separators=(',', ': ')))
src.close()
remove_redundant()
The error that I keep getting is IndexError: list index out of range
because the len is changing constantly. Thanks for any help.