I have a Json file which contain some duplicates and I am looking for the way to remove them. Two examples of the beginning of my Json texts:
"date": "May 16, 2012 Wednesday", "body": "THE future of one of Scotland's most important listed buildings .... World Monuments Fund. o See a picture gallery of Mavisbank House at scotsman.com/scotland ", "title": "Rescue deal to bring Adam mansion back from brink"
"date": "May 16, 2012 Wednesday", "body": "The future of one of Scotland's most important listed buildings .... World Monuments Fund.", "title": "Rescue deal to bring Adam mansion back from brink"
I have cut the text in the middle due to the extension of it and irrelevance since they match perfectly. As we can see the text matches almost 100% except at the beginning THE
vs The
and at the end (extra sentence: o See a picture gallery of Mavisbank House at scotsman.com/scotland
). In this line I will like to come with a way to I) Find the duplicates and II) remove one of the duplicates (note that they can also be more than one duplicate). I just started programming in Python and I am not sure how to handle this problem. Any help is really appreciated!
kind regards!