0

I have ~2000 JSON files that have duplicate representations like:

{
    "vulnerabilities": [],
    "ok": true,
    "dependencyCount": 2,
    "org": "org",
    "isPrivate": true,
    "licensesPolicy": {
      "severities": {},
      "orgLicenseRules": {
        "AGPL-1.0": {
          "licenseType": "AGPL-1.0",
          "severity": "high",
          "instructions": ""
        }
      }
    }
}
{
    "vulnerabilities": [],
    "ok": true,
    "dependencyCount": 2,
    "org": "org",
    "isPrivate": true,
    "licensesPolicy": {
      "severities": {},
      "orgLicenseRules": {
        "AGPL-1.0": {
          "licenseType": "AGPL-1.0",
          "severity": "high",
          "instructions": ""
        }
      }
    }
}

I want to essentially remove duplicate entries for vulnerabilities in a directory of JSON files.

How might I do that?

Jshee
  • 2,620
  • 6
  • 44
  • 60
  • Maybe you can iterate through the json files, create an array and each time it finds a new object it checks if it's not in the new array. – Raffon Feb 14 '21 at 13:11
  • @Raffon OP have 2000 files and we don't even know how many items in each file. It could end up massively with that method – Doan Van Thang Feb 14 '21 at 19:44
  • What happen if only `vulnerabilities` key is duplicate and other key won't? For example `org`, will you keep it or discard it? – Doan Van Thang Feb 14 '21 at 19:50
  • This maybe can help you https://stackoverflow.com/questions/17967114/how-to-efficiently-remove-duplicates-from-an-array-without-using-set – Doan Van Thang Feb 14 '21 at 20:06

0 Answers0