Remove duplicate package dependencies, sort by version

Question

I have a file like this for example:

"grunt": "0.4.5",
"grunt": "1.0.1"
"grunt": "1.0.1",
"grunt-angular-templates": "0.5.7",
"grunt-cli": "^0.1.13",
"grunt-contrib-clean": "0.6.0",
"grunt-contrib-compress": "0.12.0",
"grunt-contrib-concat": "1.0.1",

Now I want to remove lines that have duplicate prefixes but keep the ones that have more recent versions. So for the line that starts with grunt I want to keep the one that has version 1.0.1 but remove the other ones.

Is there a straightforward to do this?

I think it'd be a simple task to remove duplicates, but determining "the more recent version" may prove to be difficult. It means that you would have to interpret and read the versions and determine which one is the latest. Here's some resources: https://stackoverflow.com/questions/4023830/how-compare-two-strings-in-dot-separated-version-format-in-bash — PressingOnAlways, Aug 18 '17 at 18:47
You didn't specify what exactly constitutes a *prefix* for you. From what I got I'd suggest `sed 's/.*//' | sort -u | head -1`. Just teasing here. :) — yacc, Aug 18 '17 at 18:54
For prefixes, the strings should match letter by letter. In this case "grunt" would be an exact match. — KanwarG, Aug 18 '17 at 19:51

randomir · Accepted Answer · 2017-08-18T20:21:27.790

One implementation for a naive approach would be very simple:

sort -k1,1 -k2,2Vr file | sort -k1,1 -u

i.e: sort by first field (package name) ascending, and by second field (version) descending using -V/--version-sort (natural sort for version numbers). Then in the second pass (second sort invocation, with -u/--unique flag) simply compare by package name only and drop all duplicates (packages with the same name but smaller version number, since after the first pass greater versions will appear at the top).

The result for your sample input is:

"grunt": "1.0.1",
"grunt-angular-templates": "0.5.7",
"grunt-cli": "^0.1.13",
"grunt-contrib-clean": "0.6.0",
"grunt-contrib-compress": "0.12.0",
"grunt-contrib-concat": "1.0.1",

However, since npm (and I'm assuming those are lines from package.json) uses semantic versioning (semver), properly handling semver sorting is a lot more complex than the above sort approach can handle.

For example, you would have to sort versions like >=version, ~version, ^version, version1 - version2, even range1 || range2, and even URLs, files/paths, GitHub URLs, tags, etc.

To handle all those (valid) versions, it's best to use a specialized tool, for example semver.

Remove duplicate package dependencies, sort by version

1 Answers1