3

Is there a difference in the order of uniq and sort when calling them in a shell script? I’m talking here about time- and space-wise.

grep 'somePattern' | uniq | sort

vs.

grep 'somePattern' | sort | uniq

a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)

I don’t know how to measure memory usage though …

The question now is: does the order make a difference? Or is it dependent on the returned lines from grep (many/few duplicates)

knittl
  • 246,190
  • 53
  • 318
  • 364
  • 1
    I would humbly recommend accepting a different asnwer - sort -u is the correcter way of doing this than either of your alternatives. – DVK Sep 10 '09 at 14:22
  • sure, but the accepted answer explains the _why_ better – knittl Sep 10 '09 at 14:25

3 Answers3

10

I believe that sort -u is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort and uniq individually in either order.

mqp
  • 70,359
  • 14
  • 95
  • 123
  • 1
    `sort -u` is a great hint, and no doubt, it’s more efficient than calling the two in either order. BUT, the order makes a difference (uniq | sort not working) – knittl Sep 09 '09 at 21:55
  • In a quick test, I found that `sort -u` is about 7% faster than `sort|uniq`. – Dennis Williamson Sep 09 '09 at 23:22
9

The only correct order is to call uniq after sort, since the man page for uniq says:

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Therefore it should be

grep 'somePattern' | sort | uniq
Robert Munteanu
  • 67,031
  • 36
  • 206
  • 278
  • 1
    I've used | uniq | sort | uniq when grepping gigabytes worth of stuff out of sorted files just to try to keep the sort from having to sort an excessive amount of data. – Shizzmo Sep 10 '09 at 04:25
3

uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.

Sven Schott
  • 83
  • 1
  • 8