Sort & uniq in Linux shell

Question

What is the difference between the following to commands?

sort -u FILE

sort FILE | uniq

When you ran them, what did you see? Did you try collecting timing differences for different sized files? You could run a few experiments and post the results are part of your question. — S.Lott, Aug 01 '10 at 17:08
I want to know if there is a special case were both commands behave differently, in a normal execution they both give the same results — yassin, Aug 01 '10 at 17:12
[What is the difference between “sort -u” and “sort | uniq”?](http://unix.stackexchange.com/q/76049/17265) — mtk, Apr 15 '14 at 10:04
@mtk: The Q&A on U&L is essentially a duplicate of this, but was asked years after this question. The historical commentary in the answer is interesting. The `-u` option was in UNIX 7th Edition `sort` (circa 1979), so the ancient history referred to is truly ancient. — Jonathan Leffler, Feb 18 '15 at 16:31
@Jonathan Well I posted it for the same reason that the historical commentary is interesting :) Plus it has some timing experiment illustrated. — mtk, Feb 19 '15 at 10:16

score 91 · Accepted Answer · edited Jan 24 '18 at 16:12

Using sort -u does less I/O than sort | uniq, but the end result is the same. In particular, if the file is big enough that sort has to create intermediate files, there's a decent chance that sort -u will use slightly fewer or slightly smaller intermediate files as it could eliminate duplicates as it is sorting each set. If the data is highly duplicative, this could be beneficial; if there are few duplicates in fact, it won't make much difference (definitely a second order performance effect, compared to the first order effect of the pipe).

Note that there times when the piping is appropriate. For example:

sort FILE | uniq -c | sort -n

This sorts the file into order of the number of occurrences of each line in the file, with the most repeated lines appearing last. (It wouldn't surprise me to find that this combination, which is idiomatic for Unix or POSIX, can be squished into one complex 'sort' command with GNU sort.)

There are times when not using the pipe is important. For example:

sort -u -o FILE FILE

This sorts the file 'in situ'; that is, the output file is specified by -o FILE, and this operation is guaranteed safe (the file is read before being overwritten for output).

gnu sort does not have a way to do all of `sort | uniq -c | sort -n`, and neither have I found any other tool to do it efficiently. Seems like a worthwhile thing to code up. — mc0e, Feb 18 '15 at 16:23

score 11 · Answer 2 · answered Aug 01 '10 at 19:05

There is one slight difference: return code.

The thing is that unless shopt -o pipefail is set the return code of the piped command will be return code of the last one. And uniq always returns zero (success). Try examining exit code, and you'll see something like this (pipefail is not set here):

pavel@lonely ~ $ sort -u file_that_doesnt_exist ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
2
pavel@lonely ~ $ sort file_that_doesnt_exist | uniq ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
0

Other than this, the commands are equivalent.

score 9 · Answer 3 · answered Aug 23 '12 at 20:36

Beware! While it's true that "sort -u" and "sort|uniq" are equivalent, any additional options to sort can break the equivalence. Here's an example from the coreutils manual:

For example, 'sort -n -u' inspects only the value of the initial numeric string when checking for uniqueness, whereas 'sort -n | uniq' inspects the entire line.

Similarly, if you sort on key fields, the uniqueness test used by sort won't necessarily look at the entire line anymore. After being bitten by that bug in the past, these days I tend to use "sort|uniq" when writing Bash scripts. I'd rather have higher I/O overhead than run the risk that someone else in the shop won't know about that particular pitfall when they modify my code to add additional sort parameters.

score 7 · Answer 4 · edited May 23 '17 at 12:25

7

sort -u will be slightly faster, because it does not need to pipe the output between two commands

also see my question on the topic: calling uniq and sort in different orders in shell

edited May 23 '17 at 12:25

Community

1
1

answered Aug 01 '10 at 17:11

knittl

246,190
53
318
364

score 3 · Answer 5 · edited Apr 15 '15 at 21:06

3

I have worked on some servers where sort don't support '-u' option. there we have to use

sort xyz | uniq

edited Apr 15 '15 at 21:06

Chris Seymour

83,387
30
160
202

answered Aug 02 '10 at 12:11

Hemant

3,893
6
27
22

2

Would you care to specify which servers with which o/s version and when, roughly? The 7th Edition UNIX™ `sort` supported `-u` and that was the first widely used version of UNIX, so all others (System III, System V, BSD, etc) tended to follow it, so I'd be surprised indeed to find a Unix-like system where `sort` did not support `-u`. – Jonathan Leffler Dec 27 '12 at 15:05

score 2 · Answer 6 · answered Aug 01 '10 at 17:09

2

Nothing, they will produce the same result

answered Aug 01 '10 at 17:09

Jauzsika

3,171
3
23
32

Sort & uniq in Linux shell

6 Answers6

Linked