How to remove duplicate lines in a file?

Question

I understand that the general approach is to use something like

$ sort file1.txt | uniq > file2.txt

But I was wondering if there was a way to do this without needing separate source and destination files, even if it means it can't be a one-liner.

See: [How can I use a file in a command and redirect output to the same file without truncating it?](https://stackoverflow.com/q/6696842/3776858) — Cyrus, May 09 '22 at 20:48

Ed Morton · Answer 1 · 2022-05-09T18:48:38.267

With GNU awk for "inplace" editing:

awk -i inplace '!seen[$0]++' file1.txt

As with all tools (except ed which requires the whole file to be read into memory first) that support "inplace" editing (sed -i, perl -i, ruby -i, etc.) this uses a temp file behind the scenes.

With any awk you can do the following with no temp files used but about twice the memory used instead:

awk '!seen[$0]++{a[++n]=$0} END{for (i=1;i<=n;i++) print a[i] > FILENAME}' file

M. Nejat Aydin · Accepted Answer · 2022-05-09T20:10:08.587

3

Simply use the -o and -u options of sort:

sort -o file -u file

You don't need even to use a pipe for another command, such as uniq.

edited May 09 '22 at 20:10

answered May 09 '22 at 19:47

M. Nejat Aydin

9,597
1
7
17

score 2 · Answer 3 · answered May 09 '22 at 18:37

2

With Perl's -i:

perl -i -lne 'print unless $seen{$_}++' original.file

-i changes the file "in place";
-n reads the input line by line, running the code for each line;
-l removes newlines from input and adds them to print;
The %seen hash idiom is described in perlfaq4.

answered May 09 '22 at 18:37

choroba

231,213
25
204
289

I don't think `-l` is needed here: you're not manipulating the line, just treating it as an atomic string. – glenn jackman May 09 '22 at 19:01
1

@glennjackman: If the last line didn't end in a newline, you could get a duplicate... – choroba May 09 '22 at 19:11
@choroba if the last line didn't end in a newline then it wouldn't be a valid POSIX text file and so YMMV with what any tool does with it. – Ed Morton May 09 '22 at 19:57
@EdMorton: The OP doesn't mention POSIX, it doesn't mention "text file" either. – choroba May 09 '22 at 20:52
I know, but the `sort` utility the OP is using in the question is only guaranteed to work on text files so I think it's reasonable to assume the input is text files. – Ed Morton May 09 '22 at 22:24

score 1 · Answer 4 · answered May 09 '22 at 18:35

A common idiom is:

temp=$(mktemp)
some_pipeline < original.file > "$temp" && mv "$temp" original.file

The && is important: if the pipeline fails, then the original file won't be overwritten with (perhaps) garbage.

The Linux moreutils package contains a program that encapsulates this away:

some_pipeline < original.file | sponge original.file

How to remove duplicate lines in a file?

4 Answers4