I have files containing lots of lines of data, some of which are duplicated. I want to delete duplicate lines if they follow each other.

For example if the input file contained this:


I would want the output file to read:


I am fairly new to bash scripts. I am presuming awk is the way to go but I am a bit stumped. Any help appreciated.

  • 31
  • 2

3 Answers3


The command uniq does precisely this.

It is very often used in combination with sort so that duplicates will be adjacent.

  • 175,061
  • 34
  • 275
  • 318

You can use awk:

awk '$0==b{next}{b=$0;print}' a.txt

I'm using the variable b which stands for buffer. If the current line is already in the buffer it does not print the line. Otherwise it puts the line to the buffer an prints it.

  • 152,036
  • 28
  • 249
  • 266
  • You made it, I prefer this to my answer. Good one. – fedorqui Jun 11 '14 at 15:24
  • @fedorqui Thx! The one from anubhava is even shorter as he reverted the logic.. – hek2mgl Jun 11 '14 at 15:25
  • 1
    @fedorqui anubhava's solution is in fact same as yours. in your deleted solution, the `!a[$0]++` doesn't make any sense, you can just remove it, then it would be same as anub..'s. – Kent Jun 11 '14 at 15:28
  • @Kent I see... I was thinking in [How can I delete duplicate lines in a file in Unix?](http://stackoverflow.com/questions/1444406/how-can-i-delete-duplicate-lines-in-a-file-in-unix) and then noticed it wasn't the way. But people is so fast here and other better answer were already posted, so I just deleted instead of trying to reformulate it :) – fedorqui Jun 11 '14 at 15:30
  • 1
    I also tried posting an answer, I came with same as yours @fedorqui, (without the `!a[$0]++`) then you posted first. I think I should comment your answer, after wrote the comment, you deleted!! and anubhava's answer came, I even thought I can write if check in a positive check, which collided hek2mgls again!.... you guys are too damn fast!!! – Kent Jun 11 '14 at 15:34
  • 1
    @Kent I hope your brain didn't explode after this situation :D In general, it is always complicated to know what to do. Sometimes we write some answers within a couple of minutes and any small change can make the answer collide with another one... I am learning to delete on the spot, to avoid strange situations. Also, you are so polite commenting first and giving good ideas! :) – fedorqui Jun 11 '14 at 15:49

This awk should also work:

awk '$1!=p{print} {p=$1}' file

Or you can shorten this even further:

awk '$1!=p; {p=$1}' file
  • 761,203
  • 64
  • 569
  • 643
  • Brilliant, just what I needed. Thanks for the fast response everyone. – ghfunk Jun 11 '14 at 15:36
  • 1
    Nice awk solution, although I didn't know about 'uniq' which does just what I want. – ghfunk Jun 11 '14 at 15:46
  • @ghfunk Note that this would only work as long as the _strings_ do not contain any spaces. – devnull Jun 11 '14 at 15:47
  • It can be easily changed to use `$0` instead of `$1` for including spaces. – anubhava Jun 11 '14 at 15:52
  • That might be obvious to those who already know the answer, not to somebody asking this. – devnull Jun 11 '14 at 15:53
  • Besides, `awk` is clearly the wrong tool for this specific case and is several times slower than `uniq`. – devnull Jun 11 '14 at 15:55
  • @devnull: Just curious if you have run any benchmarks on `sort+uniq` vs `awk` – anubhava Jun 11 '14 at 15:58
  • `sort` is not even required. It seems that you need to read the `uniq` manual. Regarding the benchmarks, there are some statistics available on the linked duplicate. – devnull Jun 11 '14 at 15:59
  • So it makes it evident that you need to RTFM. See `man uniq`. Just because a flawed answer gathers upvotes doesn't necessarily make it a good or great answer. Good luck! – devnull Jun 11 '14 at 16:01
  • 1
    I very well know what `uniq` does and how to use it. I mentioned `sort` since triplee suggested use of `sort` with `uniq`. – anubhava Jun 11 '14 at 16:03
  • Thanks for all your responses I am learning a lot. uniq does just what I want. – ghfunk Jun 11 '14 at 16:07
  • @ghfunk: Yes `uniq` (without sort) is indeed the simplest solution for your problem. – anubhava Jun 11 '14 at 16:08