0

I have a file as as follows:

abc
abc
123
xyz
foo
foo
bar

What I'm trying to do is determine which rows in the file are unique, perhaps using grep or awk. So in this case my command should output

123
xzy
bar

...and ignore lines that contain a string that occurs again. The file has already been sorted. Thanks in advance.

user3653270
  • 166
  • 1
  • 8
  • What about `uniq -u`? Pretty sure this has been asked before. – Benjamin W. Apr 19 '16 at 14:20
  • 1
    @BenjaminW. Or more appropriately, possibly - `sort -u`, since the example input is definitely not sorted, despite OP's comment that the file has already been sorted. Also, I think `-u` might be a bit redundant, although the man page I have for `uniq` is a bit sparse. I can imagine that option might exist solely to override a previous `-d` or something, but otherwise it's what `uniq` does by default... – twalberg Apr 19 '16 at 14:30
  • ok - great. That makes sense and works. Sorry for the duplicate - had searched on the wrong type of questions. – user3653270 Apr 19 '16 at 14:30
  • 1
    using `awk`:awk '!seen[$0]++ {lines[i++]=$0} > END {for (i in lines) if (seen[lines[i]]==1) print lines[i]}' file bar 123 xyz – justaguy Apr 19 '16 at 14:42
  • This is not getting the unique entries, referenced question and this are not the same. This is asking the entries which don't have duplicates. – karakfa Apr 19 '16 at 14:51
  • @karakfa There are two duplicate questions linked, and [this one](http://stackoverflow.com/questions/13778273/find-unique-lines) seems exactly what was asked, no? – Benjamin W. Apr 19 '16 at 15:00
  • @twalberg Try `printf "1\n1\n2\n" | uniq` and `printf "1\n1\n2\n" | uniq -u` to see the difference. "Only print unique lines" in this context means "don't print repeated lines at all, not even once." – Benjamin W. Apr 19 '16 at 15:03
  • OK, I missed the second link. Was talking about the first one which is not the same. Second one is identical. – karakfa Apr 19 '16 at 15:08
  • @BenjaminW. Yep... Went and researched it after I commented. I understand the difference now, and it does sound a bit more like what the OP was asking, although it wasn't extremely clear... – twalberg Apr 19 '16 at 15:28
  • `awk '{a[$0]++}END{for(i in a)if(a[i]<2)print i}'` –  Apr 19 '16 at 17:57
  • @twalberg The OP says that the file is already sorted. We have no reason to doubt that statement, even though the file is not sorted in lexicographical order. Therefore, the order of lines must not be changed. – Michael Vehrs Apr 20 '16 at 06:04
  • @MichaelVehrs You are correct - the question states that the input is sorted. However, the example data provided is arguably not sorted - at least by the most common collating sequences - although it does at least appear that duplicate lines are collated together. The fact that the question text and example data seem to disagree makes the question somewhat unclear. – twalberg Apr 20 '16 at 19:07

0 Answers0