0

I have a list of comma separated files under the directory. There are no headers, and unfortunately they are not even the same length for each row.

I want to find the unique entry in the first column across all files.

What's the quickest way of doing it in shell programming?

awk -F "," '{print $1}' *.txt | uniq

seems to only get uniq entries of each files. I want all files.

CuriousMind
  • 15,168
  • 20
  • 82
  • 120

1 Answers1

0

Shortest is still using awk (this will print the row)

awk -F, '!a[$1]++' *.txt

to get just the first field

awk -F, '!a[$1]++ {print $1}' *.txt
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • 2
    It doesn't print singular elements, but whole rows on my machine (GNU Awk 3.1.5). You probably meant this `awk -F, '!a[$1]++ {print $1}' *.txt` – Eugeniu Rosca Jun 10 '15 at 16:15
  • @chatraed yes, default is to print the row, need to add `{print $1}` to just to get the first field. – karakfa Jun 10 '15 at 16:24
  • Thanks. but why does my script not work tho? `awk -F "," '{print $1}' *.txt | uniq` – CuriousMind Jun 10 '15 at 16:27
  • @CodeNoob uniq only works on sorted lists, the duplicates need to be consecutive. Need to insert sort between awk and uniq. – karakfa Jun 10 '15 at 16:28