-2

Suppose I have a file similar to the following:

hello
hello
hi
hi
hello
hey

I would like to find the indices of every unique line and using a comma as the indices separator. So ideally, the output would be like:

hello 1,2,5
hi 3,4
hey 6

What has been done in getting the value of lines by using the following codes,

{ arr[$0]++ }
END { for (i in arr) {
        print i
    }
}

the result is,

hey
hi
hello
hsuh01
  • 13
  • 2

1 Answers1

0

Try using this script

{
  words[$0] = words[$0] == "" ? FNR : words[$0] "," FNR        # appends the line, sorting for the word
}

END {                                # once we are done reading the file
  for (w in words)                     # for each word, the sorting order depends on awk internal variables.
  {
    print w, words[w]             # prints the desired output
  }
}

Please see Controlling Array Traversal for more details on how the words are going to be printed out and how to control it. For more details on FNR see What are NR and FNR.

Daemon Painter
  • 3,208
  • 3
  • 29
  • 44
  • I have tried this script, however, there exist some problems. First, parentheses are needed in the _for_ statement. Second, after fixing it, the result looks like `hey 2`, where the second column of all three lines is 2. – hsuh01 Dec 10 '20 at 23:44
  • Ah yes, you are right. Let me fix that for you. In addition, I took the opportunity to use a more elegant solution to manage the comma on the first line. Still, it is one of the many possibilities. Your implementation was not too far out, but you weren't tracking the line number, only the record content. – Daemon Painter Dec 11 '20 at 09:01