Indexing words in a file according to their line with AWK

Question

Suppose I have a file similar to the following:

hello
hello
hi
hi
hello
hey

I would like to find the indices of every unique line and using a comma as the indices separator. So ideally, the output would be like:

hello 1,2,5
hi 3,4
hey 6

What has been done in getting the value of lines by using the following codes,

{ arr[$0]++ }
END { for (i in arr) {
        print i
    }
}

the result is,

hey
hi
hello

Please [edit] your question to show what you've tried so far so we can help you with that. See [ask]. — Ed Morton, Dec 10 '20 at 14:36
Please note that the only unique line in your file is "hey". — Daemon Painter, Dec 10 '20 at 15:40
if you want a quick look into these, I would use: ```cat -n ``` — azbarcea, Dec 10 '20 at 15:42

Daemon Painter · Accepted Answer · 2020-12-11T08:59:53.757

0

Try using this script

{
  words[$0] = words[$0] == "" ? FNR : words[$0] "," FNR        # appends the line, sorting for the word
}

END {                                # once we are done reading the file
  for (w in words)                     # for each word, the sorting order depends on awk internal variables.
  {
    print w, words[w]             # prints the desired output
  }
}

Please see Controlling Array Traversal for more details on how the words are going to be printed out and how to control it. For more details on FNR see What are NR and FNR.

edited Dec 11 '20 at 08:59

answered Dec 10 '20 at 15:36

Daemon Painter

3,208
3
29
44

I have tried this script, however, there exist some problems. First, parentheses are needed in the _for_ statement. Second, after fixing it, the result looks like `hey 2`, where the second column of all three lines is 2. – hsuh01 Dec 10 '20 at 23:44
Ah yes, you are right. Let me fix that for you. In addition, I took the opportunity to use a more elegant solution to manage the comma on the first line. Still, it is one of the many possibilities. Your implementation was not too far out, but you weren't tracking the line number, only the record content. – Daemon Painter Dec 11 '20 at 09:01

Indexing words in a file according to their line with AWK

1 Answers1