0

I was trying to remove lines that had an element already seen in them (an ID) while keeping the first that appeared with sed. I found a solution but it was not explained at all and I am struggling to understand it.

Example of test.txt (IDs will not always be numerically sorted but duplicates will follow each others) :

1
2
3
3
4
4
4
5
6
7
7

Result wanted :

1
2
3
4
5
6
7

The code:

#creates array of Ids
mapfile -t id_array < <(cut -f1 test.txt)
#loops over IDs
for (( i=0; i < ${#id_array[@]}; i++ )) 
do
     prev=$(($i-1))
     #compares each ID with the previous one, if same then adds it to index
     if (( ${id_array[$prev]} == ${id_array[$i]} ))
     then 
          index_array+=($i)
     fi
done
#line I dont fully understand, removes lines from array
sed -i ''"${index_array[*]/%/d;}"'' test.txt

The last line deletes inplace the lines indexed in the arrray. [*] expands all values in a single word ([@] would not work as it expands each value in its in own word). The /%/ replaces whitespaces with d; with parameters expansion. But I completely fail to understand the '' on each side. Just one simple quote does not not work. Why ?

EDIT: it came to me that its was to keep the first (internal) ' to keep the sed expression in single quotes as required, true ?

Yama
  • 90
  • 6
  • 2
    Why not just use `uniq test.txt`? – Barmar Aug 31 '23 at 17:11
  • 1
    They serve no purpose. As for your task, it looks like you want to "uniq based on a single column", something like: https://stackoverflow.com/a/76605540 – jqurious Aug 31 '23 at 17:11
  • `''` is an empty string. So this is simply concatenating two empty strings around the `sed` argument, which has no effect. – Barmar Aug 31 '23 at 17:12
  • @Barmar My file has other columns. Each line might have the same ID but will have different values in other columns. I did not include more information because it was not relevant. – Yama Aug 31 '23 at 17:38
  • 1
    Then `awk` would probably be easier. – Barmar Aug 31 '23 at 17:39
  • @jqurious I do not want just 'uniq' but the first occurence. There are other columns which will have different values even if they have the same value at the first column. – Yama Aug 31 '23 at 17:40
  • 1
    `awk '!a[$1] { a[$1] = 1; print }' test.txt` – Barmar Aug 31 '23 at 17:40
  • @Barmar I'll keep that solution, really need to learn awk. Indeed the quotes have no effect, it went above my head. Solved ! – Yama Aug 31 '23 at 17:46

0 Answers0