I was trying to remove lines that had an element already seen in them (an ID) while keeping the first that appeared with sed. I found a solution but it was not explained at all and I am struggling to understand it.
Example of test.txt (IDs will not always be numerically sorted but duplicates will follow each others) :
1
2
3
3
4
4
4
5
6
7
7
Result wanted :
1
2
3
4
5
6
7
The code:
#creates array of Ids
mapfile -t id_array < <(cut -f1 test.txt)
#loops over IDs
for (( i=0; i < ${#id_array[@]}; i++ ))
do
prev=$(($i-1))
#compares each ID with the previous one, if same then adds it to index
if (( ${id_array[$prev]} == ${id_array[$i]} ))
then
index_array+=($i)
fi
done
#line I dont fully understand, removes lines from array
sed -i ''"${index_array[*]/%/d;}"'' test.txt
The last line deletes inplace the lines indexed in the arrray. [*]
expands all values in a single word ([@]
would not work as it expands each value in its in own word). The /%/
replaces whitespaces with d;
with parameters expansion. But I completely fail to understand the ''
on each side. Just one simple quote does not not work. Why ?
EDIT: it came to me that its was to keep the first (internal) '
to keep the sed expression in single quotes as required, true ?