capturing unique section irrespective of even or odd occurences

Question

I have a text file where a particular set of consecutive lines appear again and again. I need to trim all the duplicate occurrences and just print the first occurrence alone.

Input:

$ cat log_repeat.txt
total bytes = 0, at time = 1190554
time window = 0, at time = 1190554
BW in Mbps = 0, at time = 1190554
total bytes = 0, at time = 1190554
time window = 0, at time = 1190554
BW in Mbps = 0, at time = 1190554
total bytes = 0, at time = 1190554
time window = 0, at time = 1190554
BW in Mbps = 0, at time = 1190554
total bytes = 0, at time = 1190554
time window = 0, at time = 1190554
BW in Mbps = 0, at time = 1190554
total bytes = 0, at time = 1190554
time window = 0, at time = 1190554
BW in Mbps = 0, at time = 1190554

$

The below Perl solution works only when there are odd occurrences,

$ perl -0777 -pe 's/(^total.*)\1//gms ' log_repeat.txt
total bytes = 0, at time = 1190554
time window = 0, at time = 1190554
BW in Mbps = 0, at time = 1190554

$

and prints nothing when there are even occurrences. How do I get the first occurrence irrespective of the section repeating odd or even times.

You can simply load all lines in a array, use `uniq()` function, and then print all elements in array, this question can help you https://stackoverflow.com/questions/7651/how-do-i-remove-duplicate-items-from-an-array-in-perl — Mobrine Hayde, Mar 01 '19 at 12:32
@MobrineHayde.. no, I need to get them in order.. also the section can span many lines.. in the given sample, it spans across 3 lines.. — stack0114106, Mar 01 '19 at 12:35

zdim · Accepted Answer · 2019-03-04T01:44:35.587

2

Match your block, multiple times greedily, as long as all that is followed by yet another

perl -0777 -wpe's/(total.*)+(?=\1)//s' log_repeat.txt

The lookahead ensures that one (last one) remains since it doesn't consume its match.

Or, keep the first match, by discarding it with \K, and remove others

perl -0777 -wpe's/(total.*?)\K\1+//s' log_repeat.txt

Note that .*? that must be used here has differences with .*, while probably not practical ones.

edited Mar 04 '19 at 01:44

answered Mar 01 '19 at 17:41

zdim

64,580
5
52
81

I left out `^` (and thus `/m`) as it takes a great conspiracy to have another `total` inside a line _and_ the same pattern between pairs of them; it's kinda a little impossible -- or, it doesn't make sense. However, the `^` _is_ informative there and it doesn't hurt adding it. – zdim Mar 01 '19 at 18:59

score 1 · Answer 2 · answered Mar 01 '19 at 12:54

1

The problem is that the substitution s/(^total.*)\1//gms deletes pairs of blocks. You can fix this by only deleting a single block at a time using a lookahead:

perl -0777 -pe 's/(^total.*)(?=\1)//gms' log_repeat.txt

answered Mar 01 '19 at 12:54

Håkon Hægland

39,012
21
81
174

capturing unique section irrespective of even or odd occurences

2 Answers2