Remove line (omit from the output) if the previous line is a prefix

Question

This is very similar to the question "remove duplicate lines with similar prefix" but it's the other way around:

Given an input of sorted strings (in this case, directories) like:

a/
a/b/c/
a/d/
bar/foo/
bar/foo2/
c/d/
c/d/e/

I want to remove the lines from the output, if the previous line is a prefix of the current line. In this case, the output would be:

a/
bar/foo/
bar/foo2/
c/d/

This would be pretty easy to code in Python etc, but in this case I am in shell environment (bash, sort, sed, awk...). (Re-sorting is fine.)

Is `a/b/c/` a prefix of `a/d/`? – Cyrus May 27 '18 at 00:57 — Cyrus, May 27 '18 at 00:57
Since `a/` exists `a/d/` should not exist in the output. – ahmet alp balkan May 27 '18 at 02:51 — ahmet alp balkan, May 27 '18 at 02:51

jxc · Accepted Answer · 2018-05-27T15:54:25.040

4

use awk:

awk '{if(k && match($0, k))next; k="^"$0}1' file

k="^"$0 to anchor the pattern to the beginning of the string.

Probably need NF>0 before the main block in case there are EMPTY lines.

Update: there could be issues if regex meta characters exist in the variable k, the below line without using regex should be better:

awk '{if(k && index($0, k)==1)next; k=$0}1' file

Update-2: thanks @Ed, I've adjusted the 2nd method to cover non-empty lines which evaluated to zeros (empty lines will be kept as-is though):

awk '{if(k!="" && index($0,k)==1)next;k=$0}1' file

edited May 27 '18 at 15:54

answered May 27 '18 at 01:18

jxc

13,553
4
16
34

1

@EdMorton, good catch-up, the 2nd one does have issue when the lines are zero or empty. the first one should be fine due to the prefixed `^`. – jxc May 27 '18 at 15:12

score 2 · Answer 2 · answered May 27 '18 at 01:09

Perl 1-liner. Loop over the input lines -n and then execute -e the following program, checking to see if the beginning of the current line matches the last line, printing the non-matches.

perl -ne 'print unless m|^$last|; chomp($last=$_);' file_list.txt

score 2 · Answer 3 · answered May 27 '18 at 01:45

Bash itself (in fact POSIX shell) provides all you need through parameter expansion with substring removal. All you need to do is check whether the line you read matches itself with the prefix removed. If it doesn't, you have a prefixed line, otherwise, you have a non-prefixed line. Then it is a simple matter of outputting the non-prefixed line and setting the prefix to the current line -- and repeat, e.g.

#!/bin/bash

pfx=    ## prefix

## read each line
while read -r line; do 
    ## if no prefix or line matches line with prefix removed
    if [ -z "$pfx" -o "$line" = "${line#$pfx}" ]
    then
        printf "%s\n" "$line"   ## output lile
        pfx="$line"             ## set prefix to line
    fi
done < "$1"

(note: if there is a chance that an input file that does not contain a POSIX end-of-file, e.g. a '\n' on the final line of the file, then you should check the contents of line as a condition of your while, e.g. while read -r line || [ -n "$line" ]; do ... )

Example Input File

$ cat string.txt
a/
a/b/c/
a/d/
bar/foo/
bar/foo2/
c/d/
c/d/e/

Example Use/Output

$ bash nonprefix.sh string.txt
a/
bar/foo/
bar/foo2/
c/d/

score 1 · Answer 4 · answered May 27 '18 at 14:54

1

$ awk 'NR==1 || index($0,prev)!=1{prev=$0; print}' file
a/
bar/foo/
bar/foo2/
c/d/

answered May 27 '18 at 14:54

Ed Morton

188,023
17
78
185

this one will not survive if the first line is EMPTY – jxc May 27 '18 at 15:13
1

I get what you're saying but we don't know if this will behave correctly or not since the OP hasn't told us what the correct behavior is if the previous line is empty. In other words it's Undefined Behavior so anything a tool does is compliant/correct :-). – Ed Morton May 27 '18 at 15:15

Remove line (omit from the output) if the previous line is a prefix

4 Answers4