Select lines that contain a digit but not a pair of that digit in Bash using regular expressions

Question

I have a file containing one number per line.

data.txt

I want to print lines that contain a digit 7 and at the same time do not contain its pair 77. I wrote a simple script for this.

script.sh

#!/usr/bin/env bash

cat data.txt | grep -E '\d*7' | grep -v -E '\d*77'

Quick explanation of the regular expression above:

First take any digit zero or more times.
Then take 7.
If the line satisfies the rules above run the search again.
Take any digit zero or more times.
Take double 7.
If the line again satisfies the rules, remove it from the selection (the -v option inverts the selection).

This works fine and outputs the desired result.

output

However I had to start the grep program twice. I then tried a different regular expression.

script.sh

#!/usr/bin/env bash

cat data.txt | grep -E '\d*7[0-689]?\d*'

Which should in my understanding:

However it also selects the lines that contain 77.

output

Is there a better way that starts grep or any other program that uses regular expressions only once?

Incorrect dupe. OP is asking `I want to print lines that contain a digit 7 and at the same time do not contain its pair 77` — anubhava, Feb 18 '21 at 18:50
`cat data.txt | grep -v 77 | grep 7` probably the easiest here — Aven Desta, Feb 18 '21 at 19:27
@AvenDesta: Please read full question. OP already knows how to run `grep` twice and wants to avoid doing that. — anubhava, Feb 18 '21 at 19:29
Does this answer your question? [Unix grep regex containing 'x' but not containing 'y'](https://stackoverflow.com/questions/6063258/unix-grep-regex-containing-x-but-not-containing-y) — Ryszard Czech, Feb 18 '21 at 22:47
Please read question completely. Obviously OP already knows how to use a `grep` for `containing 'x' but not containing 'y'`. This question is about doing it in single `grep` and both GNU and POSIX solutions are required. — anubhava, Feb 19 '21 at 06:26

RavinderSingh13 · Answer 1 · 2021-02-18T20:05:43.897

With shown samples only, this could be done by simple regex in GNU grep command.

grep -P '(?<!7)7(?!7)' Input_file

OR with covering edge case of 7's coming in starting should be dealt with try:

grep -P '^(?!.*77)(?=.*7)' Input_file

Explanation:

Using -P option to enable PCRE regex here.
First using (?<!7) negative look behind to make sure 7 is not preceded by 7.
Then using negative lookahead (?!7) to make sure 7 is not followed further by any 7 as per need.

anubhava · Accepted Answer · 2021-02-18T19:54:07.527

2

You may try this awk alternative:

awk '/7/ && !/77/' data.txt

00700
07070
70000
00007

An equivalent gnu grep would be this:

grep -P '^(?!.*77).*7' data.txt

Or in POSIX grep:

grep -vE '77|^[^7]+$' data.txt

edited Feb 18 '21 at 19:54

answered Feb 18 '21 at 18:55

anubhava

MonkeyZeus · Answer 3 · 2021-02-18T19:31:25.310

1

This would do it using PCRE:

grep -P '^(?=.*7)((?!77).)*$' Input_file

edited Feb 18 '21 at 19:31

answered Feb 18 '21 at 19:03

MonkeyZeus

2

However that also prints `00100`, `01010` and `10000`. – sanitizedUser Feb 18 '21 at 19:05
@sanitizedUser Well that's embarrassing, fixed! – MonkeyZeus Feb 18 '21 at 19:31

3 Answers3