0

I have a file containing one number per line.

data.txt

00700
00100
07070
01010
07700
07770
70000
10000
00007

I want to print lines that contain a digit 7 and at the same time do not contain its pair 77. I wrote a simple script for this.

script.sh

#!/usr/bin/env bash

cat data.txt | grep -E '\d*7' | grep -v -E '\d*77'

Quick explanation of the regular expression above:

  • First take any digit zero or more times.
  • Then take 7.
  • If the line satisfies the rules above run the search again.
  • Take any digit zero or more times.
  • Take double 7.
  • If the line again satisfies the rules, remove it from the selection (the -v option inverts the selection).

This works fine and outputs the desired result.

output

00700
07070
70000
00007

However I had to start the grep program twice. I then tried a different regular expression.

script.sh

#!/usr/bin/env bash

cat data.txt | grep -E '\d*7[0-689]?\d*'

Which should in my understanding:

  • Take any number zero or more times.
  • Then 7.
  • Then any number except 7 zero or once.
  • Then any number until the end of the line.

However it also selects the lines that contain 77.

output

00700
07070
07700
07770
70000
00007

Is there a better way that starts grep or any other program that uses regular expressions only once?

Inian
  • 80,270
  • 14
  • 142
  • 161
sanitizedUser
  • 1,723
  • 3
  • 18
  • 33
  • 2
    Incorrect dupe. OP is asking `I want to print lines that contain a digit 7 and at the same time do not contain its pair 77` – anubhava Feb 18 '21 at 18:50
  • `cat data.txt | grep -v 77 | grep 7` probably the easiest here – Aven Desta Feb 18 '21 at 19:27
  • 3
    @AvenDesta: Please read full question. OP already knows how to run `grep` twice and wants to avoid doing that. – anubhava Feb 18 '21 at 19:29
  • Does this answer your question? [Unix grep regex containing 'x' but not containing 'y'](https://stackoverflow.com/questions/6063258/unix-grep-regex-containing-x-but-not-containing-y) – Ryszard Czech Feb 18 '21 at 22:47
  • 2
    Please read question completely. Obviously OP already knows how to use a `grep` for `containing 'x' but not containing 'y'`. This question is about doing it in single `grep` and both GNU and POSIX solutions are required. – anubhava Feb 19 '21 at 06:26

3 Answers3

3

With shown samples only, this could be done by simple regex in GNU grep command.

grep -P '(?<!7)7(?!7)' Input_file

OR with covering edge case of 7's coming in starting should be dealt with try:

grep -P '^(?!.*77)(?=.*7)' Input_file

Explanation:

  • Using -P option to enable PCRE regex here.
  • First using (?<!7) negative look behind to make sure 7 is not preceded by 7.
  • Then using negative lookahead (?!7) to make sure 7 is not followed further by any 7 as per need.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
2

You may try this awk alternative:

awk '/7/ && !/77/' data.txt

00700
07070
70000
00007

An equivalent gnu grep would be this:

grep -P '^(?!.*77).*7' data.txt

Or in POSIX grep:

grep -vE '77|^[^7]+$' data.txt
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

This would do it using PCRE:

grep -P '^(?=.*7)((?!77).)*$' Input_file

https://regex101.com/r/ECrwky/1

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77