2

I have a problem where I have a large amount of files that I need to scan and return a line and its following line, but only when the following line begins with a string.

String one - line one must begin with 'Bill'
String two - line two must begin with 'Jones'.

If these two criteria are matched, it returns the two lines. Repeat for the whole file.

ie. original file:

Edith Blue
Edith Green
Edith Red
Bill Blue
Jones Red
Edith Green
Bill Green
Edith Red
Jones Green
Bill Blue

I'd want it to return only:

Bill Blue
Jones Red

Any ideas? No idea where to begin with this, I only have basic scripting skills with sed/awk etc... At the moment I am using this to get the filename and its following line, but it is giving me too much useless information that I have to strip off with other sed commands.

grep -A 1 "^Bill" * > test.txt

I guess there's a far more elegant way of getting only the lines I need. Any help would be lovely!

captain yossarian
  • 447
  • 3
  • 10
  • 22
  • What you really need is pcregrep. Have a look [here][1] [1]: http://stackoverflow.com/questions/152708/how-can-i-search-for-a-multiline-pattern-in-a-file-use-pcregrep/152711#152711 – Nehal Dattani Oct 18 '13 at 18:09

5 Answers5

2

As an extension of your initial approach, a simple solution is to grep lines starting with "Bill" returning one after, then find lines starting with "Jones" returning one before....

grep -A1 "^Bill" myfile.txt | grep "^Jones" -B1

Output:

Bill Blue
Jones Red

Side note: as a true test, your input file should probably have some lines where Bill and Jones are not at the start of the line...

Edith Blue
Edith Jones
Edith Red
Bill Blue
Jones Red
Edith Bill
Bill Jones
Edith Red
Jones Green
Bill Blue
beroe
  • 11,784
  • 5
  • 34
  • 79
1

Use the getline() instruction of for each line that begins with Bill:

awk '
    $1 ~ /^Bill/ { 
        getline l
        if ( l ~ /^Jones/ ) { 
            printf "%s\n%s\n", $0, l 
        } 
    }
' infile

It yields:

Bill Blue
Jones Red
Birei
  • 35,723
  • 2
  • 77
  • 82
1

And here is another way using awk with a flag:

$ awk '$1=="Bill"{p=1;a=$0;next};$1=="Jones"&&p{print a;print};{p=0}' file
Bill Blue
Jones Red
user000001
  • 32,226
  • 12
  • 81
  • 108
1

Here is a simple python script:

FILE = 'test.text'

f = open(FILE,'r')

one = 'Bill'
two = 'Jones'

prev = ''

for line in f:
    if prev.startswith(one) and line.startswith(two):
        print prev,line.rstrip()
    prev = line

Yields:

python FileRead.py
Bill Blue
Jones Red
m3h2014
  • 156
  • 5
0

This might work for you (GNU sed):

sed -n '$!N;/^Bill.*\nJones/p;D' file
potong
  • 55,640
  • 6
  • 51
  • 83