-3

How can we achieve this using sed or awk?

I have now included the text in a code block to make it clear.

The code block part should be printed - is the requirement.

LOGIC 1:

The text 'abc' will be our keyword here which will be unique and will only occur within the code block part

So we'll have to search for 'abc' and from that line till the last occurrence of 'abc' all lines should be printed inclusive

LOGIC 2:

Based on page numbers i.e. select text between page 1 and page n again inclusive Note: 'Page 1' and 'Page 1 - Page n' can come multiple times.

The whole text is a part of a 4GB file which needs to be parsed for similar occurrences.

Apologies for not being clear.

START OF TEXT IN THE FILE:

Xyz Page: 1

a

b

c

d

e

QWE Page: 1

e

r

t

y

asdabc       Page: 1

t

y

u

I

o

ghjabc       Page: 2

e

d

c

b

bnmabc       Page: 3

uia

asd

ads

thm Page: 1

as

das

da

END OF TEXT IN FILE

m21
  • 66
  • 1
  • 7
  • 1
    Read https://stackoverflow.com/a/17914105/1745001 and if afterwards you still have a question then read [ask] and try again. – Ed Morton Jun 08 '17 at 03:20

2 Answers2

1

I really don't know what exactly you want to print, but you should be able to use sed:

sed -n '/start pattern/,/end pattern/p' <file>
Jack
  • 5,801
  • 1
  • 15
  • 20
  • Thanks Jack but I was unable to put my requirements exactly the first time I guess. Can you have a look now please? – m21 Jun 08 '17 at 03:55
0

You may achieve it by using awk,

awk 'BEGIN{a=0} /.*Page/{if(index($0,"abc")!=0){a=1} else{a=0}} a==1{print}' <Your_File>

Output:

asdabc       Page: 1

t

y

u

I

o

ghjabc       Page: 2

e

d

c

b

bnmabc       Page: 3

uia

asd

ads

Here's what I do here,

  1. set the flag 'a' to determine if print or not
  2. find string "Page" in the line, and then check if "abc" in it
  3. start to print lines
  4. print the line until next "Page" show up but no "abc" in it
CWLiu
  • 3,913
  • 1
  • 10
  • 14