2

I have thousand of files in a directory and each file contains numbers of defined variables starting with keyword DEFINE and ending with a semicolon (;), I want to copy all the occurrences of the data between this keyword(Inclusive) into a target file.

Example: Below is the content of the text file:

/* This code is for lookup */
DEFINE variable as a1 expr= extract (n123f1 using brach, code);

END.

Now from the above content i just want to copy the section starting with DEFINE and ending with ; into a target file i.e. the output should be:

DEFINE variable as a1 expr= extract (n123f1 using brach, code);

this needs to done for thousands of scripts and multiple occurences, Please help out.

Thanks a lot , the provided code works, but to a limited extent only when the whole sentence is in a single line but the data is not supposed to be in one single line it is spread in multiple line like below:

/* This code is for lookup */
DEFINE variable as a1 expr= if branchno > 55
then
extract (n123f1 using brach, code)
else
branchno = null
;

END.

The code is also in the above fashion i need to capture all the data between DEFINE and semicolon (;) after every define there will be an ending semicolon ;, this is the pattern.

Dale
  • 1,903
  • 1
  • 16
  • 24
Bipin
  • 21
  • 2

3 Answers3

2

It sounds like you want grep(1):

grep '^DEFINE.*;$' input > output
Carl Norum
  • 219,201
  • 40
  • 422
  • 469
0

Try using grep. Let's say you have files with extension .txt in present directory,

grep -ho 'DEFINE.*;' *.txt > outfile 

Output:

DEFINE variable as a1 expr= extract (n123f1 using brach, code);

Short Description

-o will give you only matching string rather than whole line, if line also contains something else and want to ommit it.

-h will suppress file names before matching result

Read man page of grep by typing man grep on your terminal

EDIT

If you want capability to search in multiple lines, you can use pcregrep with -M option

pcregrep -M 'DEFINE.*?(\n|.)*?;' *.txt > outfile

Works fine on my system. Check man pcregrep for more details

Reference : SO Question

Community
  • 1
  • 1
jkshah
  • 11,387
  • 6
  • 35
  • 45
  • Thanks a lot the above code works, but to a limited extent only when the whole sentence is in a single line but the data is not supposed to be in one single line it is spread in multiple line like below : /* This code is for lookup */ DEFINE variable as a1 expr= if branchno > 55 then extract (n123f1 using brach, code) else branchno = null ; END. The code is also in the above fashion i need to capture all the data between DEFINE and semicolon (;) after every define there will be an ending semicolon ; , this is the pattern. – Bipin Oct 18 '13 at 09:08
  • @Bipin Check **EDIT** in may ans. It would be good if you can update the question with additional requirement so that it would be helpful to people who come across this question in future. – jkshah Oct 18 '13 at 09:55
0

One can make a simple solution using sed with version :

sed -n -e '/^DEFINE/{:a p;/;$/!{n;ba}}' your-file

Option -n prevents sed from printing every line; then each time a line begins with DEFINE, print the line (command p) then enter a loop: until you find a line ending with ;, grab the next line and loop to the print command. When exiting the loop, you do nothing.

It looks a bit dirty; it seems that the version sed15 has a shorter (and more straightforward) way to achieve this in one line:

sed -n -e '/^DEFINE/,/;$/p' your-file

Indeed, only for this version of sed, both patterns are treated; for other versions of sed like mine under cygwin, the range patterns must be on separate lines to work properly.

One last thing to remember: it does not treat inclusive patterned ranges, i.e. it stops printing after the first encountered end-pattern even if multiple start patterns have been matched. Prefer something with awk if this is a feature you are looking for.

Bentoy13
  • 4,886
  • 1
  • 20
  • 33