1

This is an extension of my previous question recommended by the answerer. Basically, I need sed to print text between delimiters. The delimiters could span multiple lines like:

(abc
d)

Below are sample input and output.

Input

(123) 
aa (a(b)cd)) (g) (c)fff 
abcd(aabb 
d)aa (aeb)oe

correct output

123
a(b  
aabb
d

Note: I only want the text between the first pair of delimiters. If the delimiter spans two lines than I just want the text between first pair that span two lines and move on to the third (next) line. Thus, for the last line in the input I printed 'd' and skip (aeb) and move on to the next line.

Community
  • 1
  • 1
Mark
  • 8,408
  • 15
  • 57
  • 81
  • Thanks Mark for posting this. Now what I don't get is how come `(a(b)cd))` became `a(b` and why not show `g` and `c` for `(g)` and `(c)` respectively. – anubhava May 16 '11 at 21:30
  • I have added a further note to clarify things. Basically I want the text between the first pair of delimiters (on each line or if it spans several lines I'll choose that pair and print the text between them). – Mark May 16 '11 at 21:32
  • Thanks for clarifying but it seems to me that these nested `(` and `)` will make it very hard for regex to capture. – anubhava May 16 '11 at 21:37
  • Before reading the answer to your last question I had suggested to switch to awk for doing the job, but now I'm not sure anymore. :-) – ofi May 16 '11 at 21:42
  • hmmm... I don't understand the case you have in mind could you care to elaborate? For example, if its just ((a)()) then => ((a) right? – Mark May 16 '11 at 21:42
  • why everyone wants to solve a multiple-line problem by tweaking single-line utilities? – karatedog May 16 '11 at 21:53
  • @karatedog I am pretty sure sed is powerful enough. – Mark May 16 '11 at 21:57
  • @Mark I understand that sed is powerful. Although I hate the "use the right tool for the right job" type answers as well, I just suggested that sed's core functionality is not about handling entire textfiles. – karatedog May 17 '11 at 12:46

1 Answers1

2

I used a sed script file (called sedscript) containing:

/^[^(]*([^)]*$/N
s/^[^(]*(\([^)]*\)).*/\1/p

and the command line:

sed -n -f sedscript data

Given the input:

(123)
aa (a(b)cd)) (g) (c)fff
abcd(aabb
d)aa (aeb)oe

It produced the output:

123
a(b
aabb
d

You can do it on a single command line with:

sed -n -e '/^[^(]*([^)]*$/N' -e 's/^[^(]*(\([^)]*\)).*/\1/p' data

The first pattern looks for lines containing an open parenthesis without a close parenthesis after it, and reads the next line into memory and starts over. It repeats that until there is a close parenthesis in the data (or EOF). The second pattern looks for the stuff before an open parenthesis, the open parenthesis, remembers the data from there to the first close parenthesis, followed by the close parenthesis and any other stuff; it replaces that with the remembered string and prints it. Anything else is not printed (-n).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Thanks Jonathan! I have tested this for many cases and I am pretty sure it has finally solved the problem. Also thanks to anubhava for his great help as well. – Mark May 16 '11 at 23:41