(I apologize for the vague title. If someone has a better wording, please let me know.)
My question is about a function I wish to implement with sed
that showed up again and again. Currently I have a solution, but it is ugly and destroys some format. I shall describe them below.
Question
Usually I have to handle a file like this
.
.
<pattern A>
.
.
<pattern B>
.. <pattern B1>
..
.. <pattern B2>
..
.. <pattern B3>
<pattern B>
.
.
<pattern A>
<pattern B>
.
.
I usually find that I would like to focus on every thing between/out-of <pattern A>
, or to focus on
<pattern B>
.. <pattern B1>
..
.. <pattern B2>
..
.. <pattern B3>
<pattern B>
by ignoring specific <pattern B>
in the whole file.
Is there any elegant way to do this with sed
?
Concrete Example
1.
From the file
<html>
<div>
1st div
</div>
<div>
2nd div
</div>
..
<div>
10th div
</div>
</html>
how to extract
<div>
3rd div
.
.
7th div
</div>
2.
From the file
<html>
.
.
<ol> # the first <ol> in the whole file
.
.
</ol> # the last </ol> in the whole file
.
How to extract
<ol> # the first <ol> in the whole file
.
.
</ol> # the last </ol> in the whole file
What I've tried
My current solution is very ugly and non-robust. I simply delete all newlines, making the whole file a one-liner, and do lots of ugly sed
-magic.. Fortunately, in my case I can usually input the newlines back.. but this is definitely not the right way.
Please let me know if further information should be provided. I know it's somehow a vague question, but that's exactly I want.. Can sed
detect patterns in the whole file like this? I appreciate your help in advance!
` is as easy as `ruby -rnokogiri -e 'puts Nokogiri::HTML(STDIN).at_css("ol").to_xml' < test.html` in Ruby (with Nokogiri gem).
– Amadan Jul 05 '19 at 01:37