Extract section of file between two constants

Question

ksh solaris10.

I have a large text file as below

Cell 011
458754544 5.91
459923124 100.00

Cell 055
123456789 0.99
123454787 0.55

Cell 094
18759844 5.44
13549986 
<end of file>

I want to extract just the rows in the Cell 055 section.

I've done it for the Cell 094 section as below

sed -n '/Cell 094 :/,$p' $INFILE | grep \\. | sed 's/^  //g' | sed 's/ \{1,\}/,/g'

I've forgotten how sed works in this context, and I cannot work out how to extract just up to the 'Cell 094' text.

can you add expected output for clarity? I think what you are looking for is `awk -v RS= '/Cell 055/' $INFILE` — Sundeep, Oct 02 '17 at 15:58
see also https://stackoverflow.com/questions/38972736/how-to-select-lines-between-two-patterns — Sundeep, Oct 02 '17 at 16:01
Is there a way to mark unresolved and close? I'm going to have to do it manually - none of the solutions have worked. — Ben Hamilton, Oct 09 '17 at 09:17
I can't see any reason why you would say solutions are not working, it is up to you to give proper details — Sundeep, Oct 09 '17 at 09:35

score 4 · Answer 1 · answered Oct 02 '17 at 16:35

4

It's not exactly clear what is your expected output, but sed can easily extract a range of lines via range addressing, where each address can be a line number, or a regular expression.

For example, to get the complete block that starts with Cell 055 and ends with a blank line:

$ sed -n '/Cell 055/,/^$/p' file
Cell 055
123456789 0.99
123454787 0.55

Alternatively, to get only the meat, without the range start and end lines:

$ sed -n '/Cell 055/,/^$/{//!p}' file
123456789 0.99
123454787 0.55

answered Oct 02 '17 at 16:35

randomir

17,989
1
40
55

I am getting command garbled error with those seds? Solaris 10. – Ben Hamilton Oct 06 '17 at 12:52
Can you post the "garbled" output? Both commands should work for POSIX `sed`. – randomir Oct 06 '17 at 13:01
@BenHamilton, looks like the default `sed` on your Solaris is the BSD `sed` (found in `/usr/bin/sed`), a rather old version. You should try with the POSIX-compliant version found in `/usr/xpg4/bin/sed`. (see [this thread](https://groups.google.com/forum/#!topic/comp.unix.solaris/Zb-K6P0UPDg)) – randomir Oct 06 '17 at 13:08
Using /usr/xpg4/bin/sed , I get error 'sed: command garbled: /Cell 055/,/^$/{//!p}' – Ben Hamilton Oct 06 '17 at 13:47
Maybe `//` isn't supported in your `sed` (empty regex should repeat the last regex match). You can try a safer alternative: `sed -n '/Cell 055/,/^$/{/Cell 055/!{/^$/!p}}' file`. I assume the first example works? – randomir Oct 06 '17 at 14:03
This is one of the **many** reasons I keep saying "sed is for `s/old/new/`, that is all" because there's a million sed variants out there and even the simplest operations beyond `s/old/new/` usually venture into constructs that aren't supported across all of them and just look at how complicated the script gets very quickly! – Ed Morton Oct 06 '17 at 19:28
@EdMorton, I'm sorry to say that, but I'm afraid that's the story of all software tools, more or less (especially old ones). There will always be some incompatible versions out there. That's why I prefer the GNU set of tools which are fairly back-compatible and widely available. – randomir Oct 07 '17 at 17:27
I agree in general but sed, for whatever reason, us particularly affected. There seems to be a lot of different variants out there, all with very different syntaxes for anything other than s/old/new/. Then when you throw in that awk is almost always clearer, simpler, more efficient, more robust, and easier to maintain for anything other for s/old/new/ it just seems pointless to even consider sed for anything else. – Ed Morton Oct 07 '17 at 19:01
@EdMorton, if continue that line of reasoning, we might say python is even more readable, cleaner, robust, etc. than awk. :) But I agree, sed programs/expressions become very cryptic and very unreadable, very quick. – randomir Oct 08 '17 at 10:58
@randomir we might say that but we'd be wrong to do so :-). Glad we agree about sed though. – Ed Morton Oct 08 '17 at 12:11
None of these sed examples work. I get `sed: command garbled: /Cell 055/,/^$/{/Cell 055/!{/^$/!p}}`. This code runs but doesn't produce the required output `sed -n '/Cell 055/,/^$/p' $INFILE > sed_test.txt` – Ben Hamilton Oct 09 '17 at 08:41

score 2 · Answer 2 · answered Oct 02 '17 at 16:13

2

sed is for s/old/new, that is all. That's not what you're trying to do so you shouldn't be considering using sed. Just use awk:

$ awk -v RS= '/^Cell 055/' file
Cell 055
123456789 0.99
123454787 0.55

You didn't show us the expected output and the sed+grep pipeline you posted produces no output so idk if the above is what you wanted or not, it's just a guess, but whatever it is you want the right tool to use for it is awk, not sed.

answered Oct 02 '17 at 16:13

Ed Morton

188,023
17
78
185

I get 'awk: syntax error near line 1' when using this. – Ben Hamilton Oct 06 '17 at 12:57
Was that the WHOLE error message? I suspect not and it had a second line of `awk: bailing out near line 1` which together mean you're using old, broken awk (/bin/awk on Solaris). If that's the case and you're on Solaris then use /usr/xpg4/bin/awk and never use /bin/awk because it's, well, old and broken. If that's not the case then do tell.... – Ed Morton Oct 06 '17 at 13:02

RavinderSingh13 · Answer 3 · 2017-10-02T16:04:55.647

On a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk Could you please try following awk and let me know if this helps you.

Solution 1st: If you want to print all lines after string Cell 055 including blank line then following may help you.

awk '/Cell/ && !/Cell 055/{flag="";next} /Cell 055/{flag=1;next} flag'  Input_file

Solution 2nd: If you want to avoid blank lines in Cell 055 para then following may help you in same.

awk '/Cell/ && !/Cell 055/{flag="";next} /Cell 055/{flag=1;next} flag && NF'  Input_file

Solution 3rd: If you want to print Cell 055 line also then following may help you in same.

awk '/Cell/ && !/Cell 055/{flag="";next} /Cell 055/{flag=1;print;next} flag' Input_file
OR
awk '/Cell/ && !/Cell 055/{flag="";next} /Cell 055/{flag=1;print;next} flag && NF'

score 0 · Answer 4 · answered Oct 02 '17 at 16:19

0

If your file format allows only 2 lines after each Cell header then you can use grep too :

grep "Cell 055" -A2 file

or

awk as mentioned by @Ed-Morton in his answer.

answered Oct 02 '17 at 16:19

Rahul Verma

2,946
14
27

Extract section of file between two constants

4 Answers4