0

I have a sed command which will successfully print lines matching two patterns:

 sed -n '/PAGE 2/,/\x0c/p' filename.txt

What I haven't figured out, is that I want it to print all the lines from the first token, up until the second token. The \x0c token is a record separator on a big flat file, and I need to keep THAT line intact.

In between the two tokens, the data is completely variable, and I do not have a reliable anchor to work with.

[CLARIFICATION] Right now it prints all the lines between /PAGE 2/ and /\x0c/ inclusive. I want it to print /PAGE 2/ up until the next /\x0c/ in the record.

[test data] The /x0c will be at the start of the first line, and the beginning of the last line of this record.

I need to delete the first line of the record, through the line just before the beginning of the next record.

^L20-SEP-2006 01:54:08 PM         Foobars College                          PAGE 2
TERM: 200610               Student Billing Statement                     SUMDATA
99999

Foo bar                                                              R0000000
999 Geese Rural Drive                                           DUE: 15-OCT-2012
Columbus, NE 90210

--------------------------------------------------------------------------------
       Balance equal to or greater than $5000.00    $200.00
       Billing inquiries may be directed to 444/555-1212 or by
       email to bursar@foobar.edu.  Financial Aid inquiries should
       be directed to 444/555-1212 or finaid@foobar.edu.
^L20-SEP-2006 01:54:08 PM         Foobars College                          PAGE 1

[expected result]

 ^L20-SEP-2006 01:54:08 PM         Foobars College                          PAGE 1

There will be multiple such records in the file. I can rely only on the /PAGE 2/ token, and the /x0c/ token.

[solution]:

Following Choruba's lead, I edited his command to:

sed '/PAGE [2-9]/,/\x0c/{/\x0c$/!d}'

The rule in the curly brackets was applying itself to any line containing a ^L and was selectively ignoring them.

ST3
  • 8,826
  • 3
  • 68
  • 92
avgvstvs
  • 6,196
  • 6
  • 43
  • 74
  • I don't understand your question. The rage you use should print all the lines between the starting and the ending line. –  Nov 01 '12 at 13:09
  • I don't want it to print the ending line. – avgvstvs Nov 01 '12 at 13:11
  • If you want to delete lines (mentioned in your question), you should be using `d` command, not the `p` and `-n` – doubleDown Nov 01 '12 at 13:25
  • My ultimate goal is to delete, but I need get the matching part right first, hence why I'm printing. I'm almost to where I need it, will fix when finished. – avgvstvs Nov 01 '12 at 13:30
  • 1
    Arrghh! Why are you asking us to solve one problem when you really have a different problem? Please post what you're REALLY trying to do, including sample input and expected output. – Ed Morton Nov 01 '12 at 13:47
  • I see you posted some real input, now just post the expected output from that input and we can start trying to help you. The solutions for printing everything except a block of text will be somewhat different from the solutions for printing a block of text so figuring out the latter and then figuring out how to negate that is NOT a good approach. – Ed Morton Nov 01 '12 at 13:52
  • I'm dealing with financial transactions and I have a security background, I'm quite paranoid about exposing anything about the application or the transaction data. Posted expected result, with the firm constraints. – avgvstvs Nov 01 '12 at 13:54
  • You don't to print the ACTUAL transaction just something that represents it. \x0c is the form feed character, control-L, right? And do form-feeds separate every record? If so all you need to do it `awk -v RS='^L' 'NR>1' file` where "^L" is a literal control-L. If that's not it, can you post something with a few records in it so we get a better idea what you want? No need for 30 lines of text between delimiters, just a couple of lines of "abc", "def", whatever. – Ed Morton Nov 01 '12 at 14:02
  • I updated my answer to show how to get the output you want from what you've posted so far. If that's not what you want, you'll have to help us out with some more representative input (in format, not necessarily content). – Ed Morton Nov 01 '12 at 14:23

5 Answers5

9

EDIT: New answer for the new question the OP asked (how to delete records:

Given a file with control-Ls delimiting records and a desire to print specific lines from specific records, just set your record separator to control-L and your field separator to "\n" and print whatever you like. For example, to get the output the OP says he wants from the input he posted would just be:

awk -v RS='^L' -F'\n' 'NR==3{print $1}' file

^L shown here represents a literal control-L, and it's the 3rd record because there's an empty record before te first control-L in the input file.

#

This is the answer to the original question the OP asked:

You want this:

awk '/PAGE 2/ {f=1} /\x0c/{f=0} f' file

but also try these to see the difference (for the future):

awk '/PAGE 2/ {f=1} f; /\x0c/{f=0}' file
awk 'f; /PAGE 2/ {f=1} /\x0c/{f=0}' file

And finally, FYI, The following idioms describe how to select a range of records given a specific pattern to match:

a) Print all records from some pattern:

awk '/pattern/{f=1}f' file

b) Print all records after some pattern:

awk 'f;/pattern/{f=1}' file

c) Print the Nth record after some pattern:

awk 'c&&!--c;/pattern/{c=N}' file

d) Print every record except the Nth record after some pattern:

awk 'c&&!--c{next}/pattern/{c=N}1' file

e) Print the N records after some pattern:

awk 'c&&c--;/pattern/{c=N}' file

f) Print every record except the N records after some pattern:

awk 'c&&c--{next}/pattern/{c=N}1' file

g) Print the N records from some pattern:

awk '/pattern/{c=N}c&&c--' file

I changed the variable name from "f" for "found" to "c" for "count" where appropriate as that's more expressive of what the variable actually IS.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
3

Tell sed not to print the line containing the character:

sed -n '/PAGE 2/,/\x0c/{/\x0c/!p}' filename.txt
choroba
  • 231,213
  • 25
  • 204
  • 289
  • This solution is better, but it also neglects printing the first token's line. I do need to hit that line. – avgvstvs Nov 01 '12 at 13:27
  • Added test data to make this more concise. – avgvstvs Nov 01 '12 at 13:45
  • @avgvstvs, if this neglects printing the first token's line, does that mean the line matching the first token also matches the second taken? – doubleDown Nov 01 '12 at 14:00
  • 1
    Sorry, with the posted test data, it incorrectly deletes the first line (I suspect because the `{/\x0c/!p}` clause looks for ANY linefeed character at all, which the first line WILL contain. So the output strips the first and last line, and leaves everything else. The correct command is `sed '/PAGE [2-9]/,/\x0c/{/\x0c$/!d}'` – avgvstvs Nov 01 '12 at 14:16
1

I think this would do it:

awk '/PAGE 2/{a=1}/\x0c/{a=0}{if(a)print}'
amaurea
  • 4,950
  • 26
  • 35
0

In this line, the second sed deletes (d) the last line ($).

sed -n '/^START$/,/^STOP$/p' in.txt | sed '$d'
WEFX
  • 8,298
  • 8
  • 66
  • 102
  • Won't that delete the last line from all of the output rather than the last line from each block of output? You'd need `| sed '/^STOP$/d'` or similar. – Ed Morton Nov 01 '12 at 13:45
  • That deletes the last line from the first `sed`, but not within the match space. I posted some test data. – avgvstvs Nov 01 '12 at 13:49
0

Following Choruba's lead, I edited his command to:

sed '/PAGE [2-9]/,/\x0c/{/\x0c$/!d}'

avgvstvs
  • 6,196
  • 6
  • 43
  • 74