2

I have a big file of 5000+ lines which has a repeated pattern like shown below:

ABC
111
222
333
XYZ

ABC
444
555
666
777
XYZ

..
..

ABC
777777777
888888888
999999999
222
333
111
XYZ

I would like to extract contents between each 'ABC' and 'XYZ' and write it to a separate file.

Ex: file1 should have

ABC
111
222
333
XYZ

File2 should have

ABC
444
555
666
777
XYZ

Filen should have

ABC
777777777
888888888
999999999
222
333
111
XYZ

and so on.

How could we achieve this ? I read these below threads but it writes only one single file. Didn't help for my case.

How to select lines between two marker patterns which may occur multiple times with awk/sed

Print lines between two patterns to new file

Community
  • 1
  • 1
bala
  • 21
  • 1

3 Answers3

4
awk '/^ABC/{file="file"c++}{print >>file}' a
bian
  • 1,456
  • 8
  • 7
  • 1
    Well done, you could maybe add an ending *session* to avoid empty lines after `XYZ` like `awk '/^ABC/{file="file"c++;w=1}w{print >>file} /^XYZ/{w=0}' a` – NeronLeVelu Oct 21 '15 at 08:37
  • 1
    like this; awk '/^ABC/{file="file"c++;a=1}a{print >>file}/^XYZ/{a=0}' a – bian Oct 21 '15 at 08:39
  • right you are fast , i was still to edit /paste reply :-) – NeronLeVelu Oct 21 '15 at 08:40
  • Wow !!! Awk is so powerful...thanks a lot A-Ray ..it worked. Thanks to NeronLeVelu. Thank you to you both for your help. – bala Oct 21 '15 at 08:44
  • Nicely done, but you probably meant to use `>`, not `>>`, as you'll otherwise _append_ to any _preexisting_ files; sounds like the OP wants to start numbering at _1_, which would make it `++c`. – mklement0 Oct 22 '15 at 00:47
  • Also, it's probably worth closing the previous file before opening a new one so as not to keep opening more and more file handles: `if(file){close(file)} file=...` – mklement0 Oct 22 '15 at 00:53
  • thanks @mklement0, yor are right! use >> more file handles, – bian Oct 22 '15 at 01:22
  • 1
    Actually, these are 2 unrelated problems: (a) using `>>` just means that if a given output file already existed before the `awk` command is invoked, it will be _appended_ to; (b) irrespective of whether you use `>` or `>>`, not closing a file explicitly when opening a new one could make you run out of file handles with a large number of output files. – mklement0 Oct 22 '15 at 02:55
  • awk '!a{close(file)}/^ABC/{file="file"c++;a=1}a{print >file}/^XYZ/{a=0}' a – bian Oct 22 '15 at 03:26
2

Perl to the rescue!

< bigfile perl -nwe 'print {$OUT} $_
                         if (/ABC/ && do { open $OUT, ">", "file" . ++$i or die $!}
                            ) ... /XYZ/'
  • n reads the file line by line
  • it only prints if between /ABC/ and /XYZ/
  • when /ABC/ is true, i.e. we're starting a new section, a new file is opened and associated with the filehandle $OUT. $i is the number of the file.
choroba
  • 231,213
  • 25
  • 204
  • 289
  • Thanks a lot Choroba. I shall keep this perl script and would use it later. Thanks for your time and helping me out. – bala Oct 21 '15 at 08:45
  • Nicely done; I suggest using something other than `${O}` for the file handle, because it is easy to confuse that with `${0}` (zero). With the `-w` flag, you actually do get a warning here about using `$i` only once (Perl v5.18.2); an alternative to omitting `-w` is to prepend `BEGIN{$i}`. Also, given that the opening and closing lines of the range are expected on _different_ lines, it's better to use `...` than `..` - am I correct in assuming that reusing the same file handle implicitly closes the previously open file? – mklement0 Oct 22 '15 at 02:25
1
awk '
  # setup our output file name file0, file1, file2, ...
  $0 == "ABC"{if (i) {close(f)};f="file"i++;};
  # use inclusive range match 
  $0 == "ABC",$0 == "XYZ"{print > f}
'
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
  • Thanks for updating, but from what I understand you don't need to truncate the file explicitly - just use `print > f` in lieu of `print >> f`. The redirection operators inside `awk` work differently from the shell's: in an `awk` script, using `>` in every iteration will NOT recreate the file every time; instead, it will implicitly open/truncate the file on first access, and then keep appending until the file is closed (either explicitly, or implicitly upon termination of `awk`). – mklement0 Oct 22 '15 at 02:39
  • 1
    @mklement0 sweet didn't know that. I'm fairly with awk :-) – Andreas Louv Oct 22 '15 at 02:42
  • @mklement0 sure but can you elaborate a little on what could happen with an unclosed handler? I mean wouldn't it be closed when awk terminates? – Andreas Louv Oct 22 '15 at 02:47
  • Yes, they would be closed automatically eventually, but with a large number of output files you could run out of file handles before the script finishes. – mklement0 Oct 22 '15 at 02:48