Use awk to separate text file into multiple files

Question

I've read a couple of other questions about this, but none of them seem to be working. I'm currently trying to split something like file A.txt using the delimiter "STOPHERE".

This is the code:

#!/bin/bash

awk 'BEGIN{
    RS = "STOPHERE"
    file = 0}
{
    file++
    print $0 > ("sepf" file)
}' A.txt

File A:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa      lwdjnuqqfqaaaaaaaaaa   qlknfqek fkgnl       efekfnwegelflfne
ldnwefne f STOPHEREsdfnkjnf nnnnnnnnnnnnnnnnnnnnnnnasd  fefffffffffffffflllo  

aldn3orn    STOPHERE

fknjke bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbowqff STOPHERE i
asfjfenf STOPHERE

Into these:

sepf1:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa      lwdjnuqqfqaaaaaaaaaa   qlknfqek fkgnl       efekfnwegelflfne
ldnwefne f

sepf2:

sdfnkjnf nnnnnnnnnnnnnnnnnnnnnnnasd  fefffffffffffffflllo  

aldn3orn

sepf3:

    #line starts here
fknjke bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbowqff

sepf4:

 i
asfjfenf

So basically, the formatting has to stay exactly the same between the STOPHERE.

But for some reason, this is the kind of output I'm getting in some of the files:

Eg: sepf2

TOPHEREsdfnkjnf nnnnnnnnnnnnnnnnnnnnnnnasd  fefffffffffffffflllo  

aldn3orn

Any ideas as to why the "TOPHERE" remains??

This says only the first char is used as the record separator http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_19.html — Bob, Mar 25 '16 at 01:00
Is there any way I can change this so that it uses the whole word? — Nematode7, Mar 25 '16 at 01:15

score 0 · Answer 1 · answered Nov 12 '16 at 17:01

GNU awk allows RS to be a regex. So you can provide multiple characters as a record separator. Your code can also be simplified as AWK provides a default value of 0. So this will generate separate files for each record.

awk -v RS="STOPHERE" '{print $0 > ("sepf" ++file)}'

Use awk to separate text file into multiple files

1 Answers1