split text file in two using bash script

Question

I have a text file with a marker somewhere in the middle:

one
two
three
blah-blah *MARKER* blah-blah
four
five
six
...

I just need to split this file in two files, first containing everything before MARKER, and second one containing everything after MARKER. It seems it can be done in one line with awk or sed, I just can't figure out how.

I tried the easy way — using csplit, but csplit doesn't play well with Unicode text.

score 12 · Answer 1 · answered Jan 17 '11 at 01:51

12

you can do it easily with awk

awk -vRS="MARKER" '{print $0>NR".txt"}' file

answered Jan 17 '11 at 01:51

ghostdog74

327,991
56
259
343

+1: Looove it. So concise and elegant. I've been needing this to discard a large portion of garbage out of logs which came from poorly configured build script. – Rekin Jun 14 '11 at 07:30

Leniel Maccaferri · Answer 2 · 2010-09-05T05:09:19.843

5

Try this:

awk '/MARKER/{n++}{print >"out" n ".txt" }' final.txt

It will read input from final.txt and produces out1.txt, out2.txt, etc...

edited Sep 05 '10 at 05:09

answered Sep 04 '10 at 22:46

Leniel Maccaferri

100,159
46
371
480

Almost worked. Doesn't screw up UTF-8, but leaves *MARKER* in the second file. – Sergey Kovalev Sep 04 '10 at 22:53
Have you tried the solution shown here: http://www.unix.com/shell-programming-scripting/41060-split-file-into-seperate-files.html - It uses `csplit` and works the way you want, that is, letting the marker out the files. – Leniel Maccaferri Sep 04 '10 at 23:11
Not working as described. Needs "BEGIN{n=1}" Otherwise the initial file will be named "out.txt" and not "out1.txt". Contrary to your note. -- I tried to add this via an edit but it was rejected. – StackzOfZtuff Aug 29 '18 at 12:32

Dennis Williamson · Answer 3 · 2010-09-04T22:57:11.220

3

sed -n '/MARKER/q;p' inputfile > outputfile1
sed -n '/MARKER/{:a;n;p;ba}' inputfile > outputfile2

Or all in one:

sed -n -e '/MARKER/! w outputfile1' -e'/MARKER/{:a;n;w outputfile2' -e 'ba}' inputfile

edited Sep 04 '10 at 22:57

answered Sep 04 '10 at 22:50

Dennis Williamson

346,391
90
374
439

score 1 · Answer 4 · answered Sep 04 '10 at 22:50

1

The split command will almost do what you want:

$ split -p '\*MARKER\*' splitee 
$ cat xaa
one
two
three
$ cat xab
blah-blah *MARKER* blah-blah
four
five
six
$ tail -n+2 xab
four
five
six

Perhaps it's close enough for your needs.

I have no idea if it does any better with Unicode than csplit, though.

answered Sep 04 '10 at 22:50

Marcelo Cantos

181,030
38
327
365

That option does not seem to be available in the version of split included in GNU coreutils; I assume you're using a BSD of some flavor. In any case, on GNU-based operating systems like most Linux distros, coreutils includes both split and csplit, so they should have similar Unicode behavior. – Daniel H Apr 17 '13 at 22:33

split text file in two using bash script

4 Answers4

Linked