Split text file into multiple files

Question

I am having large text file having 1000 abstracts with empty line in between each abstract . I want to split this file into 1000 text files. My file looks like

16503654    Three-dimensional structure of neuropeptide k bound to dodecylphosphocholine micelles.      Neuropeptide K (NPK), an N-terminally extended form of neurokinin A (NKA), represents the most potent and longest lasting vasodepressor and cardiomodulatory tachykinin reported thus far.  

16504520    Computer-aided analysis of the interactions of glutamine synthetase with its inhibitors.        Mechanism of inhibition of glutamine synthetase (EC 6.3.1.2; GS) by phosphinothricin and its analogues was studied in some detail using molecular modeling methods.

I may suggest to avoid to create too much files or directories in one directory. It can definitely slow down the stat(2) calls. Some thousand is not a big issue, but some ten-thousands can be. Of course this limit is dependent on the machine (HDD), operating system and file system You are using. — TrueY, Apr 29 '13 at 09:05
possible duplicate of [Split a .txt file based on content](http://stackoverflow.com/questions/8544684/split-a-txt-file-based-on-content) — tripleee, Jun 28 '13 at 04:35
possible duplicate of [Split one file into multiple files based on delimiter](http://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter) — Gilles 'SO- stop being evil', Jul 02 '13 at 12:52

score 54 · Accepted Answer · answered Apr 29 '13 at 07:27

54

You can use split and set "NUMBER lines per output file" to 2. Each file would have one text line and one empty line.

split -l 2 file

answered Apr 29 '13 at 07:27

Alper

12,860
2
31
41

Guru · Answer 2 · 2013-04-29T07:47:04.527

4

Something like this:

awk 'NF{print > $1;close($1);}' file

This will create 1000 files with filename being the abstract number. This awk code writes the records to a file whose name is retrieved from the 1st field($1). This is only done only if the number of fields is more than 0(NF)

edited Apr 29 '13 at 07:47

answered Apr 29 '13 at 07:16

Guru

16,456
2
33
46

Thanks for quick response.It worked but its showing awk: 9276016 makes too many open files input record number 35, file pmid.txt source line number 1. I tried different files for every files its showing error at same line number 35. Does it have any limit – shalini Apr 29 '13 at 07:43
I faced another problem. My file has some lines starting with Conclusion or Results under Abstract number in such case the command u mentioned generating an extra file with Conclusion and Result name which I Dont want. Please help me out – shalini May 10 '13 at 13:12

score 4 · Answer 3 · edited Jan 27 '21 at 08:04

4

You could always use the csplit command. This is a file splitter but based on a regex.

something along the lines of :

csplit -ks -f /tmp/files INPUTFILENAMEGOESHERE '/^$/'

It is untested and may need a little tweaking though.

CSPLIT

edited Jan 27 '21 at 08:04

Indent

4,675
1
19
35

answered Apr 29 '13 at 07:30

FreudianSlip

2,870
25
24

1

I prefer this over the 'awk' solutions. To split one large file (LDIF format) with empty lines separating the chunks, I used the 'repeat pattern' and 'suppress matching line' options: `csplit -m -f /tmp/files INPUTFILE '/^\s*$/' '{*}'` – bovender Apr 16 '15 at 12:16
Yeah hooray for csplit. +1. – Steve Kehlet Oct 06 '15 at 23:36

Split text file into multiple files

3 Answers3

Linked