How do I split a file into n no of parts

Question

I have a file contining some no of lines. I want split file into n no.of files with particular names. It doesn't matter how many line present in each file. I just want particular no.of files (say 5). here the problem is the no of lines in the original file keep on changing. So I need to calculate no of lines then just split the files into 5 parts. If possible we have to send each of them into different directories.

Using what? A tool, a programming language, a script ... ? – Stefan Steinegger Jul 07 '10 at 11:48 — Stefan Steinegger, Jul 07 '10 at 11:48
Windows, Linux? What language(s) do you have available? – Hamish Grubijan Jul 07 '10 at 11:48 — Hamish Grubijan, Jul 07 '10 at 11:48

score 74 · Answer 1 · edited May 26 '17 at 00:31

74

In bash, you can use the split command to split it based on number of lines desired. You can use wc command to figure out how many lines are desired. Here's wc combined with with split into one line.

For example, to split onepiece.log into 5 parts

    split -l$((`wc -l < onepiece.log`/5)) onepiece.log onepiece.split.log -da 4

This will create files like onepiece.split.log0000 ...

Note: bash division rounds down, so if there is a remainder there will be a 6th part file.

edited May 26 '17 at 00:31

rogerdpack

62,887
36
269
388

answered Dec 16 '13 at 22:26

sketchytechky

841
7
4

27

**split -da 4 -l $((`wc -l < onepiece.log`/5)) onepiece.log part --additional-suffix=".log"** This will name the files in an intuitive way -ie part0001.log, part0002.log etc instead of the default naming of split. -da 4 means we want a numerical suffix of length 4. Read "man split" for more customization options. PS: SO ate the backtics in the comments – grasshopper Jan 06 '14 at 14:04
1

this answer is more concise than the two [higher upvoted questions](http://stackoverflow.com/questions/7764755/unix-how-to-split-a-file-into-equal-parts-without-breaking-individual-lines) on stackoverflow and askubuntu. – Trevor Boyd Smith Nov 15 '16 at 18:12
2

In order to deal with remainders, if anyone like me wants a fixed number of files as output but with a round robin distribution of lines then the above command can be modified to: **split -da 4 -n r/1024 filename filename_split --additional-suffix=".log"**. Replace 1024 with the number of files you want as output. – Vishnu Sep 14 '18 at 07:11
@Vishnu You should make this an answer rather than a comment, it is very helpful and more people will see it as an answer than as a comment. – G. LC Mar 18 '19 at 19:31
1

@G.LC I just added it as an answer. – Vishnu Mar 18 '19 at 20:14
no need to count first, use `split -n l/5` – reachlin Oct 20 '22 at 06:31

score 16 · Answer 2 · edited May 26 '17 at 00:19

16

On linux, there is a split command,

split --lines=1m /path/to/large/file /path/to/output/file/prefix

Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT is -, read standard input.

...

-l, --lines=NUMBER put NUMBER lines per output file

...

You would have to calculate the actual size of the splits beforehand, though.

edited May 26 '17 at 00:19

rogerdpack

62,887
36
269
388

answered Jul 07 '10 at 11:48

miku

181,842
47
306
310

1

This splits on lines or bytes? – Hamish Grubijan Jul 07 '10 at 11:49
here the size of file also changes daily.. So I need general answer where we should not use either size or no.of lines – new person Jul 07 '10 at 11:50
I have to write a shell script for this. Can anybody help me with it – new person Jul 07 '10 at 11:53
What does it mean the sintax `--lines=1m` ? – Alessandro C Mar 24 '21 at 11:46

Harold · Answer 3 · 2020-08-27T22:10:07.433

split has an option "--number=CHUNKS" that lets you divide a file into a number of chunks. This is from the (trimmed) output of "split --help":

  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below

...

CHUNKS may be:
N       split into N files based on size of input
K/N     output Kth of N to stdout
l/N     split into N files without splitting lines
l/K/N   output Kth of N to stdout without splitting lines
r/N     like 'l' but use round robin distribution
r/K/N   likewise but only output Kth of N to stdout

In the case of splitting it into 5 parts, the command would be: split --number=l/5 inputfile outputprefix

This might not result in them having the same number of lines, though.

If you want them all to have the same number of lines up until the last one, you can use the following command: split -l $(( ($(cat "inputfile" | wc -l) + 5 - 1)/5 )) inputfile outputprefix Both 5s here can be replaced with any other number (making sure they're the same).

Here's an explanation of this command piece by piece:

$( ) returns the output of whatever command you put into it. cat is used here to make sure wc only returns the number of lines without also outputting the input filename.

$(( )) evaluates whatever you put between the parentheses as a mathematical expression (using only integers) and returns the result.

($(cat "inputfile" | wc -l) + 5 - 1)/5 takes the line count of the input file and adds 5, subtracts 1, and divides the result by 5. The addition and subtraction before division makes sure the result is rounded up so that it gives exactly the number of parts you want (5 in this case).

You can also use split --number=r/5 to split it into four files where each line is distributed between them as in the following example:

inputfile.txt:
1
2
3
4
5
6
7
8
9

outputfile1:
1
6

outputfile2:
2
7

outputfile3:
3
8

outputfile4:
4
9

outputfile5:
5

This doesn't preserve the file order. but it can be useful in cases where that isn't important.

Adding clarity from a comment in https://stackoverflow.com/a/45704863/394585 . Note that the `number=l/5` has the letter l in the numerator, not the number 1. It stumped me. — svenski, Sep 17 '20 at 10:20

score 6 · Answer 4 · answered Jul 07 '10 at 11:51

6

Assuming you are processing a text file then wc -l to determine the total number of lines and split -l to split into a specified number of lines (total / 5 in your case). This works on UNIX/Mac and Windows (if you have cygwin installed)

answered Jul 07 '10 at 11:51

bjg

7,457
1
25
21

score 3 · Answer 5 · answered Nov 10 '21 at 12:08

3

On macOS you can simply do:

split -n <number_of_parts> <filename>

For example, you can do

split -n 5 file.txt

And it will be split in 5 files with similar number of lines.

answered Nov 10 '21 at 12:08

Luca Di Liello

1,486
2
17
34

score 2 · Answer 6 · answered Mar 18 '19 at 20:13

This is building on the original answers given by @sketchytechky and @grasshopper. If you would like to deal with remainders differently and want a fixed number of files as output but with a round robin distribution of lines, then the split command should be written as:

split -da 4 -n r/1024 filename filename_split --additional-suffix=".log". Replace 1024 with the number of files you want as output.

score 1 · Answer 7 · answered Aug 08 '21 at 12:28

1

here's a oneliner with variables

file=onepiece.log; nsplit=5; len=$(wc -l < $file); split -l$(($len/$nsplit)) "$file" "$file.split" -da 4

answered Aug 08 '21 at 12:28

FarisHijazi

558
4
9

From the lazy man inside of me, THANKS! Easiest solution. – ejkitchen Sep 13 '22 at 21:59

tvanfosson · Answer 8 · 2010-07-07T12:06:51.513

I can think of a few ways to do it. Which you would use depends a lot on the data.

Lines are fixed length: Find the size of the file by reading it's directory entry and divide by the line length to get the number of lines. Use this to determine how many lines per file.
The files only need to have approximately the same number of lines. Again read the file size from the directory entry. Read the first N lines (N should be small but some reasonable fraction of the file) to calculate an average line length. Calculate the approximate number of lines based on the file size and predicted average line length. This assumes that the line length follows a normal distribution. If not, adjust your method to randomly sample lines (using seek() or something similar). Rewind the file after your have your average, then split it based on the predicted line length.
Read the file twice. The first time count the number of lines. The second time splitting the file into the requisite pieces.

EDIT: Using a shell script (according to your comments), the randomized version of #2 would be hard unless you wrote a small program to do that for you. You should be able to use ls -l to get the file size, wc -l to count the exact number of lines, and head -nNNN | wc -c to calculate the average line length.

score 0 · Answer 9 · answered Oct 20 '22 at 06:32

0

linux, split -n l/5 -da 2 test.txt

answered Oct 20 '22 at 06:32

reachlin

4,516
7
18
23

How do I split a file into n no of parts

9 Answers9

Linked