I have a file contining some no of lines. I want split file into n no.of files with particular names. It doesn't matter how many line present in each file. I just want particular no.of files (say 5). here the problem is the no of lines in the original file keep on changing. So I need to calculate no of lines then just split the files into 5 parts. If possible we have to send each of them into different directories.
-
6Using what? A tool, a programming language, a script ... ? – Stefan Steinegger Jul 07 '10 at 11:48
-
2Windows, Linux? What language(s) do you have available? – Hamish Grubijan Jul 07 '10 at 11:48
9 Answers
In bash, you can use the split
command to split it based on number of lines desired. You can use wc
command to figure out how many lines are desired. Here's wc
combined with with split
into one line.
For example, to split onepiece.log
into 5 parts
split -l$((`wc -l < onepiece.log`/5)) onepiece.log onepiece.split.log -da 4
This will create files like onepiece.split.log0000
...
Note: bash division rounds down, so if there is a remainder there will be a 6th part file.

- 62,887
- 36
- 269
- 388

- 841
- 7
- 4
-
27**split -da 4 -l $((`wc -l < onepiece.log`/5)) onepiece.log part --additional-suffix=".log"** This will name the files in an intuitive way -ie part0001.log, part0002.log etc instead of the default naming of split. -da 4 means we want a numerical suffix of length 4. Read "man split" for more customization options. PS: SO ate the backtics in the comments – grasshopper Jan 06 '14 at 14:04
-
1this answer is more concise than the two [higher upvoted questions](http://stackoverflow.com/questions/7764755/unix-how-to-split-a-file-into-equal-parts-without-breaking-individual-lines) on stackoverflow and askubuntu. – Trevor Boyd Smith Nov 15 '16 at 18:12
-
2In order to deal with remainders, if anyone like me wants a fixed number of files as output but with a round robin distribution of lines then the above command can be modified to: **split -da 4 -n r/1024 filename filename_split --additional-suffix=".log"**. Replace 1024 with the number of files you want as output. – Vishnu Sep 14 '18 at 07:11
-
@Vishnu You should make this an answer rather than a comment, it is very helpful and more people will see it as an answer than as a comment. – G. LC Mar 18 '19 at 19:31
-
1
-
On linux, there is a split
command,
split --lines=1m /path/to/large/file /path/to/output/file/prefix
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT is -, read standard input.
...
-l, --lines=NUMBER put NUMBER lines per output file
...
You would have to calculate the actual size of the splits beforehand, though.

- 62,887
- 36
- 269
- 388

- 181,842
- 47
- 306
- 310
-
1
-
here the size of file also changes daily.. So I need general answer where we should not use either size or no.of lines – new person Jul 07 '10 at 11:50
-
I have to write a shell script for this. Can anybody help me with it – new person Jul 07 '10 at 11:53
-
split has an option "--number=CHUNKS" that lets you divide a file into a number of chunks. This is from the (trimmed) output of "split --help":
-n, --number=CHUNKS generate CHUNKS output files; see explanation below
...
CHUNKS may be:
N split into N files based on size of input
K/N output Kth of N to stdout
l/N split into N files without splitting lines
l/K/N output Kth of N to stdout without splitting lines
r/N like 'l' but use round robin distribution
r/K/N likewise but only output Kth of N to stdout
In the case of splitting it into 5 parts, the command would be:
split --number=l/5 inputfile outputprefix
This might not result in them having the same number of lines, though.
If you want them all to have the same number of lines up until the last one, you can use the following command:
split -l $(( ($(cat "inputfile" | wc -l) + 5 - 1)/5 )) inputfile outputprefix
Both 5s here can be replaced with any other number (making sure they're the same).
Here's an explanation of this command piece by piece:
$( )
returns the output of whatever command you put into it. cat is used here to make sure wc only returns the number of lines without also outputting the input filename.
$(( ))
evaluates whatever you put between the parentheses as a mathematical expression (using only integers) and returns the result.
($(cat "inputfile" | wc -l) + 5 - 1)/5
takes the line count of the input file and adds 5, subtracts 1, and divides the result by 5. The addition and subtraction before division makes sure the result is rounded up so that it gives exactly the number of parts you want (5 in this case).
You can also use split --number=r/5
to split it into four files where each line is distributed between them as in the following example:
inputfile.txt:
1
2
3
4
5
6
7
8
9
outputfile1:
1
6
outputfile2:
2
7
outputfile3:
3
8
outputfile4:
4
9
outputfile5:
5
This doesn't preserve the file order. but it can be useful in cases where that isn't important.

- 101
- 1
- 3
-
Adding clarity from a comment in https://stackoverflow.com/a/45704863/394585 . Note that the `number=l/5` has the letter l in the numerator, not the number 1. It stumped me. – svenski Sep 17 '20 at 10:20
-
On macOS you can simply do:
split -n <number_of_parts> <filename>
For example, you can do
split -n 5 file.txt
And it will be split in 5 files with similar number of lines.

- 1,486
- 2
- 17
- 34
This is building on the original answers given by @sketchytechky and @grasshopper. If you would like to deal with remainders differently and want a fixed number of files as output but with a round robin distribution of lines, then the split command should be written as:
split -da 4 -n r/1024 filename filename_split --additional-suffix=".log"
. Replace 1024 with the number of files you want as output.

- 499
- 1
- 5
- 23
here's a oneliner with variables
file=onepiece.log; nsplit=5; len=$(wc -l < $file); split -l$(($len/$nsplit)) "$file" "$file.split" -da 4

- 558
- 4
- 9
I can think of a few ways to do it. Which you would use depends a lot on the data.
Lines are fixed length: Find the size of the file by reading it's directory entry and divide by the line length to get the number of lines. Use this to determine how many lines per file.
The files only need to have approximately the same number of lines. Again read the file size from the directory entry. Read the first N lines (N should be small but some reasonable fraction of the file) to calculate an average line length. Calculate the approximate number of lines based on the file size and predicted average line length. This assumes that the line length follows a normal distribution. If not, adjust your method to randomly sample lines (using seek() or something similar). Rewind the file after your have your average, then split it based on the predicted line length.
Read the file twice. The first time count the number of lines. The second time splitting the file into the requisite pieces.
EDIT: Using a shell script (according to your comments), the randomized version of #2 would be hard unless you wrote a small program to do that for you. You should be able to use ls -l
to get the file size, wc -l
to count the exact number of lines, and head -nNNN | wc -c
to calculate the average line length.

- 524,688
- 99
- 697
- 795