I was wondering if it was possible to split a file into equal parts (edit: = all equal except for the last), without breaking the line? Using the split command in Unix, lines may be broken in half. Is there a way to, say, split up a file in 5 equal parts, but have it still only consist of whole lines (it's no problem if one of the files is a little larger or smaller)? I know I could just calculate the number of lines, but I have to do this for a lot of files in a bash script. Many thanks!
-
What's your definition of "equal" that allows for unequal file sizes? – Kerrek SB Oct 14 '11 at 08:06
-
All equal, except for one (probably the last one). – Abdel Oct 14 '11 at 08:19
-
cross ref: https://askubuntu.com/questions/54579/how-to-split-larger-files-into-smaller-parts# – Trevor Boyd Smith Nov 15 '16 at 18:10
-
2one liner for equally split by N: (1.) split by lines `split --lines $(( $(wc -l < ${your_filename}) / ${N})) ${your_filename}` (2.) split by bytes `split --bytes $(( $(wc -c < ${your_filename}) / ${N})) ${your_filename}` – Trevor Boyd Smith Nov 15 '16 at 18:15
6 Answers
If you mean an equal number of lines, split
has an option for this:
split --lines=75
If you need to know what that 75
should really be for N
equal parts, its:
lines_per_part = int(total_lines + N - 1) / N
where total lines can be obtained with wc -l
.
See the following script for an example:
#!/usr/bin/bash
# Configuration stuff
fspec=qq.c
num_files=6
# Work out lines per file.
total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))
# Split the actual file, maintaining lines.
split --lines=${lines_per_file} ${fspec} xyzzy.
# Debug information
echo "Total lines = ${total_lines}"
echo "Lines per file = ${lines_per_file}"
wc -l xyzzy.*
This outputs:
Total lines = 70
Lines per file = 12
12 xyzzy.aa
12 xyzzy.ab
12 xyzzy.ac
12 xyzzy.ad
12 xyzzy.ae
10 xyzzy.af
70 total
More recent versions of split
allow you to specify a number of CHUNKS
with the -n/--number
option. You can therefore use something like:
split --number=l/6 ${fspec} xyzzy.
(that's ell-slash-six
, meaning lines
, not one-slash-six
).
That will give you roughly equal files in terms of size, with no mid-line splits.
I mention that last point because it doesn't give you roughly the same number of lines in each file, more the same number of characters.
So, if you have one 20-character line and 19 1-character lines (twenty lines in total) and split to five files, you most likely won't get four lines in every file.

- 854,327
- 234
- 1,573
- 1,953
-
3That would split up my file in pieces of 75 lines.. But I was wondering if there was a split command option, where I could say for example that I want the file to be split up in 5 equal parts (without giving the nr of lines), and that would split up the file in 5 equal parts, each consisting of complete lines only. – Abdel Oct 14 '11 at 08:15
-
-
2One-line command to do similar thing in OS X... split -l num_of_lines_per_file original_file destination_files. – Tim Dearborn Apr 04 '13 at 13:46
-
1For massive files (my test file is 83 million lines long), the above method to count lines is a little slow; you can actually pass the filename itself as an argument to `wc` without having to cat the whole file, e.g. `wc -l filename.txt` `wc` outputs `
` so you'd have to pipe the output to awk to grab the word count, but it's still significantly faster than `cat`ing the whole file and piping it to `wc`
The script isn't even necessary, split(1) supports the wanted feature out of the box:
split -l 75 auth.log auth.log.
The above command splits the file in chunks of 75 lines a piece, and outputs file on the form: auth.log.aa, auth.log.ab, ...
wc -l
on the original file and output gives:
321 auth.log
75 auth.log.aa
75 auth.log.ab
75 auth.log.ac
75 auth.log.ad
21 auth.log.ae
642 total

- 6,198
- 3
- 30
- 42
A simple solution for a simple question:
split -n l/5 your_file.txt
no need for scripting here.
From the man file, CHUNKS may be:
l/N split into N files without splitting lines
Update
Not all unix dist include this flag. For example, it will not work in OSX. To use it, you can consider replacing the Mac OS X utilities with GNU core utilities.

- 17,318
- 6
- 67
- 91
split was updated in coreutils release 8.8 (announced 22 Dec 2010) with the --number option to generate a specific number of files. The option --number=l/n generates n files without splitting lines.

- 3,791
- 2
- 27
- 42

- 601
- 6
- 11
-
+1 i believe this is the correct response to OPs question without getting to complicated. Also the answer I needed – lollerskates Dec 07 '16 at 18:38
I made a bash script, that given a number of parts as input, split a file
#!/bin/sh
parts_total="$2";
input="$1";
parts=$((parts_total))
for i in $(seq 0 $((parts_total-2))); do
lines=$(wc -l "$input" | cut -f 1 -d" ")
#n is rounded, 1.3 to 2, 1.6 to 2, 1 to 1
n=$(awk -v lines=$lines -v parts=$parts 'BEGIN {
n = lines/parts;
rounded = sprintf("%.0f", n);
if(n>rounded){
print rounded + 1;
}else{
print rounded;
}
}');
head -$n "$input" > split${i}
tail -$((lines-n)) "$input" > .tmp${i}
input=".tmp${i}"
parts=$((parts-1));
done
mv .tmp$((parts_total-2)) split$((parts_total-1))
rm .tmp*
I used head
and tail
commands, and store in tmp files, for split the files
#10 means 10 parts
sh mysplitXparts.sh input_file 10
or with awk, where 0.1 is 10% => 10 parts, or 0.334 is 3 parts
awk -v size=$(wc -l < input) -v perc=0.1 '{
nfile = int(NR/(size*perc));
if(nfile >= 1/perc){
nfile--;
}
print > "split_"nfile
}' input

- 8,016
- 6
- 40
- 62
var dict = File.ReadLines("test.txt")
.Where(line => !string.IsNullOrWhitespace(line))
.Select(line => line.Split(new char[] { '=' }, 2, 0))
.ToDictionary(parts => parts[0], parts => parts[1]);
or
enter code here
line="to=xxx@gmail.com=yyy@yahoo.co.in";
string[] tokens = line.Split(new char[] { '=' }, 2, 0);
ans:
tokens[0]=to
token[1]=xxx@gmail.com=yyy@yahoo.co.in"

- 41
- 7