I got a text file file.txt
(12 MB) containing:
something1
something2
something3
something4
(...)
Is there a way to split file.txt
into 12 *.txt files, let’s say file2.txt
, file3.txt
, file4.txt
, etc.?
I got a text file file.txt
(12 MB) containing:
something1
something2
something3
something4
(...)
Is there a way to split file.txt
into 12 *.txt files, let’s say file2.txt
, file3.txt
, file4.txt
, etc.?
You can use the Linux Bash core utility split
:
split -b 1M -d file.txt file
Note that M
or MB
both are OK but size is different. MB is 1000 * 1000, M is 1024^2
If you want to separate by lines you can use -l
parameter.
UPDATE
a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d file.txt file
Another solution as suggested by Kirill, you can do something like the following
split -n l/12 file.txt
Note that is l
not one
, split -n
has a few options, like N
, k/N
, l/k/N
, r/N
, r/k/N
.
$ split -l 100 input_file output_file
where -l
is the number of lines in each files. This will create:
CS Pei's answer won't produce .txt files as the OP wants. Use:
split -b=1M -d file.txt file --additional-suffix=.txt
Using Bash:
readarray -t lines < file.txt
count=${#lines[@]}
for i in "${!lines[@]}"; do
index=$(( (i * 12 - 1) / count + 1 ))
echo "${lines[i]}" >> "file${index}.txt"
done
Using AWK:
awk '{
a[NR] = $0
}
END {
for (i = 1; i in a; ++i) {
x = (i * 12 - 1) / NR + 1
sub(/\..*$/, "", x)
print a[i] > "file" x ".txt"
}
}' file.txt
Unlike split
, this one makes sure that the number of lines are most even.
Regardless to what was said in previous answers, on my Ubuntu 16.04 (Xenial Xerus) I had to do:
split -b 10M -d system.log system_split.log
Please note the space between -b
and the value.
My search of how to do this led me here, so I'm posting this here for others too:
To get all of the contents of the file, split
is the right answer! But, for those looking to just extract a piece of a file, as a sample of the file, use head
or tail
:
# extract just the **first** 100000 lines of /var/log/syslog into
# ~/syslog_sample.txt
head -n 100000 /var/log/syslog > ~/syslog_sample.txt
# extract just the **last** 100000 lines of /var/log/syslog into
# ~/syslog_sample.txt
tail -n 100000 /var/log/syslog > ~/syslog_sample.txt
Try something like this:
awk -vc=1 'NR%1000000==0{++c}{print $0 > c".txt"}' Datafile.txt
for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;
I agree with @CS Pei, however this didn't work for me:
split -b=1M -d file.txt file
...as the =
after -b
threw it off. Instead, I simply deleted it and left no space between it and the variable, and used lowercase "m":
split -b1m -d file.txt file
And to append ".txt", we use what @schoon said:
split -b=1m -d file.txt file --additional-suffix=.txt
I had a 188.5MB txt file and I used this command [but with -b5m
for 5.2MB files], and it returned 35 split files all of which were txt files and 5.2MB except the last which was 5.0MB. Now, since I wanted my lines to stay whole, I wanted to split the main file every 1 million lines, but the split
command didn't allow me to even do -100000
let alone "-1000000
, so large numbers of lines to split will not work.
On my Linux system (Red Hat Enterprise 6.9), the split
command does not have the command-line options for either -n
or --additional-suffix
.
Instead, I've used this:
split -d -l NUM_LINES really_big_file.txt split_files.txt.
where -d
is to add a numeric suffix to the end of the split_files.txt.
and -l
specifies the number of lines per file.
For example, suppose I have a really big file like this:
$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group 40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group 4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
This file has 100,000 lines, and I want to split it into files with at most 30,000 lines. This command will run the split and append an integer at the end of the output file pattern split_files.txt.
.
$ split -d -l 30000 really_big_file.txt split_files.txt.
The resulting files are split correctly with at most 30,000 lines per file.
$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group 156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group 4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group 428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group 427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group 427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group 142454325 Sep 14 15:43 split_files.txt.03
$ wc -l *.txt*
100000 really_big_file.txt
30000 split_files.txt.00
30000 split_files.txt.01
30000 split_files.txt.02
10000 split_files.txt.03
200000 total
If each part has the same number of lines, for example 22, here is my solution:
split --numeric-suffixes=2 --additional-suffix=.txt -l 22 file.txt file
And you obtain file2.txt with the first 22 lines, file3.txt the 22 next line, etc.
Thank @hamruta-takawale, @dror-s and @stackoverflowuser2010