How can I split one text file into multiple *.txt files?

Question

I got a text file file.txt (12 MB) containing:

something1
something2
something3
something4
(...)

Is there a way to split file.txt into 12 *.txt files, let’s say file2.txt, file3.txt, file4.txt, etc.?

Related: [Unix & Linux: Create a tar archive split into blocks of a maximum size](https://unix.stackexchange.com/q/61774/114401) — Gabriel Staples, Jun 16 '23 at 01:05

score 89 · Accepted Answer · edited Aug 11 '21 at 23:31

89

You can use the Linux Bash core utility split:

split -b 1M -d  file.txt file

Note that M or MB both are OK but size is different. MB is 1000 * 1000, M is 1024^2

If you want to separate by lines you can use -l parameter.

UPDATE

a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d  file.txt file

Another solution as suggested by Kirill, you can do something like the following

split -n l/12 file.txt

Note that is l not one, split -n has a few options, like N, k/N, l/k/N, r/N, r/k/N.

edited Aug 11 '21 at 23:31

Peter Mortensen

30,738
21
105
131

answered Sep 26 '13 at 14:37

CS Pei

10,869
1
27
46

Can you please update how it could be made sure that number of files are even without splitting the lines in it just by using `split`? – konsolebox Sep 26 '13 at 15:40
you can use `wc -l` to get the total lines and run something like this `a=(`wc -l yourfile`) ; lines=`echo $a/12 | bc -l` ; split -l=$lines -d file.txt file` – CS Pei Sep 26 '13 at 16:39
With all those complexities you could have just use awk. And that won't worked with non-file input which won't allow reading the data twice. Just saying that your previous claim that `split` could do it isn't really correct. And as expected you used `wc`. – konsolebox Sep 26 '13 at 17:00
1

Nice update, personally glad you didn't use awk as line counts can be achieved without a full file read. By this logic, you can also `a=(\`wc -c yourfile\`) ; n=12; bytes=\`echo (a-a%n)/n\` | bc -l\` ; split -b=$bytes -d file.txt file` and to split evenly with the last file being the trailing bytes if you don't use divisible numbers. Derivatives of your method seem very easy to adjust! – That Realty Programmer Guy Nov 23 '13 at 20:19
Macos seems to prefer the size to be in lower case, that means `-b=1M` should be `-b=1m` for this to work. – gdvalderrama Dec 21 '16 at 10:28
Based on the manual page for `split`, the units are K, M, G, T, P, E, Z, Y, etc – CS Pei Dec 21 '16 at 14:36
6

Or just use split -n l/12 file.txt to get 12 files, splitted by line – Kirill Jan 08 '18 at 14:39

score 84 · Answer 2 · edited Dec 02 '15 at 17:52

84

$ split -l 100 input_file output_file

where -l is the number of lines in each files. This will create:

output_fileaa
output_fileab
output_fileac
output_filead
....

edited Dec 02 '15 at 17:52

slm

15,396
12
109
124

answered Sep 18 '14 at 11:08

amruta takawale

841
6
2

How does it pick the aa, ab, ac...? – T. Brian Jones Sep 28 '17 at 18:33
1

@T.BrianJones its taken care by split – viru Oct 26 '17 at 00:53
1

Personally, I prefer using `-d` to make split use numeric suffixes. – Dror S. Dec 09 '19 at 14:17
Is there a way to know how many files have been created? And I don't mean counting them manually, I mean does the command split output that or is there a command line argument to tell it to output that? – Silidrone Jul 23 '20 at 18:44
To find the number of lines do `wc -l ` – A. K. Apr 05 '21 at 14:07

schoon · Answer 3 · 2021-08-12T06:38:54.333

32

CS Pei's answer won't produce .txt files as the OP wants. Use:

split -b=1M -d  file.txt file --additional-suffix=.txt

edited Aug 12 '21 at 06:38

answered May 19 '16 at 09:52

schoon

2,858
3
46
78

8

This is very helpful `--additional-suffix=.txt` – ruoho ruotsi Dec 03 '19 at 07:43
There is no one here by the name "John". What answer does it refer to? – Peter Mortensen Aug 11 '21 at 23:34
Good question. Must be CS Pei. Will edit. – schoon Aug 12 '21 at 06:38
I have no idea on which operating system the option `-b=1M` works. But on Ubuntu and Centos, the option `-b=1M` will complain error "split: =1M: invalid number of bytes". The correct option is `-b 1M` or `--bytes=1M`. The space is required for short option. – zhenguoli Jan 08 '22 at 01:51

konsolebox · Answer 4 · 2021-08-20T03:56:15.603

2

Using Bash:

readarray -t lines < file.txt
count=${#lines[@]}

for i in "${!lines[@]}"; do
    index=$(( (i * 12 - 1) / count + 1 ))
    echo "${lines[i]}" >> "file${index}.txt"
done

Using AWK:

awk '{
    a[NR] = $0
}
END {
    for (i = 1; i in a; ++i) {
        x = (i * 12 - 1) / NR + 1
        sub(/\..*$/, "", x)
        print a[i] > "file" x ".txt"
    }
}' file.txt

Unlike split, this one makes sure that the number of lines are most even.

edited Aug 20 '21 at 03:56

answered Sep 26 '13 at 15:03

konsolebox

72,135
12
99
105

1

`split` can do that too – CS Pei Sep 26 '13 at 15:12
@JohnSmith Yeah didn't see the option quickly. – konsolebox Sep 26 '13 at 15:14
@JohnSmith I'm taking that back. How do we make sure that lines are even? Without using `wc -l` and calculating it of course or else we could have just use bash itself or awk. And that's actually the reason why I made the script and didn't consider split. – konsolebox Sep 26 '13 at 15:38

score 1 · Answer 5 · edited Aug 11 '21 at 23:36

1

Regardless to what was said in previous answers, on my Ubuntu 16.04 (Xenial Xerus) I had to do:

split -b 10M -d  system.log system_split.log

Please note the space between -b and the value.

edited Aug 11 '21 at 23:36

Peter Mortensen

30,738
21
105
131

answered Jan 26 '17 at 12:04

Nicolas D

1,182
18
40

1

Wouldn't that create `system_split.log1`, `system_split.log2`, etc without the `additional-suffix` option? – Damien Roche Feb 15 '17 at 16:38
Yes it will do that – Nicolas D Feb 17 '17 at 13:48

score 1 · Answer 6 · answered Jul 16 '22 at 23:15

My search of how to do this led me here, so I'm posting this here for others too:

To get all of the contents of the file, split is the right answer! But, for those looking to just extract a piece of a file, as a sample of the file, use head or tail:

# extract just the **first** 100000 lines of /var/log/syslog into 
# ~/syslog_sample.txt
head -n 100000 /var/log/syslog > ~/syslog_sample.txt

# extract just the **last** 100000 lines of /var/log/syslog into 
# ~/syslog_sample.txt
tail -n 100000 /var/log/syslog > ~/syslog_sample.txt

score 0 · Answer 7 · edited Jan 24 '18 at 10:52

0

Try something like this:

awk -vc=1 'NR%1000000==0{++c}{print $0 > c".txt"}' Datafile.txt

for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;

edited Jan 24 '18 at 10:52

pheeleeppoo

1,491
6
25
29

answered Apr 03 '17 at 14:33

Morgan32

1

score 0 · Answer 8 · answered Sep 29 '17 at 18:56

I agree with @CS Pei, however this didn't work for me:

split -b=1M -d file.txt file

...as the = after -b threw it off. Instead, I simply deleted it and left no space between it and the variable, and used lowercase "m":

split -b1m -d file.txt file

And to append ".txt", we use what @schoon said:

split -b=1m -d file.txt file --additional-suffix=.txt

I had a 188.5MB txt file and I used this command [but with -b5m for 5.2MB files], and it returned 35 split files all of which were txt files and 5.2MB except the last which was 5.0MB. Now, since I wanted my lines to stay whole, I wanted to split the main file every 1 million lines, but the split command didn't allow me to even do -100000 let alone "-1000000, so large numbers of lines to split will not work.

You just need the `-C` option of `split`. – Amit Naidu Jun 20 '19 at 00:18 — Amit Naidu, Jun 20 '19 at 00:18

score 0 · Answer 9 · answered Sep 14 '18 at 22:49

On my Linux system (Red Hat Enterprise 6.9), the split command does not have the command-line options for either -n or --additional-suffix.

Instead, I've used this:

split -d -l NUM_LINES really_big_file.txt split_files.txt.

where -d is to add a numeric suffix to the end of the split_files.txt. and -l specifies the number of lines per file.

For example, suppose I have a really big file like this:

$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group         40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt

This file has 100,000 lines, and I want to split it into files with at most 30,000 lines. This command will run the split and append an integer at the end of the output file pattern split_files.txt..

$ split -d -l 30000 really_big_file.txt split_files.txt.

The resulting files are split correctly with at most 30,000 lines per file.

$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group        156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group  428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group  427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group  427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group  142454325 Sep 14 15:43 split_files.txt.03


$ wc -l *.txt*
    100000 really_big_file.txt
     30000 split_files.txt.00
     30000 split_files.txt.01
     30000 split_files.txt.02
     10000 split_files.txt.03
    200000 total

score 0 · Answer 10 · edited Aug 11 '21 at 23:41

0

If each part has the same number of lines, for example 22, here is my solution:

split --numeric-suffixes=2 --additional-suffix=.txt -l 22 file.txt file

And you obtain file2.txt with the first 22 lines, file3.txt the 22 next line, etc.

Thank @hamruta-takawale, @dror-s and @stackoverflowuser2010

edited Aug 11 '21 at 23:41

Peter Mortensen

30,738
21
105
131

answered Apr 23 '20 at 16:47

bcag2

1,988
1
17
31

How can I split one text file into multiple *.txt files?

10 Answers10

Linked