How to split large text file in windows?

Question

I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?

possible [duplicate](http://stackoverflow.com/q/23593556/2152082) — Stephan, Aug 03 '15 at 11:44

score 374 · Accepted Answer · edited Jun 20 '20 at 09:12

374

If you have installed Git for Windows, you should have Git Bash installed, since that comes with Git.

Use the split command in Git Bash to split a file:

into files of size 500MB each: split myLargeFile.txt -b 500m
into files with 10000 lines each: split myLargeFile.txt -l 10000

Tips:

If you don't have Git/Git Bash, download at https://git-scm.com/download
If you lost the shortcut to Git Bash, you can run it using C:\Program Files\Git\git-bash.exe

That's it!

I always like examples though...

Example:

You can see in this image that the files generated by split are named xaa, xab, xac, etc.

These names are made up of a prefix and a suffix, which you can specify. Since I didn't specify what I want the prefix or suffix to look like, the prefix defaulted to x, and the suffix defaulted to a two-character alphabetical enumeration.

Another Example:

This example demonstrates

using a filename prefix of MySlice (instead of the default x),
the -d flag for using numerical suffixes (instead of aa, ab, ac, etc...),
and the option -a 5 to tell it I want the suffixes to be 5 digits long:

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 24 '18 at 15:40

Josh Withee

9,922
3
44
62

4

Nice solution with good example. Works well for me in Windows 10. For me I used `split myLogs.log mylogs_ -b 800m -a 3 -d` to split a 4.5Gb log file. – user3437460 May 12 '19 at 14:57
1

Nice solution by using Git-bash on Windows. For the `split` part, using _filename prefix_ and `-d` options should be on your main answer. Related: https://stackoverflow.com/a/45761990/1519522 – aff Aug 12 '19 at 02:36
Thanks for this, worked great!! Working with Windows 10 and being new to Git bash this might be helpful... the default directory to work in was C:\Users\username (find your current working directory with "dir"). The command to change directory is "cd /c/folder". – fenix Sep 18 '19 at 15:52
ok, but if I want to select *.txt in Git Bash, why is not working? such as: `split *.txt MyNewText -b 5m` – Just Me Oct 25 '19 at 10:15
17

Nice! You can also set the extension on the outputs ... e.g. --additional-suffix=.txt – Nij Apr 17 '20 at 10:47
Now that you can easily get full Linux subsystems inside of Windows 10, you don't need Git for Windows or anything like that - just fire up an Ubuntu terminal and run from there. Really handy for stuff like this. – Joe Enos Jun 02 '20 at 21:57
Can also install `Git Bash` via [Chocolatey's `git`](https://chocolatey.org/packages/git)... `choco install git` – Intrastellar Explorer Sep 23 '20 at 18:55
Mannn.. this is exactly what I was looking for. I wish MS could include this with Windows by default so that I may think about loving Windows again :) – Ajay Kumar Sep 20 '21 at 15:55
Great, It worked for me, but I have to convert the extension to .sql. I would like to share the command for knowledge. split mylargefile.sql -l 10231 --additional-suffix=.sql it will generate files after each 10231 lines and with suffix .sql – Atmiya Kolsawala Oct 05 '21 at 05:43
If you don't have git bash installed: https://github.com/Mitch-Wheat/FileSplitter – Mitch Wheat Dec 17 '21 at 05:14
I am suprised for a quickly splitting 2GB+ text file to 500mb ones in some miliseconds ! Thanks. – Hamit Enes Feb 25 '22 at 20:55
then how to put them together into one file? – Albert Chen Mar 24 '22 at 12:48
1

@AlbertChen use `split myLargeFile.txt slice_ -l 1000` then `cat slice_* > combined.txt` – Josh Withee Mar 24 '22 at 14:02
@JoshWithee is there a guarantee that the order keeps the same? – Albert Chen Mar 27 '22 at 15:27
@Nij, Thanks! The question needs to be edited to add your comment! – SedJ601 Feb 02 '23 at 15:58

score 3 · Answer 2 · answered Oct 16 '20 at 16:35

Below code split file every 500

@echo off
setlocal ENABLEDELAYEDEXPANSION
REM Edit this value to change the name of the file that needs splitting. Include the extension.
SET BFN=upload.txt
REM Edit this value to change the number of lines per file.
SET LPF=15000
REM Edit this value to change the name of each short file. It will be followed by a number indicating where it is in the list.
SET SFN=SplitFile

REM Do not change beyond this line.

SET SFX=%BFN:~-3%

SET /A LineNum=0
SET /A FileNum=1

For /F "delims==" %%l in (%BFN%) Do (
SET /A LineNum+=1

echo %%l >> %SFN%!FileNum!.%SFX%

if !LineNum! EQU !LPF! (
SET /A LineNum=0
SET /A FileNum+=1
)

)
endlocal
Pause

See below: https://forums.techguy.org/threads/solved-split-a-100000-line-csv-into-5000-line-csv-files-with-dos-batch.1023949/

bill · Answer 3 · 2015-08-04T02:43:39.640

Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
    Set rs = CreateObject("ADODB.Recordset")
    With rs
        .Fields.Append "LineNumber", 4 

        .Fields.Append "Txt", 201, 5000 
        .Open
        LineCount = 0
        Do Until Inp.AtEndOfStream
            LineCount = LineCount + 1
            .AddNew
            .Fields("LineNumber").value = LineCount
            .Fields("Txt").value = Inp.readline
            .UpDate
        Loop

        .Sort = "LineNumber ASC"

        If LCase(Arg(1)) = "t" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber < " & LCase(Arg(3)) + 1
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber > " & LCase(Arg(3))
            End If
        ElseIf LCase(Arg(1)) = "b" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber > " & LineCount - LCase(Arg(3))
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber < " & LineCount - LCase(Arg(3)) + 1
            End If
        End If

        Do While not .EOF
            Outp.writeline .Fields("Txt").Value

            .MoveNext
        Loop
    End With

Cut

filter cut {t|b} {i|x} NumOfLines

Cuts the number of lines from the top or bottom of file.

t - top of the file
b - bottom of the file
i - include n lines
x - exclude n lines

Example

cscript /nologo filter.vbs cut t i 5 < "%systemroot%\win.ini"

Another way This outputs lines 5001+, adapt for your use. This uses almost no memory.

Do Until Inp.AtEndOfStream
         Count = Count + 1
         If count > 5000 then
            OutP.WriteLine Inp.Readline
         End If
Loop

when i try to run the script i got error like filter.vbs(16, 13) Microsoft Cursor Engine: Out of memory. — Albin, Aug 04 '15 at 02:35
Do you have 32 or 64 bit windows. If 64 bit run it from the 64 bit version (`c:\windows\sysnative\cscript etc` - sysnative forces System32 files to be run rather than SysWoW64 files for32 bit processes). If 32 bit we need another technique, which will be specific rather than generic. — bill, Aug 04 '15 at 02:39
So I told you what to do in that case. Sysnative forces 64 bit. — bill, Aug 04 '15 at 03:59

score 2 · Answer 4 · edited Jan 02 '22 at 21:09

2

You must have Git Bash installed, and work inside that terminal/shell.

You can use the command split for this task. For example, this command entered into the command prompt

split YourLogFile.txt -b 500m

creates several files with a size of 500 MByte each. This will take several minutes for a file of your size. You can rename the output files (by default called "xaa", "xab",... and so on) to *.txt to open it in the editor of your choice.

Make sure to check the help file for the command. You can also split the log file by the number of lines or change the name of your output files.

tested on

Windows 7 64 bit
Windows 10 64 bit

edited Jan 02 '22 at 21:09

Maifee Ul Asad

3,992
6
38
86

answered Jul 27 '16 at 08:37

Wintermute

145
2
3

31

"'split' is not recognized as an internal or external command, operable program or batch file." - on Win 7 Ultimate SP1, 64 bit – Christopher J Smith Sep 06 '17 at 21:00
1

I believe you actually used gitbash's split – Eljah Jul 30 '19 at 13:15
ok, but if I want to select *.txt in Git Bash, why is not working? such as: split *.txt MyNewText -b 5m – Just Me Oct 25 '19 at 10:27
I tried on Windows 10 with GNU bash, version 4.4.23(1)-release and it worked as advertised – mico Jul 05 '22 at 09:00
It works just fine from "Git Bash" Terminal inside PyCharm IDE, Windows 10 host. Thanks! – Nikolai Varankine Oct 05 '22 at 14:08
-b 500m gives me tons of small files (500 bytes I think) on windows. I used -n 4 – Pavel Apr 04 '23 at 16:45

Zimba · Answer 5 · 2020-05-04T18:20:57.333

Of course there is! Win CMD can do a lot more than just split text files :)

Split a text file into separate files of 'max' lines each:

Split text file (max lines each):
: Initialize
set input=file.txt
set max=10000

set /a line=1 >nul
set /a file=1 >nul
set out=!file!_%input%
set /a max+=1 >nul

echo Number of lines in %input%:
find /c /v "" < %input%

: Split file
for /f "tokens=* delims=[" %i in ('type "%input%" ^| find /v /n ""') do (

if !line!==%max% (
set /a line=1 >nul
set /a file+=1 >nul
set out=!file!_%input%
echo Writing file: !out!
)

REM Write next file
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>out!
set /a line+=1 >nul
)

If above code hangs or crashes, this example code splits files faster (by writing data to intermediate files instead of keeping everything in memory):

eg. To split a file with 7,600 lines into smaller files of maximum 3000 lines.

Generate regexp string/pattern files with set command to be fed to /g flag of findstr

list1.txt

\[[0-9]\]
\[[0-9][0-9]\]
\[[0-9][0-9][0-9]\]
\[[0-2][0-9][0-9][0-9]\]

list2.txt

\[[3-5][0-9][0-9][0-9]\]

list3.txt

\[[6-9][0-9][0-9][0-9]\]

Split the file into smaller files:

type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt

remove prefixed line numbers for each file split:
eg. for the 1st file:

for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do (
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>file_1.txt)

Notes:
Works with leading whitespace, blank lines & whitespace lines.

Tested on Win 10 x64 CMD, on 4.4GB text file, 5651982 lines.

The test was done on Win 10 x64 command line, on a 4.4GB text file — Zimba, Jan 05 '23 at 17:23

How to split large text file in windows?

5 Answers5

That's it!

Linked