6

I have a csv file and i need to split it in to n files such that each split file should not exceed 100 mb. I need to achieve it in windows batch script. I tried the below way but its taking lot of time as my unsplit file is in GBs

@echo off
setlocal enableextensions enabledelayedexpansion
set count=1
set maxbytesize=100000000
set size=1
type NUL > output_1.csv

FOR /F  "tokens=*" %%i in (myfile.csv) do (
FOR /F "usebackq" %%A in ('!filename!_!count!.csv') do (
set size=%%~zA) 
if !size! LSS !maxbytesize! (
echo %%i>>!filename!_!count!.csv) else (
set /a count+=1 
echo %%i>>!filename!_!count!.csv 
))

please let me know if there is a better way to achieve this. I cant go to any other scripting languages as my server is windows

Amnon
  • 2,212
  • 1
  • 19
  • 35
chethan
  • 61
  • 1
  • 1
  • 2
  • ya, you're basically having to check if the file is GTR than 100000000 with every iteration, which is probably why this is taking so long. Are you uninterested in the use of Powershell or VB in this case, if so are you willing to download any third party software like 7zip that would allow this to be done easier? – rud3y Mar 08 '13 at 15:10
  • The main brake is `for` command (you can check it by running empty `FOR /F "tokens=*" %%i in (myfile.csv) do ()` loop) so you've nothing to do with it. I'd recommend using more high-level languages. – Fr0sT Feb 03 '16 at 07:08

1 Answers1

2

This would do the trick assuming your lines are roughly the same size.

Its advantage is that it is only a 2 pass solution, One for counting the lines and the other for printing them.

@rem echo off

@rem usage: batchsplit.bat <file-to-split> <size-limit>
@rem it will generate files named <file-to-split>.part_NNN

setlocal EnableDelayedExpansion

set FILE_TO_SPLIT=%1
set SIZE_LIMIT=%2

for /f %%s in ('dir /b %FILE_TO_SPLIT%') do set SIZE=%%~Zs
for /f %%c in ('type "%FILE_TO_SPLIT%"^|find "" /v /c') do set LINE_COUNT=%%c

set /a AVG_LINE_SIZE=%SIZE%/%LINE_COUNT%
set /a LINES_PER_PART=%SIZE_LIMIT%/%AVG_LINE_SIZE%

set "cmd=findstr /R /N "^^" %FILE_TO_SPLIT%"

for /f "tokens=1,2* delims=:" %%a in ('!cmd!') do @(
    set /a ccc = %%a / %LINES_PER_PART%
    echo %%b >> %FILE_TO_SPLIT%.part_!ccc!
)

save it as batchsplit.bat and run it using:

batchsplit.bat myfile.csv 100000000
Amnon
  • 2,212
  • 1
  • 19
  • 35
  • `set SIZE=%~Z1` will work as well. Anyway your solution is quite nice (especially the method of determining line count in file) but slow too. The main brake is `for` command. – Fr0sT Feb 03 '16 at 07:02
  • Thanks @Fr0sT. Good point about setting the size. I know 'for' is slow, but that was the requirement. – Amnon Feb 03 '16 at 07:27