OK, my first attempt was unbearably slow. Here is a good solution that was able to process a 1.8 GB file in 2 min 48 sec :-)
I used hybrid batch/JScript, so it runs on any Windows machine from XP onward - no 3rd party exe file is needed, nor is any compilation needed.
I read and write ~1 MB chunks. The logic is actually pretty simple.
I replace all \r\n with a single space, and #@#@# with \r\n. You can easily change the string values in the code to suit your needs.
fixLines.bat
@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment
::--- Batch section within JScript comment that calls the internal JScript ----
@echo off
setlocal disableDelayedExpansion
if "%~1" equ "" (
echo Error: missing input argument
exit /b 1
)
if "%~2" equ "" (
set "out=%~f1.new"
) else (
set "out=%~2"
)
<"%~1" >"%out%" cscript //nologo //E:JScript "%~f0"
if "%~2" equ "" move /y "%out%" "%~1" >nul
exit /b
----- End of JScript comment, beginning of normal JScript ------------------*/
var delim='#@#@#',
delimReplace='\r\n',
nl='\r\n',
nlReplace=' ',
pos=0,
str='';
var delimRegex=new RegExp(delim,"g"),
nlRegex=new RegExp(nl,"g");
while( !WScript.StdIn.AtEndOfStream ) {
str=str.substring(pos)+WScript.StdIn.Read(1000000);
pos=str.lastIndexOf(delim)
if (pos>=0) {
pos+=delim.length;
WScript.StdOut.Write(str.substring(0,pos).replace(nlRegex,nlReplace).replace(delimRegex,delimReplace));
} else {
pos=0
}
}
if (str.length>pos) WScript.StdOut.Write(str.substring(pos).replace(nlRegex,nlReplace));
To fix input.txt and write the output to output.txt:
fixLines input.txt output.txt
To overwrite the original file test.txt
fixLines test.txt
Just for kicks, I attempted to process the 1.8 GB file using JREPL.BAT. I didn't think it would work because it must load the entire file into memory. It doesn't matter how much memory is installed in the computer - JScript is limited to 2GB max string size. And I think there are additional constraints that come into play.
jrepl "\r?\n:#@#@#" " :\r\n" /m /x /t : /f input.txt /o output.txt
It took 5 minutes for the command to fail with an "Out Of Memory" error. And then it took a long time for my computer to recover from the serious abuse of memory.
Below is my original custom batch/JScript solution that reads and writes one character at a time.
slow.bat
@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment
::--- Batch section within JScript comment that calls the internal JScript ----
@echo off
setlocal disableDelayedExpansion
if "%~1" equ "" (
echo Error: missing input argument
exit /b 1
)
if "%~2" equ "" (
set "out=%~f1.new"
) else (
set "out=%~2"
)
<"%~1" >"%out%" cscript //nologo //E:JScript "%~f0"
if "%~2" equ "" move /y "%out%" "%~1" >nul
exit /b
----- End of JScript comment, beginning of normal JScript ------------------*/
var delim='#@#@#',
delimReplace='\r\n',
nlReplace=' ',
read=1,
write=2,
pos=0,
char;
while( !WScript.StdIn.AtEndOfStream ) {
chr=WScript.StdIn.Read(1);
if (chr==delim.charAt(pos)) {
if (++pos==delim.length) {
WScript.StdOut.Write(delimReplace);
pos=0;
}
} else {
if (pos) {
WScript.StdOut.Write(delim.substring(0,pos));
pos=0;
}
if (chr=='\n') {
WScript.StdOut.Write(nlReplace);
} else if (chr!='\r') {
WScript.StdOut.Write(chr);
}
}
}
if (pos) WScript.StdOut.Write(delim.substring(0,pos));
It worked, but it was a dog. Here is a summary of timing results to process a 155 MB file:
slow.bat 3120 sec (52 min)
jrepl.bat 55 sec
fixLines.bat 15 sec
I verified that all three solutions gave the same result.