0

I have 30 text files in a folder. I need to remove the line breaks/carriage breaks in the data for all of them. I'm already using batch for other tasks, so it would be nice to keep using it. Right now the batch is renaming all the extensions from .csv to .txt, then launching an excel file.

I've done a lot of searching and can't find anything quiet like I need. I have only briefly dabbled in batch scripting so painting things out in crayon so I can understand what is going on would be great.

Charles
  • 93
  • 8
  • Not going to be able to do it with a pure [tag:batch-file]. You can do it with Vbscript but it would require that the whole file be loaded into memory. What is the largest file size? – Squashman Feb 10 '17 at 15:25
  • They are around 7mb each. How does Vbscript work? Could I just code everything I'm trying to do with it? I know batch is obsolete so I'm willing to move languages if this is the way to go. – Charles Feb 10 '17 at 15:49
  • 3
    Well I disagree that batch-files are obsolete. They do certain things very well. http://ss64.com/vb/ – Squashman Feb 10 '17 at 16:17
  • Understood, so with vbscript could I change it into a executable type file (click to run) like a batch file? – Charles Feb 10 '17 at 16:27
  • @Squashman - it's definitely possible in batch if you allow the use of `certutil`. Sure, it's kind of time-consuming, but it's definitely possible. – SomethingDark Feb 10 '17 at 16:31
  • @SomethingDark, ah yes. I see where you are going with that. That would be an awesome hack. – Squashman Feb 10 '17 at 16:51
  • 1
    Yesterday, I had to do exactly what you are trying to do today: Remove carriage returns from a bunch of files. FART and FNR didn't do quite what I needed, so I ended up coding my own C# FARTer that I call from batch files. Let me know if you'd like to get your hands on the source code(C#) of my FARTer project, or perhaps just my exe. I also disagree that batch files are obsolete. – blaze_125 Feb 10 '17 at 16:59
  • Alright so which language is going to be be best for doing this and expandable? VBscript, C#, batch, etc. I know it's probably personal preference, but If I'm forced to learn another language to tackle this problem (I'm profiecient in HTML,CSS,VBA,SQL, and learning BATCH scripting now) how do I decide which way to go? I've seen other suggestions such as perl,python,java to tackle these types of problems as well. – Charles Feb 10 '17 at 17:13
  • You're not forced into learning another language. You are given the opportunity to learn a new language to tackle your problem at hand, or you can fallback on solutions that people have used before. If you're going to pickup a new language though, C# in Visual Studio is what I suggest. The documentation availability combined with the Visual Studio IDE makes it a great contender, and I think this combination is a great launching pad towards other object oriented languages that may not benefit from such a powerful IDE. – blaze_125 Feb 10 '17 at 18:22
  • Maybe a wrong use of words here. If it is optimal* to learn another language. I will look into C#. I am automating databases, excel documents, image processing, data feeds,reports and general business processes currently so we will see if this can be tied into those functions. Thanks for your time. – Charles Feb 10 '17 at 20:12
  • You are not clear - do you want to remove only Carriage Returns (0x0D, decimal 13) so that your result is in Unix format? Or do you also want to remove Linefeeds (0x0A, decmial 10) so that each file is one very long line. – dbenham Feb 11 '17 at 04:22
  • 1
    Either way, [JREPL.BAT](http://www.dostips.com/forum/viewtopic.php?t=6044) provides a convenient solution. To remove carriage returns from all files in the current directory, use `for %%F in (*) do call jrepl "\r" "" /m /f "%%F" /o -`. To also remove line feeds, simply change `"\r"` to `"[\r\n]"` – dbenham Feb 11 '17 at 04:25
  • Sorry for not being completely clear, as I am not familiar enough with the different types of line breaks to propose this more clearly. The problem is, when I'm loading this file into excel, there are areas that are being put on separate lines that shouldn't be, and when I open it in notepad, I see returns. I was not aware until I made this thread there is different types. – Charles Feb 13 '17 at 14:25
  • I also have had a friend show me howto do this in python, which seems to be a pretty nice language. I may take a stab at it with that as well and compare the results. – Charles Feb 13 '17 at 18:34

1 Answers1

3

You can use the built-in executable certutil to convert the file from ASCII to hexadecimal, process the resulting file with a for /f loop, strip any instances of 0d (which is the carriage return character in hexadecimal) from each line, and rebuild the file from the remaining hex code. I swear it's easier than I'm making it sound.

Please note that there is a maximum input file size limit of 71 MB due to a limitation with certutil, and that files larger than 2 MB can take a while to process, but at least everything is native to Windows so you don't have to install anything or learn a completely new language.

@echo off
setlocal enabledelayedexpansion

:: Ensure a file was passed in
if "%~1"=="" (
    echo Please provide a file to process.
    pause
    exit /b
)

:: Ensure file is under the certutil input limit
if %~z1 GTR 74472684 (
    echo This file exceeds the maximum file size of 74472684 bytes ^(71 MB^)
    echo Please use a smaller file.
    pause
    exit /b
)

:make_rand
:: Generate a random number to reduce the risk of filename collision
set rand=%RANDOM%%RANDOM%
set "temp_file=%~dpf1_%rand%.tmp"
set "hex_file=%~dpf1_%rand%.hex"
set "new_file=%~dpf1_new.%~x1"
if exist %temp_file% goto :make_rand
if exist %hex_file% goto :make_rand

if exist %new_file% choice /c:YN /M "%new_file% already exists. Overwrite? "
if %errorlevel% equ 1 del %new_file%
if %errorlevel% equ 2 exit /b

certutil -encodehex "%~1" "%temp_file%"

:: The script will break if you have spaces in your file path.
:: This is a feature, not a bug. Names your paths correctly.
for /f "tokens=1,*" %%A in (%temp_file%) do (
    set "line=%%B"
    set "hex_substring=!line:~0,48!"
    set "no_carriage=!hex_substring:0d=!"
    echo !no_carriage! >>%hex_file%
)

certutil -decodehex "%hex_file%" "%new_file%"

:: Temp file cleanup
del /q %hex_file%
del /q %temp_file%
SomethingDark
  • 13,229
  • 5
  • 50
  • 55
  • What exacly is: set "hex_file=%~dpf1_%rand%.hex" doing. I also looked up certutil and can't really figure out what it is used for in general. – Charles Feb 13 '17 at 14:02
  • 1
    @Charles - it sets the output file name to the input file name (and puts it in the same directory) but with an underscore and random number at the end. `certutil` in general is used for certificate activities, but can also be used for processing files at the hex level. – SomethingDark Feb 13 '17 at 14:10
  • Oh that is interesting. Where does this loop through all the files? (I also have a lot of spaces in directories, guess I will need to change that, although I'm not sure what the problem with that is) – Charles Feb 13 '17 at 14:22
  • 1
    Oh, I totally missed that you have multiple files. On the command line in the directory where the files are, you can type something like `for /f "delims=" %A in ('dir /b *.txt') do process.bat %A` where `process.bat` is whatever you called the script. – SomethingDark Feb 13 '17 at 22:50