2

I have a BAT file that parses out the header of a CSV file and replaces spaces with underscores, then merges a series of CSV files here. The problem is, my header file is very long and is being truncated at 1024 characters. Does anyone have a suggestion? I’m hoping to not have to go to PowerShell or anything beyond basic batch programming if possible.

The only issue is with the headers.

@ECHO OFF
set Outputfolder=c:\Test

REM Get the header string out of one of the files
for %%I in (%outputFolder%\*_stats.csv)  do set /p HeaderString=< %%I

REM replace the spaces in that header string with underscores
SET HeaderString=%HeaderString: =_%

REM write that header as the first line of the output file
echo.%HeaderString%>%outputFolder%\all_stats_merged.csv

REM append the non-header lines from all the files
>>%outputFolder%\all_stats_merged.csv (
for %%I in (%outputFolder%\*_stats.csv)  do more +1 "%%I"
)
Jae
  • 143
  • 5
  • 2
    Usually, when you hit a limit like this, it's time to wonder if you're still using the right tool for the job. – steenbergh Mar 21 '18 at 12:29
  • Can you post your script? – MisterSmith Mar 21 '18 at 12:29
  • It really is time to go to PowerShell... – Jeff Zeitlin Mar 21 '18 at 12:30
  • The maximum length of a variable, including the variable name itself and the equals character, is either 2047 or 8191 characters (depending on the operating system). I cannot understand why, as stated, it is being truncated at 1024 characters, unless perhaps the character set used in your csv file is something other than the standard OEM codepage used by cmd.exe. Perhaps you could provide us with more details of the file itself and how many characters actually exist in your header and subsequent lines. – Compo Mar 21 '18 at 13:10
  • 2
    @Compo, as far as I know, `SET /P` only handles 1024. – Squashman Mar 21 '18 at 14:43
  • 2
    You other option is to use a `FOR /F` command to read one of the csv files and assign the first line of the file to a variable and then use a GOTO command to break out of the `FOR` command. – Squashman Mar 21 '18 at 14:45
  • 1
    @Squashman - You nailed the problem, though the limit is 1021, not 1024. I was writing my answer as you posted your comments. – dbenham Mar 21 '18 at 15:02
  • 1
    @dbenham, I also warned the user about the `MORE` command limitation in their previous question. My solution for that is to redirect the file into the `FIND` command and use the original header variable as a reverse find. `find /I /V "%oheader%" – Squashman Mar 21 '18 at 15:07
  • @Squashman - Yes, the MORE limitation could be important, and I like your FIND solution. – dbenham Mar 21 '18 at 15:09

2 Answers2

4

Your problem is you are using SET /P to read the header line, which is limited to 1021 characters (not 1024). See https://www.dostips.com/forum/viewtopic.php?f=3&t=2160 as well as https://www.dostips.com/forum/viewtopic.php?f=3&t=2160#p12339 for more info.

This can easily be solved if you switch to FOR /F which can read ~8k

for %%A in (%outputFolder%\*_stats.csv) do for /f "usebackq delims=" %%B in ("%%A") do (
  set "HeaderString=%%B"
  goto :break
)
:break
dbenham
  • 127,446
  • 28
  • 251
  • 390
0

Despite your preference to steer away from PowerShell, (if possible), give the following .ps1 a go and see if it helps change your preference:

$First = $True
GCI 'C:\test\*_stats.csv' | % {$Csv = $_
    $Lines = $Lines = GC $Csv
    $Write = Switch($First) {$True {$Line = $Lines | Select -First 1
            $Line.Replace(' ','_')
            $First = $False}
        $False {$Lines | Select -Skip 1}}
    AC 'C:\test\all_stats_merged.csv' $Write}

For completeness, here's an untested batch file attempt using FindStr which should be okay up to 8191 characters **:

@Echo Off
Set "OutputFolder=C:\Test"
Set "HeaderString="
For /F "Tokens=3* Delims=:" %%A In (
    'FindStr /N "^" "%OutputFolder%\*_stats.csv" 2^>Nul') Do (If "%%A"=="1" (
        If Not Defined HeaderString (Set "HeaderString=%%B"
            Call Echo %%HeaderString: =_%%)) Else Echo %%B
)>>"%OutputFolder%\all_stats_merged.csv"

The above .cmd file was designed with %OutputFolder% containing a drive specification; if its final value is without a drive letter, you'll need to change Tokens=3* to Tokens=2*.


** The 8191 characters will however include the full file path of each file, each lines line number and two colon separator characters.

Compo
  • 36,585
  • 5
  • 27
  • 39