0

I'm currently trying to parse a csv file in batch but am unable to because of extra commas within "------,----" at the beginning. Also some cvs files don't contain this field so I'm unable to just move my tokens. Here is an example of the csv file:

Datasheets,Image,Digi-Key Part Number,Manufacturer Part Number,Manufacturer,Description,Quantity Available,Factory Stock,Unit Price (USD),@ qty,Minimum Quantity,"Packaging","Series","Part Status","Capacitance","Tolerance","Voltage - Rated","Dielectric Material","Number of Capacitors","Circuit Type","Temperature Coefficient","Ratings","Mounting Type","Package / Case","Size / Dimension","Height - Seated (Max)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10582TR-ND,ECJ-RVC1H150K,Panasonic Electronic Components,CAP ARRAY 15PF 50V NP0 1206,0,0,"Obsolete","0","4000","Tape & Reel (TR)","ECJ-R","Obsolete","15pF","±10%","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10582CT-ND,ECJ-RVC1H150K,Panasonic Electronic Components,CAP ARRAY 15PF 50V NP0 1206,1801,0,"0.45000","0","1","Cut Tape (CT)","ECJ-R","Obsolete","15pF","±10%","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10582DKR-ND,ECJ-RVC1H150K,Panasonic Electronic Components,CAP ARRAY 15PF 50V NP0 1206,1801,0,"Digi-Reel","0","1","Digi-Reel®","ECJ-R","Obsolete","15pF","±10%","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10580TR-ND,ECJ-RVC1H100F,Panasonic Electronic Components,CAP ARRAY 10PF 50V NP0 1206,0,0,"Obsolete","0","4000","Tape & Reel (TR)","ECJ-R","Obsolete","10pF","±1pF","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"
"//media.digikey.com/pdf/Data%20Sheets/Panasonic%20Capacitors%20PDFs/ECJ-R,ECJ-T_4-Array.pdf",//media.digikey.com/photos/Panasonic%20Photos/ECJ-R%201206%20SERIES.jpg,P10580CT-ND,ECJ-RVC1H100F,Panasonic Electronic Components,CAP ARRAY 10PF 50V NP0 1206,0,0,"Obsolete","0","1","Cut Tape (CT)","ECJ-R","Obsolete","10pF","±1pF","50V","Ceramic","4","Isolated","C0G, NP0","-","Surface Mount","1206 (3216 Metric)","0.126"" L x 0.063"" W (3.20mm x 1.60mm)","0.037"" (0.95mm)"

Here is an example of my code:

FOR /F "skip=1 tokens=3-6 delims=, " %%A IN (File.csv) DO (

ECHO %%A,%%B,%%D,%%C
)
epickmatt
  • 75
  • 1
  • 6
  • https://stackoverflow.com/a/32969382/1243048 – Maelig Jun 04 '19 at 14:07
  • 6
    Well, the CSV data are 100% correct according to [RFC 4180](https://www.rfc-editor.org/info/rfc4180), very good explained in Wikipedia article about [comma-separated values](https://en.wikipedia.org/wiki/Comma-separated_values). The mistake made here was the decision to parse the CSV file with a batch file processed by Windows __command__ processor `cmd.exe` designed for executing commands and applications and not really for anything else. It would be much better to use VBScript or PowerShell for this task which have both built-in support for parsing CSV files. – Mofi Jun 04 '19 at 14:21
  • 2
    Can confirm, the CSV file is correct. For example, in Python, parsing it takes no more effort than `reader = csv.DictReader(csvfile)`. – Tim Pietzcker Jun 04 '19 at 14:25
  • 1
    So in essence you want to get four columns 3-6 and reverse 5th and 6th on output to get `"Digi-Key Part Number","Manufacturer Part Number","Description","Manufacturer"`. As suggested using a script language which ***understands*** csv is much more appropriate, here powershell: `import-csv .\File.csv -Header (1..6|%{"H$_"})|Select H3,H4,H6,H5|ConvertTo-Csv -Not|select-object -skip 1|set-content newfile.csv` If you really want to drop the header, increase `-skip 2` –  Jun 04 '19 at 17:17

2 Answers2

2

Funny that this question came up. A couple weeks ago I solved a very similar problem where a FOR /F needed to parse a CSV with commas in the values. My answer included a pure batch solution. In that answer I also explained the many issues that make parsing CSV with pure batch difficult.

I've refactored that technique into reusable :processLine and :decodeToken routines below. The routines require delayed expansion to be enabled before the main processing loop. The technique is designed to put each FOR /F token value into a similarly named environment variable. Enclosing quotes are removed, and doubled "" within values (if they exist) are reduced to ".

The outer loop at the top calls the routines, redoubles all ", reorders the fields, and encloses each field within quotes. The outer loop can be easily reformulated to do any manipulation you want. The :processLine and :parseToken routines at the bottom never have to change.

The code below is ~5 times faster than the aschipfl answer. The output is identical, except my code encloses each field within quotes, even those where it is not required. That is perfectly acceptable for CSV.

@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("test.csv") do (
  call :processLine A ln
  for /f "tokens=3-6 delims=," %%A in ("!ln!") do (
    for %%v in (A B C D) do call :decodeToken %%v
    echo "!A:"=""!","!B:"=""!","!D:"=""!","!C:"=""!"
  )
)
exit /b


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: The following routines will work for any CSV as long as no field contains \n
:: and no line approaches the 8191 character limit.

:processLine  forVarCharIn  envVarOut
::
:: Prepares CSV line stored in FOR variable %%forVarIn to be safely parsed by
:: FOR /F with delayed expansion enabled. The result is stored in environment
:: variable envVarOut.
:: 
:: All "" become "
:: All @ become @a
:: All quoted , become @c
:: All ^ become ^^
:: All ! become ^!
:: All fields are enclosed within quotes
:: 
setlocal
setlocal disableDelayedExpansion
for %%. in (.) do set "ln=%%%1"
set "ln=,%ln:"=""%,"
set "ln=%ln:^=^^^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:!=^^!%"
set "ln=%ln:,=^,^,%"
set ^"ln=%ln:""="%^"
set "ln=%ln:"=""%"
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"
endlocal & set "ln=%ln:""="%" !
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"
set "ln=!ln:~2,-2!"
set "ln=!ln:^=^^^^!"
endlocal&set "%2=%ln:!=^^^!%"
set "%2=!%2:""="!"
set "%2="!%2:,,=","!"" !
exit /b

:decodeToken  V
::
:: Decodes field in %%V and stores in environment variable V
:: All @c become ,
:: All @a become @
::
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
  set "%1=!%1:@c=,!"
  set "%1=!%1:@a=@!"
)
exit /b

If you are sure that none of your values contain " literals, then the loop at the top can be reduced to:

@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("test.csv") do (
  call :processLine A ln
  for /f "tokens=3-6 delims=," %%A in ("!ln!") do (
    for %%v in (A B C D) do call :decodeToken %%v
    echo "!A!","!B!","!D!","!C!"
  )
)
exit /b

Better yet, since none of the columns you want to preserve contain @ or , or ", then the top loop can be greatly simplified without any need for :parseToken, gaining another factor of 2 in performance (total of 10 times faster than the aschipfl answer):

@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in (%1) do (
  call :processLine A ln
  for /f "tokens=3-6 delims=," %%A in ("!ln!") do echo %%~A,%%~B,%%~D,%%~C
)
exit /b

These routines will work with any CSV as long as none of the CSV values contain a newline, and none of the processed lines exceed the 8191 character limit imposed by batch.

Also, all simple FOR /F techniques are limited to parsing a maximum of 32 tokens. Over at DosTips I demonstrate how to parse and process hundreds of CSV fields. It takes some sophisticated batch coding, but again the routines are reusable, so the outer loop is easy to manage.

dbenham
  • 127,446
  • 28
  • 251
  • 390
  • Yes this code works great, thank you @dbenham . Is there a way to make it skip the first line of entries? I tried adding skip=1 and it doesn't seem to work for me. – epickmatt Jun 06 '19 at 12:44
  • 1
    @epickmatt - Sure. Change the top most loop to `for /f skip^=1^ usebackq^ delims^=^ eol^= %%A in ...` – dbenham Jun 06 '19 at 13:31
  • @epickmatt - If you know that no line begins with `;`, then the loop options become more traditional - `for /f "skip=1 usebackq delims=" %%A in ...` – dbenham Jun 06 '19 at 13:33
1

Here is a pure approach that allows to extract and reorder specified columns of a CSV file. The column indexes and their order need to be defined in the constant _LIST at the top of the script:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=%~1"     & rem // (input CSV file; `%~1` is first argument)
set "_LIST=3 4 6 5" & rem // (list of one-based column indexes to return)

rem // Define temporary replacements into pseudo-array `$REPL[]`:
call :SUBSTARR $REPL

rem // Read input CSV file line by line:
for /F "delims=" %%L in ('findstr /N "^" "%_FILE%"') do (
    set "LINE=%%L"
    set /A "INUM=0, LNUM=LINE"
    setlocal EnableDelayedExpansion
    set "LINE=!LINE:*:=!"
    rem // Temporarily substitute standard token delimiters but `,`:
    if defined LINE set "LINE=!LINE:\=\b!"
    call :REPLCHAR LINE LINE "^!" "\m"
    for /F "tokens=2* delims=[=]" %%M in ('set $REPL') do (
        if "%%N" == "" (
            call :REPLCHAR LINE LINE "=" "%%M"
        ) else if "%%N" == "*" (
            call :REPLCHAR LINE LINE "*" "%%M"
        ) else (
            if defined LINE set "LINE=!LINE:%%N=%%M!"
        )
    )
    rem // Split line (row) into comma-separated items (fields, cells):
    for %%I in ('!LINE:^,^='^,'!') do (
        endlocal
        set /A "INUM+=1"
        set "ITEM=%%I"
        setlocal EnableDelayedExpansion
        set "ITEM=!ITEM:','=,!"
        for /F "delims=" %%J in ("$ITEM[!INUM!]=!ITEM:~1,-1!") do (
            endlocal & set "%%J"
            setlocal EnableDelayedExpansion
        )
    )
    rem // Rebuild line (row) as per specified list of column indexes:
    set "LINE=," & for %%I in (%_LIST%) do (
        if %%I gtr 0 if %%I leq !INUM! (
            set "LINE=!LINE!!$ITEM[%%I]!,"
        ) else set "LINE=!LINE!,"
    )
    rem // Revert substitution of standard token delimiters but `,`:
    for /F "tokens=2* delims=[=]" %%M in ('set $REPL') do (
        if "%%N" == "" (
            set "LINE=!LINE:%%M==!"
        ) else (
            set "LINE=!LINE:%%M=%%N!"
        )
    )
    call :REPLCHAR LINE LINE "\m" "^!"
    set "LINE=!LINE:\b=\!"
    rem // Return modified line (row):
    >&2 < nul set /P ="!LNUM!:"
    echo(!LINE:~1^,-1!
    endlocal
)

endlocal
exit /B


:NONPRINT
    rem // Obtain several non-printable characters:
    for /F "tokens=1-8 delims=#" %%S in ('
        forfiles /P "%~dp0." /M "%~nx0" /C ^
            "cmd /C echo/0x08#0x09#0x0B#0x0C#0x1A#0x1B#0x7F#0xFF"
    ') do (
        rem // Get back-space, horizontal & vertical tabulators and form-feed:
        set "_BS=%%S" & set "_HT=%%T" & set "_VT=%%U" & set "_FF=%%V"
        rem // Get substitute (end-of-file), escape, delete and fixed space:
        set "_SS=%%W" & set "_ES=%%X" & set "_DE=%%Y" & set "_XX=%%Z"
    )
    exit /B


:SUBSTARR  <rtn_array>
    rem // Obtain non-printable token delimiters:
    call :NONPRINT
    rem // Define substitutions by a pseudo-array:
    for %%R in (
        "[\i]=;"
        "[\e]=="
        "[\s]= "
        "[\t]=%_HT%"
        "[\v]=%_VT%"
        "[\f]=%_FF%"
        "[\x]=%_XX%"
    ) do set "%~1%%~R"
    rem // Define wildcards as substitutions too:
    set "%~1[\a]=*"
    set "%~1[\q]=?"
    set "%~1[\l]=<"
    set "%~1[\g]=>"
    rem set "%~1[\m]=!"
    rem set "%~1[\b]=\"
    rem set "%~1[\c]=,"
    exit /B


:LENGTH  <rtn_length>  <ref_string>
    rem // Determine length of a string:
    setlocal EnableDelayedExpansion
    set "STR=!%~2!"
    if not defined STR (set /A "LEN=0") else (set /A "LEN=1")
    for %%L in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if defined STR (
            set "INT=!STR:~%%L!"
            if not "!INT!" == "" set /A "LEN+=%%L" & set "STR=!INT!"
        )
    )
    endlocal & set "%~1=%LEN%"
    exit /B


:REPLCHAR  <rtn_string>  <ref_string>  <val_char>  <val_replace>
    rem // Replace given character in a string by another string:
    setlocal
    set "DXF=!"
    setlocal DisableDelayedExpansion
    set "CHR=%~3"
    set "RPL=%~4"
    setlocal EnableDelayedExpansion
    set "STR=!%~2!"
    if defined CHR (
        call :LENGTH LEN STR
        call :LENGTH LCH CHR
        set /A "LEN-=1" & for /L %%P in (!LEN!,-1,0) do (
            for %%O in (!LCH!) do (
                if "!STR:~%%P,%%O!" == "!CHR!" (
                    set /A "INC=%%P+%%O" & for %%Q in (!INC!) do (
                        set "STR=!STR:~,%%P!!RPL!!STR:~%%Q!"
                    )
                )
            )
        )
    )
    if not defined DXF if defined STR set "STR=!STR:"=""!"
    if not defined DXF if defined STR set "STR=!STR:^=^^^^!"
    if not defined DXF if defined STR set "STR=%STR:!=^^^!%" !
    if not defined DXF if defined STR set "STR=!STR:""="!"
    for /F "delims=" %%E in (^""!STR!"^") do (
        endlocal & endlocal & endlocal & set "%~1=%%~E" !
    )
    exit /B

The complicated thing is to handle both unquoted and quoted separators (,) properly; that explains the size of this script.

Given that the script is called reconstruct-csv.bat and the input CSV file is named File.csv, run it with the following command line:

reconstruct-csv.bat "File.csv"

To write the output to another CSV file, say File_NEW.csv, rather than to display it, use this:

reconstruct-csv.bat "File.csv" > "File_NEW.csv"
aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • Thank you @aschipfl , this solution works must better then anything I have tried. It seems to work the majority of the time. An exception would be this csv file https://www.digikey.ca/products/en?keywords=CD74HC154M96 and click download table. – epickmatt Jun 05 '19 at 20:07
  • Oops, the problem was the question mark `?` in the data; fixed now (same for `<` and `>`; any `*` and `~` might still be problematic, but I'll check)... – aschipfl Jun 05 '19 at 20:34
  • Works great, @aschipfl . It hasn't failed a case yet. – epickmatt Jun 05 '19 at 20:54
  • Problems with `*` as well as `!` and `^` fixed, even empty cells are now treated as expected! The `~` was never a problem... – aschipfl Jun 06 '19 at 00:14