1

I have a .csv file (generated from exporting a googleDoc spreadsheet) that I need to extract information from. The information does NOT contain a consistent delimiter.

I am currently using a comma (,) as a delimiter, which works fine when getting information from the first 4 columns.

However, when I want to extract information from column 8, I get incorrect data. This is because some cells contain 2 pieces of information split up by commas.

Cells with 2 pieces of information are given doublequotes (") at the start and end. Providing data like 1,"2,3",4

My splitter cannot recognise the difference between 1,2,3,4 and 1,"2,3",4 so the third value returns 3 for the first set and 3" for the second set, when it should return 4 for the second set (3 for the first set is expected)

Below is an extract of the .csv file I'm using.

A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",BLANK,Blank,,BLANK,
A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
,,,,,,7,Butter,Gold,GOLD,yellow
B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,

This is my current for loop that I'm using to extract the information, the outer forloop checks for empty data to ensure it's always the same column being returned. The inner forloop puts the data values into arrays.

SET originalCol=8
SET newCol=10
SET startRow=2
SET lastRow=45
SET rowsToSkip=1
SET /a i=0
SET /a totalValues=0
SET /a maxLines=%lastRow%-%startRow%
FOR /f "skip=%rowsToSkip% delims=" %%L in (%fileLocation%) DO (
    set "line=%%L,,,,,,,,"
    set "line=#!line:,=,#!"
    FOR /f "tokens=1,%originalCol%,%newCol% delims=," %%F IN ("!line!") DO (
        set "param1=%%F"
        set "param2=%%G"
        set "param3=%%H"
        set "param1=!param1:~1!"
        set "param2=!param2:~1!"
        set "param3=!param3:~1!"
        IF NOT #!param1!# == ## (
            SET /a lineCounter=!i!+%startRow%
            SET /a totalValues=!i!
            SET originalValuesList[!i!]=!param2!
            SET newValuesList[!i!]=!param3!
            IF !i! == %maxLines% (
                goto :copyingCSVDataComplete
            ) ELSE (
                SET /a i+=1
            )
        )
    )
)
echo.  originalValuesList [A] & echo [%originalValuesList[0]%, %originalValuesList[1]%, %originalValuesList[2]%, %originalValuesList[3]%, %originalValuesList[4]%, %originalValuesList[5]%, %originalValuesList[6]%, %originalValuesList[7]%]
echo.
echo.  originalValuesList [B] & echo [%originalValuesList[8]%]
echo.
echo.  newValuesList [A] & echo [%newValuesList[0]%, %newValuesList[1]%, %newValuesList[2]%, %newValuesList[3]%, %newValuesList[4]%, %newValuesList[5]%, %newValuesList[6]%, %newValuesList[7]%]
echo.
echo.  newValuesList [B] & echo [%newValuesList[8]%]

ACUTAL:

  originalValuesList [A]
[GOLD", GOLD2", "GOLD, "ZIRCON,  CHOCOLATE_CHIPS,  BUTTERCREAM",  BAKING_POWDER", ALEXANDRITE"]

  originalValuesList [B]
[ BUTTER"]



  newValuesList [A]
[Blank, FreeSpin, PERIDOT", TOPAZ", "EMERALD, BRONZE, BRONZE, Flour]

  newValuesList [B]
[EMERALD]

EXPECTED:

  originalValuesList [A]
[Blank, FreeSpin, Champagne, Cocoa, GoldLeaf, Sugar, IcingSugar, flour]

  originalValuesList [B]
[ChocolateChips]



  newValuesList [A]
[BLANK, FREE_SPIN, GARNET, LAPIS_LAZULI, GOLD3, EMERALD, JADE, BRONZE]

  newValuesList [B]
[ALEXANDRITE]

So, what I want is to use the same code but instead of splitting on a comma (,) delimiter, I want to split based on a regex. Something like (,"([A-Z]*),") | (,)

Is it possible to use regex in Batch and if so how can I use it to split strings?

jamesD
  • 55
  • 6
  • what happens when you set your variables as `set "param1=%%~F"` and the same for `%%~G` etc? – Gerhard May 23 '19 at 12:17
  • @GerhardBarnard I'm not sure what you mean. Saving the `%%F` to `param1` allows me to manipulate the string to remove the `#`. If you're asking what would happen if I use `%%~F` instead of `%%F` then the answer is there is no change. – jamesD May 23 '19 at 14:36
  • I have now gone through my googleDoc and replaced all commas(,) with dashes(-)and re-exported it to a .csv file which fixed the issue as it no longer breaks between cells. But if anyone has an answer for the above question, it would be really good to know, and may help others in future :) – jamesD May 23 '19 at 14:38
  • Your code has nothing to do with regex. So what is your question? Do you want to know how to fix your code? Or are you looking for an entirely new approach using regex? Note that regex would require inclusion of a third party utility, or PowerShell, or CSCRIPT (JScript or VBS). – dbenham May 23 '19 at 14:51
  • @dbenham Ahh, I thought I had included the regex, I must have forgot, I'll edit the question in a sec. I want to use the same code but instead of splitting on a comma (,) delimiter, I want to split based on a regex. Something like `(,"([A-Z]*),") | (,)` Worth noting, I'm new to regex so that could be very wrong, but I hope you get what I mean haha – jamesD May 23 '19 at 15:02
  • see `for /?` and read about variable expansion. `%%~f` should remove surrounding double quotes from the variable. – Gerhard May 23 '19 at 16:05

1 Answers1

2

First off, PowerShell has the built in ability to parse and manipulate CSV documents, so that would be a better option. But I will stick with batch processing.

Regular Expression solution

Regular expressions are no good to a pure native batch solution for two reasons:

  • It is impossible to alter FOR /F behavior to parse tokens by regular expressions - it is what it is - very limited.
  • To parse your file with FOR /F you would need to manipulate each line prior to parsing. Batch does not have any regex utility that can alter content. It only has FINDSTR which can do very crude regex searches, but it always returns the original matching line. On top of that, the FINDSTR regex is so crippled, I'm not sure you could properly parse a CSV anyway.

You could use custom JScript or VBScript via CSCRIPT to preprocess the file with a regular expression search and replace in such a way that FOR /F could then parse the file. I have already written a hybrid JScript/batch regular expression processing utility called JREPL.BAT that works well for this.

A quoted CSV field can contain quote literals, in which case the quote liberals are doubled. The following regex would match any CSV token (not including the comma delimiter) ("(?:""|[^"])*"|[^,"]*). It looks for a quote followed by any number of non-quote characters and/or doubled quotes, followed by a closing quote or any number of characters not including quote or comma. But your CSV does not contain any doubled quote literals, so the regex can be simplified to ("[^"]*"|[^,"]*).

CSCRIPT has no mechanism to pass quote literals within arguments, so JREPL has an /XSEQ option to enable extended escape sequence support, including \q to represent ". The other option is to use the standard \x22 sequence. JREPL "(\q[^\q]*\q|[^,\q]*)," "$1;" /XSEQ /F "test.csv" will match any token (possibly empty) followed by a comma delimiter, and preserve the token and replace the comma with a semicolon.

But that still leaves empty tokens, and FOR /F does not properly parse empty tokens. So I can throw a bit of JSCRIPT into the replacement term to remove any existing quotes, and then surround each token with quotes (except for the last one, where it isn't needed)
JREPL "(\q[^\q]*\q|[^,\q]*)," "$txt='\q'+$1.replace(/'\q'/,'')+'\q;'" /JQ /XSEQ /F "test.csv"

Here is a demonstration showing how it could be used to parse your CSV:

@echo off
for /f "tokens=1-11 delims=;" %%A in (
  'JREPL "(\q[^\q]*\q|[^,\q]*)," "$txt='\x22'+$1.replace(/\x22/g,'')+'\x22;'" /JQ /XSEQ /F test.csv'
) do (
  echo A=%%~A
  echo B=%%~B
  echo C=%%~C
  echo D=%%~D
  echo E=%%~E
  echo F=%%~F
  echo G=%%~G
  echo H=%%~H
  echo I=%%~I
  echo J=%%~J
  echo K=%%~K
  echo(
)

--OUTPUT--

A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H="This
I="BLANK""
J=
K=BLANK

A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=

A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=

A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue

A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow

A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green

A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green

A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow

A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow

A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=

But I wouldn't use regluar expressions for this. There are other ways.

Pure native batch solution

Believe it or not, it is not that hard to use nothing but internal batch commands to manipulate each line such that a FOR /F can parse all the tokens.

There are two things that need to happen with your CSV:

1) Unquoted comma delimiters must be transformed into some other character that does not appear in your file, leaving the quoted commas alone. I can use a derivative of a technique that jeb developed to differentiate between quoted and unquoted characters: When variables are expanded with percent expansion, escaped characters like ^, are treated differently depending on whether they are quoted or not. Typically ^, becomes ,, and "^," remains unchanged. But if you use CALL, then "^," becomes "^^,", and ^, remains unchanged. Either way, it is then possible to differentiate between quoted and unquoted characters.

2) FOR /F cannot parse empty tokens, so empty tokens must be enclosed by quotes. It is easiest to simply enclose all tokens in quotes.

@echo off
setlocal enableDelayedExpansion
for /f "usebackq delims=" %%A in ("test.csv") do (

  %= Print out the raw line so we can verify the end result =%
  echo %%A

  %= Preprocess the line so it is safe to parse =%
  set "ln=%%A"           %= Transfer line to environment variable =%

  %= Artifact of CALL - Convert quoted , to ^^; and unquoted , to ^;        =%
  %= Make sure unquoted SET statement does not have any trailing characters =%
  call set ln=%%ln:,=^^;%%

  set "ln=!ln:^^;=,!"    %= Convert quoted ^^; back into ,                         =%
  set "ln=!ln:^;=;!"     %= Convert unquoted ^; to ;                               =%
  set "ln=!ln:"=!"       %= Strip all quotes so we can safely do next step         =%
  set "ln="!ln:;=";"!""  %= Enclose all tokens in quotes to protect empty tokens   =%

  %= The line is now ready to parse with another FOR /F     =%
  %= I simply print the value of all 11 tokens, 1 per line. =%
  %= Adjust the loop as needed to suit your needs.          =%
  for /f "tokens=1-11 delims=;" %%A in ("!ln!") do (
    for %%a in (A B C D E F G H I J K) do call :echoToken %%a
    echo(
  )
)
exit /b

:echoToken  Char
for %%. in (.) do echo %1=%%~%1
exit /b

Here is the same code without all the comments:

@echo off
setlocal enableDelayedExpansion
for /f "usebackq delims=" %%A in ("test.csv") do (
  echo %%A
  set "ln=%%A"
  call set ln=%%ln:,=^^;%%
  set "ln=!ln:^^;=,!"
  set "ln=!ln:^;=;!"
  set "ln=!ln:"=!"
  set "ln="!ln:;=";"!""
  for /f "tokens=1-11 delims=;" %%A in ("!ln!") do (
    for %%a in (A B C D E F G H I J K) do call :echoToken %%a
    echo(
  )
)
exit /b

:echoToken  Char
for %%. in (.) do echo %1=%%~%1
exit /b

-- OUTPUT ---

A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H=This,BLANK
I=
J=BLANK
K=

A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=

A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=

A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue

A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow

A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green

A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green

A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow

,,,,,,7,Butter,Gold,GOLD,yellow
A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow

B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,
A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=

But there are many possible situations that can make parsing CSV much more complicated.

Here is a robust pure batch solution that can parse any CSV as long as there are no newlines within fields, and no line length approaches the 8191 byte batch limit, and you don't need to parse more than 31 tokens. The code is heavily commented in an effort to explain all the steps required.

@echo off
setlocal enableDelayedExpansion

:: Must use arcane FOR /F option syntax to disable both EOL and DELIMS.
for /f usebackq^ delims^=^ eol^= %%A in ("test2.csv") do call :processLine
:: I CALL out of the loop to a :subroutine because a single CALL :subroutine
:: is much faster than many CALL SET statements. It also simplifies the
:: management of delayed expansion.

exit /b


:processLine

:: Must disable delayed expansion so percent expansion does not corrupt ! or ^ literals.
setlocal disableDelayedExpansion

:: FOR variables are global - this extra FOR loop exposes %%A that would otherwise be hidden.
for %%. in (.) do set "ln=%%A"

:: Print out raw line so we can diagnose the result.
set ln

:: "Hide" quotes by doubling, making all characters safe for percent expansion when
:: entire string is quoted. Also enclose line within extra set of , delimiters.
set "ln=,%ln:"=""%,"

:: Escape poison characters so all characters are safe for unquoted percent expansion.
set "ln=%ln:^=^^^^%" %= Double escaped to account for enabled delayed expansion later on. =%
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"

:: Double escape ! so not corrupted by later percent expansion while delayed expansion enabled.
set "ln=%ln:!=^^!%"

:: Double and escape all commas.   , -> ^,^,
set "ln=%ln:,=^,^,%"

:: Undouble quotes and unescape (originally) unquoted strings. Note that outer quotes are escaped.
set ^"ln=%ln:""="%^"

:: At this point quoted comma literals are still ^,^, whereas unquoted comma delimiters are ,,
:: Also, all quoted poison characters are still escaped, but unquoted ones are not.

:: Redouble quotes, all characters safe again for quoted percent expansion.
set "ln=%ln:"=""%"

:: Encode @ as @a and quoted comma literals ^,^, as @c
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"

:: Restore delayed expansion and undouble quotes, which unescapes (originally) quoted strings.
:: Note that outer quotes are NOT escaped this time. The ENDLOCAL and SET are on the same
:: line so that the percent expansion value is transferred across the ENDLOCAL barrier.
endlocal & set "ln=%ln:""="%" !   %= Trailing ! is ignored except forces all ^^ to become ^ =%

:: At this point no characters are escaped, and all ! and ^ are unprotected against percent or
:: FOR variable expansion while delayed expansion is enabled.

:: Remove enclosing quotes from tokens that are already quoted so we can later safely enclose
:: all tokens in quotes. This is why the extra enclosing , were added at the beginning.
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"

:: Remove outer , delimiters that were added at the beginning.
set "ln=!ln:~2,-2!"

:: Must double escape ! and ^ again to protect against delayed expansion within parsing FOR /F loop.
set "ln=!ln:^=^^^^!"
set "ln=%ln:!=^^^!%"

:: Undouble remaining quotes because quote literals are doubled within original CSV.
set "ln=!ln:""="!"

:: Restore doubled ,, delimiters to , and enclose all tokens within quotes to preserves empty tokens.
set "ln="!ln:,,=","!"" !

:: The line is now safe to parse with FOR /F, though @ and , are encoded as @a and @c

:: Parse line into tokens.
for /f "tokens=1-11 delims=," %%A in ("!ln!") do (

  %= Decode the tokens and store result in environment variables =%
  for %%a in (A B C D E F G H I J K) do call :decodeToken %%a

  %= Your processing goes here. Decoded %%A - %%K are now safely in !A! - !K! =%
  %= I will simply echo all the values, one per line =%
  for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
  echo(
)
exit /b


:decodeToken  Char
:: Converts @c and @a back into , and @
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
  set "%1=!%1:@c=,!"
  set "%1=!%1:@a=@!"
)
exit /b

Here is the same code without all the comments:

@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("test2.csv") do call :processLine
exit /b

:processLine
setlocal disableDelayedExpansion
for %%. in (.) do set "ln=%%A"
set ln
set "ln=,%ln:"=""%,"
set "ln=%ln:^=^^^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:!=^^!%"
set "ln=%ln:,=^,^,%"
set ^"ln=%ln:""="%^"
set "ln=%ln:"=""%"
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"
endlocal & set "ln=%ln:""="%" !
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"
set "ln=!ln:~2,-2!"
set "ln=!ln:^=^^^^!"
set "ln=%ln:!=^^^!%"
set "ln=!ln:""="!"
set "ln="!ln:,,=","!"" !
for /f "tokens=1-11 delims=," %%A in ("!ln!") do (
  for %%a in (A B C D E F G H I J K) do call :decodeToken %%a
  for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
  echo(
)
exit /b

:decodeToken  Char
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
  set "%1=!%1:@c=,!"
  set "%1=!%1:@a=@!"
)
exit /b

Here is your sample CSV file with an additional line added to test the various complications:

;A!,"B!","C is ""cool""",D @^&|<>,"E @^&|<>","F ,x","G ""@^&|<>""","H ""@^&|<>!""",I,J,K
A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
,,,,,,7,Butter,Gold,GOLD,yellow
B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,

And here is the final output:

ln=;A!,"B!","C is ""cool""",D @^&|<>,"E @^&|<>","F ,x","G ""@^&|<>""","H ""@^&|<>!""",I,J,K
A=;A!
B=B!
C=C is "cool"
D=D @^&|<>
E=E @^&|<>
F=F ,x
G=G "@^&|<>"
H=H "@^&|<>!"
I=I
J=J
K=K

ln=A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H=This,"BLANK"
I=
J=BLANK
K=

ln=A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=

ln=A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=

ln=A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue

ln=A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow

ln=A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green

ln=A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green

ln=A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow

ln=,,,,,,7,Butter,Gold,GOLD,yellow
A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow

ln=B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,
A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=

See This DosTips post for a demonstration of how to extend this technique to parse more than 32 fields.

Hybrid JScript/batch parseCSV.bat utility

The pure batch requires a lot of code that is difficult to create on the fly, and it is relatively slow. I have created parseCSV.bat - A hybrid JScript/batch utility that quickly formats nearly any CSV into something that can be easily parsed by FOR /F. It even supports newlines within fields.

Of course parseCSV cannot solve the 8191 line length limit, and parsing more than 32 tokens still takes additional code.

parseCSV.bat does not use regular expressions.

I won't go into details on how it works. Full documentation is built into the utility, which is available by entering parseCSV /? from the command line. The output of the help follows:

parseCSV  [/option]...

  Parse stdin as CSV and write it to stdout in a way that can be safely
  parsed by FOR /F. All columns will be enclosed by quotes so that empty
  columns may be preserved. It also supports delimiters, newlines, and
  escaped quotes within quoted values. Two consecutive quotes within a
  quoted value are converted into one quote by default.

  Available options:

    /I:string = Input delimiter. Default is a comma (,)

    /O:string = Output delimiter. Default is a comma (,)

         The entire option must be quoted if specifying poison character
         or whitespace literals as a delimiters for /I or /O.

         Examples:  pipe = "/I:|"
                   space = "/I: "

         Standard JScript escape sequences can also be used.

         Examples:       tab = /I:\t  or  /I:\x09
                   backslash = /I:\\

    /E = Encode output delimiter literal within value as \D
         Encode newline within value as \N
         Encode backslash within value as \S

    /D = escape exclamation point and caret for Delayed expansion
         ! becomes ^!
         ^ becomes ^^

    /L = treat all input quotes as quote Literals

    /Q:QuoteOutputFormat

       Controls output of Quotes, where QuoteOutputFormat may be any
       one of the following:

         L = all columns quoted, quote Literals output as "   (Default)
         E = all columns quoted, quote literals Escaped as ""
         N = No columns quoted, quote literals output as "

       The /Q:E and /Q:N options are useful for transforming data for
       purposes other than parsing by FOR /F

    /U = Write unix style lines with newline (\n) instead of the default
         Windows style of carriage return and linefeed (\r\n).

parseCSV  /?

  Display this help

parseCSV  /V

  Display the version of parseCSV.bat

parseCSV.bat was written by Dave Benham. Updates are available at the original
posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702

Here is how parseCSV.bat could be used with the test2.csv from above.

@echo off
setlocal enableDelayedExpansion
for /f "tokens=1-11 delims=," %%A in (
  'parseCSV /E /D ^<test2.csv'
) do (
  %= Decode Tokens =%
  for %%a in (A B C D E F G H I J K) do call :decodeToken %%a
  %= Show the results =%
  for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
  echo(
)
exit /b

:decodeToken
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
  set "%1=!%1:\D=,!"
  set "%1=!%1:\S=\!"
)
exit /b

See This DosTips post for a demonstration how to extend this technique to parse more than 32 fields.

aschipfl
  • 33,626
  • 12
  • 54
  • 99
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • This is great thanks! Very detailed explanation and easy to understand. I used the Pure Native Batch Solution, and it worked great – jamesD May 28 '19 at 09:59