First off, PowerShell has the built in ability to parse and manipulate CSV documents, so that would be a better option. But I will stick with batch processing.
Regular Expression solution
Regular expressions are no good to a pure native batch solution for two reasons:
- It is impossible to alter FOR /F behavior to parse tokens by regular expressions - it is what it is - very limited.
- To parse your file with FOR /F you would need to manipulate each line prior to parsing. Batch does not have any regex utility that can alter content. It only has FINDSTR which can do very crude regex searches, but it always returns the original matching line. On top of that, the FINDSTR regex is so crippled, I'm not sure you could properly parse a CSV anyway.
You could use custom JScript or VBScript via CSCRIPT to preprocess the file with a regular expression search and replace in such a way that FOR /F could then parse the file. I have already written a hybrid JScript/batch regular expression processing utility called JREPL.BAT that works well for this.
A quoted CSV field can contain quote literals, in which case the quote liberals are doubled. The following regex would match any CSV token (not including the comma delimiter) ("(?:""|[^"])*"|[^,"]*)
. It looks for a quote followed by any number of non-quote characters and/or doubled quotes, followed by a closing quote or any number of characters not including quote or comma. But your CSV does not contain any doubled quote literals, so the regex can be simplified to ("[^"]*"|[^,"]*)
.
CSCRIPT has no mechanism to pass quote literals within arguments, so JREPL has an /XSEQ option to enable extended escape sequence support, including \q
to represent "
. The other option is to use the standard \x22
sequence. JREPL "(\q[^\q]*\q|[^,\q]*)," "$1;" /XSEQ /F "test.csv"
will match any token (possibly empty) followed by a comma delimiter, and preserve the token and replace the comma with a semicolon.
But that still leaves empty tokens, and FOR /F does not properly parse empty tokens. So I can throw a bit of JSCRIPT into the replacement term to remove any existing quotes, and then surround each token with quotes (except for the last one, where it isn't needed)
JREPL "(\q[^\q]*\q|[^,\q]*)," "$txt='\q'+$1.replace(/'\q'/,'')+'\q;'" /JQ /XSEQ /F "test.csv"
Here is a demonstration showing how it could be used to parse your CSV:
@echo off
for /f "tokens=1-11 delims=;" %%A in (
'JREPL "(\q[^\q]*\q|[^,\q]*)," "$txt='\x22'+$1.replace(/\x22/g,'')+'\x22;'" /JQ /XSEQ /F test.csv'
) do (
echo A=%%~A
echo B=%%~B
echo C=%%~C
echo D=%%~D
echo E=%%~E
echo F=%%~F
echo G=%%~G
echo H=%%~H
echo I=%%~I
echo J=%%~J
echo K=%%~K
echo(
)
--OUTPUT--
A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H="This
I="BLANK""
J=
K=BLANK
A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=
A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=
A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue
A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow
A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green
A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green
A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow
A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow
A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=
But I wouldn't use regluar expressions for this. There are other ways.
Pure native batch solution
Believe it or not, it is not that hard to use nothing but internal batch commands to manipulate each line such that a FOR /F can parse all the tokens.
There are two things that need to happen with your CSV:
1) Unquoted comma delimiters must be transformed into some other character that does not appear in your file, leaving the quoted commas alone. I can use a derivative of a technique that jeb developed to differentiate between quoted and unquoted characters: When variables are expanded with percent expansion, escaped characters like ^,
are treated differently depending on whether they are quoted or not. Typically ^,
becomes ,
, and "^,"
remains unchanged. But if you use CALL, then "^,"
becomes "^^,"
, and ^,
remains unchanged. Either way, it is then possible to differentiate between quoted and unquoted characters.
2) FOR /F cannot parse empty tokens, so empty tokens must be enclosed by quotes. It is easiest to simply enclose all tokens in quotes.
@echo off
setlocal enableDelayedExpansion
for /f "usebackq delims=" %%A in ("test.csv") do (
%= Print out the raw line so we can verify the end result =%
echo %%A
%= Preprocess the line so it is safe to parse =%
set "ln=%%A" %= Transfer line to environment variable =%
%= Artifact of CALL - Convert quoted , to ^^; and unquoted , to ^; =%
%= Make sure unquoted SET statement does not have any trailing characters =%
call set ln=%%ln:,=^^;%%
set "ln=!ln:^^;=,!" %= Convert quoted ^^; back into , =%
set "ln=!ln:^;=;!" %= Convert unquoted ^; to ; =%
set "ln=!ln:"=!" %= Strip all quotes so we can safely do next step =%
set "ln="!ln:;=";"!"" %= Enclose all tokens in quotes to protect empty tokens =%
%= The line is now ready to parse with another FOR /F =%
%= I simply print the value of all 11 tokens, 1 per line. =%
%= Adjust the loop as needed to suit your needs. =%
for /f "tokens=1-11 delims=;" %%A in ("!ln!") do (
for %%a in (A B C D E F G H I J K) do call :echoToken %%a
echo(
)
)
exit /b
:echoToken Char
for %%. in (.) do echo %1=%%~%1
exit /b
Here is the same code without all the comments:
@echo off
setlocal enableDelayedExpansion
for /f "usebackq delims=" %%A in ("test.csv") do (
echo %%A
set "ln=%%A"
call set ln=%%ln:,=^^;%%
set "ln=!ln:^^;=,!"
set "ln=!ln:^;=;!"
set "ln=!ln:"=!"
set "ln="!ln:;=";"!""
for /f "tokens=1-11 delims=;" %%A in ("!ln!") do (
for %%a in (A B C D E F G H I J K) do call :echoToken %%a
echo(
)
)
exit /b
:echoToken Char
for %%. in (.) do echo %1=%%~%1
exit /b
-- OUTPUT ---
A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H=This,BLANK
I=
J=BLANK
K=
A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=
A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=
A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue
A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow
A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green
A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green
A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow
,,,,,,7,Butter,Gold,GOLD,yellow
A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow
B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,
A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=
But there are many possible situations that can make parsing CSV much more complicated.
- If delayed expansion is enabled when a FOR variable is expanded, then all unescaped
!
will be corrupted, as will unescaped ^
if !
is present.
- The technique requires percent expansion. But that will fail if poison characters like
&
, |
, >
, <
, ^
are present unless they are quoted or escaped.
- You may not know the data, in which case you can't be sure of any character to use as a delimiter that doesn't appear in your data. So delimiter literals within quoted values must be encoded as something else, and then restored after you parse the tokens.
- Quoted CSV fields may include quote literals, which are doubled. The doubled quotes should be undoubled after parsing.
- CSV also allows newlines in quoted fields. I'm not aware of a pure batch solution that can solve this.
- A single FOR /F cannot parse more than 32 tokens in a line. See this DosTips thread for techniques to exceed this limit, especially the following posts within:
Here is a robust pure batch solution that can parse any CSV as long as there are no newlines within fields, and no line length approaches the 8191 byte batch limit, and you don't need to parse more than 31 tokens. The code is heavily commented in an effort to explain all the steps required.
@echo off
setlocal enableDelayedExpansion
:: Must use arcane FOR /F option syntax to disable both EOL and DELIMS.
for /f usebackq^ delims^=^ eol^= %%A in ("test2.csv") do call :processLine
:: I CALL out of the loop to a :subroutine because a single CALL :subroutine
:: is much faster than many CALL SET statements. It also simplifies the
:: management of delayed expansion.
exit /b
:processLine
:: Must disable delayed expansion so percent expansion does not corrupt ! or ^ literals.
setlocal disableDelayedExpansion
:: FOR variables are global - this extra FOR loop exposes %%A that would otherwise be hidden.
for %%. in (.) do set "ln=%%A"
:: Print out raw line so we can diagnose the result.
set ln
:: "Hide" quotes by doubling, making all characters safe for percent expansion when
:: entire string is quoted. Also enclose line within extra set of , delimiters.
set "ln=,%ln:"=""%,"
:: Escape poison characters so all characters are safe for unquoted percent expansion.
set "ln=%ln:^=^^^^%" %= Double escaped to account for enabled delayed expansion later on. =%
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
:: Double escape ! so not corrupted by later percent expansion while delayed expansion enabled.
set "ln=%ln:!=^^!%"
:: Double and escape all commas. , -> ^,^,
set "ln=%ln:,=^,^,%"
:: Undouble quotes and unescape (originally) unquoted strings. Note that outer quotes are escaped.
set ^"ln=%ln:""="%^"
:: At this point quoted comma literals are still ^,^, whereas unquoted comma delimiters are ,,
:: Also, all quoted poison characters are still escaped, but unquoted ones are not.
:: Redouble quotes, all characters safe again for quoted percent expansion.
set "ln=%ln:"=""%"
:: Encode @ as @a and quoted comma literals ^,^, as @c
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"
:: Restore delayed expansion and undouble quotes, which unescapes (originally) quoted strings.
:: Note that outer quotes are NOT escaped this time. The ENDLOCAL and SET are on the same
:: line so that the percent expansion value is transferred across the ENDLOCAL barrier.
endlocal & set "ln=%ln:""="%" ! %= Trailing ! is ignored except forces all ^^ to become ^ =%
:: At this point no characters are escaped, and all ! and ^ are unprotected against percent or
:: FOR variable expansion while delayed expansion is enabled.
:: Remove enclosing quotes from tokens that are already quoted so we can later safely enclose
:: all tokens in quotes. This is why the extra enclosing , were added at the beginning.
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"
:: Remove outer , delimiters that were added at the beginning.
set "ln=!ln:~2,-2!"
:: Must double escape ! and ^ again to protect against delayed expansion within parsing FOR /F loop.
set "ln=!ln:^=^^^^!"
set "ln=%ln:!=^^^!%"
:: Undouble remaining quotes because quote literals are doubled within original CSV.
set "ln=!ln:""="!"
:: Restore doubled ,, delimiters to , and enclose all tokens within quotes to preserves empty tokens.
set "ln="!ln:,,=","!"" !
:: The line is now safe to parse with FOR /F, though @ and , are encoded as @a and @c
:: Parse line into tokens.
for /f "tokens=1-11 delims=," %%A in ("!ln!") do (
%= Decode the tokens and store result in environment variables =%
for %%a in (A B C D E F G H I J K) do call :decodeToken %%a
%= Your processing goes here. Decoded %%A - %%K are now safely in !A! - !K! =%
%= I will simply echo all the values, one per line =%
for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
echo(
)
exit /b
:decodeToken Char
:: Converts @c and @a back into , and @
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
set "%1=!%1:@c=,!"
set "%1=!%1:@a=@!"
)
exit /b
Here is the same code without all the comments:
@echo off
setlocal enableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("test2.csv") do call :processLine
exit /b
:processLine
setlocal disableDelayedExpansion
for %%. in (.) do set "ln=%%A"
set ln
set "ln=,%ln:"=""%,"
set "ln=%ln:^=^^^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:!=^^!%"
set "ln=%ln:,=^,^,%"
set ^"ln=%ln:""="%^"
set "ln=%ln:"=""%"
set "ln=%ln:@=@a%"
set "ln=%ln:^,^,=@c%"
endlocal & set "ln=%ln:""="%" !
set "ln=!ln:,,"=,,!"
set "ln=!ln:",,=,,!"
set "ln=!ln:~2,-2!"
set "ln=!ln:^=^^^^!"
set "ln=%ln:!=^^^!%"
set "ln=!ln:""="!"
set "ln="!ln:,,=","!"" !
for /f "tokens=1-11 delims=," %%A in ("!ln!") do (
for %%a in (A B C D E F G H I J K) do call :decodeToken %%a
for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
echo(
)
exit /b
:decodeToken Char
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
set "%1=!%1:@c=,!"
set "%1=!%1:@a=@!"
)
exit /b
Here is your sample CSV file with an additional line added to test the various complications:
;A!,"B!","C is ""cool""",D @^&|<>,"E @^&|<>","F ,x","G ""@^&|<>""","H ""@^&|<>!""",I,J,K
A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
,,,,,,7,Butter,Gold,GOLD,yellow
B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,
And here is the final output:
ln=;A!,"B!","C is ""cool""",D @^&|<>,"E @^&|<>","F ,x","G ""@^&|<>""","H ""@^&|<>!""",I,J,K
A=;A!
B=B!
C=C is "cool"
D=D @^&|<>
E=E @^&|<>
F=F ,x
G=G "@^&|<>"
H=H "@^&|<>!"
I=I
J=J
K=K
ln=A,SCONE,Shen ring,SHEN_RING,"FLOUR, BUTTER","BRONZE,GOLD",blank,"This,""BLANK""",,BLANK,
A=A
B=SCONE
C=Shen ring
D=SHEN_RING
E=FLOUR, BUTTER
F=BRONZE,GOLD
G=blank
H=This,"BLANK"
I=
J=BLANK
K=
ln=A,STRAWBERRIES_AND_CREAM,Cat1,CAT1,"STRAWBERRY, CREAM","OBSIDIAN,GOLD2",FS,FreeSpin,,FREE_SPIN,
A=A
B=STRAWBERRIES_AND_CREAM
C=Cat1
D=CAT1
E=STRAWBERRY, CREAM
F=OBSIDIAN,GOLD2
G=FS
H=FreeSpin
I=
J=FREE_SPIN
K=
ln=A,WALNUT_TOFFEE,Pyramid,PYRAMID,"BUTTER, SUGAR, WALNUT","GOLD,EMERALD,PERIDOT",1,Champagne,Garnet,GARNET,
A=A
B=WALNUT_TOFFEE
C=Pyramid
D=PYRAMID
E=BUTTER, SUGAR, WALNUT
F=GOLD,EMERALD,PERIDOT
G=1
H=Champagne
I=Garnet
J=GARNET
K=
ln=A,RASPBERRY_AND_LIME_JELLY,Cuff bracelet,CUFF_BRACELET,"RASPBERRY, JELLY, LIME","ZIRCON,BRONZE2,TOPAZ",2,Cocoa,Lapis lazuli,LAPIS_LAZULI,Blue
A=A
B=RASPBERRY_AND_LIME_JELLY
C=Cuff bracelet
D=CUFF_BRACELET
E=RASPBERRY, JELLY, LIME
F=ZIRCON,BRONZE2,TOPAZ
G=2
H=Cocoa
I=Lapis lazuli
J=LAPIS_LAZULI
K=Blue
ln=A,CHOCOLATE_CHIP_COOKIES,Nekhbet,NEKHBET,"SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT","EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER",3,GoldLeaf,gold3,GOLD3,yellow
A=A
B=CHOCOLATE_CHIP_COOKIES
C=Nekhbet
D=NEKHBET
E=SUGAR, FLOUR, BUTTER, CHOCOLATE_CHIPS, SALT
F=EMERALD,BRONZE,GOLD,ALEXANDRITE,SILVER
G=3
H=GoldLeaf
I=gold3
J=GOLD3
K=yellow
ln=A,BUTTER_CREAM_CUP_CAKE,Sobek,SOBEK,"ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM","JADE,BRONZE,GOLD,GARNET2",4,Sugar,emerald,EMERALD,green
A=A
B=BUTTER_CREAM_CUP_CAKE
C=Sobek
D=SOBEK
E=ICING_SUGAR, FLOUR, BUTTER, BUTTERCREAM
F=JADE,BRONZE,GOLD,GARNET2
G=4
H=Sugar
I=emerald
J=EMERALD
K=green
ln=A,PEANUT_BUTTER_COOKIE,Sekhmet,SEKHMET,"PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER","GARNET1,BRONZE,AMAZONITE,EMERALD",5,IcingSugar,JADE,JADE,green
A=A
B=PEANUT_BUTTER_COOKIE
C=Sekhmet
D=SEKHMET
E=PEANUT_BUTTER, FLOUR, SUGAR, BAKING_POWDER
F=GARNET1,BRONZE,AMAZONITE,EMERALD
G=5
H=IcingSugar
I=JADE
J=JADE
K=green
ln=A,CHOCOLATE_MARSHMALLOWS,Osiris,OSIRIS,"MARSHMALLOW, CHOCOLATE_CHIPS","PLATINUM,ALEXANDRITE",6,Flour,Bronze,BRONZE,yellow
A=A
B=CHOCOLATE_MARSHMALLOWS
C=Osiris
D=OSIRIS
E=MARSHMALLOW, CHOCOLATE_CHIPS
F=PLATINUM,ALEXANDRITE
G=6
H=Flour
I=Bronze
J=BRONZE
K=yellow
ln=,,,,,,7,Butter,Gold,GOLD,yellow
A=
B=
C=
D=
E=
F=
G=7
H=Butter
I=Gold
J=GOLD
K=yellow
ln=B,BLUEBERRY_PIE,Ankh,ANKH,"BLUEBERRY, SUGAR, FLOUR, BUTTER","JADEITE,EMERALD,BRONZE,GOLD",8,ChocolateChips,Alexandrite,ALEXANDRITE,
A=B
B=BLUEBERRY_PIE
C=Ankh
D=ANKH
E=BLUEBERRY, SUGAR, FLOUR, BUTTER
F=JADEITE,EMERALD,BRONZE,GOLD
G=8
H=ChocolateChips
I=Alexandrite
J=ALEXANDRITE
K=
See This DosTips post for a demonstration of how to extend this technique to parse more than 32 fields.
Hybrid JScript/batch parseCSV.bat utility
The pure batch requires a lot of code that is difficult to create on the fly, and it is relatively slow. I have created parseCSV.bat - A hybrid JScript/batch utility that quickly formats nearly any CSV into something that can be easily parsed by FOR /F. It even supports newlines within fields.
Of course parseCSV cannot solve the 8191 line length limit, and parsing more than 32 tokens still takes additional code.
parseCSV.bat does not use regular expressions.
I won't go into details on how it works. Full documentation is built into the utility, which is available by entering parseCSV /?
from the command line. The output of the help follows:
parseCSV [/option]...
Parse stdin as CSV and write it to stdout in a way that can be safely
parsed by FOR /F. All columns will be enclosed by quotes so that empty
columns may be preserved. It also supports delimiters, newlines, and
escaped quotes within quoted values. Two consecutive quotes within a
quoted value are converted into one quote by default.
Available options:
/I:string = Input delimiter. Default is a comma (,)
/O:string = Output delimiter. Default is a comma (,)
The entire option must be quoted if specifying poison character
or whitespace literals as a delimiters for /I or /O.
Examples: pipe = "/I:|"
space = "/I: "
Standard JScript escape sequences can also be used.
Examples: tab = /I:\t or /I:\x09
backslash = /I:\\
/E = Encode output delimiter literal within value as \D
Encode newline within value as \N
Encode backslash within value as \S
/D = escape exclamation point and caret for Delayed expansion
! becomes ^!
^ becomes ^^
/L = treat all input quotes as quote Literals
/Q:QuoteOutputFormat
Controls output of Quotes, where QuoteOutputFormat may be any
one of the following:
L = all columns quoted, quote Literals output as " (Default)
E = all columns quoted, quote literals Escaped as ""
N = No columns quoted, quote literals output as "
The /Q:E and /Q:N options are useful for transforming data for
purposes other than parsing by FOR /F
/U = Write unix style lines with newline (\n) instead of the default
Windows style of carriage return and linefeed (\r\n).
parseCSV /?
Display this help
parseCSV /V
Display the version of parseCSV.bat
parseCSV.bat was written by Dave Benham. Updates are available at the original
posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
Here is how parseCSV.bat could be used with the test2.csv from above.
@echo off
setlocal enableDelayedExpansion
for /f "tokens=1-11 delims=," %%A in (
'parseCSV /E /D ^<test2.csv'
) do (
%= Decode Tokens =%
for %%a in (A B C D E F G H I J K) do call :decodeToken %%a
%= Show the results =%
for %%a in (A B C D E F G H I J K) do echo %%a=!%%a!
echo(
)
exit /b
:decodeToken
for %%. in (.) do set "%1=%%~%1" !
if defined %1 (
set "%1=!%1:\D=,!"
set "%1=!%1:\S=\!"
)
exit /b
See This DosTips post for a demonstration how to extend this technique to parse more than 32 fields.