Regular expressions are wonderful things :) You can use a hybrid JScript/batch utility called REPL.BAT that performs a regex search and replace on stdin and writes the result to stdout.
Assuming REPL.BAT is in your current directory, or better yet, somewhere whithin you path, then the following one liner works directly from the command line:
repl "^(?:[^,]*,){3}(\q[^\q]*\q).*$" "$1\n" mx <input.csv >output.txt
Please note, the above solution assumes every record contains at least 4 columns (3 commas). It also assumes the first 3 columns do not contain any commas in the value, the 4th column must be quoted and must not contain escaped quote ""
, and any column after the 4th does not contain a new line in the value. All hell may break loose if these conditions are not met.
Using a much more complicated regex, I have a solution that should extract the 4th column of any valid csv file. It properly handles both quoted and unquoted values. Quoted values may contain commas, escaped quotes, and/or new lines. The 4th column output will always be quoted, even if the source is not. The only limitation is every record must contain at least 4 columns.
repl "^(?:(?:[^,\q\n]*|\q(?:[^\q]|\q\q)*\q),){3}(?:\q((?:[^\q]|\q\q)*)\q|([^,\q\n]*))(?:,(?:[^,\q\n]*|\q(?:[^\q]|\q\q)*\q))*$" "\q$1$2\q" mx <input.csv >output.txt
Here is a nasty test case csv file:
1,"50,
""ignore""
x","1","Warning! Q1: Value too high [W-0001]",extra," hello,""world""
more!","part A
1,2,3","I don't want this",<last column of record 1>
"2","49","1","Warning! Q1: Value too low [W-0002]",<last column of record 2>
"3","48","1","Warning! Q1: Value changing too fast.
Check for endless loop[W-0003]","<last column of record 3>"
1,2,3,"I want this part 1
and ""this"" part 2",<last column of record 4>
a,b,c,I want this unquoted value,<last column of record 5>
And here is the output:
"Warning! Q1: Value too high [W-0001]"
"Warning! Q1: Value too low [W-0002]"
"Warning! Q1: Value changing too fast.
Check for endless loop[W-0003]"
"I want this part 1
and ""this"" part 2"
"I want this unquoted value"
Use of batch variables helps to document the logic of the regex, and makes it easy to quickly capture any arbitrary set of columns. Here is the same solution using variables:
@echo off
setlocal
:: define a regex that matches a non-captured column value
set "ignore=(?:[^,\q\n]*|\q(?:[^\q]|\q\q)*\q)"
:: define a regex that matches a captured column value
:: quoted value is in $1, unquoted value is in $2
set "capture=(?:\q((?:[^\q]|\q\q)*)\q|([^,\q\n]*))"
call repl "^(?:%ignore%,){3}%capture%(?:,%ignore%)*$" "\q$1$2\q" mx <input.csv >output.txt
type output.txt
And here is a demonstration of a final CALL that captures the 1st and 4th columns
call repl "^%capture%,%ignore%,%ignore%,%capture%(?:,%ignore%)*$" "\q$1$2\q,\q$3$4\q" mx <input.csv >output.csv