1

I have a batch file I created to extract only the columns I need from a csv file. recently, they added 16 more columns and it broke my script essentially.

My script is as follows.

@echo off
setlocal

del /f /q "output.csv"
REM The Distinguished Name of the container the new users will be placed in.

set filename="I9D14B20.csv"

for /F "skip=1 tokens=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38 delims=," %%A in (I9D14B20.csv) do (
 echo %%M,%%N,%%I,%%H,%%K,%%AI>>output.csv
)
type output.csv

endlocal
pause

When it had 22 columns it worked fine, but now that they added a bunch more, I need to pull column 35 out (aka AI)

It "breaks" at the second letter. I also tried %%35 but it only pulls column 3.

I know this has to be easier than I am making it, but I am pretty limited in my knowledge of .bat scripting.

2 Answers2

0

for uses letters as its variables, either upper- or lower-case ones. So you can only parse up to 26 tokens by a single for command.
Therefore, you need to split your for loop into two:

for /F "skip=1 tokens=1-25* delims=," %%A in (I9D14B20.csv) do (
    for /F "tokens=1-13 delims=," %%a in ("%%Z") do (
        echo %%M,%%N,%%I,%%H,%%K,%%j>>output.csv
    )
)

What happens here, is that the outer for loop splits the current line into 26 tokens where %%A up to %%Y are columns 1 to 25 and the last one %%Z is all the rest, so column 26, 27, 28 up to 38, including the delimiters; the inner for loop splits the remaining line (that is, the string "%%Z") into tokens again, so that token 1 is column 26, token 2 is column 27, and so on.
With this trick you can split lines with up to 26 + 25 = 51 columns in total.

aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • 2
    Actually `for` accepts even some special characters like `#` as variable, but this is completely undocumented; therefore I recommend not using such variables. – aschipfl Jul 27 '15 at 18:07
  • 1
    Also, a single FOR /F can parse up to 31 tokens. – dbenham Jul 27 '15 at 18:40
0

First off, you don't need to explicitly specify each token. For example, "delims=1,2,3,4,5" could be written as "delims=1-5". And the requested tokens do not need to be consecutive. For example, "delims=1,3,7,15" is perfectly fine.

Secondly, FOR variables are always a single character. You cannot use a variable like %AI.

But you have a more significant problem - a single FOR /F cannot parse more than 31 tokens, as explained at https://stackoverflow.com/a/8520993/1012053. In order to get the 35th token, FOR /F must parse all the prior tokens as well, even if you don't request all the intermediate values. That exceeds the maximum number of parsed tokens. The solution is to use multiple FOR /F statements.

One other optimiztion: It is more efficient to enclose the entire loop in parentheses and redirect only once. The code runs faster, and you don't have to clear the output file first.

@echo off
>output.csv (
  for /f "skip=1 tokens=8,9,11,13,14,30* delims=," %%A in (I9D14B20.csv) do (
    for /f "tokens=5 delims=," %%H in ("%%G") do echo %%D,%%E,%%B,%%A,%%C,%%H
  )
)
type output.csv
pause

Another option is to use my JREPL.BAT regular expression text processing utility. JREPL.BAT is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward. Full documentation is embedded within the script.

The solution becomes a one liner using JREPL, and it is significantly faster than any pure batch solution (except for really small files).

@call jrepl "^(?:.*?,){7}(.*?),(.*?),.*?,(.*?),.*?,(.*?),(.*?),(?:.*?,){20}(.*?),.*" "$4,$5,$2,$1,$3,$6" /a /jbegln "skip=(ln==1)" /f "test.txt" /o "output.csv"
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390