2

Very sorry if this has been asked and answered already. I have looked through similarly-titled questions without finding one that seemed applicable to me. Also, this is my first question posted on Stack Overflow.

I have a csv file called sortcriteria.csv that currently consists of 12 columns and 5 rows.

Here is the csv file's content:

Private,ReadWriteInc,TheArts,Preschool,Transition,Staff,Music,Language,MultiYearOrGroups,MiddleYears,PhysicalActivity,StudentLeaders
PLT,RWI,Art,Pre,Tran,Staff,Uke,Language,Group,MiddleYears,PhysicalActivity,StudentLeaders
Personal,",",",",",Ukulele,Japanese,",",PE,"
Private,",",",",",Guitar,German,",",","
11 ,",",",",",Music,",",",","

Here is a picture of the csv in Excel: picture of csv in Excel

If I were to run this through a for /f loop with the delimiter set to comma, I could get 12 tokens, one for each column. For example, the following code would generate 12 variables:

for /f "delims=, tokens=1-12" %%B in (C:\GAMWork\sortcriteria.csv) do ...

For sortcriteria.csv in its current form, this for loop would iterate 5 times (once for each line) with each time generating 12 variables (%%B through to %%M). It will do this because I have told it that there are 12 columns, or tokens, with "tokens=1-12".

However, the csv file may not always have 12 columns. Columns may be added or subtracted in the future. How can I get the batch file to first count the number of columns in the csv file and then use that number to determine how many tokens the for /f loop looks for?

I am hoping for something like the following:

<some code block that counts the number of columns in the csv file>

set /a tknnum=<Result from code block above>

for /f "delims=, tokens=1-%tknnum%" %%B in (C:\GAMWork\sortcriteria.csv) do ...

I'm especially hoping that I can set the number of columns to the tknnum variable, as I also need to use it elsewhere in the batch file for another purpose.

I hope this explanation is clear and understandable.

I do not have any examples of things that I have tried, as I don't really know where to start to attempt a solution. I cannot think of a way to process the csv file using for /f that will count the number of columns. I have read the documentation for the for command from Microsoft's website, as well as from ss64.com. As mentioned, I have also searched other question here on Stack Overflow with similar titles. Of those I could understand, none seemed applicable in my context. The closest I could find is this one: https://stackoverflow.com/a/72903742/, however I got lost trying to understand @Magoo's answer.

EDIT: I realise that using " as a filler for blank cells is a bad idea. Sorry.

  • Important question: Is it possible in your case, that a column is empty? Then it's much more tricky to solve. And what are these columns with one double quote, that looks invalid. Or shall this a comma value for a column? `","` – jeb May 18 '23 at 07:56
  • @jeb I have tried to use a filler character to ensure that there a no empty cells. In this case, I used the solo character " I realise that this is a bad idea, and will look for another, less-troublesome character to use as a filler character for blank cells. – Joshua Howard May 18 '23 at 08:03
  • You should use two double quotes, that is the standard way. Is it possible, that a comma is in the content like `"Ukulele, Guitar"`? Without, it's simple to count, only by measure the length of a line with and without commas – jeb May 18 '23 at 08:06
  • I have added a picture of what the csv file looks like when opened in Excel as I believe it helps clarify the exact content of the csv file. No, there are no commas in the actual data contained in the csv. – Joshua Howard May 18 '23 at 08:09
  • You shouldn't need the quotes in empty fields, just export to CSV with an option like *export all fields in quotes* or something like that (sorry I don't use windows/excel) – jeb May 18 '23 at 08:14
  • Or you can just modify the lines while reading, see [Batch Sctiping Tokens](https://stackoverflow.com/a/13370729/463115) or [Adding quotes to all columns](https://stackoverflow.com/a/42674917/463115) – jeb May 18 '23 at 08:22
  • Sorry, only just saw your comment. I initially had left the empty fields blank, but it was causing issues down the line and the easiest solution for me was to ensure no fields were left blank. – Joshua Howard May 18 '23 at 08:51

3 Answers3

1

Build the difference of the length of a line with and without the commas.

setlocal EnableDelayedExpansion
set /p firstLine= < file.csv
call :strlen firstLine totalLength
set "reduced=!firstLine:,=!"
call :strlen firstLine LengthWithoutCommas

set /a tknnum=totalLength-LengthWithoutCommas+1
echo %tknum%
exit /b

:strlen <stringVar> <resultVar>
(   
    setlocal EnableDelayedExpansion
    (set^ tmp=!%~1!)
    if defined tmp (
        set "len=1"
        for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
            if "!tmp:~%%P,1!" NEQ "" ( 
                set /a "len+=%%P"
                set "tmp=!tmp:~%%P!"
            )
        )
    ) ELSE (
        set len=0
    )
)
( 
    endlocal
    set "%~2=%len%"
    exit /b
)

Link to batch string length function
macro batch string length function

But as @Stephan already mentioned, probably the count doesn't help very much.

This is a solution that builds an array for each line

@echo off
setlocal EnableDelayedExpansion
(set LF=^
%=empty=%
)


set lineNumber=0

for /f "delims=" %%B in (%filename%) DO (
  set /a lineNumber+=1
  set count=0
  set "CSV=%%B"
  echo [!lineNumber!]: !CSV!
  for %%L in ("!LF!") do set "CSV=!CSV:,=%%~L!"
  FOR /F "tokens=* delims=" %%a in ("!CSV!") DO (
    set /a count+=1
    echo   Column[!count!]: "%%a"
    set "content[!line!][!column!]=%%a"
  )
)
jeb
  • 78,592
  • 17
  • 171
  • 225
  • It seems you've called a label :strlen, but I don't see it declared? Or have I misunderstood? – Joshua Howard May 18 '23 at 08:14
  • 1
    @JoshuaHoward I added a link – jeb May 18 '23 at 08:17
  • :strlen looks like a nice solution, but when I copy and paste code from the links you've posted, I'm running into syntax errors. Is it possible for you to add the code to the code block in your answer above? Thanks :) – Joshua Howard May 18 '23 at 08:48
  • 1
    @JoshuaHoward I added and modified the code, the problem was that the linked function uses a different order of arguments ` ` are interchanged – jeb May 18 '23 at 08:55
  • Thanks @jeb for this. What you have provided is dynamic and powerful and I will certainly add it to my knowledge bank. I have marked Stephan's answer as accepted because it accomplishes what I need for this specific task much more simply. I appreciate your help, though. – Joshua Howard May 18 '23 at 23:02
1

To answer your question so far:
get the number of columns (use the first line, which is the header) by just counting the elements (Attention: this uses standard delimiters (not changeable), so if there should be spaces in a column, the value has to be quoted (like "Student Leaders".

@echo off
<sortcriteria.csv set /p header=
set columns=0
for %%a in (%header%) do set /a columns+=1
echo Your file has %columns% columns.
for /f "tokens=1-%columns% delims=," %%a in (sortcriteria.csv) do ...

Still open question: you still don't know which letter is your last token. There might be a better solution. What exactly do you want to do?

Stephan
  • 53,940
  • 10
  • 58
  • 91
  • Being a csv, my source file uses commas as delimiters, rather than the standard SPACE or TAB. Even though there are no spaces in the actual data, there are also no spaces or tabs anywhere in the first line. – Joshua Howard May 18 '23 at 08:44
  • 1
    It's the `for` used for counting the columns, which uses standard delimiters (including spaces, tabs and commas). That has nothing to do with the CSV itself. – Stephan May 18 '23 at 09:40
  • @jeb: you're right. I must have had a typo when testing. Changed. – Stephan May 18 '23 at 09:43
  • I have come back to look at this with fresh eyes after some sleep and see what you mean now. Thanks for this answer, it does exactly what I needed and actually helps me tidy up my code further down the original file. – Joshua Howard May 18 '23 at 23:06
1

This is another, perhaps simpler method:

@echo off

set /P "header=" < sortcriteria.csv
set tokens=1 & set "col=%header:,=" & set /A tokens+=1 & set "col=%"
echo Tokens=%tokens%

This method take as delimiter precisely the specified character (comma in this case, the same as for /F command) so you could insert spaces or tabs in the columns.

If you don't know how many columns have your file, you don't know which letter tokens use. The program below transfer the letter-tokens into array elements, so you can use any array element via its subscript and/or limit the process of elements to the ones with subscript no greater than tokens:

@echo off
setlocal EnableDelayedExpansion

rem Count the tokens in file
set /P "header=" < sortcriteria.csv
set tokens=1 & set "col=%header:,=" & set /A tokens+=1 & set "col=%"
echo Tokens=%tokens%

rem Define the conversion from tokens to array elements
set "letter= abcdefghijklmnopqrstuvwxyz"
set "TokensToArray="
for /L %%i in (1,1,%tokens%) do (
   set "TokensToArray=!TokensToArray!& set "array[%%i]=%%!letter:~%%i,1!" "
)
set "TokensToArray=!TokensToArray:~1!"

rem Process the file
for /F "tokens=1-%tokens% delims=," %%a in (sortcriteria.csv) do (
   %TokensToArray%
   echo -------------------------
   echo Column 1 = "!array[1]!"
   echo Column 7 = "!array[7]!"
)

Output:

Tokens=12
-------------------------
Column 1 = "Private"
Column 7 = "Music"
-------------------------
Column 1 = "PLT"
Column 7 = "Uke"
-------------------------
Column 1 = "Personal"
Column 7 = "Ukulele"
-------------------------
Column 1 = "Private"
Column 7 = "Guitar"
-------------------------
Column 1 = "11 "
Column 7 = "Music"

PS - I reviewed your picture of csv in Excel and it seems to me that your quote character is intended to mean: "Copy the value of field above". If so, you can get the same result if you modify the for /L %%i that define the conversion to this:

set ""=quot"
for /L %%i in (1,1,%tokens%) do (
   set "TokensToArray=!TokensToArray!& (if ^!%%!letter:~%%i,1!^! neq quot set "array[%%i]=%%!letter:~%%i,1!") "
)

Output example with columns 4, 8 and 11:

Tokens=12
-------------------------
Column 4  = "Preschool"
Column 8  = "Language"
Column 11 = "PhysicalActivity"
-------------------------
Column 4  = "Pre"
Column 8  = "Language"
Column 11 = "PhysicalActivity"
-------------------------
Column 4  = "Pre"
Column 8  = "Japanese"
Column 11 = "PE"
-------------------------
Column 4  = "Pre"
Column 8  = "German"
Column 11 = "PE"
-------------------------
Column 4  = "Pre"
Column 8  = "German"
Column 11 = "PE"
Aacini
  • 65,180
  • 12
  • 72
  • 108