-1

I get a record, (line), from a file and need to output the number of fields, (columns), it contains.

So for example, (includes empty fields):

a,b,,d,,,f
-----------
= 7 columns 

I thought about counting the number of field separators, (commas), in the record but field data may also contain commas.

Would regular-expressions be appropriate for this task?

aschipfl
  • 33,626
  • 12
  • 54
  • 99
an007
  • 111
  • 1
  • 9

4 Answers4

2

Given that the CSV cells contain only printable characters, and quotation marks " are used to enclose text containing ,, you could do the following steps:

  • read a line
  • remove problematic wildcard characters (to not disturb for later)
  • remove standard token separators except , (to let for only separate at unquoted , later)
  • enclose each field in quotation marks
  • loop over fields (using for) and count them

The following code could be used for that (the CSV data is expected to reside in the file that is given as the first command line argument):

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Gather TAB character:
for /F "delims=" %%C in ('forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo/0x09"') do set "TAB=%%C"

rem // Read the CSV file line by line:
for /F usebackq^ delims^=^ eol^= %%L in ("%~1") do (
    rem // Store current line string:
    set "LINE=%%L"
    rem // Toggle delayed expansion to avoid trouble with `!`:
    setlocal EnableDelayedExpansion
    rem // Remove problematic wildcard characters `?`, `<`, `>`:
    set "TEST=!LINE:?=!" & set "TEST=!TEST:<=!" & set "TEST=!TEST:>=!"
    rem // Remove standard token separators SPACE, TAB, `;`, but not`,`:
    set "TEST=!TEST: =!" & set "TEST=!TEST:%TAB%=!" & set "TEST=!TEST:;=!"
    rem // Remove special characters `!`, `^`, `*`, `=`:
    call :REMOVE TEST TEST
    rem // Enclose all fields in quotation marks, loop over them and count them:
    set /A "COUNT=0" & for %%I in ("!TEST:,=","!") do set /A "COUNT+=1"
    rem // Return count of fields and line:
    echo !COUNT!: !LINE!
    endlocal
)

endlocal
exit /B


:REMOVE  <ref_output_string>  <ref_input_string>
    setlocal DisableDelayedExpansion
    set "#RET=%~1" & if not defined #RET endlocal & exit /B 2
    set "#STR=%~2" & if not defined #STR set "#STR=%#RET%"
    set "RPL=!^*="
    setlocal EnableDelayedExpansion
    set "BUF=_" & set "TST=!%#STR%!" & set "WRK=!TST!_"
:REMOVE_LOOP
    if not defined TST set "BUF=!BUF:~1,-1!" & goto :REMOVE_NEXT
    for /F "tokens=1* delims=%RPL% eol=%RPL:~,1%" %%S in ("!BUF!!WRK!") do (
        endlocal & set "BUF=%%S" & set "TST=%%T" & set "WRK=%%T" & setlocal EnableDelayedExpansion
    )
    goto :REMOVE_LOOP
:REMOVE_NEXT
    for /F "delims=" %%S in (^""!BUF!"^") do endlocal & endlocal & set "%#RET%=%%~S"
    exit /B

Example input data:

unquoted,"quoted",unquoted space,"quoted space","quoted,comma",unquoted;&|!^,"quoted;&|!^",(unquoted parens),"(quoted parens)",,next empty,,asterisk*,equal=to

Example output text:

14: unquoted,"quoted",unquoted space,"quoted space","quoted,comma",unquoted;&|!^,"quoted;&|!^",(unquoted parens),"(quoted parens)",,next empty,,asterisk*,equal=to
aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • I didn't test, but I'm pretty sure your "enclose each field in quotation marks" step fails. Already quoted fields that contain `,` will be split into multiple fields. – dbenham Nov 25 '19 at 20:15
  • 1
    @dbenham, I also didn't test so far... the idea was that *every* `,` becomes replaced by `","`, and then the whole string becomes enclosed within `""`, so you basically get all fields enclosed in `""`; fields that contain `,` on their own must be quoted, so the result is `""abc","def""`, for instance, hence the comma in the middle is quoted and thus protected from `for`; for this to work I removed every other (printable) standard token separator... – aschipfl Nov 25 '19 at 21:40
  • Simple test case `"a,b"` should evaluate to 1 field. But that becomes `"a","b"` which parses as 2 tokens. So I don't see how that can work. – dbenham Nov 25 '19 at 23:12
  • 1
    No, @dbenham, `"a,b"` becomes `""a","b""`, which is one field as expected... – aschipfl Nov 25 '19 at 23:32
  • Ugh - Sorry I was so dense. I should have simply tested. Of course you are correct. – dbenham Nov 26 '19 at 04:52
  • The script can now even handle CSV data containing `*` and `=`... – aschipfl Nov 26 '19 at 09:29
2

You need the length of the [string] and the length of the [string without commas]. Here is a simple implementation for a start:

@echo off
set "line=a,b,,d,,,f"
>one.tmp echo %line%
>two.tmp echo %line:,=%
for %%a in (one.tmp) do set one=%%~za
for %%a in (two.tmp) do set two=%%~za
set /a commas=one-two
echo %commas% commas

Look here for alternatives to get the length of a string without temporary files.

Edit it seems I didn't check your concerns about commas inside a value last night. A simple for loop takes care of that:

@echo off
setlocal
set "line=1,2,,"4,0",5"
echo original line: %line%
set cols=0
for %%a in (%line:,=X,X%) do set /a cols+=1
echo counted columns: %cols%
Stephan
  • 53,940
  • 10
  • 58
  • 91
  • Stephan, please note that I have edited the question to include a statement about empty fields in the example record. I had previously thought it unnecessary to mention, _(that the example was sufficient)_, but given your latest edit, I have clarified things; Sorry! – Compo Nov 23 '19 at 12:04
  • I guess the coffee wasn't strong enough this morning...Simple change. See my edit. – Stephan Nov 23 '19 at 17:15
  • 1
    Nope, still potential problems if any values contain `*` and/or `?` wildcards – dbenham Nov 25 '19 at 00:27
  • 1
    Also problems if unquoted values contain `&`, `<`, `>` or `|` – dbenham Nov 25 '19 at 00:30
  • 1
    @dbenham also `=` or even spaces. I orientated my code according to the example in the question. – Stephan Nov 25 '19 at 07:36
  • thnx for that. could u plz explain what (%line:,=X,X%) means? Im totally new to batch scripts so I dont understand it. – an007 Nov 25 '19 at 12:06
  • also. I guess I should have given a better example but basically, the cells can contain anything. Any sort of symbols, commas, speechmarks etc. Would this be able to deal with those fine? – an007 Nov 25 '19 at 12:07
  • `cmd` has no idea about `csv`. We can only handle text. So it's difficult to work with `csv` "correctly" in all cases. That means a lot of hard-to-understand and error-prone code. If PowerShell is an option, I strongly suggest using that instead. – Stephan Nov 25 '19 at 12:33
2

Assuming you already have a representative line in a variable, then the following pure batch can reliably determine the number of fields, provided that no field contains a newline character. The Microsoft spec for CSV allows for newlines in fields, but they are rare, and the issue can probably be disregarded.

The code allows for any other character in a field, and accounts for quoted comma literals in fields, as well as doubled quotes representing a quote literal.

The algorithm is a derivative of a technique used by jeb to properly parse paths within the PATH variable. In that case the ; is a delimiter, but quoted paths can contain ; literals.

@echo off
setlocal
set "line=,,<&^|>!,,1,2,,,"4,^<^&^^^|^>!0",5,"a,""b"",c",,"
set line

setlocal enableDelayedExpansion

:: Remove all poison characters
if defined line set "line=!line:^=!"
if defined line set "line=!line:<=!"
if defined line set "line=!line:>=!"
if defined line set "line=!line:|=!"
if defined line set "line=!line:&=!"

:: Remove all !
if defined line set "line=%line:!=%"

:: Convert all true , delimiters to ^, - Note the enclosing quotes cause delimiters to be quoted
:: The , in values are also converted, but they are no longer quoted so they revert back to ,
if defined line set "line=%line:,=^,%"

:: Convert ^, delimiters into newline
for %%N in (^"^
%= This creates a quoted newline character =%
^") do if defined line set "line=!line:^,=%%~N!"

:: Count the number of lines in string and save result
setlocal disableDelayedExpansion
for /f %%N in ('cmd /v on /c echo(!line!^|find /c /v ""') do set "cnt=%%N"

echo %cnt% fields

-- OUTPUT --

line=,,<&^|>!,,1,2,,,"4,<&^|>!0",5,"a,""b"",c",,
13 fields
dbenham
  • 127,446
  • 28
  • 251
  • 390
-2

Use awk script like this:

echo "a,b,,d,,,f" | awk -F"," '{print NF}'
Dudi Boy
  • 4,551
  • 1
  • 15
  • 30