1

This is extension to another question (Loop through CSV file with batch - Space issue)

I have csv file content like this

name,sex,age,description,date
venu,m,16,test mesg,2012-05-01
test,f,22,"He is good guy
and
brilliant",2012-05-01

I am looping this file using this command.

For /F "usebackq tokens=1-3 delims=" %%x in (test.csv) Do (

But since there is line break in second row, I am getting 3 records even though there are two records in the file.

How to fix this? Thanks in advance.

Community
  • 1
  • 1
Venu
  • 7,243
  • 4
  • 39
  • 54
  • 2
    I think you're touching the limits of batch scripting... – marapet May 23 '12 at 07:18
  • hmm may be. I know we can easily do this with other languages but I don't have choice :( – Venu May 23 '12 at 07:25
  • As @marapet said, in batch this is a bit tricky :-) Do you know if the line endings are different? Like _normal_ lines end with `` and your _multilines_ end with only a single ``? Then it's nearly easy to solve with `SET/P` – jeb May 23 '12 at 08:14
  • I am using windows xp. I am not sure how to know that, I am assuming it would be /r/n. How would I know that? – Venu May 23 '12 at 08:25
  • ok I found using notepad++ in extended search mode. Unfortunately, even multilines end with \r\n – Venu May 23 '12 at 08:42
  • Then you need a bit more advanced technic to distinguish between multiline and normal line endings by the odd/even count of quotes. Btw. How are quotes decode in your description field? – jeb May 23 '12 at 09:37
  • Is Powershell a choice? You sure can use it in XP and have greater flexibility to get job done still using command line jobs. – AvkashChauhan May 24 '12 at 01:32

2 Answers2

2

The main problem seems to be to count the quotes in a line.
If the count of quotes is odd then you need to append the next line and count again the quotes.

Counting of characters in a string is a bit tricky, if you won't iterate through all charachters.
I used here the delayed reduction technic, each quote will be effectivly replaced by a +1 and all other characters are removed.
To begin and terminate the line in a proper way there is always one extra +1 at the beginning, which will be compensated by a -1 in front.

The main trick is to replace the complete text from one quote to the next with exactly one +1 by replacing each quote with !!#:#=.
This works as !#:#=...<some text>...! will always be expanded to +1, as the content of the variable # is +1 and so the search pattern # can't be found.
The other replacements are only necessary to avoid problems with exclamation marks and carets int the text.

:::::::::::::::::::::::::::::::::::::::::::
:CountQuotes <stringVar> <result>
setlocal EnableDelayedExpansion
set "line=!%~1!"
set "#=+1"

rem DelayedExpansion: double all quotes
set "line=!line:"=""!"

rem DelayedExpansion: remove all carets ^
set "line=!line:^=!"

rem PercentExpansion: Remove all !
set "line=%line:!=%"

rem PercentExpansion: Replace double quotes to !!#:#=
set "line=-1^!#:#=%line:""=^!^!#:#=%"

for /F "delims=" %%X in ("!line!") do (
    set /a count=%%X!
)

(
    endlocal
    set %~2=%count%
    exit /b
)

And the logic for appending lines and inserting linefeeds

@echo off
setlocal DisableDelayedExpansion
set "lastLine="
set LF=^


rem Two empty lines
for /F "delims=" %%A in (test.csv) do (
    set "line=%%A"
    setlocal EnableDelayedExpansion
    set "line=!line:\=\x!"
    if defined lastLine (
        set "line=!lastLine!\n!line!"
    )

    call :CountQuotes line quoteCnt
    set /a rest=quoteCnt %% 2
    if !rest! == 0 (
        for %%L in ("!LF!") DO set "line=!line:\n=%%~L!"
        set "line=!line:\\=\!"
        echo Complete Row: !Line!
        echo(
        set "lastLine="
    ) ELSE (
        set "lastLine=!line!"
    )

    for /F "delims=" %%X in (""!lastLine!"") DO (
        endlocal
        set "lastLine=%%~X"
    )
)
exit /b

:::::::::::::::::::::::::::::::::::::::::::
:CountQuotes <stringVar> <result>
setlocal EnableDelayedExpansion
set "line=!%~1!"
set "#=+1"

rem DelayedExpansion: double all quotes
set "line=!line:"=""!"

rem DelayedExpansion: remove all carets ^
set "line=!line:^=!"

rem PercentExpansion: Remove all !
set "line=%line:!=%"

rem PercentExpansion: Replace double quotes to !!#:#=
set "line=-1^!#:#=%line:""=^!^!#:#=%"

for /F "delims=" %%X in ("!line!") do (
    set /a count=%%X!
)

(
    endlocal
    set %~2=%count%
    exit /b
)
jeb
  • 78,592
  • 17
  • 171
  • 225
  • +1 for your effort. I had given up and used php. I will check this and will accept it. thanks once again. – Venu Jun 05 '12 at 14:26
1

The Batch file below do what you want:

@echo Off
setlocal EnableDelayedExpansion

call :processFile < test.csv
goto :EOF


:processFile
   set line=
   set /P line=
   if not defined line exit /b
   set "line=!line:,,=,@,!"
   for %%a in (name sex age description mydate) do set %%a=
   for %%a in (!line!) do (
      if not defined name (
         set "name=%%a"
      ) else if not defined sex (
         set "sex=%%a"
      ) else if not defined age (
         set "age=%%a"
      ) else if not defined description (
         set "description=%%a"
      ) else if not defined mydate (
         set "mydate=%%a"
      )
   )
   :checkDate
   if defined mydate goto show
      set /P line=
      for /F "tokens=1* delims=," %%a in ("!line!") do (
         set "description=!description! %%a"
         set "mydate=%%b"
      )
   goto checkDate
   :show
   for %%a in (name sex age description mydate) do set /P "=%%a=!%%a!, " < NUL
   echo/
goto processFile

I added the requirements you requested in your previous topic, that is, the sex may be empty (and is changed by @ character as I explained in my answer to that topic), and the name may include commas. I tested the program with this data file:

name,sex,age,description,date
venu,m,16,"test mesg",2012-05-01
test,,22,"He is good guy
and
brilliant",2012-05-01
"venu,gopal",m,16,"Another
multi-line
description",2012-05-02

And get these results:

name=name, sex=sex, age=age, description=description, mydate=date,
name=venu, sex=m, age=16, description="test mesg", mydate=2012-05-01,
name=test, sex=@, age=22, description="He is good guy and brilliant", mydate=2012-05-01,
name="venu,gopal", sex=m, age=16, description="Another multi-line description", mydate=2012-05-02,

Note that any field that contain commas or spaces must be enclosed in quotes.

Aacini
  • 65,180
  • 12
  • 72
  • 108
  • But this fails for lines like `,m,22,"This description,,fails?",`, as the first and last field are not allowed to be empty. And not all characters are allowed (like `?*@` and double `,,` in description – jeb May 24 '12 at 04:40
  • And also there is no double quote around description. It has only when there is a comma in the description field. – Venu May 24 '12 at 05:41