0

I have next csv file with data like xml

  "<Person>
      <Name> ""Test"" </Name> <Surname>""Test1""</Surname>
   </Person>
   <Person>
     <Name>""TestA""</Name>  <Surname>""""</Surname>
   </Person>"

I want to replace """" with "" and replace "" with ". I found next bat file with code witch works in way that remove all quotes and spaces. But I dont now how to modify code to replace only specific quotes.

  @echo off
  setlocal EnableDelayedExpansion

  set FileIn=C:\Users\PC\Documents\test.csv
  set FileOut=C:\Users\PC\Documents\TestNew.csv

    (
      For /F "usebackq tokens=*" %%A in ("%FileIn%") do (
      set Line = %%A
      set Line=!Line:"=!
      Echo.!Line!
    )
    )
  > "%FileOut%"

Can anyone help me to resolve this problem? To get csv like this:

  <Person>
   <Name> "Test" </Name> <Surname>"Test1"</Surname>
  </Person>
  <Person>
   <Name>"TestA"</Name> <Surname>""</Surname>
  </Person>

Now I have problem of limit character in batch, can someone post me example with powershell please

pape
  • 239
  • 4
  • 17
  • Yes, but I convert this file in xml, so this is fine – pape Jul 26 '23 at 17:10
  • Going back to what @Mofi stated, if you have a CSV file with XML data, you probably want to read that data with a CSV reader/parser and write the results as XML to .xml files. Pretending the CSV is just XML with some bad characters that can be removed may lead to problems with the final XML. Maybe if you ask with the PowerShell tag, someone with PS experience can suggest a solution. – Zach Young Jul 26 '23 at 18:00

2 Answers2

1

You remove quotes. Ok, that's a step in the right direction.
But you have to save double-doublequotes.

You can do this by first replacing "" with another char (chose one that definitely won't be in your data), then delete each remaining " and finally revert that special char (§ below for demo purposes) back into ""

For /F "usebackq tokens=*" %%A in ("%FileIn%") do (
  set "Line=%%A"
  set "Line=!Line:""=§!"
  set "Line=!Line:"=!"
  set "Line=!Line:§="!"
  Echo.!Line!
)

(Note: this will still delete empty lines. There are ways to avoid this, but this wasn't your question, so I left it out)

Stephan
  • 53,940
  • 10
  • 58
  • 91
0

The Windows Command Processor cmd.exe processing a batch file is not designed for processing CSV or XML files like PowerShell or VBScript. The Windows Command Processor is designed for running commands and executables. There is not even a Windows command supporting searching for a string in a file and replacing it with a different string. It is therefore the worst choice for this task using a batch file to reformat a CSV file to a partial XML file.

The task can be done nevertheless with a batch file using following command lines:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "FileIn=C:\Users\PC\Documents\test.csv"
if not exist "%FileIn%" echo ERROR: Missing file: "%FileIn%"& exit /B 1
for %%I in ("%FileIn%") do set "FileOut=%%~dpnI.xml"

(for /F delims^=^ eol^= %%I in ('%SystemRoot%\System32\findstr.exe /N "^" "%FileIn%"') do (
    set "Line=%%I"
    setlocal EnableDelayedExpansion
    set "Line=!Line:*:=!"
    if defined Line (
        set "Line=!Line:""=#q-u-o-t-e#!"
        set "Line=!Line:"=!"
        echo(!Line:#q-u-o-t-e#="!
    ) else echo(
    endlocal
))>"%FileOut%"
endlocal

This batch file handles correct any "ANSI" (one byte per character) or UTF-8 encoded CSV file including empty lines and lines containing one or more exclamation marks. It does not support UTF-16 encoded CSV files because of FINDSTR does not support searching in UTF-16 encoded files.

The pure batch file solution is very slow in comparison to a pure PowerShell or VBScript solution because of setting up for each line a local execution environment with enabled delayed variable expansion and discarding that environment after reformatting the line and appending it to the XML output file. Read this answer for details about the commands SETLOCAL and ENDLOCAL and what is done in background on each execution of these two commands.

Read also: How to read and print contents of text file line by line? It explains in full details the FOR command line and set "Line=!Line:*:=!" required to process also empty lines in the CSV file.

Batch scripts - cannot display the special character ^ explains in full details why delims^=^ eol^= is used to avoid that any line read from CSV file output by FINDSTR with line number and colon at the beginning is ignored or modified by FOR before assigning the string to the loop variable I. In this case could be used also just "tokens=*" (the line with leading normal spaces and horizontal tabs removed) or "delims=" (no line splitting because of an empty list of delimiters) because of each line to process by FOR starts with an ASCII digit.

The reformatting of a line after removal of the line number and the colon added by FINDSTR is done by replacing all occurrences of "" in the current line by the string #q-u-o-t-e# which is expected not being present ever in the CSV file. Next all remaining " in the current line are removed from the line. Last all occurrences of #q-u-o-t-e# are replaced by " before the line is appended to the XML output file.

Read also the DosTips forum topic: ECHO. FAILS to give text or blank line - Instead use ECHO/
The command echo( is the only possibility to output an empty line or a blank line containing just normal spaces or horizontal tabs which always works as not accessing the file system at all as echo. or echo/ do.

The batch file dynamically defines the output file name derived from the input file name by replacing the file extension of the input file by .xml. The input file cannot have for that reason the file extension .xml as that would result in input file name being identical to output file name. That would result with this code in truncation of the input file to 0 bytes before findstr.exe tries to open the input file which would fail as the file is already opened by cmd.exe. There could be an IF condition added to append something to output file name left to file extension .xml if the input file name has the file extension .xml like:

for %%I in ("%FileIn%") do if /I not  "%%~xI" == ".xml" (set "FileOut=%%~dpnI.xml") else set "FileOut=%%~dpnI_out.xml"

To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.

  • echo /?
  • endlocal /?
  • findstr /?
  • for /?
  • if /?
  • set /?
  • setlocal /?
Mofi
  • 46,139
  • 17
  • 80
  • 143