5

I have a file with following 4 lines.

A;1;abc;<xml/>;
;2;def;<xml
>hello world</xml>;
;3;ghi;<xml/>;

Using the batch file, I need to combine lines such that if the line doesn't end end with a semicolon (;), combine the next line into the current line.

So the desired output should be

A;1;abc;<xml/>;
;2;def;<xml>hello world</xml>;
;3;ghi;<xml/>;

I am not very familiar with batch scripts but tried using for /F but no luck so far.

As I understand, the logic should be to check the last character for each line, if it is not a semicolon, read the next line into current line.

Further to this, I managed to get the last character of the line but my script only reads the line if it doesn't being with ; . Any ideas?

@echo off
for /f "tokens=*" %%i in (myfile.txt) do (
  set var=%%i
  echo %%i
  if "%var:~-1%"==";" (
    echo test
  )
)

Note: the above query only reads line 1 and 3.

Junaid
  • 1,708
  • 16
  • 25

3 Answers3

7

You have a number of problems with your code :)

1) As you have stated, your code ignores lines that begin with ; - This is due to the default FOR /F EOL option. But your code also strips leading spaces from each line because of "TOKENS=*". You need to set both EOL and DELIMS to nothing. The syntax is weird, but it works:

for /f delims^=^ eol^= %%i ...

2) You attempt to set and expand var within a parenthesized block of code. This cannot work because expansion occurs when the line is parsed, and the entire block of code is parsed at once. So the value of %var% is the value that existed prior to the loop executing. Of course not what you want. The solution is to use delayed expansion. Type FOR /? from a command prompt for more information about delayed expansion (about half way down the help listing)

3) For variable content containing ! will be corrupted if it is expanded when delayed expansion is enabled. The solution is to toggle delayed expansion on and off as needed within the loop. But that causes a complication because you need the value of the growing line to be preserved across the ENDLOCAL barrier. I use a FOR /F to transport the value across the barrier.

Here is a complete batch script that should do the job. It is limited in that it cannot process lines that are greater than the max length of ~8191 bytes.

This code has been re-written to fix a significant bug

@echo off
setlocal disableDelayedExpansion
set "ln="
set "print=0"
for /f delims^=^ eol^= %%i in (myfile.txt) do (
  set "var=%%i"
  setlocal enableDelayedExpansion
  for /f delims^=^ eol^= %%A in ("!ln!!var!") do (
    if "!var:~-1!"==";" (
      endlocal
      echo %%A
      set "ln="
    ) else (
      endlocal
      set "ln=%%A"
    )
  )
)

SET /P solution

There is a much simpler solution that prints each line immediately so that you don't have to worry about transporting a variable across ENDLOCAL. Lines that don't end with ; are printed without newlines using SET /P.

This solution has the following limitations:

1) Lines printed via SET /P will have leading spaces stripped. This limitation is only for Vista and newer versions of Windows. It is not a problem on XP.

2) Thanks to David Ruhmann, I now know that SET /P will fail if the line begins with =. Very unfortunate :(

@echo off
setlocal disableDelayedExpansion
set "ln="
for /f delims^=^ eol^= %%i in (myfile.txt) do (
  set "var=%%i"
  setlocal enableDelayedExpansion
  if "!var:~-1!"==";" (echo !var!) else (<nul set /p ="!var!")
  endlocal
)

hybrid batch/JScript regex solution (bullet proof?)

I've written a hybrid batch/JScript REPL.BAT utility that allows for easy regex search and replace on file contents. It makes the job really easy.

The following command should work on any input, without limitations. It has been updated to support both Windows and Unix style lines. And it is much faster than a pure batch solution.

findstr "^." myfile.txt|repl "([^;\r])\r?\n" "$1" m >"outFile.txt"

Here is the REPL.BAT utility. Full documentation is embedded within the script.

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?
:::
:::  Performs a global search and replace operation on each line of input from
:::  stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /? then prints help documentation
:::  to stdout.
:::
:::  Search  - By default this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::            A $ literal can be escaped as $$. An empty replacement string
:::            must be represented as "".
:::
:::            Replace substitution pattern syntax is documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            I - Makes the search case-insensitive.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in Replace are
:::                treated as $ literals.
:::
:::            E - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line. ^ anchors
:::                the beginning of a line and $ anchors the end of a line.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Ascii (Latin 1) character expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Escape sequences are supported even when the L option is used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string.
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" ""
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 1
  )
)
echo(%~3|findstr /i "[^SMILEX]" >nul && (
  call :err "Invalid option(s)"
  exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) {
  options+=args.Item(2).toLowerCase();
}
var multi=(options.indexOf("m")>=0);
var srcVar=(options.indexOf("s")>=0);
if (srcVar) {
  options=options.replace(/s/g,"");
}
if (options.indexOf("e")>=0) {
  options=options.replace(/e/g,"");
  search=env(search);
  replace=env(replace);
}
if (options.indexOf("l")>=0) {
  options=options.replace(/l/g,"");
  search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
  replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("x")>=0) {
  options=options.replace(/x/g,"");
  replace=replace.replace(/\\\\/g,"\\B");
  replace=replace.replace(/\\b/g,"\b");
  replace=replace.replace(/\\f/g,"\f");
  replace=replace.replace(/\\n/g,"\n");
  replace=replace.replace(/\\r/g,"\r");
  replace=replace.replace(/\\t/g,"\t");
  replace=replace.replace(/\\v/g,"\v");
  replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
    function($0,$1,$2){
      return String.fromCharCode(parseInt("0x"+$0.substring(2)));
    }
  );
  replace=replace.replace(/\\B/g,"\\");
}
var search=new RegExp(search,options);

if (srcVar) {
  WScript.Stdout.Write(env(args.Item(3)).replace(search,replace));
} else {
  while (!WScript.StdIn.AtEndOfStream) {
    if (multi) {
      WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
    } else {
      WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace));
    }
  }
}
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • To add more, the file I am using as input is the list of records in CSV. – Junaid Jan 11 '13 at 15:06
  • @Junaid - You should not be getting the recursion limit. It sounds like your code is missing an ENDLOCAL. I'm not aware of a way to make the 1st solution work with lines longer than 8191 bytes. Try the 2nd solution. Hopefully you are either on XP, or else you don't have to worry about leading spaces. – dbenham Jan 11 '13 at 15:47
  • @Junaid - Oops. You were correct. There was a significant bug in my 1st solution. I re-wrote the code to fix the bug and edited the answer. – dbenham Jan 11 '13 at 16:16
  • @Junaid - I think my final hybrid batch/JScript solution is your best option. – dbenham Jan 11 '13 at 18:42
  • thanks, but I am not sure how to echo the result of second solution's result to a file? Plus I stil have issue with the line size as size of a line, each of the non-combined line in my file can reach upto 30000 characters before moving to the next line – Junaid Jan 14 '13 at 09:46
  • @Junaid - There is not any pure native batch solution that can process lines longer than 8191 bytes. (Perhaps something could be coded to piece together long lines, but it would be complicated and probably unacceptably slow) As I indicated in my prior comment, you should try the 3rd solution - the hybrid batch/JScript. I modified it to write the output to a new file. It should work just fine. The only potential issue is it loads the entire file into memory - that could possibly cause problems if the file is *really* huge. – dbenham Jan 14 '13 at 13:09
  • Thanks, I can only use batch file therefore solution 2 works perfect. I have worked around the line size issue. The thing now is that if the file is already in correct format, it merges everything on one single line – Junaid Jan 14 '13 at 13:55
  • Found out that it's do to with the space added at the end of each file when lines are merges. Is it possible to trim space from the end of each line? – Junaid Jan 14 '13 at 14:11
  • Any idea how long would it take to convert the second solution to UNIX? How difficult would that be? – Junaid Jan 17 '13 at 16:42
4

Without Delayed Expansion

@echo off
setlocal EnableExtensions DisableDelayedExpansion
for /f "tokens=* eol=" %%L in (myfile.txt) do (
    <nul set /p ="%%L" 2>nul                         %= Fixed Limitation 3 =%
    set "xLine=%%L"
    call set "xLine=%%xLine:"=%%"                    %= Fix for Limitation 2 =%
    call :NewLine
)
endlocal
pause >nul
goto :eof

:NewLine
if "%xLine:~-1%"==";" echo.
goto :eof

With Delayed Expansion

@echo off
setlocal EnableExtensions DisableDelayedExpansion
for /f "tokens=* eol=" %%L in (myfile.txt) do (
    <nul set /p ="%%L" 2>nul                         %= Fixed Limitation 3 =%
    setlocal EnableDelayedExpansion
    set "xLine=%%L"
    set "xLine=!xLine:"=!"                           %= Fix for Limitation 2 =%
    if "!xLine:~-1!"==";" echo.
    endlocal
)
endlocal
pause >nul

Limitations: (Same for both versions)

  1. Lines may not begin with the equals = character due to the <nul set /p "=%%L" command.
  2. Lines may not end with the double quotation " character due to the if "<var>"==";" echo. command.
  3. Double quotation " characters at the beginning of the line will be be lost due to the <nul set /p "=%%L" command. (solved by dbenham)
  4. Spaces at the beginning of the line will be trimmed due to the "tokens=* eol=" option. The same issue occurs for Windows Vista or newer with the delims^=^ eol^= option due to the set /p command. I chose to implement the tokens method for consistency across all versions of Windows.
  5. The batch line length limit. 8191 bytes. See Line length limit in xp batch file? and http://support.microsoft.com/kb/830473

Note: None of these limitations will crash the script, but instead 1 and 3 will cause those lines to be skipped and 4 will just trim the leading space from the line.

Update

I have found a (display only!) solution for the = equals and space trimming issue cause by the set /p command. However, it requires that a non-display character be entered into the batch script. This must be done by editing the hex data of the script. Place any non-space, non-issue character (illustrated by .) followed by the backspace character (illustrated by 0x08) and only the value of %Var% will display. NOTE: This will not work as a solution for file output since the non-displayed characters will also be output to the file.

set /p =".0x08%Var%"

The reason for this equals issue is because the set command has a problem parsing variable names and does not allow equals to be contained within a variable name.

SET command will not allow an equal sign to be part of the name of a variable.

This issue has always existed, but has been compounded by the leading space trimming issues added in Vista+. Good analysis: http://www.dostips.com/forum/viewtopic.php?f=3&t=4209

Community
  • 1
  • 1
David Ruhmann
  • 11,064
  • 4
  • 37
  • 47
  • 1
    +1, I had forgotten about the enclosing quote issue. That can be solved by moving the quote in the code after the equal: `set /p ="%%L"`. But OMG, that SET /P `=` issue is new to me, and it is nasty :( After moving the code quote after the equal, I find it only causes problems if the line begins with `="`. Lines beginning with `=` that aren't followed by a quote work fine. – dbenham Jan 11 '13 at 18:50
  • 1
    Oops - you were correct about SET /P and equal. I fooled myself. It always fails when line begins with `=`. – dbenham Jan 11 '13 at 19:03
  • @dbenham Thanks for the part about placing the quotes after the equals. `set /p ="%%L"` That solves the Limitation 3 at least. **`:)`** And for that equals `=` issues I have not found a simple resolution yet. – David Ruhmann Jan 11 '13 at 19:07
  • You have a problem with `"eol= delims="`: that sets EOL to a space, so it will ignore any lines that begin with space. See [my answer](http://stackoverflow.com/a/14279663/1012053) for the proper way to disable both EOL and DELIMS – dbenham Jan 11 '13 at 19:36
  • @dbenham I had just noticed that myself and was working on the fix. Thanks. **`:)`** – David Ruhmann Jan 11 '13 at 19:41
  • Thanks all, I am not sure how can I echo the result to the file? – Junaid Jan 14 '13 at 08:52
  • Also, has the solution got any limit for the size of a line, each of the non-combined line in my file can reach upto 30000 characters before moving to the next line? – Junaid Jan 14 '13 at 09:10
  • 1
    @Junaid To echo the results to the file call the batch script with a redirect `file.bat>output.txt` or add redirects `>>output.txt` into the batch script on the lines that output the text. Unfortunately, the limit is 8191 bytes (8191 ASCII or 2047 Unicode) for batch. See http://stackoverflow.com/questions/9575644/line-length-limit-in-xp-batch-file and http://support.microsoft.com/kb/830473 – David Ruhmann Jan 14 '13 at 14:47
0

Here is a solution that does not use the set /P command because this introduces some limitations. Here the applicable lines are concatenated in a variable and output as soon as a trailing semicolon is encountered using echo which has no such limitations. The code contains explanatory remarks:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "FILE=%~1" & rem // (input file from command line argument)
set "CHAR=;"   & rem // (character that marks the end of line)

rem // Initialise variables:
set "PREV=" & rem // (variable to collect lines to combine)
rem // Iterate through the lines of the given file:
for /F usebackq^ delims^=^ eol^= %%L in ("%FILE%") do (
    set "LINE=%%L"
    rem // Toggle delayed expansion to not lose `!` in text:
    setlocal EnableDelayedExpansion
    rem // Check last character of current line:
    if "!LINE:~-1!"=="%CHAR%" (
        rem /* Last character marks end of line, so output
        rem    collected previous lines and current one: */
        echo !PREV!!LINE!
        rem // Clear Cached previous lines:
        endlocal
        set "PREV="
    ) else (
        rem /* Last character does not mark end of line, so
        rem    do not output it but cache it in a variable;
        rem    the `for /F` loop lets the data pass `endlocal`: */
        for /F delims^=^ eol^= %%K in ("!PREV!!LINE!") do (
            endlocal
            set "PREV=%%K"
        )
    )
)
rem /* Output all remaining cached data in case the last line
rem    is not terminated by an end-of-line marker: */
if defined PREV (
    setlocal EnableDelayedExpansion
    echo !PREV!
    endlocal
)

endlocal
exit /B
aschipfl
  • 33,626
  • 12
  • 54
  • 99