0

I'm trying to delete certain lines that contains a keyword in a txt file using CMD or VBS.

I've read this but it's not the same. I want to delete a range of lines.

Original Text File:

ABCDEFGXXX
ABCD
A
AE
AXXXLKGUSP
0000ASD
ASD

Processed Text File:

0000ASD
ASD

I want to delete the range of lines containing the first instance of 'XXX' and the second.There are only two instances of this 'XXX'. The number of lines between the two instances are random and there may be a scenario where the two instances are in the same line. The 4 zeroes also appear after the line of the 2nd instance of 'XXX'. Please note that it may contain characters below, so it may choke if you try to process.

---------------------EDIT 08/04/2015 7:41 PM----------------------------------- "XXX" is all caps... and the text file may contain characters below.. it shows a single line in notepad though. IDK what they are

PK
    ¨‘G_t¥0  8ˆ     XXXç¸[~-ÄWÀ¨Ì’gÝ
Community
  • 1
  • 1
Luigi
  • 439
  • 5
  • 23

4 Answers4

1

Assuming you've mentioned VBS to indicate it's acceptable to have a non-pure-batch-syntax solution, here's a powershell one-liner which you can call from a .bat.

It reads the text file in default system encoding, other useful values are UTF8 and Unicode.
A 100MB file is processed in 2 seconds.

@echo off
set "string=XXX"
set "infile=input file.txt"
set "outfile=output file.txt"
set "encoding=default"

powershell -ExecutionPolicy bypass -c "$txt=(get-content '%infile%' -raw -encoding default); $i=$txt.indexof('%string%'); if($i -ge 0) { $j=$txt.indexof('%string%',$i+'%string%'.length); if($j -ge 0) {$k=$txt.indexof(\"`n\",$j); if($k -ge $j){$txt2=$txt.substring($k)} else {$txt2=''} $txt.substring(0,[math]::max(0,$txt.lastindexof(\"`n\",$i))) + $txt2 | out-file '%outfile%' -encoding %encoding%}}"
pause
  • -ExecutionPolicy bypass is added to allow execution of powershell on a non-admin user account.
  • PowerShell 3.0 and newer is required, it comes by default with Windows 7 SP1, 8, 10.
wOxxOm
  • 65,848
  • 11
  • 132
  • 136
  • True. But I did mention CMD and VBS because I intend it to be in CMD and VBS only. This is due to PowerShell requiring admin rights due to it's immense power and also as shameful as it is, I do not read PowerShell yet. – Luigi Aug 04 '15 at 21:15
  • Found a way to bypass powershell. Batch-sensei! – Luigi Aug 04 '15 at 21:21
  • No I didn't based on presumptions that powershell requires admin rights, apparently the only thing I know about powershell :( But I tried to append this powershell.exe -ExecutionPolicy bypass http://stackoverflow.com/questions/13212688/how-do-i-run-powershell-scripts-without-admin-rights – Luigi Aug 04 '15 at 21:25
  • I cannot get this code to run on my work machine. Does it have a version dependency? My Powershell is version 2.0 – dbenham Aug 04 '15 at 21:53
  • @dbenham, yep, `-raw` switch was introduced in 3.0, use ```$txt=(get-content '%infile%') -join \"`r`n\";``` for 2.0 which will be 5 times slower. Do you think it should be added to the answer? – wOxxOm Aug 04 '15 at 22:00
  • It is always a good idea to list any limitations. You might even add the code that works on earlier versions. – dbenham Aug 04 '15 at 22:03
  • I tested with the PS 2.0 code, and it fails to preserve lines before the first XXX if there exists lines both before the first XXX and after the second XXX. – dbenham Aug 04 '15 at 22:08
  • Well, it works here, so maybe PS2.0 doesn't support `&{}`. I've simplified the code in the answer. There was also an off-by-one index error which didn't cause much damage though except for eating one LF. – wOxxOm Aug 04 '15 at 22:37
  • 1
    As for the limitations of PS2.0, not sure it's something worth specialcasing in the answer as those who still use XP don't use powershell because it wasn't too useful or widely used in the old times of XP. I'll add the workaround for a non-admin account though. – wOxxOm Aug 04 '15 at 22:41
  • I'm not sure how XP entered the discussion. I have Win 7 both at work and home, and the default powershell version for both is 2.0 – dbenham Aug 05 '15 at 01:13
  • Oh, my bad. However Windows 7 SP1 includes powershell 3.0 for 5 years and anyway I don't have access to any PC with the outdated PS2.0... Okay, there's only one thing I can do then. I've edited the answer as you suggested. – wOxxOm Aug 05 '15 at 04:53
1

This is really simple with JREPL.BAT - a pure script based utility (hybrid JScript/batch) that runs natively on any Windows machine from XP onward. The key feature is it supports multi-line regular expression search and replace.

The following overwrites the original file:

jrepl "^.*?XXX[\s\S]*XXX.*\n?" "" /m /f "test.txt" /o -

If you want to create a new file, then simply specify a file name instead of - for the /O option:

jrepl "^.*?XXX[\s\S]*XXX.*\n?" "" /m /f "input.txt" /o "output.txt"

Or you could omit the /O option entirely and print the result to the screen (stdout)

jrepl "^.*?XXX[\s\S]*XXX.*\n?" "" /m /f "input.txt"


Use call jrepl ... if you put the command within a batch script.


EDIT: Here is a pure batch solution. Normally I've abandoned using batch for text processing because a robust solution requires too much insanity. The code below is fairly robust and optimized, but even so, still has the following limitations:

  • Lines limited to <8k
  • Searched keyword cannot begin with *, and cannot contain =, !, or "
  • The search ignores case, with no option to make it case sensitive

There may be some more I have missed.

But the code does preserve empty lines, and does not choke on ! in the content. (handling these possibilities is the cause for much of the complexity)

@echo off
setlocal disableDelayedExpansion

set "in=input.txt"
set "out=output.txt"
set "find=XXX"

set "cnt=0"
>"%out%" (
  for /f "delims=" %%A in ('findstr /n "^" "%in%"') do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    set "ln=!ln:*:=!"
    if defined ln if !cnt! equ 0 (
      set "test=!ln:*%find%=!"
      if !test! neq !ln! set "cnt=1"
    ) else set "test=!ln!"
    if !cnt! neq 1 echo(!ln!
    if defined test if !cnt! neq 2 if "!test:%find%=!" neq "!test!" set "cnt=2"
    for %%N in (!cnt!) do endlocal&set "cnt=%%N"
  )
)
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • Error: insufficient arguments – Luigi Aug 04 '15 at 21:54
  • @LuigiMackenzieC.Brito - Ugh - I had an extra quote on all commands as a result of a stupid cut and paste error. All fixed. Believe it or not, I had actually tested extensively - I just copied the commands incorrectly. – dbenham Aug 04 '15 at 21:59
  • It actually works too, but it doesn't account for small xxx or XXX or XxX.. the list goes on – Luigi Aug 04 '15 at 22:24
  • @LuigiMackenzieC.Brito - You didn't state you wanted to ignore case. All you need to do is add the `/I` option to each of the commands. There are a great many options available to JREPL.BAT, and the documentation is built into the script. It is quit sophisticated. – dbenham Aug 04 '15 at 22:28
  • I did try to read the documentation. After seeing the scroll bar I backed down. I also tried creating a pure batch solution but failed. I'm not sure if this is because of my lack of knowledge for cmd or because I have the intellectual capacity of an ape. Let me try again using /I on JREPL. And I'll also try the pure batch. – Luigi Aug 04 '15 at 23:52
  • Hi adding /I produced an error. The batch works though... by any chance, can it handle these kind of characters below... It doesn't need to handle, I just want them removed. I noticed that all solutions choked on them @ÍÖ‚ÉÖ­—ù¾µãågÚ´¿Äƒ¹†n)Uíá—A±Ù1sÉkæñÑ)­´„À½0;QØÕÄ\*A}°µcFe:jÂû>¿nUr¢Š¿S"AÅbä¯SwÁÐþ~ Ò£¬¨Í)×÷Í'£lÈð}óÛxê(yh& – Luigi Aug 05 '15 at 00:29
  • @LuigiMackenzieC.Brito - `jrepl "^.*?XXX[\s\S]*XXX.*\n?" "" /i /m /f "test.txt" /o -` works fine for me (ignores case), and it also works with the characters you posted. Note that both solutions require ANSI encoding (not Unicode) – dbenham Aug 05 '15 at 01:06
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/85161/discussion-between-luigi-mackenzie-c-brito-and-dbenham). – Luigi Aug 05 '15 at 01:24
1

This is a pure CMD implementation:

@echo off

rem DEFINITIONS:
set "KEYWD=XXX"
set "INFILE=original.txt"
set "OUTFILE=modified.txt"

setlocal EnableExtensions EnableDelayedExpansion

rem GET_LINE_NUMBERS:
set "NUMONE="
set "NUMTWO="
for /F %%F in ('findstr /N /L /C:"%KEYWD%" "%INFILE%"') do (
  for /F "delims=:" %%N in ("%%F") do (
    if not defined NUMONE (
      set "NUMONE=%%N"
    ) else (
      set "NUMTWO=%%N"
    )
  )
)
if not defined NUMTWO set "NUMTWO=%NUMONE%"

rem RETURN_BEFORE_BLOCK:
set /A "COUNT=0"
rem.> "%OUTFILE%"
for /F "delims=" %%L in ('findstr /N /R "^" "%INFILE%"') do (
  set /A "COUNT+=1"
  if !COUNT! geq !NUMONE! (
    goto :NEXT
  ) else (
    setlocal DisableDelayedExpansion
    set "LINE=%%L"
    setlocal EnableDelayedExpansion
    echo(!LINE:*:=!
    endlocal
    endlocal
  ) >> "%OUTFILE%"
)

rem RETURN_AFTER_BLOCK:
:NEXT
if defined NUMTWO set "SKIP=skip=!NUMTWO!"
for /F "%SKIP% delims=" %%L in ('findstr /N /R "^" "%INFILE%"') do (
  setlocal DisableDelayedExpansion
  set "LINE=%%L"
  setlocal EnableDelayedExpansion
  echo(!LINE:*:=!
  endlocal
  endlocal
) >> "%OUTFILE%"

endlocal

The code consists of four sections (see the remarks rem):

  • DEFINITIONS: here you need to define the search keyword, the input and the output files;
  • GET_LINE_NUMBERS: this section searches for the line numbers of the two occurrences of the given keyword; the resulting line numbers are stored in the respective variables NUMONE and NUMTWO; if only one line with (a) match(es) is found, NUMTWO is set to NUMONE; if no match is found, both variables remain empty;
  • RETURN_BEFORE_BLOCK: here everything up to but not including the line with the first keyword match is output; it relies on the fact that goto breaks any ongoing for loop context;
  • RETURN_AFTER_BLOCK: in this section every line after the second keyword match is returned and appended to the output of the previous section; here the for /F option argument skip is built dynamically;
aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • You didn't do nearly enough testing. It only works if XXX appears on two different lines, and only if there are no lines before the first XXX line. – dbenham Aug 04 '15 at 22:21
  • @dbenham, I fixed the issue with the lines before the first occurrence of `XXX`; the other point is an assumption I made (and mentioned in the code description); the question says "the number of lines between are random", so I did not think of less than zero lines in between... (if such a situation could occur the question should be revised accordingly...) zero lines means adjacent lines in my point of view... – aschipfl Aug 04 '15 at 22:38
  • Nope - Still broken, and the OP explicitly stated "*there may be a scenario where the two instances are in the same line*" – dbenham Aug 04 '15 at 23:03
  • @dbenham, I fixed also the issue with -1 lines between the matches (matches in the same line), thanks for the hint... seems I didn't read carefully enough... – aschipfl Aug 04 '15 at 23:05
0

Here is another hybrid JScript / Batch solution.

Assume the text file is in data.txt we can invoke a replacer script:

cscript //nologo j.js .*XXX.*\n[\s\S]*?.*\n*XXX.* "" < data.txt

Where j.js is a supporting script written in Microsoft JScript:

var txt = WScript.StdIn.ReadAll();
var pattern = new RegExp( WScript.Arguments.Item(0), "g" );
var newvalue = WScript.Arguments.Item(1);
txt = txt.replace( pattern, newvalue );
WScript.StdOut.Write( txt );

A golf-optimized version of the j.js script is:

WScript.StdOut.Write(WScript.StdIn.ReadAll().replace(new RegExp(WScript.Arguments.Item(0),"g"),WScript.Arguments.Item(1)));

Generally speaking, the j.js usage is:

cscript //nologo j.js pattern_regular_expression new_value < input.dat > output.dat

Where input.dat refers to an input text/data file. If < input.dat is omitted, input will be taken from the user input. And output.dat refers to an output text/data file. If > output.dat is omitted, the output will be shown on the console.

Stephen Quan
  • 21,481
  • 4
  • 88
  • 75