0

I need to extract text from file between 2 delimiters and copy it to TXT file. This text looks like XML code, instead delimiters <string> text... </string>, I have :::SOURCE text .... ::::SOURCE. As you see in first delimiter are 3x of ':' and in second are 4x of ':'

Most import is that there are multiple lines between these 2 delimiters.

Example of text:

texttexttexttexttexttexttexttexttext
texttexttexttext
:::SOURCE
just this text
just this text
just this text
just this text
...
just this text
::::SOURCE texttext
texttexttext

Desired output:

just this text
just this text
just this text
just this text
...
just this text
Andy
  • 79
  • 1
  • 7
  • If your goal is to scrape a log file, be advised that batch processing a large log file is inefficient, even if efficient methods are used in the batch script. You'll get much better performance from a stream reader, such as [GNU `awk`](http://gnuwin32.sourceforge.net/packages/gawk.htm). Please have a look at [my past struggles](http://stackoverflow.com/questions/15628017/) so you aren't doomed to repeat them. I'm pretty sure I've been through what you're going through now. – rojo Feb 21 '16 at 17:51
  • @ rojo ca you please submit GNU version for this example? – Andy Feb 21 '16 at 19:13
  • You could actually do it with a one liner without needing a script. `awk "/^:::SOURCE/{flag=1;next}/^::::SOURCE/{flag=0}flag" txtfile.txt` would do it. ([credit to this post](http://stackoverflow.com/a/17988834/1683264)) – rojo Feb 21 '16 at 20:38
  • Thanks @rojo; Now I need to find some spare time to learn this new language. – Andy Feb 22 '16 at 10:43

1 Answers1

1

Try this:

@echo off
setlocal enabledelayedexpansion
if exist srcoutput.txt ( break > srcoutput.txt )
set found=
set markpoint=false
set /a count=0
set /a two=2
for /f "tokens=* delims= " %%a in (source.txt) do (
   if !count! equ %two% goto :EOF
   echo %%a | findstr /c:":SOURCE" >nul 
   if errorlevel 1 ( 
           set found=false
           if "!markpoint!"=="true" (
            echo %%a >> srcoutput.txt
           )
          ) else ( 
              set found=true
            )

   if "!found!"=="true" (
      set /a count=count+1
      set /a division=!count!%%%two%
      if !division! equ 0 (
         set markpoint=false
      ) else (
         set markpoint=true
        )
   )

)
:EOF

For input file source.txt which contains :

texttexttexttexttexttexttexttexttext
texttexttexttext
:::SOURCE
just this text
just this text
just this text
just this text
...
just this text
::::SOURCE texttext
:::SOURCE
just this text
just this text
just this text
just this text
...
just this text
::::SOURCE texttext
texttexttext
:::SOURCE
just this text
just this text
just this text
just this text
...
just this text
::::SOURCE texttext

The output in srcoutput.txt looks like:

just this text 
just this text 
just this text 
just this text 
... 
just this text 
SomeDude
  • 13,876
  • 5
  • 21
  • 44
  • Thanks svasa for your reply. Maybe I miss something, but unfortunately this doesn't with: 'texttexttexttexttexttexttexttext texttexttexttexttext text :::SOURCE text text text text ... text ::::SOURCE text text text' This text with multiple lines.... – Andy Feb 21 '16 at 16:42
  • I updated 'Example' and added 'desired output' to understand better. – Andy Feb 21 '16 at 17:02
  • You should do that since the very beginning! Are the `:::SOURCE` and `::::SOURCE` delimiters always placed at beginning of the line (like in your example)? – Aacini Feb 21 '16 at 17:12
  • also, is there only one "block" of SOURCE? If no, how to handle them? – Stephan Feb 21 '16 at 17:25
  • Well it should have been mentioned in the original post. It took more effort than anticipated, but please check out the answer, it will parse anything between ::SOURCE and :::SOURCE – SomeDude Feb 21 '16 at 18:15
  • @Andy please check out the answer – SomeDude Feb 21 '16 at 18:43
  • Many thanks svasa, it works for this example, but I have a problem with extracting my real code and I really don't know why. First of all it's generating an error if in text are present characters "<>". Cheers, Andy. – Andy Feb 21 '16 at 19:12
  • do you have xml tags in text? If so, forget batch file. try perl. Its a breeze extracting text from xml in perl. – SomeDude Feb 21 '16 at 19:19
  • Is not XML, it's programming compiled files. If file is big (about 1MB of text and compiled characters), this code is running forever. I'm just thinking, is so difficult to extract just text between these 2 delimiters and ignore unreadable complied text at begin of file using batch? Delimiters apears only once, at begin of code and at end. – Andy Feb 21 '16 at 19:41
  • @Andy you should have clearly mentiond your requirements. I didn't know that you have just one pair of delimiters. I modified batch file, it should run fast now. Please try. If it works, please upvote the answer too. thanks. – SomeDude Feb 25 '16 at 17:08
  • @svasa - Cheers mate. It's better now, but unfortunately batch is limited so I went with VB script at end. See [stackoverflow.com/questions/35552792](http://stackoverflow.com/questions/35552792/vb-script-extracting-text-from-file-using-delimiters) – Andy Mar 02 '16 at 10:49