1

I get confused with the Windows batch way to process string and substring when it contains special characters.

From a script I obtained this variable:

echo "%longString"

which returns me:

"<p style="text-align: center;"><a class="more" href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">Download</a></p><p style="text-align: center;"><a href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">64-Bit Graphical Installer (462 MB)</a></p><p style="text-align: center;"><a href="https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86.exe">32-Bit Graphical Installer (410 MB)</a></p></div>"

And the only part that I am interested in is between a href=" and ">64-Bit Graphical.

Then, using this similar question (but I does not contain the difficulty of special characters), I tried many combinations of the proposed solutions but every time I get unexpected result with my case because of the special characters.

I thing that a non-working example could be

@ECHO OFF

:: define the longstring
Call Set "longString=<p style="text-align: center;"><a class="more" href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">Download</a></p><p style="text-align: center;"><a href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">64-Bit Graphical Installer (462 MB)</a></p><p style="text-align: center;"><a href="https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86.exe">32-Bit Graphical Installer (410 MB)</a></p></div>"

:: Define subtrings token
Set "subsA=a href=""
Set "subsB=>64-Bit Graphical"

:: Remove part before subsA
Call set "Result=%%longString:*%subsA%=%%"
:: extract part to remove behind subsB
Call set "Remove=%%Result:*%subsB%=%%"
:: remove part behind subsB
Call set "Result=%%Result:%Remove%=%%"

Echo "%Result%"

For now my best result is using Set "subsA=href"and Set "subsB=64-Bit" (so it is simpler since there is no special characters in), which allows me to go through the first settings of Result and Remove but then because these new variables contain many special characters the last setting of Result gives me crap.

I also tried to use For /F and findstr but the results where even worse.

So I eager to find any solution or/and explanations.

aschipfl
  • 33,626
  • 12
  • 54
  • 99
R. N
  • 707
  • 11
  • 31

2 Answers2

2

Well, since you are trying to extract a URL which should usually not contain quotation marks on its own you could do the following:

  • split off everything up to and including a href=" (I intentionally removed =" from this search string in order to be able to use sub-string substitution, because the = separates search and replace strings);
  • split the remaining string at " characters and extract the second part (the first part is =);

Here is a possible solution:

@echo off

Set "longString=<p style="text-align: center;"><a class="more" href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">Download</a></p><p style="text-align: center;"><a href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">64-Bit Graphical Installer (462 MB)</a></p><p style="text-align: center;"><a href="https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86.exe">32-Bit Graphical Installer (410 MB)</a></p></div>"

rem // Use delayed expansion to avoid trouble with `"` and other special characters:
setlocal EnableDelayedExpansion
rem // Split off everything up to and including `a href`, then extract the second token in between `""`:
for /F tokens^=1^,2^ delims^=^"^ eol^=^" %%I in ("!longString:*a href=!") do (
    endlocal
    rem // Check for leading `=`-sign (could be skipped if not needed):
    if not "%%I"=="=" >&2 echo ERROR!& goto :EOF
    rem // Remove leading whitespaces:
    for /F "tokens=* eol= " %%K in ("%%J") do set "partString=%%K"
)
rem // Return extracted URL:
echo/%partString%
aschipfl
  • 33,626
  • 12
  • 54
  • 99
1

Don't use call with special characters, it only gets worse ( in this case even CALL works, but thats only luck).

Set "longString=<p style="text-align: center;"><a class="more" href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">Download</a></p><p style="text-align: center;"><a href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">64-Bit Graphical Installer (462 MB)</a></p><p style="text-align: center;"><a href="https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86.exe">32-Bit Graphical Installer (410 MB)</a></p></div>"

Better use delayed expansion, as the results of delayed expansion are safe for all characters.

Even the first part fails

:: Define subtrings token
Set "subsA=a href=""
:: Remove part before subsA
set "Result=!longString:*%subsA%=!"

The problem is here the equal sign in subA a href=", the first equal sign is used as the delimiter in the search=replace expression.
It's better to change your search string to only Set "subsA=a href".

Now you have more or less the correct string, the first two charcters can simply removed by set result=!result:~2!

Your idea to remove the tail of the string is good, but doesn't work in batch, you got again problems with equal signs in the REMOVE string.

But you could simply count the length of your remove string, that length can be used to remove it from the result by position.
But the length in remove_len is too short, because the length of subsB itself is missing.

set "remove=!result:*%subsB%=!"
call :strlen remove_len remove
set "result=!result:~0,-%remove_len%!"

echo !result!

To get the strlen, you could use a function like SO:How do you get the string length in a batch file?

The resulting code looks like

@echo off
setlocal
Set "longString=<p style="text-align: center;"><a class="more" href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">Download</a></p><p style="text-align: center;"><a href=" https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86_64.exe">64-Bit Graphical Installer (462 MB)</a></p><p style="text-align: center;"><a href="https://repo.anaconda.com/archive/Anaconda3-2019.10-Windows-x86.exe">32-Bit Graphical Installer (410 MB)</a></p></div>"

setlocal EnableDelayedExpansion

:: Define subtrings token
Set "subsA=a href"
Set "subsB=>64-Bit Graphical"

:: Remove part before subsA
set "Result=!longString:*%subsA%=!"
set "Result=!result:~2!"
set result
set "remove=!result:*%subsB%=!"
set remove
call :strlen remove_len remove
call :strlen subsB_len subsB
set /a remove_len+=subsB_len+1
set "result=!result:~0,-%remove_len%!"

echo !result!
   exit /b


:strlen <resultVar> <stringVar>
(   
    setlocal EnableDelayedExpansion
    (set^ tmp=!%~2!)
    if defined tmp (
        set "len=1"
        for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
            if "!tmp:~%%P,1!" NEQ "" ( 
                set /a "len+=%%P"
                set "tmp=!tmp:~%%P!"
            )
        )
    ) ELSE (
        set len=0
    )
)
( 
    endlocal
    set "%~1=%len%"
    exit /b
)
jeb
  • 78,592
  • 17
  • 171
  • 225
  • I also thought about the index position but not the length. Your solution is efficient. – R. N Dec 06 '19 at 09:01