4

I am trying to write a Windows batch file that will look through a specific html file that looks something like this (simplified):

            <input name="pattern" value="*.var" type="text" /><img style="width: 16px; height: 16px; vertical-align:middle; cursor:pointer" onclick="this.parentNode.submit()" class="icon-go-next icon-sm" src="/static/474743c8/images/16x16/go-next.png" /></form></div><table class="fileList"><tr><td><img style="width: 16px; height: 16px; " class="icon-text icon-sm" src="/static/474743c8/images/16x16/text.png" /></td><td><a href="./address.var.varapplication-varapplication-varwebservice-05.05.07-SNAPSHOT.var">address.var.varapplication-varapplication-varwebservice-05.05.07-SNAPSHOT.var</a></td><td class="fileSize">133.49 MB</td><td><a href="./address.var.varapplication-varapplication-varwebservice-05.05.07-SNAPSHOT.var/*fingerprint*/"><img style="width: 16px; height: 16px; " class="icon-fingerprint icon-sm" src="/static/474743c8/images/16x16/fingerprint.png" /></a> <a href="./address.var.varapplication-varapplication-varwebservice-05.05.07-SNAPSHOT.var/*view*/">view</a></td></tr><tr><td style="text-align:right;" colspan="3"><div style="margin-top: 1em;"><a href="./*.var/*zip*/target.zip"><img style="width: 16px; height: 16px; " class="icon-package icon-sm" src="/static/474743c8/images/16x16/package.png" />

and use the build version (e.g. 05.05.07-SNAPSHOT - next time will be another version but the format remain the same) as variable for another batch file. I have tried with findstr but no success:

for /F "delims=" %%a in ('findstr /ic "webservice" a.html') do set "line=%%a"
set "line=%line:*webservice=%"
for /F "delims=" %%a in ("%line%") do set string=%%a
for %%b in ("%line%") do @ set "var=%%b"
SET build=%var:~-11,8%      
ECHO. %build%
Deco
  • 79
  • 1
  • 8
  • Welcome to StackOverflow! You are asking your question the right way, including sample data, the code you've tried to parse it, and clearly explaining the output you desire. Well done! – rojo Jul 28 '16 at 15:34

2 Answers2

1

When parsing structured markup, it's better to treat it as a hierarchical object than as flat text. Not only is it easier to navigate as a hierarchy than trying to match strings with tokens or a regexp, but an object-oriented approach is also more resistant to changes in formatting (whether the code is minified, beautified, line breaks are introduced, whatever).

With that in mind, I suggest using a querySelector to select anchor tags that are children of table elements whose classname is "fileList". Then use a regex to scrape the version info from the anchor tag's href attribute.

@if (@CodeSection == @Batch) @then
@echo off & setlocal

set "html=test.html"

for /f "delims=" %%I in ('cscript /nologo /e:JScript "%~f0" "%html%"') do set "%%I"

echo %build%

goto :EOF
@end // end batch / begin JScript hybrid code

var htmlfile = WSH.CreateObject('htmlfile'),
    fso = WSH.CreateObject('Scripting.FileSystemObject'),
    file = fso.OpenTextFile(WSH.Arguments(0), 1),
    html = file.ReadAll();

file.Close();
htmlfile.write('<meta http-equiv="x-ua-compatible" content="IE=9" />' + html);

var anchors = htmlfile.querySelectorAll('table.fileList a');

for (var i = 0; i < anchors.length; i++) {
    if (/webservice-((\d+\.)*\d.+)\.var$/i.test(anchors[i].href)) {
        WSH.Echo('build=' + RegExp.$1);
        WSH.Quit(0);
    }
}

What's even cooler is, if the HTML file you're scraping is served by a web server, you can also use the Microsoft.XMLHTTP methods to retrieve the HTML without having to rely on wget or curl or similar. This only requires a few minor changes to the code above.

@if (@CodeSection == @Batch) @then
@echo off & setlocal

set "URL=http://www.domain.com/file.html"

for /f "delims=" %%I in ('cscript /nologo /e:JScript "%~f0" "%URL%"') do set "%%I"

echo %build%

goto :EOF
@end // end batch / begin JScript hybrid code

var xhr = WSH.CreateObject('Microsoft.XMLHTTP'),
    htmlfile = WSH.CreateObject('htmlfile');

xhr.open('GET', WSH.Arguments(0), true);
xhr.setRequestHeader('User-Agent', 'XMLHTTP/1.0');
xhr.send('');
while (xhr.readyState != 4) WSH.Sleep(50);

htmlfile.write('<meta http-equiv="x-ua-compatible" content="IE=9" />' + xhr.responseText);

var anchors = htmlfile.querySelectorAll('table.fileList a');

for (var i = 0; i < anchors.length; i++) {
    if (/webservice-((\d+\.)*\d.+)\.var$/i.test(anchors[i].href)) {
        WSH.Echo('build=' + RegExp.$1);
        WSH.Quit(0);
    }
}
Community
  • 1
  • 1
rojo
  • 24,000
  • 5
  • 55
  • 101
  • The first code works perfect, but the second, which I'm very interested, displayed the following error: `code` url.bat(13, 1) Microsoft JScript runtime error: Object doesn't support this property or method – Deco Jul 29 '16 at 07:12
  • I appreciate if you will put some comments to understand the code (I don't have any experience with JScript). Thanks! – Deco Jul 29 '16 at 07:37
  • I think JScript stop working so I added a code (find it on stackoverflow): `//to trigger the error: throw new FatalError("Something went badly wrong!");` and error message displayed is: test.bat(13, 1) Microsoft JScript runtime error: Something went badly wrong! Any ideas why this is happening? – Deco Jul 29 '16 at 08:10
  • @Deco LOL I'm sorry. I had a typo on line 13. I said `CreateOjbect`. I'll fix it. I didn't feel like spinning up a web server to test, so let me know if you find any more problems. – rojo Jul 29 '16 at 12:12
  • I'm not sure what was happening because also the first script stop working. I restarted the Windows but the problem remain. Could be a problem with Java? Should I reinstall the Java? – Deco Jul 29 '16 at 12:32
  • The second script will work also with dynamic pages? – Deco Jul 29 '16 at 12:33
  • @Deco Java is not Jscript. Jscript and JavaScript aren't really the same thing either, although they do have a lot in common. I think both scripts should work with dynamic pages. They render the page invisibly using the HTA engine (I think?) in IE9 compatibility mode; and they use a CSS selector and the DOM which mutate along with the web page's mutations. Regarding the first script no longer working, **could it be that the regex needs to be tweaked to match the expected text?** In any case, I did spin up IISExpress to test, and both scripts work with your sample HTML in your question. – rojo Jul 29 '16 at 13:03
  • Sorry for my confusion regarding Java and JavaScript. For first script: I'm sorry but I don't understand how the script working and I'm a bit confuse about how identify the desire text. Instead of `CreateObject` what will be? – Deco Jul 29 '16 at 13:38
  • @Deco `CreateObject` is correct. I had the `b` and the `j` switched. Re: identifying desired text, are you familiar with regular expressions? Try changing `/webservice-((\d+\.)*\d.+)\.var$/i` to `/webservice-(.+)\.var$/i` and see whether that fixes it. That's a little less specific and will match under broader conditions. – rojo Jul 29 '16 at 13:47
  • I reposted the html file (actually I will use that script in four different cases which are very similar but not identical). I modified the script with your suggestion: `if (/varwebservice-(.+)\.var$/i.test(anchors[i].href)) { ` . The result is: "E:\0>test.bat ECHO is off." – Deco Jul 29 '16 at 14:17
  • Odd. My result is `05.05.07-SNAPSHOT` using your updated html. If you're getting the incorrect result with the second script, can you confirm that the URL is valid? – rojo Jul 29 '16 at 14:40
  • Could be a JavaScript problem? Maybe is disable? About the second script: actually the URL is like: http://google.com/ (no page.html at the end) – Deco Jul 29 '16 at 14:48
  • Didn't you say you got the first script to work for you 10 hours ago? Unless you've intentionally disabled JavaScript in your Internet Options, I don't see how that could be an issue. I don't think I'll be able to help you troubleshoot further until you tell me the URL you are checking and let me try my script against the live version. You can delete the comment with the URL after I've got it; or I can give you my email address if there's no other option. – rojo Jul 29 '16 at 17:51
  • Extremly odd: I have tested first script at home and it works every time. Just with laptop from my work stop to working :( – Deco Jul 30 '16 at 13:39
  • According with your suggestion, I disabled and re-enabled the JavaScript in IE option and the first script started to work. Many thanks! – Deco Aug 01 '16 at 07:39
0

Try this:

findstr /ic:"webservice" a.html

user2956477
  • 1,208
  • 9
  • 17
  • Unfortunately is displaying: "< was unexpected at this time." – Deco Jul 28 '16 at 13:05
  • @ser2956477: I modified the script according with your suggestion and I also added "{" but still no useful result: for /F "delims=" %%a in ('findstr /ic:"webservice" test.html') do set "line=%%a" set "line=%line:*webservice={%" for /F "delims=" %%a in ("%line%") do set string=%%a for %%b in ("%line%") do @ set "var=%%b" SET build=%var:~-11,8% ECHO. %build% – Deco Jul 28 '16 at 13:20