1

I tried searching but couldn't find anything anything specific to what I need.

This is an excerpt from my HTML file:

<div id="pair_today">
    <div class="left_block">
        <div class="tpl_box">
            <h1 style="margin-top:5px;color:#ec1b1b;">
            <span style="font-size:15px;color:#000;">1 Australian Dollar =</span><br /> 93.663 Japanese Yen</h1>

                        <span style="display:inline-block; margin-top:10px; text-align:right; align:right; font-size:12px; color:#9c9c9c">rate on Fri, 6 March, 2015 15:58:20 (AEDT)</span>

           <a href="http://fx-rate.net/AUD/JPY/currency-transfer/" title="Currenty Transfer from Australia to Japan" style="float:right" class="btn" onclick="ga('send','event', {'eventCategory': 'CurrencyTransfer', 'eventAction' : 'click','eventLabel':'Today Box'});"><span class="btn-ico btn-ico-go">Get Rate</span></a>
           </span>

I need to parse out the 93.663 value from line 5. This value will be different every time I have to run the script, so I figured regex would be the best way to specifically target this value.

I've been tinkering with for /f loops but I don't know how to implement regex into the script.

Thanks guys!

user3315570
  • 91
  • 2
  • 7
  • what programming language are you using? Have you googled for a tutorial on that language plus the words "regex tutorial" or "regex how to" ? what code did you come up with after you went through those tutorials? (edit your question and add the code here). if you don't know where to start - how about pseudo-code (and the bits you don't know just make up - ie imagine you could make ti work however you wanted and write that). – Taryn East Mar 06 '15 at 06:04
  • Hi Taryn, I'm just using a batch file in Windows. I did a fair bit of googling but I can't find out how to do this. Effectively, I need a batch file that will output that value (in this case 93.663) from a local HTML file in the same directory. I'm guessing my best bet is some kind of for /f loop with some RegEx sprinkled in to target that value, I just need some help with the syntax. – user3315570 Mar 06 '15 at 06:13
  • 1
    so... show us the code you've got so far :) also - try your hand at the bits you don't know how to do... imagine you had a magic wand and could write code that made sense to you and worked however you wanted it to... what would you write? Then - we can help you change those bits to the way that code actually works. – Taryn East Mar 06 '15 at 06:14
  • Use some languages that provide good html parsing libraries. Parsing html with regex is never recomended – nu11p01n73R Mar 06 '15 at 06:25
  • 1
    The client I'm doing this for won't allow us to install anything. I could do this with Ruby pretty easily. Can anyone show me how to parse it out from a batch file + regex? – user3315570 Mar 06 '15 at 06:28
  • Suppose Windows as tagged `batch-file`. Do not insist upon `batch-file` & `RegEx` as Windows CLI does not support `RegEx` (and, moreover, @nu11p01n73R is right: _parsing html with regex is never recommended_). Switch to `VBScript` with `XPath` in `Msxml2.DOMDocument`. [An example here](http://stackoverflow.com/q/24403237/3439404) – JosefZ Mar 06 '15 at 07:15

1 Answers1

2

Use Windows Scripting Host (VBscript or JScript). Use the htmlfile COM object. Parse the DOM. Then you can massage the innerText as needed with a regexp.

Here you go. Save this as a .bat file, modify the set "htmlfile=test.html" line as needed, and run it. (Derived from this answer. Documentation for the htmlfile COM object in WSH is sparse; but if you would like to learn more about it, follow that bread crumb.)

@if (@CodeSection == @Batch) @then

@echo off
setlocal

set "htmlfile=test.html"

rem // invoke JScript hybrid code and capture its output
for /f %%I in ('cscript /nologo /e:JScript "%~f0" "%htmlfile%"') do set "converted=%%I"

echo %converted%

rem // end main runtime
goto :EOF

@end // end batch / begin JScript chimera

var fso = WSH.CreateObject('scripting.filesystemobject'),
    DOM = WSH.CreateObject('htmlfile'),
    htmlfile = fso.OpenTextFile(WSH.Arguments(0), 1),
    html = htmlfile.ReadAll();

DOM.write(html);
htmlfile.Close();

var scrape = DOM.getElementById('pair_today').getElementsByTagName('h1')[0].innerText;
WSH.Echo(scrape.match(/^.*=\s+(\S+).*$/)[0]);

You know, as long as you're invoking Windows Script Host anyway, if you're acquiring your html file using wget or similar, you might be able to get rid of that dependency. Unless the page you're downloading uses a convoluted series of cookies and session redirects, you can replace wget with the Microsoft.XMLHTTP COM object and download the page via XHR (or as those with less organized minds would say, Ajax). (Based on fetch.bat.)

@if (@CodeSection == @Batch) @then

@echo off
setlocal

set "from=%~1"
set "to=%~2"
set "URL=http://host.domain/currency?from=%from%&to=%to%"

for /f "delims=" %%I in ('cscript /nologo /e:jscript "%~f0" "%URL%"') do set "conv=%%I"

echo %conv%

rem // end main runtime
goto :EOF

@end // end batch / begin JScript chimera

var x = WSH.CreateObject("Microsoft.XMLHTTP"),
    DOM = WSH.CreateObject('htmlfile');

x.open("GET",WSH.Arguments(0),true);
x.setRequestHeader('User-Agent','XMLHTTP/1.0');
x.send('');
while (x.readyState!=4) {WSH.Sleep(50)};

DOM.Write(x.responseText);

var scrape = DOM.getElementById('pair_today').getElementsByTagName('h1')[0].innerText;
WSH.Echo(scrape.match(/^.*=\s+(\S+).*$/)[0]);
Community
  • 1
  • 1
rojo
  • 24,000
  • 5
  • 55
  • 101
  • This worked perfectly - could you possibly deconstruct this a little so I understand what's going on? What is the first section doing? How do I switch back to batch file syntax from Jscript in the same file? Thanks again! – user3315570 Mar 09 '15 at 00:39
  • Sure. You know arguments for batch scripts are held in `%1` through `%9`. So if you execute `yourscript.bat arg1`, then `%1` holds the value `arg1`. Well, `%0` holds `yourscript.bat`. By extension, `%~f0` holds `c:\path\to\yourscript.bat`. So here's how the hybrid flows. The first line, the `@if` statement, is evaluated as false in batch and does nothing. `@echo off` is then executed, then `setlocal`, and so on until `cscript` is reached. `cscript` re-launches self (`%~f0`) with the JScript interpreter and blocks the batch execution until Jscript is complete. – rojo Mar 09 '15 at 13:13
  • Jscript evaluates the first line as false. But since JScript is capable of multi-line commands, it treats everything between `@then` and `@end` as part of the false `@if` statement. Thus, the second command JScript encounters is `var x = WSH...` etc. Jscript runs from there till the end, outputs the results with `WSH.Echo`, then control is returned to the batch cmd thread. Batch resumes by capturing the output of `cscript` with its `for /f` loop. And on with `echo %cmd%`, batch continues until `goto :EOF` is reached. See https://gist.github.com/davidruhmann/5199433 for more examples. – rojo Mar 09 '15 at 13:16