For a little Art Project I need to read stock prices from yahoo finance. The html-Source is quite complicated and long but using an online regexp-tester I figured out a regular expression that should result in the correct output.
<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart\(10px\) Fz\(24px\)(\"| C\(\$dataGreen\)\"| C\(\$dataRed\)\") data-reactid\=\"35\">([-+]{0,1}\d*\.\d*) \((([-+]{0,1})\d*\.\d*)\%\)<\/span>
Here is a bit of code with the result nested in it:
<svg class="D(n) Cur(p)" width="24" style="fill:#000;stroke:#000;stroke-width:0;vertical-align:bottom;" height="24" viewBox="0 0 24 24" data-icon="search" data-reactid="29"><path d="M9 3C5.686 3 3 5.686 3 9c0 3.313 2.686 6 6 6s6-2.687 6-6c0-3.314-2.686-6-6-6m13.713 19.713c-.387.388-1.016.388-1.404 0l-7.404-7.404C12.55 16.364 10.85 17 9 17c-4.418 0-8-3.582-8-8 0-4.42 3.582-8 8-8s8 3.58 8 8c0 1.85-.634 3.55-1.69 4.905l7.403 7.404c.39.386.39 1.015 0 1.403" data-reactid="30"></path></svg></div></div></div><div class="My(6px) Pos(r) smartphone_Mt(6px)" data-reactid="31"><div class="D(ib) Va(m) Maw(65%) Ov(h)" data-reactid="32"><div class="D(ib) Mend(20px)" data-reactid="33"><span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="34">11,541.87</span><span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($dataRed)" data-reactid="35">-402.83 (-3.37%)</span><div id="quote-market-notice" class="C($tertiaryColor) D(b) Fz(12px) Fw(n) Mstart(0)--mobpsm Mt(6px)--mobpsm" data-reactid="36"><span data-reactid="37">At close: 5:44PM CET</span></div></div><!-- react-empty: 38 --></div></div></div></div></div><script>if (window.performance) {window.performance.mark && window.performance.mark('Lead-3-QuoteHeader');window.performance.measure && window.performance.measure('Lead-3-QuoteHeaderDone','PageStart','Lead-3-QuoteHeader');}</script></div><div data-reactid="29">
My problem is: This regular expression does behave different in the online-tester than in egrep unter openwrt!
In the online-tester, it results in exactly this snippet:
<span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($dataGreen)" data-reactid="35">+50.32 (+0.17%)</span>
(With some additional groups marked because of the additional brackets in the regex)
If i use
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart\(10px\) Fz\(24px\)(\"| C\(\$dataGreen\)\"| C\(\$dataRed\)\") data-reactid\=\"35\">([-+]{0,1}\d*\.\d*) \((([-+]{0,1})\d*\.\d*)\%\)<\/span>' stock.html
I get absolutely no result. OK, there must be an error in the regular expression. Let's start small:
egrep '<span class' stock.html
gives me many results.
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\)' stock.html
still results in some lines of code. But
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart' stock.html
gives me nothing! Niente! Nada! Even
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) ' stock.html
(Mind the blank space at the end of the regexp!) gives me no result. And I have no idea what the difference between
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\)' stock.html
and
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart' stock.html
is in terms of regular expressions! If the blank space would be the problem, I should already get no results with the first blank space before "Fw". So why does my regexp fail with that second blank?