0

I want to read out and later process a value from a website (Facebook Ads) from a bash script that runs daily. Unfortunately I need to be logged in to get this value:

enter image description here

So far I've figured out how to log into this website on Firefox and save the html file where the value could theoretically be read out:

enter image description here

The only unique identifier in this file is the first instance of "Gesamtausgaben". Is there any way with this information to cut out everything besides "100,10" ?

I'd also be happy for a different kind of way to get this value. And no, I don't have any API access.

I appreciate all ideas.

Thanks, Patrick

Patrick
  • 65
  • 5

1 Answers1

0

How to Parse HTML (Badly) with PCRE

You can't reliably parse HTML with just regular expressions, so you'll need an XML/HTML or XPATH parser to do this properly. That said, if you have a PCRE-compatible grep then the following will likely work provided the HTML is minified and the class isn't re-used on your page.

$ pcregrep -o 'span class=".*_3df[ij].*>\K[^<]+' foo.html
100,10 €

If your target HTML spreads across multiple lines, or if you have multiple spans with the same classes assigned, then you'll have to do some work to refine the regular expression and differentiate between which matches are important to you. Context lines or subsequent matches may be helpful, but your mileage will definitely vary.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199