I've been working with Nokogiri for a couple of days and I absolutely adore it. Everything was working brilliantly until I got a requirement to scrape a website that uses the data-reactid javascript attribute tag. The problem is that Nokogiri seems to be getting confused with the attribute id format this website is using (several periods, some dollar signs and some other invalid xml/css characters):
An example of what I need to scrape would be:
<td data-reactid=".3.3.1:$contract_23.$=1$dataRow:0.1">94.280</td>
I need the value (94.280) inside of the attribute with an id of ".3.3.1:$contract_23.$=1$dataRow:0.1"
which usually in nokogiri we would select by doing something like:
doc.css("type[attributename=attributeid]")
in my example it would be:
doc.css("td[data-reactid=.3.3.1:$contract_23.$=1$dataRow:0.1]")
but no matter what I do to escape the invalid characters, it keeps telling me there is an invalid character after my equals sign:
Error message for code above:
nokogiri-1.4.3.1/lib/nokogiri/css/parser.rb:78:in `on_error': unexpected '.3' after 'equal'
I've tried:
a) Getting my string defined as a variable and forced into a string
b) Escaping it with backslashes (.3.[...])
c) Prefixing it with a hash (#.3.3[...])
d) Escaping it using cgi escapedString
e) Placing it inside '%{ }' eg '%{.3.3[...]}'
No matter what I do, I keep getting the same message (except for option e which gives me an altogether different error message:
: no .<digit> floating literal anymore; put 0 before dot
Can you guys help me get the right value with such an oddly-named attribute?