Got the right node with Nokogiri, but need to search further

Question

I am using this.

doc = Nokogiri::HTML(open(url))
pic = doc.search "[text()*='hiRes']"

to get this script node:

<script type="text/javascript">
var data = {
'colorImages': { 'initial': 
[{"hiRes":"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UL1500_.jpg","thumb":"http://ecx.images-joes.com/images
/I/41xE2XADIvL._US40_.jpg","large":"http://ecx.images-joes.com/images
/I/41xE2XADIvL.jpg","main":{"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX395_.jpg":[395,260],"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX500_.jpg":[500,329],"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX535_.jpg":[535,352],"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX575_.jpg":[575,379]}

and the node keeps going from there..

But the only thing I need to pull out is the entire URL that contains the string. "UL1500" or the URL that follows "hiRes:".. ex. http://ecx.images-joes.com/images/I/71MBTEP1W9L.UL1500.jpg

I looked up the class that Nokogiri returns, and its a Nokogiri::XML::NodeSet

But I'm not sure how to interact with it in order to get what I need?

Thanks

score 0 · Answer 1 · edited May 18 '15 at 18:10

0

Yeah. It's a NodeSet, because of the generic case.

See: http://www.rubydoc.info/github/sparklemotion/nokogiri/master/Nokogiri/XML/NodeSet#children-instance_method

In this case you could try:

pic.children.first.content

edited May 18 '15 at 18:10

the Tin Man

158,662
42
215
303

answered May 15 '15 at 16:26

Mircea

10,216
2
30
46

score 0 · Accepted Answer · edited May 23 '17 at 11:58

0

I went from just using Nokogiri to a regex expression.. but ended up finding this and it worked like magic!!

https://stackoverflow.com/a/5939906/4386626

edited May 23 '17 at 11:58

Community

1
1

answered May 15 '15 at 20:34

ToddT

3,084
4
39
83

Got the right node with Nokogiri, but need to search further

2 Answers2