0

I have such html part:

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTIwMjYxMTgwOTAPFgIeBGd1aWQFIDI1NmYyOTdkZWZhNjQyODhhYTVmOWI4MGE5MzRjNjlhFgJmD2QWAgIDDxYCHgxhdXRvY29tcGxldGUFA29mZhYCAgEPZBYGZg9kFgJmD2QWAgIFDw8WAh4EVGV4dAUO0JLRi9C50LTQuNGC0LVkZAICD2QWAgICD2QWAgIBDxAPFgYeDURhdGFUZXh0RmllbGQFBU5hendhHg5EYXRhVmFsdWVGaWVsZAUQSURXZXJzamVKZXp5a293ZR4LXyFEYXRhQm91bmRnZBAVAwZQb2xza2EHRW5nbGlzaA7QoNGD0YHRgdC60LDRjxUDATEBMgIxNxQrAwNnZ2cWAQICZAIED2QWAmYPZBYCAgEPZBYCZg9kFgICAQ9kFgICAw9kFgQCAg8PFgQeJk5vQm90X1Jlc3BvbnNlVGltZUtleV9jdGwwMCRjcCROb0JvdElEBnWRyyBNg9BIHiROb0JvdF9TZXNzaW9uS2V5S2V5X2N0bDAwJGNwJE5vQm90SUQFNE5vQm90X1Nlc3Npb25LZXlfY3RsMDAkY3AkTm9Cb3RJRF82MzUxNTE5MTQ3MjUxNzIzNDFkFgICAQ8WAh4PQ2hhbGxlbmdlU2NyaXB0BVl2YXIgZSA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdjdGwwMF9jcF9Ob0JvdElEX3BjbmInKTsgZS5vZmZzZXRXaWR0aCArIGUub2Zmc2V0SGVpZ2h0O2QCAw9kFgICAQ9kFgJmD2QWBAIDD2QWBGYPZBYCAgEPZBYIAgEPDxYGHgpQbGFpblZhbHVlBQQ0RDVoHg5FbmNyeXB0ZWRWYWx1ZQUgMjQxZjdmZGQ3ODIxNGJhYzgyOGNhNzU3ZDY4NWI3Y2IeB1Zpc2libGVoZGQCAw8PFgIfC2hkZAIFDw8WAh8LaGRkAgcPFgIfC2dkAgMPZBYCZg9kFgICAQ8PFgIfAgUK0JTQsNC70LXQtWRkAgUPZBYKZg9kFgRmD2QWAgIBDw8WAh8CBRPQktC40LQg0YPRgdC70YPQs9C4ZGQCAQ9kFgICAQ8QZGQWAGQCAQ9kFgRmD2QWAgIBDw8WAh8CBR7QnNC10YHRgtC+0L3QsNGF0L7QttC00LXQvdC40LVkZAIBD2QWAgIBDxBkZBYAZAICD2QWBGYPZBYCAgEPDxYCHwIFCNCh0YDQvtC6ZGQCAQ9kFgICAQ8QDxYCHgxBdXRvUG9zdEJhY2toFgIeCG9uQ2hhbmdlBR5jYkR6aWVuR29kemluYV9vbkNoYW5nZSh0aGlzKTtkFgBkAgMPZBYEZg9kFgICAQ8PFgIfAgUG0YfQsNGBZGQCAQ9kFgICAQ8QZGQWAGQCBQ9kFgJmD2QWBAIBDw8WAh8CBSTQl9Cw0YDQtdCz0LjRgdGC0YDQuNGA0L7QstCw0YLRjNGB0Y9kZAIDDw8WAh8CBTbQntGC0YHRg9GC0YHRgtCy0LjQtSDRgdCy0L7QsdC+0LTQvdGL0LUg0LTQsNGC0Ysg0LTQviBkZBgBBR1jdGwwMCRjcCRldmVudE9yZGVyVmFsaWRhdGlvbg8PZDKpAwABAAAA/////wEAAAAAAAAADAIAAABGTVNaX1dXV19LTElFTlQsIFZlcnNpb249Mi4xMi4wLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49bnVsbAUBAAAATU1TWl9XV1dfS0xJRU5ULktvbnRyb2xraS5FdmVudE9yZGVyVmFsaWRhdGlvbitFdmVudE9yZGVyVmFsaWRhdGlvbkNvbnRyb2xEYXRhBAAAAAtFdmVudE51bWJlchNFdmVudEV4cGlyYXRpb25UaW1lDVNlY3VyaXR5VG9rZW4MTGljem5pa1Rva2VuAwMBAAxTeXN0ZW0uSW50MzJxU3lzdGVtLk51bGxhYmxlYDFbW1N5c3RlbS5EYXRlVGltZSwgbXNjb3JsaWIsIFZlcnNpb249NC4wLjAuMCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj1iNzdhNWM1NjE5MzRlMDg5XV0IAgAAAAgIAAAAAAoGAwAAACBmNjZlZTgwODEwZWM0ZjcwYThhZjY2ZDcyNDlmNWFjZgEAAAALZGrI7rw4FqPtCexAP1+dCQ7Qps1t">

and i try to use such regex:

VIEWSTATE = (/<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" \/>/.match body_text)[1]

but it seems, that it is working strange, and fetching not all value, but just part of it. In that case, which regex i must use? (note, that part VIEWSTATE" id="_VIEWSTATE" value= is required in regex).

Will be good, if you tell me how)

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Valdis Azamaris
  • 1,433
  • 5
  • 22
  • 47
  • 3
    You should use Nokogiri... Don't use Regex. – Arup Rakshit Sep 19 '13 at 12:53
  • One problem with your `regex` is that it checks for the HTML to end in ` />` but your example simply ends in `>`. – lurker Sep 19 '13 at 12:56
  • 1
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags), home of [the definitive resource on this topic](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Hank Gay Sep 19 '13 at 12:59
  • Use a proper parser like Nokogiri. Any problems you have getting it working will be nothing compared to the problems you have trying to parse arbitrary HTML with regexes. – Andy Lester Sep 19 '13 at 14:44

1 Answers1

2
require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-eot
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTIwMjYxMTgwOTAPFgIeB">
eot

# using css
nd = doc.at_css('input#__VIEWSTATE')
nd['name'] 
# => "__VIEWSTATE"

# using xpath
nd = doc.at('//div[@id ="__VIEWSTATE"]')
nd['name'] 
# => "__VIEWSTATE"

This way you can get every attribute value of the input html element.

Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317
  • And to be specific, the OP wants to retrieve the `nd['value']` if `nd['name']` and `nd['id']` are both `__VIEWSTATE` (I think, although the intention isn't absolutely clear). – lurker Sep 19 '13 at 12:58
  • now what? this value is dynamic, and is never same, will nokogiri see if value is ="MYMYMYMY" ? – Valdis Azamaris Sep 19 '13 at 13:00
  • @ValdisAzamaris The one you are trying is possible using nokogiri. You can insert any logic you want to built..Just parse your html using Nokogiri. – Arup Rakshit Sep 19 '13 at 13:02
  • and what? doc = Nokogiri::HTML::Document.parse <<-eot eot i see empty nd = doc.at_css('input') puts "wakawaka" puts nd['value'] puts it doesn't work! – Valdis Azamaris Sep 19 '13 at 13:06
  • hello! what can you say? i'll give you code: https://dl.dropboxusercontent.com/u/59666091/cap/text1.html work's this? me no! – Valdis Azamaris Sep 19 '13 at 13:16
  • I can't test right now..As It is not installed in my current machine. But it will work,as I wrote it. Once I will switch to my pc where nokogiri is installed,I will test it.. But it will work I am sure... :) – Arup Rakshit Sep 19 '13 at 13:19
  • maybe better to fetch by id? – Valdis Azamaris Sep 19 '13 at 13:28
  • not exactly, maybe something like: doc.at('//div[(@id,"__VIEWSTATE")]').value? – Valdis Azamaris Sep 19 '13 at 13:32
  • @ValdisAzamaris You wrote the xpath. It should be `doc.at('//div[@id ="__VIEWSTATE"]')['value']` ..That's it. – Arup Rakshit Sep 19 '13 at 13:34
  • all from getting document, than to some variable put this value doc.at('//div[@id ="__VIEWSTATE"]')['value'] – Valdis Azamaris Sep 19 '13 at 13:48