-2

Possible Duplicate:
Grabbing the href attribute of an A element

I would like to know how can we parse the event attributes using DOM php? For example

<body onload="javascript:PopWin('http://google.com')">

I need to get the link inside the onload event attribute. is it possible?

Not using preg_match and parse the entire html. using DOMDocument, we can get all other attributes like "src", "href" etc using getAttribute('src') or getAttribute('href'). Is there any similar way for getting the event attribute? Any link that comes in the event "onload" should be catched

Thanks.

Community
  • 1
  • 1
curious
  • 1
  • 2
  • using DOM php ??? im not sure i understand u .. but are you basically trying to parse the content of a page, and retreive whatever is between `PopWin('` and the first occurance of `')"> ` ?? – Sam Janssens Sep 11 '12 at 07:26

1 Answers1

0

There is no method in the DOM php API that will give you the URL from the onload property so you have to use a method like I suggest below (or similar). But first get the attribute:

$body = "<body onload=\"javascript:PopWin('http://google.com')\"></body>";

$doc = new DOMDocument(); 
$doc->loadHTML($body);

$bodyElements = $doc->getElementsByTagName("body"); 
$body = $bodyElements->item(0);

$attribute = $body->getAttribute('onload');

echo $attribute; // outputs: javascript:PopWin('https://google.com')

Once you got that you can use a simple regular expression to extract the URL:

(?:.+?)(https?://[\w\d.&?=]+)(?:.+?)

Like this:

$mathes = array();
preg_match('`(?:.+?)(?<url>https?://[\w\d.&?=]+)(?:.+?)`', $attribute, $matches);

echo $matches['url']; // outputs https://google.com
Michael
  • 2,631
  • 2
  • 24
  • 40
  • No..not using preg_match and parse the entire html. using DOMDocument, we can get all other attributes like "src", "href" etc using getAttribute('src') or getAttribute('href'). Is there any similar way for getting the event attribute? Any link that comes in the event onload should be catched. – curious Sep 11 '12 at 07:35
  • @curious I've updated the answer now that I know you want to use the DOMDocument alone and dont want to parse the _raw_ HTML. Note that the DOM api cannot give you the URL that you want - you have to parse the string yourself like I suggest or in a similar way. – Michael Sep 11 '12 at 07:51
  • In the link added can see several example of how to extract the attribute `onload`. – Michael Sep 11 '12 at 07:52
  • Sorry if I am being too stupid, but my question itself is how do I get the attribute "onload"? i have tried using both getElementsByTagName('body'), but none of its nodes gives onload=... So how do i get the first line u mentioned? $attribute = "javascript:PopWin('https://google.com')"; – curious Sep 11 '12 at 07:59
  • No problem -- updated example – Michael Sep 11 '12 at 08:11
  • it seems to work in codepad (http://codepad.org/x1mxXgdr) which means this is the solution, but for me I am getting the error "Call to undefined method DOMElement::item()" . Looks like I need to clarify it myself. Thanks a lot for the fast and furious solution... – curious Sep 11 '12 at 08:39
  • How come it doesn't return the onload attribute in a real page? http://codepad.org/8WIgdsuu – curious Sep 11 '12 at 09:15
  • @curious You've made a slight mistake. The loadHTML takes a string of HTML not a URL. You have to download the document you point to and, then you can process it OR use you can use [DOMDocument::load](http://www.php.net/manual/en/domdocument.load.php) check example by _Jonas Due Vesterheden_. Note: this does not work at codepad because they've closed for that feature. – Michael Sep 11 '12 at 09:47