0

I have a list of divs like that:

<div align=center><object><embed src='http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 2007.swf' quality='autohigh' wmode='direct' width='640' height='400' name='gameObj' align='middle' allowScriptAccess='always' allowFullScreen='false' type='application/x-shockwave-flash' pluginspage='http://www.adobe.com/go/getflashplayer'/></object><br><font face=verdana size=1><a href='http://www.gamesforwork.com/' target='_blank'>10 daily games at gamesforwork.com</a></font></div>

I am talking about more than 800 divs like that. I want to extract the links of the swf files, for example in the code in the top I want to extract this link:

http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 2007.swf

So.. I try to use strstr and strpos, but with not success

if (strpos($result, "<embed src='") !== false) {
        strstr($result, "<embed src='");
    }

Its not remove the embed or give me what the string have in the continue. Sorry for my bad English.

Amanda
  • 21
  • 4

2 Answers2

1

I'll recommend to use DOM for parsing XML-structured data, like this:

$html ="<div align=center><object><embed src='http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 2007.swf' quality='autohigh' wmode='direct' width='640' height='400' name='gameObj' align='middle' allowScriptAccess='always' allowFullScreen='false' type='application/x-shockwave-flash' pluginspage='http://www.adobe.com/go/getflashplayer'/></object><br><font face=verdana size=1><a href='http://www.gamesforwork.com/' target='_blank'>10 daily games at gamesforwork.com</a></font></div>
";

$dom = new DOMDocument;
@$dom->loadHTML($html);

// get all embed elements 
$links = $dom->getElementsByTagName('embed');

//Iterate over the extracted elements and display their URLs
foreach ($links as $link){
    //Extract and show the "src" attribute.
    echo $link->getAttribute('src'), '\n';
}

You can try it here: https://3v4l.org/V336E

vuliad
  • 2,142
  • 3
  • 15
  • 16
  • Thank you so much my friend! Another question, It can possible to download to my own server all the links that I was extract (I mean the swf files) and zip them to zip archive? – Amanda Aug 02 '18 at 05:51
  • Yes, you can easily download file with http://php.net/manual/en/function.file-get-contents.php or http://php.net/manual/en/book.curl.php – vuliad Aug 02 '18 at 06:13
  • For zipping - easiest is to do this using shell command - smth like `zip swfs.zip 1.swf 2.swf 3.swf`, but if you prefer just php-way - you can look at this great library http://php.net/manual/ru/ziparchive.addfile.php – vuliad Aug 02 '18 at 06:17
  • I try to use this: https://stackoverflow.com/questions/17708562/zip-all-files-in-directory-and-download-generated-zip Its not download the files... – Amanda Aug 02 '18 at 06:19
  • First you have to download your file localy - with smth like `file_put_contents("1.swf", fopen("http://someurl/file.swf", 'r'));` and only then archive it using local path "1.swf" – vuliad Aug 02 '18 at 06:33
  • your code its so helpful, but I have a problem. I success to download 1 file with that, but I have a loop on list of urls and only 1 file was downloaded, what can be the problem? – Amanda Aug 02 '18 at 06:54
  • I success to download multiple files but, the files weigh 1KB, that is mean - the swf file that the server downloaded is empty – Amanda Aug 02 '18 at 07:25
  • Show the contents of this files, may be there is no access? – vuliad Aug 02 '18 at 07:38
  • When I enter to the direct link of the file - its download me the full file, when I do it with my foreach loop It donwload 1KB files @vuliad – Amanda Aug 02 '18 at 07:43
  • May be it's protected somehow - like cookies - we have to read full server response - what's in this 1KB? – vuliad Aug 02 '18 at 07:57
  • I have fixed the 1KB problem, it was a error on read the file url but it fixed. now I have another problem, there is links to files that they are not in embed / iframe / whatever, only link to file, so I do a check if statement to see if the link include embed in the code, and nothing append, only if the link have embed it downloaded him.. – Amanda Aug 02 '18 at 08:09
0

You can use regex to filter swf links.

<?php 

$html ="<div align=center><object><embed src='http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 2007.swf' quality='autohigh' wmode='direct' width='640' height='400' name='gameObj' align='middle' allowScriptAccess='always' allowFullScreen='false' type='application/x-shockwave-flash' pluginspage='http://www.adobe.com/go/getflashplayer'/></object><br><font face=verdana size=1><a href='http://www.gamesforwork.com/' target='_blank'>10 daily games at gamesforwork.com</a></font></div><div align=center><object><embed src='http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 2008.swf' quality='autohigh' wmode='direct' width='640' height='400' name='gameObj' align='middle' allowScriptAccess='always' allowFullScreen='false' type='application/x-shockwave-flash' pluginspage='http://www.adobe.com/go/getflashplayer'/></object><br><font face=verdana size=1><a href='http://www.gamesforwork.com/' target='_blank'>10 daily games at gamesforwork.com</a></font></div><div align=center><object><embed src='http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 9999.swf' quality='autohigh' wmode='direct' width='640' height='400' name='gameObj' align='middle' allowScriptAccess='always' allowFullScreen='false' type='application/x-shockwave-flash'></div>
";

$matches = [];

preg_match_all('/embed\s.*?src=[\'\"](.+?\.swf)/',$html,$matches);

print_r($matches[1]);// here 1 is the second group of the regex expression as first group is the entire regex

OUTPUT

Array
(
    [0] => http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 2007.swf
    [1] => http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 2008.swf
    [2] => http://www.GamesForWork.com/games/swf/Rodent Tree Jump january 4th 9999.swf
)
nice_dev
  • 17,053
  • 2
  • 21
  • 35
  • Parsing XML with regex? You are doing it wrong. You probably also brush your teeth with a broomstick? – Mike Doe Aug 02 '18 at 06:51
  • But where did he mention XML? – nice_dev Aug 02 '18 at 06:52
  • `You probably also brush your teeth with a broomstick?` wasn't necessary. – nice_dev Aug 02 '18 at 06:54
  • HTML is XML mate. I wanted to emphasize that you should use proper tools for a job. @vuliad provided most solid answer. Anyway your regex is invalid too since it won't work if there's anything between the `embed` and `src`. – Mike Doe Aug 02 '18 at 06:54
  • I ain't aware of that because XML usually has user defined tags and HTML doesn't. Anyway, of course there is a better way to deal with it then, but regex isn't completely wrong. It is just not the best way. – nice_dev Aug 02 '18 at 06:57
  • `it won't work if there's anything between the embed and src`, fixed it. – nice_dev Aug 02 '18 at 07:00
  • 1
    It won't work with ``. This is why you don't do things like that with regex. – Mike Doe Aug 02 '18 at 07:42
  • I can edit the regex and still make it work. But yes, I see your point. Thanks :) – nice_dev Aug 02 '18 at 07:51