2

i would like to use PHP to download a file by the file-type without previously knowing the files name or location, is this possible? the file will be opened/included by another HTML page that is streaming the file.

the script i want to make will be able to find the URL of all files of a certain file-type being streamed/opened/included(sorry i don't know the correct term) by a separate HTML page.

so the PHP script will have to open the page, determine what files are being streamed by that page, and select the file that has the correct file-type. that is the part i do not know how to write, or even if it possible.

if this does not explain the problem well enough i will add more.

hakre
  • 193,403
  • 52
  • 435
  • 836
John Johnson
  • 141
  • 3
  • 10
  • I think you need to clarify the problem a bit. Could you layout a use-case? E.g., On `a.html` User A is doing [Action1] while on `b.html` Admin is seeing the result of User A's Action. (I think this is kind of what you're getting at?) – garromark May 12 '12 at 17:16
  • basically an HTML page streams a certain type of file that i want. i need to get that file, however i do not know the files name because it is randomly generated and changes. i was thinking maybe there is a way in PHP to get the files being streamed by a page, and then sort it by the file-types to find the one that i want. – John Johnson May 12 '12 at 17:48
  • do you think "file_get_contents" would work for this? i understand this is a strange usage of PHP, but due to the interactivity with the webpage i think php is the best choice – John Johnson May 12 '12 at 17:53

2 Answers2

2

If you're looking for a solution that works for a specific html page, then you should just manually process the html and extract what you need.

If you want a general solution that works for the majority of all web pages...I think the only really good working solution is to hook into the raw network data, for example via a packet sniffer, and then have your php script tell a web browser installed on your server which webpage to load. The reason is that media streaming often involves browser plugins(java, flash, silverlight etc...) or javascript. For the most part, they need to be executed before the desired url is produced, as the url might be assembled from fragments of data, or maybe the url is located in an external file that needs to be downloaded first...the possibilities are numerous.

But...the web browser can figure it all out. Just let it start streaming for a second, and all the while you watch via the packet sniffer and look for a mime type of interest to you, and the url for it will be in the data, all parsed and ready for you.

on second thought- forget the packet sniffer. just write a browser extension. if firebug can see all the urls being requested, so can your extension. firebug is open source too so you can see how they do it. have your extension send the urls to a local file or script.

fyi- This is a task for a seasoned programmer, not a beginner.

goat
  • 31,486
  • 7
  • 73
  • 96
  • "This is a task for a seasoned programmer, not a beginner." yeah, thats why i thought using php i would be able to interact with the web page and keep the programming simple. however i think i will either take your advice and make a browser addon or try using python, since this is not what php is designed for, i need something more flexible. thanks – John Johnson Jun 02 '12 at 14:03
1

The best way to determine the file type is by examining the first few bytes of a file – referred to as [magic bytes].

<?php
      $file = "/path/to/file";

    // in PHP 4, we can do:
    $fhandle = finfo_open(FILEINFO_MIME);
    $mime_type = finfo_file($fhandle,$file); // e.g. gives "image/jpeg" 

    // in PHP 5, we can do:

    $file_info = new finfo(FILEINFO_MIME);  // object oriented approach!
    $mime_type = $file_info->buffer(file_get_contents($file));  // e.g. gives "image/jpeg"

    switch($mime_type) {
        case "image/jpeg":
            // your actions go here...
    }
?>

and here is an array of all mime types:

$mime_types = array("323" => "text/h323",
"acx" => "application/internet-property-stream",
"ai" => "application/postscript",
"aif" => "audio/x-aiff",
"aifc" => "audio/x-aiff",
"aiff" => "audio/x-aiff",
"asf" => "video/x-ms-asf",
"asr" => "video/x-ms-asf",
"asx" => "video/x-ms-asf",
"au" => "audio/basic",
"avi" => "video/x-msvideo",
"axs" => "application/olescript",
"bas" => "text/plain",
"bcpio" => "application/x-bcpio",
"bin" => "application/octet-stream",
"bmp" => "image/bmp",
"c" => "text/plain",
"cat" => "application/vnd.ms-pkiseccat",
"cdf" => "application/x-cdf",
"cer" => "application/x-x509-ca-cert",
"class" => "application/octet-stream",
"clp" => "application/x-msclip",
"cmx" => "image/x-cmx",
"cod" => "image/cis-cod",
"cpio" => "application/x-cpio",
"crd" => "application/x-mscardfile",
"crl" => "application/pkix-crl",
"crt" => "application/x-x509-ca-cert",
"csh" => "application/x-csh",
"css" => "text/css",
"dcr" => "application/x-director",
"der" => "application/x-x509-ca-cert",
"dir" => "application/x-director",
"dll" => "application/x-msdownload",
"dms" => "application/octet-stream",
"doc" => "application/msword",
"dot" => "application/msword",
"dvi" => "application/x-dvi",
"dxr" => "application/x-director",
"eps" => "application/postscript",
"etx" => "text/x-setext",
"evy" => "application/envoy",
"exe" => "application/octet-stream",
"fif" => "application/fractals",
"flr" => "x-world/x-vrml",
"gif" => "image/gif",
"gtar" => "application/x-gtar",
"gz" => "application/x-gzip",
"h" => "text/plain",
"hdf" => "application/x-hdf",
"hlp" => "application/winhlp",
"hqx" => "application/mac-binhex40",
"hta" => "application/hta",
"htc" => "text/x-component",
"htm" => "text/html",
"html" => "text/html",
"htt" => "text/webviewhtml",
"ico" => "image/x-icon",
"ief" => "image/ief",
"iii" => "application/x-iphone",
"ins" => "application/x-internet-signup",
"isp" => "application/x-internet-signup",
"jfif" => "image/pipeg",
"jpe" => "image/jpeg",
"jpeg" => "image/jpeg",
"jpg" => "image/jpeg",
"js" => "application/x-javascript",
"latex" => "application/x-latex",
"lha" => "application/octet-stream",
"lsf" => "video/x-la-asf",
"lsx" => "video/x-la-asf",
"lzh" => "application/octet-stream",
"m13" => "application/x-msmediaview",
"m14" => "application/x-msmediaview",
"m3u" => "audio/x-mpegurl",
"man" => "application/x-troff-man",
"mdb" => "application/x-msaccess",
"me" => "application/x-troff-me",
"mht" => "message/rfc822",
"mhtml" => "message/rfc822",
"mid" => "audio/mid",
"mny" => "application/x-msmoney",
"mov" => "video/quicktime",
"movie" => "video/x-sgi-movie",
"mp2" => "video/mpeg",
"mp3" => "audio/mpeg",
"mpa" => "video/mpeg",
"mpe" => "video/mpeg",
"mpeg" => "video/mpeg",
"mpg" => "video/mpeg",
"mpp" => "application/vnd.ms-project",
"mpv2" => "video/mpeg",
"ms" => "application/x-troff-ms",
"mvb" => "application/x-msmediaview",
"nws" => "message/rfc822",
"oda" => "application/oda",
"p10" => "application/pkcs10",
"p12" => "application/x-pkcs12",
"p7b" => "application/x-pkcs7-certificates",
"p7c" => "application/x-pkcs7-mime",
"p7m" => "application/x-pkcs7-mime",
"p7r" => "application/x-pkcs7-certreqresp",
"p7s" => "application/x-pkcs7-signature",
"pbm" => "image/x-portable-bitmap",
"pdf" => "application/pdf",
"pfx" => "application/x-pkcs12",
"pgm" => "image/x-portable-graymap",
"pko" => "application/ynd.ms-pkipko",
"pma" => "application/x-perfmon",
"pmc" => "application/x-perfmon",
"pml" => "application/x-perfmon",
"pmr" => "application/x-perfmon",
"pmw" => "application/x-perfmon",
"pnm" => "image/x-portable-anymap",
"pot" => "application/vnd.ms-powerpoint",
"ppm" => "image/x-portable-pixmap",
"pps" => "application/vnd.ms-powerpoint",
"ppt" => "application/vnd.ms-powerpoint",
"prf" => "application/pics-rules",
"ps" => "application/postscript",
"pub" => "application/x-mspublisher",
"qt" => "video/quicktime",
"ra" => "audio/x-pn-realaudio",
"ram" => "audio/x-pn-realaudio",
"ras" => "image/x-cmu-raster",
"rgb" => "image/x-rgb",
"rmi" => "audio/mid",
"roff" => "application/x-troff",
"rtf" => "application/rtf",
"rtx" => "text/richtext",
"scd" => "application/x-msschedule",
"sct" => "text/scriptlet",
"setpay" => "application/set-payment-initiation",
"setreg" => "application/set-registration-initiation",
"sh" => "application/x-sh",
"shar" => "application/x-shar",
"sit" => "application/x-stuffit",
"snd" => "audio/basic",
"spc" => "application/x-pkcs7-certificates",
"spl" => "application/futuresplash",
"src" => "application/x-wais-source",
"sst" => "application/vnd.ms-pkicertstore",
"stl" => "application/vnd.ms-pkistl",
"stm" => "text/html",
"svg" => "image/svg+xml",
"sv4cpio" => "application/x-sv4cpio",
"sv4crc" => "application/x-sv4crc",
"t" => "application/x-troff",
"tar" => "application/x-tar",
"tcl" => "application/x-tcl",
"tex" => "application/x-tex",
"texi" => "application/x-texinfo",
"texinfo" => "application/x-texinfo",
"tgz" => "application/x-compressed",
"tif" => "image/tiff",
"tiff" => "image/tiff",
"tr" => "application/x-troff",
"trm" => "application/x-msterminal",
"tsv" => "text/tab-separated-values",
"txt" => "text/plain",
"uls" => "text/iuls",
"ustar" => "application/x-ustar",
"vcf" => "text/x-vcard",
"vrml" => "x-world/x-vrml",
"wav" => "audio/x-wav",
"wcm" => "application/vnd.ms-works",
"wdb" => "application/vnd.ms-works",
"wks" => "application/vnd.ms-works",
"wmf" => "application/x-msmetafile",
"wps" => "application/vnd.ms-works",
"wri" => "application/x-mswrite",
"wrl" => "x-world/x-vrml",
"wrz" => "x-world/x-vrml",
"xaf" => "x-world/x-vrml",
"xbm" => "image/x-xbitmap",
"xla" => "application/vnd.ms-excel",
"xlc" => "application/vnd.ms-excel",
"xlm" => "application/vnd.ms-excel",
"xls" => "application/vnd.ms-excel",
"xlt" => "application/vnd.ms-excel",
"xlw" => "application/vnd.ms-excel",
"xof" => "x-world/x-vrml",
"xpm" => "image/x-xpixmap",
"xwd" => "image/x-xwindowdump",
"z" => "application/x-compress",
"zip" => "application/zip");

check here: Best way to recognize a filetype in php

Community
  • 1
  • 1
Mohammed Ahmed
  • 431
  • 5
  • 11
  • thank you, this helps solve part of the problem, however another question is how to find the files being streamed by a page? this is useful, thank you very much – John Johnson May 12 '12 at 17:43