Currently, I'm working on a project where one PHP script grabs an index file from ftp://ftp.sec.gov and places all the company information into the database. The second PHP script then grabs the raw text file from the SEC and saves it locally for processing.
An example of the raw text file can be found here -
ftp://ftp.sec.gov/edgar/data/2488/0000002488-15-000028.txt
An example of what the final result should be can be found here - http://www.sec.gov/Archives/edgar/data/1084869/000143774915020024/flws20150927_10q.htm
The goal is to be able to present the filing in a formatted way just like many companies do, but the problem is I can't seem to figure out how it's done reliably for every filing. Some filings seem to have XML, others seem to have HTML
How would I be able to reliably produce the formatted version of the raw text files?
Current code I have -
$db_hostname = "localhost";
$db_username = "username";
$db_password = "password";
$db_database = "database";
$db_server = mysql_connect($db_hostname, $db_username, $db_password);
if (!$db_server) die("Unable to connect to MySQL: " . mysql_error());
mysql_select_db($db_database)
or die("Unable to select database: " . mysql_error());
$query = "SELECT * FROM company WHERE company = '1 800 FLOWERS COM INC' AND date = '2015-08-06'";
$result = mysql_query($query);
$row = mysql_fetch_row($result);
$file = "ftp://ftp.sec.gov/" . $row[4];
$text = file_get_contents($file);
if($text === false){
echo "error downloading file $row[4]\n";
continue;
}
$tarray = explode('<SEQUENCE>', $text);
for($i = 1; $i < count($tarray); $i++){
$a = strstr($tarray[$i], '<HTML>');
if($a == false)continue; //means that there is no html document in this sequence
$html = strstr($a, '</HTML>', true);
$html.="</HTML>";
$running = $running . $html;
}
$temp = "cache.htm";
file_put_contents($temp, $running);
$name = $row[0] . "-" . $row[3] . ".pdf";
$name = str_replace(' ', '_', $name);
//$content = file_get_contents($row[2] . "-" . $row[1] . ".htm");
exec("D://wkhtmltopdf/bin/wkhtmltopdf.exe $temp $name");
unlink($temp);
//echo($row[0] . " created");
?>