1

my code so far is below. am i on the right track? i was able to read in a 1mb file but 50mb gets crazy and slow. perhaps i shouldnt be trying to output the entire data? but am i atleast reading it in correctly? below are guidelines and my code that works on a test 1mb file but not the pubs.txt 50mb file.

Write a PHP program that reads the posted pubs.txt file, parses it, and inserts the 
data into MySql tables. 

pubs.txt contains many publications. Each <pub> … </pub> pair specifies a 
publication, with ID, title, year, journal (<booktitle>), pages, and authors 
information. Some information may be missing. It is your own choice to use a 
default value or NULL for missing fields. Some information looks incorrect but 
you do not have to worry about it. The data in pubs.txt was automatically 
extracted from web resources by computer. 

You have the freedom to design the MySql database, as long as you can answer 
the queries correctly and hopefully efficiently. 
It is your own choice to execute this program from command line or web browser. 
3. Use PHP to write a web interface, which should provide intuitive forms allowing 
users to:

• Insert a publication into the database
• Query all publications by a particular author
• Query all publications in a particular year 

below is my php code . thanks for any guidance.

<?php



$mysqli = new mysqli('localhost', 'root', '', 'db1');

if (mysqli_connect_errno()){
printf("connect failed\n", mysqli_connect_error());
exit();    
}


error_reporting(E_ALL);
$header = '<?xml version="1.0" encoding="UTF-8"?>'."\n<datalist>";
$content = $header."\n".file_get_contents("pubs.txt")."\n</datalist>";
$ob = simplexml_load_string($content);
$json = json_encode($ob);    
$array = json_decode($json, true);
$alldata = $array["pub"];




foreach ($alldata as $key => $value) { //access all data in loop
$id = $value["ID"];
$title = $value["title"];
$year = $value["year"];
$booktitle = $value["booktitle"];
$pages = $value["pages"];
$authors = implode(",", $value["authors"]["author"]);


$stmt = $mysqli->prepare("INSERT INTO pubs VALUES (?, ?, ?, ?, ?, ?)");
$stmt->bind_param('ssssss',
               $value["ID"],
               $value["title"],
               $value["year"],
               $value["booktitle"],
               $value["pages"], implode(",", $value["authors"]["author"]));
$stmt->execute();
printf("%d row insrt\n", $stmt->affected_rows);

echo "<table>
<tr>
<th>ID</th>
<th>title</th>
<th>year</th>
<th>booktitle</th>
<th>pages</th>
<th>authors</th>
</tr>";
echo "<tr>";
echo "<td>" . $value['ID'] . "</td>";
echo "<td>" . $value['title'] .  "</td>";
echo "<td>" . $value['year'] .  "</td>";
echo "<td>" . $value['booktitle'] .  "</td>";
echo "<td>" . $value['pages'] . "</td>";
echo "<td>" . $value['authors'] . "</td>";
echo "</tr>";



}

echo "</table>";
?>
David Salazar
  • 133
  • 1
  • 1
  • 8
  • 1
    Loading 50MB into an XML dom is going to be slow and memory consuming. BUT there is nothing in the assignment that says you need to load fast or efficiently. So don't sweat it and concentrate on getting the database and queries correct. If you are really worried about it use the "XML parser" event based parser, but be warned it works completely differently from the simple XML parser. – James Anderson Dec 10 '13 at 03:59
  • how long do you think it should take to load this on a standard dell laptop ? – David Salazar Dec 10 '13 at 04:15
  • 50M is not too big, it can fit in memory easily – Imre L Dec 10 '13 at 06:08

1 Answers1

2

For documents this large you should use a progressive XML parser that doesn't depend on loading and parsing it all at once.

Niels Keurentjes
  • 41,402
  • 9
  • 98
  • 136
  • 1
    It is worth noting that in the above link, the user contributed example at bottom of the page, xml2array(...), will not properly handle multidimensional objects. – cerd Dec 10 '13 at 03:59
  • 1
    It is worth noting that most user contributions on php.net stink, but I was primarily pointing at the 3 samples in the left sidebar. – Niels Keurentjes Dec 10 '13 at 04:00
  • 1
    I agree completely, Niels. Just wanted OP to be aware: xml2array sounds a bit too good to be true. The same one also appears as a solution on: http://stackoverflow.com/questions/6167279/converting-a-simplexml-object-to-an-array – cerd Dec 10 '13 at 04:04
  • Like he said, it doesn't work, and as such it's too good to be true. Can't stop you from using buggy code though if you really want to. – Niels Keurentjes Dec 10 '13 at 04:10