There are a couple of things you can do here to speed up your processing. First, you are currently running an XPATH query against the entire document for each ID you are looking for. The larger your document is, and the more IDs you are searching for, the longer the process is going to take. It would be more efficient to loop through the document once, and test the person-name attribute of each unit element to see if it is in your list of IDs to extract data for. That change alone will give you a decent speedup.
However at that point, XPATH is not really doing much for you, so you might as well use XMLReader
to parse the document efficiently without having to load the whole thing into memory. The code is more complex, so it's more error-prone and difficult to understand, but if you need to efficiently process large XML documents, you need to use a streaming parser.
The speed difference between looping mechanisms in PHP is insignificant compared to the difference you could see between your current XPATH approach and using a streaming parser.
<?php
// Instantiate XML parser and open our file
$xmlReader = new XMLReader();
$xmlReader->open('test.xml');
// Array of person-name values we want to extract data for
$arrayIds = ['A001', 'A695'];
/*
* Buffer for sec-source/mrk values
* We want a sub array for each ID so we can sort the output by ID
*/
$buffer = [];
foreach($arrayIds as $currId)
{
$buffer[$currId] = [];
}
/*
* Flag to indicate whether or not the parser is in a unit that has
* a person-name that we are looking for
*/
$validUnit = false;
/*
* Flag indicating whether or not the parser is in a seg-source element.
* Since both seg-source and target elements contain mrk elements, we need to
* know when we are in a seg-source
*/
$inSegSource = false;
/*
* We need to keep track of which person we are currently working with
* so that we can populate the buffer
*/
$curPersonName = null;
// Parse the document
while ($xmlReader->read())
{
// If we are at an opening element...
if ($xmlReader->nodeType == XMLREADER::ELEMENT)
{
switch($xmlReader->localName)
{
case 'unit':
// Pull the person-name
$curPersonName = $xmlReader->getAttribute('person-name');
/*
* If the value is in our array if ID, set the validUnit flag true,
* if not set the flag to false
*/
$validUnit = (in_array($curPersonName, $arrayIds));
break;
case 'seg-source':
// If we are opening a seg-source element, set the flag to true
$inSegSource = true;
break;
case 'mrk':
/*
* If we are in a valid unit AND inside a seg-source element,
* extract the element value and add it to the buffer
*/
if($validUnit && $inSegSource)
{
$buffer[$curPersonName][] = $xmlReader->readString();
}
break;
}
}
// If we are at a closing element...
elseif($xmlReader->nodeType == XMLREADER::END_ELEMENT)
{
switch($xmlReader->localName)
{
case 'seg-source':
// If we are closing a seg-source, set the flag to false
$inSegSource = false;
break;
}
}
}
$output = [];
foreach($buffer as $currId=>$currData)
{
$output = array_merge($output, $currData);
}
print_r($output);