3

I'm trying to iterate over an array containing multi-parent dependencies in the most efficient way possible for the purpose of building an efficient file-dependency script implementing RequireJS.

I have used the answer to this question Category Hierarchy (PHP/MySQL) successfully for a 1 parent scenario. However there is another constraint in that one child-script may depend on more than one parent-script.

For example a list of scripts and their dependencies which are currently an array in PHP (consider the script name unique and the key):

╔══════════════════╦═════════════════════════════╗
║   Script Name    ║        Dependencies         ║
╠══════════════════╬═════════════════════════════╣
║ jquery           ║ jquery-core, jquery-migrate ║
║ jquery-core      ║ null                        ║
║ jquery-migrate   ║ null                        ║
║ statistics       ║ null                        ║
║ backtotop        ║ jquery                      ║
║ jquery-circle    ║ jquery                      ║
║ interface-slider ║ jquery-circle               ║
║ test1            ║ jquery, statistics          ║
║ test2            ║ test1                       ║
║ test3            ║ null                        ║
╚══════════════════╩═════════════════════════════╝

Would create an array (demonstrated as JSON for clarity) like the one below where a shared dependency such as that of test1 would be nested under ['jquery-core jquery-migrate']['jquery']['statistics'] as opposed to having a new inclusion of ['jquery-core jquery-migrate']['jquery statistics']['test1']

allScripts = [
    {
        "name": "jquery-core jquery-migrate",
        "children":[
            {
                "name":"jquery",
                "children":[
                    {
                        "name":"backtotop",
                        "children":null
                    },
                    {
                        "name":"statistics",
                        "children":[
                            {
                                "name":"test1",
                                "children":[
                                    {
                                        "name":"test2",
                                        "children":null
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "name":"jquery-circle",
                        "children":[
                            {
                                "name":"interface-slider",
                                "children":null
                            }
                        ]
                    }
                ]
            }
        ]
    },
    {
        "name":"test3",
        "children":null
    }
];

Perhaps this needs a form of lowest-common-ancestor approach?http://www.stoimen.com/blog/2012/08/24/computer-algorithms-finding-the-lowest-common-ancestor/

Any help is much appreciated!

Community
  • 1
  • 1
Rob Schmuecker
  • 8,934
  • 2
  • 18
  • 34

2 Answers2

2

I put together some demo code that loads the dependencies of a script first, then loads the script. There's some sample output to demonstrate what's going on as well. Make sure to read all of the comments as there's too much to explain here. Please let me know if you want any changes.
+1, interesting challenge!


#!/usr/bin/php
<?php
/*
 * -------
 * There's room for improvement, but this is a good start.
 * Let me know if you need any changes.
 * -------
 * 
 * This loads scripts with dependencies being loaded
 * first, with efficiency being key here.
 * Reference counting is also performed, and can be
 * seen in the $loaded array.  A script can be referenced
 * indirectly many times through the loading of various
 * scripts and their dependencies.
 * Circular dependencies are handled by checking if the
 * script has already been loaded.  Since it can only
 * be loaded once, a circular dependency is avoided.
 *
 * Original Sample Data:
 * ╔══════════════════╦═════════════════════════════╗
 * ║   Script Name    ║        Dependencies         ║
 * ╠══════════════════╬═════════════════════════════╣
 * ║ jquery           ║ jquery-core, jquery-migrate ║
 * ║ jquery-core      ║ null                        ║
 * ║ jquery-migrate   ║ null                        ║
 * ║ statistics       ║ null                        ║
 * ║ backtotop        ║ jquery                      ║
 * ║ jquery-circle    ║ jquery                      ║
 * ║ interface-slider ║ jquery-circle               ║
 * ║ test1            ║ jquery, statistics          ║
 * ║ test2            ║ test1                       ║
 * ║ test3            ║ null                        ║
 * ╚══════════════════╩═════════════════════════════╝
 * 
 */

define('NO_DEPENDENCIES_LIST',    TRUE);  //create a list of scripts with no dependencies

//sample data, taken from OP
$scripts = array(
    'jquery'=>array('jquery-core','jquery-migrate',),
    'jquery-core'=>array(),
    'jquery-migrate'=>array(null ),
    'statistics'=>array( ),
    'backtotop'=>array('jquery' ),
    'jquery-circle'=>array('jquery' ),
    'interface-slider'=>array('jquery-circle' ),
    'test1'=>array('jquery','statistics', 'test3'  ),
    'test2'=>array('test1' ),
    'test3'=>array( ),
);

$loaded     = array();  //list of loaded scripts, order is important
$nodepends  = array();  //list of scripts with no dependencies. async load perhaps?


/**
 * Adds a null item to an empty array, strictly for json output
 * as the OP used null instead of empty arrays in the example.
 * @param array $scripts
 */
function process_array(&$scripts) { 
  foreach($scripts as $s=>&$data)
    if (count($data)==0)
      $data = array(null);
}


/**
 * Finds dependents of $scriptName.
 * @param array $scripts script test data
 * @param string $scriptName name of script to search for dependcies for
 * @param boolean $retNames TRUE to return script names, false to return ref count 
 * @return multitype:number unknown 
 */
function find_dependencies(array $scripts, $scriptName, $retNames=FALSE) {
  $ret = array();
  foreach($scripts as $s=>$data)
    foreach($data as $d) {
      if ($d == $scriptName)
        if (!$retNames) {
          $ret[$s] = (isset($ret[$s])?$ret[$s] + 1 : 0);
        } else {
          $ret[$s] = $s;
        }
    }
  return $ret;
}

/**
 * Checks $script to see if it has already been added to the list of
 * loaded scripts.
 * @param string|array $script script name or array of script names
 * @return boolean
 */
function script_loaded($script) {
  global $loaded;
  if (is_array($script)) {
    foreach($script as $s)
      if (!isset($loaded[$s]))
        return FALSE;
  } 
  if (is_string($script)) 
    return isset($loaded[$script]);
  return TRUE;
}

/**
 * Loads a script into the $loaded array, first loading all
 * dependencies, and the dependencies of those, etc., etc.
 * Ensures that scripts are only loaded after all required
 * scripts are loaded.  Remember - order of $loaded is important!
 * Return value is unimportant in this version.
 * 
 * @param array $scripts
 * @param string $script
 * @param array|null $children
 * @param integer $level
 * @return boolean
 */
function load_script(array &$scripts, $script, &$children, $level) {
  global $loaded, $nodepends;
  if ($script == null)
    return FALSE;
  if (script_loaded($script))
    return TRUE;

  if (count($scripts[$script]) > 0 && $scripts[$script][0] != null) {
    for($i=0;$i<count($scripts[$script]);$i++) {
      if (!isset($scripts[$script][$i]))
        break;
      if ($i >= count($scripts[$script]))
        break;
      if (!script_loaded($scripts[$script][$i])) {
        load_script($scripts, $scripts[$script][$i], $scripts[$script], $level+1);
      } 

      if (isset($children[$i]) && script_loaded($children[$i]))
        $children[$i] = null;
    }
  } 
  if ($scripts[$script][0]==null) {
    if (!isset($loaded[$script]))
      $loaded[$script] = $script;
      if (NO_DEPENDENCIES_LIST) 
        $nodepends[$script] = $loaded[$script];
  }


  if (!isset($loaded[$script])) {
    $loaded[$script] = 0;
  } else {
    $loaded[$script] = $loaded[$script] + 1;
    return TRUE;
  }

  $loaded[$script] = $loaded[$script] + 1;

  echo "load_script($script)\n";  
  return TRUE;
}


/**
 * demo main function
 * @param array $scripts - array of scripts and their dependencies: test data
 */
function main(&$scripts, &$loaded, &$nodepends) {
  process_array($scripts);
  foreach($scripts as $s=>$data) {
    load_script($scripts, $s, $data, 0);
  } 


if (NO_DEPENDENCIES_LIST)
  //reverse this to put the less-referenced scripts at the top
  $nodepends = array_reverse($nodepends);

//here we print out a table of the $loaded array.
//it's just here for a visual representation of
//what scripts were loaded and what their dependent scripts
//are.  
//The left column is the loaded script, the right column
//is a list of scripts that tried to load the script.
echo "
 ╔══════════════════════════════════════════════════╗
 ║  Loaded Scripts: with scripts that loaded them   ║
 ╚══════════════════════════════════════════════════╝
 ╔══════════════════╦═══════════════════════════════╗
 ║   Script Name    ║         Loading Scripts       ║
 ╠══════════════════╬═══════════════════════════════╣\n";
 foreach($loaded as $s=>$n) { 
   $d = find_dependencies($scripts, $s, TRUE);
   $n = 16-strlen($s);
   $s2 = implode(",", $d);
   if ($s2=="")
     $s2 = "null";
   $n2 = 29-strlen($s2);
   printf (" ║ %s%s ║ %s%s ║\n", $s, str_repeat(" ", $n), $s2, str_repeat(" ", $n2));
 }
 echo " ╚══════════════════╩═══════════════════════════════╝\n";

//print json of loaded scripts -- just because OP used json
print_r( json_encode($scripts, JSON_PRETTY_PRINT) );

//print array of loaded scripts: the order of this array is important;
//scripts that are depended on by other scripts are loaded first.  If
//a script has no dependencies, its order is not significant.
//--
//this is the actual result that we're looking for: all scripts loaded
//with dependencies loaded first.
print_r( $loaded );

//print array of scripts that have no dependencies.  Since efficiency is
//a requirement, you could async load these first, since it doesn't matter
//what order they load in.
if (NO_DEPENDENCIES_LIST)
  print_r( $nodepends );
}


//run the demo
main($scripts, $loaded, $nodepends);
Trick
  • 636
  • 4
  • 7
  • Thanks @trick, I will take a thorough look and get back with findings! – Rob Schmuecker Jul 04 '14 at 09:04
  • I wrote it as a shell script, so make sure to *chmod +x demo.php && demo.php* -or- *php -f demo.php* (in bash) – Trick Jul 04 '14 at 09:06
  • Looking good in so far that we're getting the correct order of dependencies. However I've gotten to that point on my own previously and would like to take it a step further but grouping by mutual exclusivity to take advantage of the asynchronous loading in requireJS. For example I would then be able to require `backtotop`, `statistics` and `jquery-circle` (then thereafter their dependants) at the same time once `jquery` had loaded as opposed to synchronously as suggested here by the output as I currently have no way of knowing the grouping/reliance. Hence I demo'd a nested JSON/array. – Rob Schmuecker Jul 04 '14 at 09:33
  • Is the JSON you posted the exact format that you need, or was it just an example? – Trick Jul 04 '14 at 10:39
  • It was really just an example of displaying a nested array/structure which I can then process for HTML/Javascript output to get an efficient requireJS chain – Rob Schmuecker Jul 04 '14 at 10:41
0

If I understand your question correctly, you have chosen the wrong data structure to represent dependencies. Whenever multiple parents are allowed in a dependency hierarchy, you have a Directed Acyclic Graph (DAG), not a tree as you have shown. (A tree is a specialization of DAG where every node happens to have one parent. DAGs are strictly more powerful.)

You have shown a quirky kind of tree where a single node (e.g. "jquery-core jquery-migrate",) represents a pair of DAG nodes. This won't work in general. If A is a dependency of (B and C) and D is one of (B and E), then what? If you choose to use two different tree nodes for the (B,C) and (B,E) combinations, what do edges from these combination nodes to their dependencies mean? Moreover, even if you adopt some conservative answer to this question, there are 2^N combinations of N nodes. This approach could easily lead to an enormous tree.

In JSON or PHP, a DAG is easy to represent and search. Store it as an adjacency list: A map taking any library to a list of other libraries that are either its dependencies or dependent on it (choose the direction based on your purpose; to match the tree you showed, it would be the second choice). Search it with any graph search algorithm (e.g. depth first or breadth first).

NB In a general system you must also worry about dependency cycles, which can be accomplished with a search as above.

Gene
  • 46,253
  • 4
  • 58
  • 96
  • Thanks for the idea of a DAG @Gene. Reading your answer, what do you mean by `it would be the second choice` ? – Rob Schmuecker Jul 04 '14 at 09:04
  • @RobSchmuecker I meant choose that the DAG edges run from the dependency to the dependent node rather than the opposite. But note that for many purposes, it's handy to have both "dependency" and "dependent" edges labeled as such you can navigate over the DAG in any way you need. In this case, the data structure is a map from libraries to _two_ lists of libraries. – Gene Jul 04 '14 at 16:33