1

I am sorting & grouping publication data from an XML file. The methods I am currently using are working fine for the most part, although I feel like there is a more efficient way to do what I am trying to accomplish.

Here is a sample of what the target nodes look like:

<comic>
      <id>117</id>
      <mainsection>
        <series>
          <displayname>My Amazing Adventure</displayname>
          <sortname>My Amazing Adventure</sortname>
        </series>
      </mainsection>
      <issuenr>2</issuenr>
      <seriefirstletter>
        <displayname>M</displayname>
        <sortname>M</sortname>
      </seriefirstletter>
    </comic>

Here are the current steps I am taking.

  • Loading the XML file with SimpleXML
  • Specifying the target node and using iterator_to_array to convert it to an array
  • Using a usort function that compares (strcmp) the seriesname attribute, to sort all of the series alphabetically.
  • I'm using a query string for each page to specify each letter of the alphabet and using an IF statement that compares the query string letter to the seriesfirstletter value. So only the applicable nodes are returned.
  • I then begin my foreach statement. Echoing out the data I want, into LI items.
  • Finally, I'm using jQuery to look at the ID's for each LI item and visually group them. I've created a PHP variable that uses the seriesname, with the spaces removed, for the ID's. It inserts a H4 heading with the proper series name, above the group and inserts a separating DIV below the group.

While the alphabetical sorting is working properly. I'm also wanting the issues within the same series to be sorted numerically. This is not currently working. Right now, the numeric sort order looks something like this: 1, 10, 12, 2, 3.

I would like to get the numerical sorting issue straightened out. I also feel like the grouping that I'm currently doing in jQuery, could be done in PHP, while I'm going through the loop. Any advice on a better / more efficient way to handle this data, would be greatly appreciated.

Batfan
  • 7,966
  • 6
  • 33
  • 53
  • 1
    For sorting - as already suggested in a previous comment - use natural order. See http://stackoverflow.com/a/8989994/367456 – hakre Oct 06 '12 at 15:28
  • @hakre - Thanks. I did see that. However, from what I can tell, natsort uses first text it finds in the array item to sort it. As you can see from the sample node above, the Series Name is not the first text in the node. Furthermore, I don't know how to make it grab the Issue Number as well. – Batfan Oct 06 '12 at 15:42

2 Answers2

1

You can use

$key = "id" ;
$iterator = new SimpleXMLIterator($xml);
$array = json_decode(json_encode($iterator), TRUE);
__xsort($array['comic'],"id") ;
var_dump($array['comic']);

Output

array
  0 => 
    array
      'id' => string '1' (length=1)
      'mainsection' => 
        array
          'series' => 
            array
              ...
  1 => 
    array
      'id' => string '2' (length=1)
      'mainsection' => 
        array
          'series' => 
            array
              ...
  2 => 
    array
      'id' => string '3' (length=1)
      'mainsection' => 
        array
          'series' => 
            array
              ...
  3 => 
    array
      'id' => string '10' (length=2)
      'mainsection' => 
        array
          'series' => 
            array
              ...
  4 => 
    array
      'id' => string '12' (length=2)
      'mainsection' => 
        array
          'series' => 
            array
              ... 

XML USed

$xml = "<comics>
<comic>
      <id>1</id>
      <mainsection>
        <series>
          <displayname>My Amazing Adventure - 1</displayname>
          <sortname>My Amazing Adventure</sortname>
        </series>
      </mainsection>
    </comic>

<comic>
      <id>10</id>
      <mainsection>
        <series>
          <displayname>My Amazing Adventure - 10</displayname>
          <sortname>My Amazing Adventure</sortname>
        </series>
      </mainsection>
    </comic>

<comic>
      <id>12</id>
      <mainsection>
        <series>
          <displayname>My Amazing Adventure 12</displayname>
          <sortname>My Amazing Adventure</sortname>
        </series>
      </mainsection>
    </comic>

<comic>
      <id>2</id>
      <mainsection>
        <series>
          <displayname>My Amazing Adventure 2</displayname>
          <sortname>My Amazing Adventure</sortname>
        </series>
      </mainsection>
    </comic>


<comic>
      <id>3</id>
      <mainsection>
        <series>
          <displayname>My Amazing Adventure 3</displayname>
          <sortname>My Amazing Adventure</sortname>
        </series>
      </mainsection>
    </comic>

</comics>" ;

__xsort Function Used

Community
  • 1
  • 1
Baba
  • 94,024
  • 28
  • 166
  • 217
  • Getting an "unexpected T_FUNCTION" error on the second line of the __xsort function. – Batfan Oct 08 '12 at 18:49
  • See the __xsort function here http://stackoverflow.com/a/12759674/1226894 .. don't want to duplicate code – Baba Oct 08 '12 at 18:51
  • Yep, I saw the link and copied that exactly. Still getting an error. – Batfan Oct 08 '12 at 18:57
  • Ah, that might be the issue. My server is running PHP 5.2.17 – Batfan Oct 08 '12 at 19:02
  • Hmm, now it's returning NULL and this error "The argument should be an array" on the usort line. I should note that the XML data I'm pulling is from a gzipped XML file on my server. – Batfan Oct 08 '12 at 19:38
  • Why `json_decode(json_encode($iterator))`? The `iterator_to_array` is not better? – Peter Krauss Jul 13 '13 at 10:09
1

Let's say you've got all <comic> elements as an iterator already. First of all convert it to an array so we can use the array functions:

$comics = iterator_to_array($comics, 0);

Then you want to sort this array based on some value, here the value of the <issuenr> child. This can be done with usort and the help of a callback function:

$success = usort($comics, function($a, $b) {
    return strnatcmp($a->issuenr, $b->issuenr);
});

The callback function just picks the concrete values you want to compare with each other and passes it along to strnatcmp which is the natural order comparison I commented above.


The following code-example shows how to list all series that match a specific search letter, natsorted and distinct (no duplicate names, grouped).

The search and the grouping is both done with an xpath query:

$searchval = 'T';

$file = 'compress.zlib://comiclist10-12.xml.gz';

$xml = simplexml_load_file($file);

$series = $xml->xpath(
    "/*/comiclist/comic[./seriefirstletter/displayname = '$searchval']
        /mainsection/series/sortname[
            not(. = ../../../following-sibling::comic/mainsection/series/sortname)
        ]"
);

natsort($series);

foreach($series as $serie)
{
    echo $serie, "\n";
}

This will then output the sorted list:

Tale of the Batman: Gotham by Gaslight, A
Tales of Suspense: Captain America & Iron Man #1 Commemorative Edition
Tales to Astonish, Vol. 1
Teenage Mutant Ninja Turtles
Teenage Mutant Ninja Turtles Micro Series
Teenage Mutant Ninja Turtles Ongoing
Terminator / Robocop: Kill Human
Thanos
Thing, Vol. 1
Thor, Vol. 2
Thor, Vol. 3
Thor: Blood Oath
Thor: For Asgard
Thor: Man of War
Thor: Son of Asgard
Thor Annual
Thor Corps
Thundercats
Thundercats (DC Comics - Wildstorm)
Thundercats: Enemy's Pride
Tomb of Dracula, Vol. 4, The
Torch, The
Toxin
Transformers: Armada
Transformers: Generation One
Transformers: Infiltration
Truth: Red, White & Black

In the next step you want to list all comics in that series, that would be an inner foreach:

foreach ($series as $serie) {
    echo $serie, "\n";

    $string = xpath_string($serie);

    $comics = $serie->xpath("../../../../comic[./mainsection/series/sortname = $string]");

    foreach ($comics as $i => $comic) {
        printf(" %d. id: %s\n", $i+1, $comic->id);
    }
}

Which will then fetch the comics for each series, output:

Tale of the Batman: Gotham by Gaslight, A
 1. id: 8832
Tales of Suspense: Captain America & Iron Man #1 Commemorative Edition
 1. id: 3591
Tales to Astonish, Vol. 1
 1. id: 3589
Teenage Mutant Ninja Turtles
 1. id: 117
Teenage Mutant Ninja Turtles Micro Series
 1. id: 13789
Teenage Mutant Ninja Turtles Ongoing
 1. id: 13780
 2. id: 13782
 3. id: 13787
Terminator / Robocop: Kill Human
 1. id: 13775
Thanos
 1. id: 3597
Thing, Vol. 1
 1. id: 3746
Thor, Vol. 2
 1. id: 5873
Thor, Vol. 3
 1. id: 1035
 2. id: 1635
 3. id: 2318
 4. id: 2430
 5. id: 2463
 6. id: 3333
 7. id: 3616
 8. id: 11731
 9. id: 11733
Thor: Blood Oath
 1. id: 3635
 2. id: 3636
Thor: For Asgard
 1. id: 11545
 2. id: 11546
Thor: Man of War
 1. id: 3644
Thor: Son of Asgard
 1. id: 538
 2. id: 3645
Thor Annual
 1. id: 5868
Thor Corps
 1. id: 3640
Thundercats
 1. id: 209
Thundercats (DC Comics - Wildstorm)
 1. id: 3654
Thundercats: Enemy's Pride
 1. id: 3649
Tomb of Dracula, Vol. 4, The
 1. id: 3719
Torch, The
 1. id: 2328
 2. id: 2330
 3. id: 2461
Toxin
 1. id: 3720
Transformers: Armada
 1. id: 3737
Transformers: Generation One
 1. id: 557
Transformers: Infiltration
 1. id: 3729
 2. id: 3731
Truth: Red, White & Black
 1. id: 3750
 2. id: 3751

The code of the xpath_string function can be found in another answer of mine.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • I think I understand that but, how am I applying this to each series. From what I'm seeing, I'd need to grab all nodes within a certain series, then throw those into an array, THEN use this sorting method to numerically sort the issues. Right? – Batfan Oct 08 '12 at 22:28
  • Yes, that's how it works. The sort will be on the array, so will sort everything that's originally offered by the `$comics` iterator. However that code is PHP 5.3, not 5.2. You need to create yourself the function with a name (not anonymous) and then just use the function-name as string with usort, see http://php.net/usort for the general example. – hakre Oct 08 '12 at 22:35
  • What would you suggest for grabbing the series groups? Because I'm guessing I'd have to reference the current node's series in the loop. My first instinct was another foreach but, I was under the impression that I couldn't do a foreach within a foreach loop. – Batfan Oct 08 '12 at 23:01
  • 1
    @Batfan: You can, you can put as many foreachs into each other until you can't read your code any longer ;) – hakre Oct 08 '12 at 23:04
  • Good to know! I've been screwing around with it for a couple hours and not having a ton of luck. As I mentioned, I'm trying to add in the group headings too. So, I've created an array of the unique series names, for the current letter. I'm using a foreach to echo each heading out as an h4 and a nested foreach (of all issues) below that to grab the issues in the series. Just starting with the issue number for now. Its only grabbing the first issue number though. Any idea why? See: http://db.tt/cByrooYL – Batfan Oct 09 '12 at 01:16
  • You need to add the XML to the script so that one could just run that single file for a test. – hakre Oct 09 '12 at 01:20
  • Well, it's quite a bit of XML data, 8MB before being gzipped. The node structure above is a good example though. Like we've discussed there are multiple series and some series have multiple issues. The parent node is . – Batfan Oct 09 '12 at 01:32
  • oh well, that's much. but you could upload it on your dropbox and make the test-script remote-include it. that should work. you can do that later I need some sleep now. – hakre Oct 09 '12 at 01:33
  • 1
    @Batfan: I dunno if I understood the XML right this way, I added you an example with a foreach inside a foreach. I'm using xpath to search and to group and in the inner foreach to locate the comics of each series. – hakre Oct 10 '12 at 01:57
  • You're a genius. Seriously, I owe you a beer for that. How did you come up with the xpath query for $series? Just so I can understand. Did you happen to notice where I was going wrong with my attempt? – Batfan Oct 10 '12 at 17:42
  • 1
    @Batfan: The search is a simple predicate, the distinc is something like outlined in [distinct in Xpath?](http://stackoverflow.com/questions/2812122/distinct-in-xpath), that gives probably more insight. The `xpath_string` function was missing on stackoverflow for PHP (at least I didn't found one), so I wrote another answer there. As xpath is powerful to query elements, I went that route straight. Earlier it was not clear to me that you needed a distinct name. Problems in your code: You should enable warnings and notices, there were several issues which spotted problems. – hakre Oct 10 '12 at 18:01
  • 1
    Apart from that, why you only got one comic per series as because you compared against the exact comic, not the string of the title. The fix is either to cast to string when you add it so the `$seriesArray` or when you compare: `if ($v2SeriesName == (string) $series) {` <- that was the reason, see the extra `(string)`. If you don't cast, you ask to compare that the comic is *the* comic (simplexml element object), not the string representation (just the title). – hakre Oct 10 '12 at 18:04
  • Ah, okay. Makes sense. I appreciate the explanation very much. Thanks again – Batfan Oct 10 '12 at 18:08
  • to be honest, I started to look into your variant and then I thought, this works better with xpath because of the many nestings. Luckily the code was much smaller so there is not so much space for errors to sneak in. – hakre Oct 10 '12 at 18:09