1

I have an XML Feed with deals and I need to get the titles and the images.

I use xpath and this is an example: /deals/deal/deal_title (for deal title) /deals/deal/deal_image (for deal image)

The problem is that some deals don't have the deal image set at all so when I link deal title with deal image I sometimes get the wrong image.

In order to track down the problem I created two separate arrays: one with titles and the other one with images.

The weird thing that causes the problem is that on the images array the empty instances are moved to the end of the array.

For example if we assume that "deal title2" has no image and "deal title3" has image the "deal title3" image is used for "deal title2".

Use this link to see the code I made: http://pastebin.com/HEuTJQjZ

The interesting part starts from: $doc = new DOMDocument();

Basically what it does is to execute many xpath queries to get titles, images, prices etc and then it adds them to the database.

The problem starts when a deal doesn't have a tag set so it just uses the next value.

I don't understand how it magically moves all the empty instances to the bottom. Xpath isn't supposed to order the results, right?

I have even tried to use the [] operators to get the specific image but doesn't help since the results are sorted the wrong way.

Example feed: http://www.clickbanner.gr/xml/?xml_type=deals&affiliate_ID=14063

EDIT:

The real problem is that xpath does not order the results by document order and modifies the expected order. Is this a bug or something or is there a way to force the results to order by document order? See also: XPath query result order

Thank you in advance.

Community
  • 1
  • 1
ggirtsou
  • 2,050
  • 2
  • 20
  • 33
  • So what is the question? I don't see one. – Dimitre Novatchev Jul 28 '12 at 21:07
  • The question is how do I correctly match deal title and deal image when there's a missing image tag on a deal? – ggirtsou Jul 28 '12 at 21:16
  • ggirtsou, All XPath 1.0 implementations I know return the selected nodes in document order (in an object often called `XmlNodeList`). What you actually point out is not that the selected nodes aren't in document order -- but that there is nothing in the selected result that represents a "not-found element". Of course, this is according to the XPath specification -- only selected nodes are represented (returned) in the selection. – Dimitre Novatchev Jul 29 '12 at 00:11
  • Yes I am aware of `XmlNodeList`. Is there a way I can make it so it returns the result properly sorted? When I use `/deals/deal[$k]/deal_title` seems not to work. When I use `/deals/deal/deal_image[deal_tile="the deal title"]` it takes very long time. – ggirtsou Jul 29 '12 at 01:24
  • ggirtsou,It must not be literally `"$k"` -- it must be the number, containrd in the variable `$k`. Also, the second expression in your comment should not select any node -- `deal_image` has no `deal_title` children. See my updated answer where (at the end) I added a more detailed example. – Dimitre Novatchev Jul 29 '12 at 01:36
  • In my script I made it so the `$k` variable represents a number that increases every time that goes through the loop. – ggirtsou Jul 29 '12 at 02:06

2 Answers2

1

Evaluate the following two XPath expressions for any values of $k in the interval [1, count(/deals/deal)]:

/deals/deal[$k]/deal_title

and

deals/deal[$k]/deal_image

In this way you know whether an image was selected, or not.

For example, if count(/deals/deal) is 3, then you will evaluate these XPath expressions:

  • /deals/deal[1]/deal_title and deals/deal[1]/deal_image

  • /deals/deal[2]/deal_title and deals/deal[2]/deal_image

  • /deals/deal[3]/deal_title and deals/deal[3]/deal_image

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • I get `Warning DOMXPath::evaluate() domxpath.evaluate: Invalid expression` when I use `[1, count(/deals/deal)]` inside the for loop. – ggirtsou Jul 28 '12 at 21:35
  • @ggirtsou: What I am saying in this answer is that you first must evaluate this expression: `count(/deals/deal)`. Than in a PHP loop from 1 to the so obtained count, you will issue the two XPath expressions, in which `$k` must be substituted with the current value of the loop-counter. – Dimitre Novatchev Jul 28 '12 at 21:40
  • Well `count(/deals/deal)` returns 1521 which is available through `$total_deals = $deal_title_tag->length;` as well. – ggirtsou Jul 28 '12 at 21:44
  • Great, then organaze the loop and evaluate the two XPath expressions inside its body. I don't know PHP, but in C# this could be something like: `for (int k =1; k <= totalDeals; k++){var result1 = doc.SelectSingleNode(string.Format("/deals/deal[{0}]/deal_title", k)); var result2 = doc.SelectSingleNode(string.Format("/deals/deal[{0}]/deal_image", k)); }` – Dimitre Novatchev Jul 28 '12 at 22:04
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/14574/discussion-between-ggirtsou-and-dimitre-novatchev) – ggirtsou Jul 28 '12 at 23:53
  • thank you so much for your help! You helped me get it working! Wish I could upvote 1 million times your answer! – ggirtsou Jul 29 '12 at 03:27
0

I think you should try this way:

  1. Walkthrough /deal/deal_id tag values
  2. When search for a pair of tags: /deal[/deal_id="$deal_id"]/deal_title and /deal[/deal_id="$deal_id"]/deal_image (using real deal_id) in place of $deal_id

You will get pairs of deal_title and deal_image for each deal and they would match each over correct

dmvrtx
  • 168
  • 1
  • 2
  • 7
  • Thank you very much for your answer goodguy. The thing is I'm using this approach for other XML Feeds as well and not all feeds have `deal_id` set. – ggirtsou Jul 28 '12 at 21:15