0

In some cases this works fine, in others like below, its not.

$xml_url = 'http://campusdining.compass-usa.com/Hofstra/Pages/SignageXML.aspx?location=Student%20Center%20Cafe';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.3a5pre) Gecko/20100526 Firefox/3.7a5pre");
$data = curl_exec($ch);
$ce = curl_error($ch);
curl_close($ch);

// this is how I was doing it prior to today and it worked before
// preg_match_all("/<MealPeriod name=\"(.+?)\">([\w\W\r\n]*?)<\/MealPeriod>/i", $data, $output_array);

// this way doesnt show all the meal periods, 
// but I need to know whats in between the MealPeriod tags
// preg_match_all('/<MealPeriod name="(.*?)">(.*?)<\/MealPeriod>/i', $data, $output_array); 

// shows all the meal period names, 
// but I need the above to work to store whats in between the MealPeriod tags in the $output_array[2]
preg_match_all('/<MealPeriod name="(.*?)">/i', $data, $output_array); 

echo '<pre> '.print_r($output_array[1],1).'</pre>';

I tried this on a few regex live sites and 1 of them returned what I needed, while the second did not..
http://www.phpliveregex.com/ -- did work
https://regex101.com/ -- did not work

expected output would by the following for $output_array[1]:

 Array
(
    [0] => Breakfast
    [1] => Every Day
    [2] => Outtakes
    [3] => Salad Bar
)

But it should also hold whats inbetween the MealPeriod tags in $output_array[2]

Any help would be greatly appreciated

Yohn
  • 2,519
  • 19
  • 23
  • how is it not working? What do the wrong results look like? For what its worth, the regex worked fine on Rey: http://rey.gimenez.biz/s/14hjon – Daniel Gimenez Feb 27 '15 at 19:20
  • When I do ``preg_match_all('/(.*?)<\/MealPeriod>/i', $data, $output_array); `` and then ``print_r($output_array[1]);`` I'm only seeing Breakfast show up, when it should show Every Day, Outtakes, and Salad Bar as well. – Yohn Feb 27 '15 at 19:22
  • Well that's a different regular expression that the one in your question - still works though. http://rey.gimenez.biz/s/kc7i6v – Daniel Gimenez Feb 27 '15 at 19:28
  • Actually, in that newest link its only showing the Breakfast meal period, when it needs to display all 4 - Breakfast, Every Day, Outtakes, and Salad Bar – Yohn Feb 27 '15 at 19:33

2 Answers2

0

This code below works, all I did was change the regex and change the printing.

The output on screen looks rather odd, because the second (.*?) to capture everything between <MealPeriod> and </MealPeriod> is capturing all the xml tags as well. If you look at the source code, you can clearly see this.

I would encourage you to work with an XML Parser to work with the document. I certainly have used regex to extract portions of XML documents before using a parser to convert them to objects but a parser is much better equipped to work with XML than regex (by leaps and bounds).

Everything is captured, but it is not being printed to the screen with <pre> tags. However, if you look at the source, everything is there.

<?php
$xml_url = 'http://campusdining.compass-usa.com/Hofstra/Pages/SignageXML.aspx?location=Student%20Center%20Cafe';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.3a5pre) Gecko/20100526 Firefox/3.7a5pre");
$data = curl_exec($ch);
$ce = curl_error($ch);
curl_close($ch);

// this is how I was doing it prior to today and it worked before
// preg_match_all("/<MealPeriod name=\"(.+?)\">([\w\W\r\n]*?)<\/MealPeriod>/i", $data, $output_array);

// this way doesnt show all the meal periods, 
// but I need to know whats in between the MealPeriod tags
// preg_match_all('/<MealPeriod name="(.*?)">(.*?)<\/MealPeriod>/i', $data, $output_array); 

// shows all the meal period names, 
// but I need the above to work to store whats in between the MealPeriod tags in the $output_array[2]
preg_match_all('/<MealPeriod name="(.*?)">(.*?)<\/MealPeriod>/i', $data, $output_array); 

echo '<pre> '.print_r($output_array,1).'</pre>';
?>
Regular Jo
  • 5,190
  • 3
  • 25
  • 47
0

I found the answer thanks to the following stack overflow post - php regex or | operator

I needed to change the regex to the following and I was finally able to return all the meal periods and contents there of within the correct array.

'/<MealPeriod name="(.*?)">(.*?)<\/?MealPeriod>/i'

hense the ? in <\/?Meal

Community
  • 1
  • 1
Yohn
  • 2,519
  • 19
  • 23