0

I'm trying to group the results i get date wise.

Please refer my previous question. How to ignore http link in string and return everything else?

Basically right now i get the schedule list but that doesn't include any date in it, So it's hard to understand which event is going to go live on which date and time, it's confusing people because of no date as it shows same timing for multiple events which is actually going to go live on a different date.

From the previous question, I got a solution which is perfect (Thanks Denomales for the solution!) but just no date.

Here's the solution regex:

<font(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\scolor=['"]?green['"]?)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>\s*(?:Stream\s*)?((?:(?!<\/font>).)*)<\/font>\s*[^<]*?([^<]+)\s+(\d+.\d+\s*\w{2}\s*-\s*\d+.\d+\s*\w{2})[^<]*?<font(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\scolor=['"]?gold['"]?)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>(?:Stream\s*)?((?:(?!\s*https?:|<\/font>).)*)

And here's the sample data:

<font color="black" size="6">---</font><p>
<font color="red" size="6">FRIDAY 6TH SEPTEMBER</font><p>
<font color="gold"> *ENGLISH* </font> Some event with quotes, comma, slashes, dots and more 9.00pm-5.00pm <font color="red">Channel 18</font><p>
<font color="gold"> *ITALIAN* </font> Some event with quotes, comma, slashes, dots and more 9.50pm-10.00pm <font color="red">Channel 02</font><p>
<font color="gold"> *ENGLISH* </font> Some event with quotes, comma, slashes, dots and more 10:00AM-12:00pm <font color="red">Channel 05</font><p>
<font color="gold"> *JAPANESE* </font> Some Event Name 11.20am-1.20pm <font color="red">CHANNEL IP 2 STREAM http://domain.com/abc/channel2.html</font><p>
<font color="black" size="6">---</font><p>
<font color="red" size="6">FRIDAY 7TH SEPTEMBER</font><p>
<font color="gold"> *ENGLISH* </font> Some event with quotes, comma, slashes, dots and more 9.00pm-5.00pm <font color="red">Channel 18</font><p>
<font color="gold"> *ITALIAN* </font> Some event with quotes, comma, slashes, dots and more 9.50pm-10.00pm <font color="red">Channel 02</font><p>
<font color="gold"> *ENGLISH* </font> Some event with quotes, comma, slashes, dots and more 10:00AM-12:00pm <font color="red">Channel 05</font><p>
<font color="gold"> *JAPANESE* </font> Some Event Name 11.20am-1.20pm <font color="red">CHANNEL IP 2 STREAM http://domain.com/abc/channel2.html</font><p>

Now I'm trying to get the date (FRIDAY 6TH SEPTEMBER) in YYYY-MM-DD format and then the events schedule.

Example output expecting:

Array(
  ['2013-09-06'] => Array (
    [0] => Array (
      'language'   => 'ENGLISH',
      'title'      => 'Some event name',
      'startTime'  => '9:00pm',
      'endTime'    => '5:00pm',
      'channel'    => 'channel 18',
      'channelNum' => '18'
    ),
    [1] => Array (
      'language'   => 'ITALIAN',
      'title'      => 'Some event name',
      'startTime'  => '12:00pm',
      'endTime'    => '2:00pm',
      'channel'    => 'Channel IP 2',
      'channelNum' => '2'
    ),
    [2] => Array (
      'language'   => 'ENGLISH',
      'title'      => 'Some event name',
      'startTime'  => '6:00pm',
      'endTime'    => '8:00pm',
      'channel'    => 'channel 20',
      'channelNum' => '20'
    ),
  ),
  ['2013-09-07'] => Array (
    [0] => Array (
      'language'   => 'ENGLISH',
      'title'      => 'Some event name',
      'startTime'  => '9:00pm',
      'endTime'    => '5:00pm',
      'channel'    => 'channel 18',
      'channelNum' => '18'
    ),
    [1] => Array (
      'language'   => 'ITALIAN',
      'title'      => 'Some event name',
      'startTime'  => '12:00pm',
      'endTime'    => '2:00pm',
      'channel'    => 'Channel IP 2',
      'channelNum' => '2'
    ),
    [2] => Array (
      'language'   => 'ENGLISH',
      'title'      => 'Some event name',
      'startTime'  => '6:00pm',
      'endTime'    => '8:00pm',
      'channel'    => 'channel 20',
      'channelNum' => '20'
    ),
  ),
)

Example output is just random made up output, not a real data or anything.

Can anyone help ? Would really appreciate.

Note: I don't want to use any HTML parsing libs, So please don't recommend unless you have the solution which is much better than regex which i have right now.

Community
  • 1
  • 1
D-M
  • 13
  • 4
  • Is there a specific reason you don't want to use HTML parsing? It feels like it'd be easier than regex for the given HTML - iterating over font tags would be very easy and you could then analyze them by their contents easier. (unless the HTML of the page doesn't validate right and breaks the PHP DOM functions) – Collin Grady Sep 06 '13 at 18:52
  • Well i feel like regex is straight forward and yes the html content doesn't validate, it's totally messy (no closing tags and all). I did try HTML parsing but that was making this more complicated (maybe I couldn't get it that much). But as i said, if someone has the solution using HTML parsing lib, then i can switch to that only (if that ain't complicated :D). – D-M Sep 06 '13 at 19:29
  • **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Sep 06 '13 at 19:52
  • @AndyLester: But I'm unable to parse and get the output i want using HTML parser :| You know how to do it? Please post the solution if you can, would be great and helpful! – D-M Sep 06 '13 at 20:20
  • There are many examples out there about how to do it. I just posted a link to one, but there are many many many others. Look for DOMDocument. Try what you can, and if it doesn't work, post a new question based on that non-working parser. – Andy Lester Sep 06 '13 at 20:43
  • Can you give us URL where do you get this html data? Because this html is disaster, no `` closing etc. – Glavić Sep 07 '13 at 12:43
  • Yep i know, you can check it out [here](http://bit.ly/PLpc9t) – D-M Sep 07 '13 at 18:29

0 Answers0