0

I want to take the 2012-07-16T21:00:00 out of the

 <abbr title="2012-07-16T21:00:00" class="dtstart">Monday, July 16th, 2012</abbr>

but I am having some difficulties. This is what I did

preg_match('/<abbr title="(.*)" \/>/i', $file_string, $time);
$time_out = $time[1];
EnexoOnoma
  • 8,454
  • 18
  • 94
  • 179
  • 2
    Please refrain from parsing HTML with RegEx as it will [drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒͆ͧͨ̽͞҉̹͍̳̻͢](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. – Madara's Ghost Jul 31 '12 at 15:11

4 Answers4

1

use

preg_match('/<abbr title="([^"]*)" \/>/i', $file_string, $time);

So your matcher will stop at first <<">> ([^"] means anything but ")

or

preg_match('/<abbr title="([0-9T:-]*)" \/>/i', $file_string, $time);

more precise, use group that contains only what you need to catch. (note the " is exluded)

Joseph Silber
  • 214,931
  • 59
  • 362
  • 292
Arcadien
  • 2,258
  • 16
  • 26
0

While I don't think using a regex for this is the best approach, it might be OK in some circumstances.

If you're using a regex, this is what you need:

preg_match('/<abbr title="([^"]*)"/i', $file_string, $time);

See it here in action: http://viper-7.com/qZu9tj

Joseph Silber
  • 214,931
  • 59
  • 362
  • 292
0

Try it this way instead of regex:

$dom = new DOMDocument;
$dom->loadXML($file_string);

$abbr = simplexml_import_dom($dom);

$time;
foreach ($abbr[0]->attributes() as $key => $value)
{
    if ($key == 'title')
    {
        $time = $value;
        break;
    }
}
echo $time;

Regex can be a pain for dealing with this sort of thing. Better to use a parser.

Stegrex
  • 4,004
  • 1
  • 17
  • 19
0

The best way would be to use an HTML parser, like PHP's DOM.

<?php

    $html = <<<HTML
<abbr title="2012-07-16T21:00:00" class="dtstart">Monday, July 16th, 2012</abbr>
HTML;

    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $abbr  = $dom->getElementsByTagName("abbr")->item(0);
    $title = $abbr->getAttribute("title");

    echo $title;

That will work even if your data doesn't look exactly like that:

  • If there are other attributes before or after title.
  • If there are trailing spaces or other invisible characters.
  • Regardless of quote type (", ', or none).

So please, don't use RegEx, as it will eventuall cause you to lose your mind to cuthulu. The <center> cannot hold it is too late.

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308