0

i have following php regex code.. i want to extract the stock symbol in some html output.

The stock symbol i want to extract is /q?s=XXXX -- XXXX (the stock symbol) could be 1 to 5 characters long.

  if(preg_match_all('~(?<=q\?s=)[-A-Z.]{1,5}~', $html, $out))
        {
            $out[0] = array_unique($out[0]);                
        } else {
            echo "FAIL";
        }

HTML code below (case 1 and case that i applied this to)

case #1 (does *not* work)
<a href="/q?s=BLCM" symbol="BLCM">Bellicum Pharmaceuticals, Inc.</a>

case #2 (does work correctly)                          
 <a href="/q?s=NYLD">NYLD</a>

Looking for suggestions on how i can update my php regex code to make it work for both case 1 and case 2. Thanks.

ChicagoDude
  • 591
  • 7
  • 21

2 Answers2

0

Instead of using regex, make effective use of DOM and XPath to do this for you.

$doc = new DOMDocument;
@$doc->loadHTML($html); // load the HTML data

$xpath = new DOMXPath($doc);
$links = $xpath->query('//a[substring(@href, 1, 5) = "/q?s="]');

foreach ($links as $link) {
   $results[] = str_replace('/q?s=', '', $link->getAttribute('href'));
}

print_r($results);

eval.in

hwnd
  • 69,796
  • 4
  • 95
  • 132
  • Or if the href value is always fixed, use `$results[] = substr($link->getAttribute('href'), 5);` – hwnd May 17 '15 at 03:12
0

The answer seems nice, but it seems like a lot of work and code to maintain, no?

if (preg_match_all('/q\?s=(\S{1,5})\"/', $html, $match)) {
    $symbols = array_unique($match[1]);
}

or even shorter... '/q\?s=(\S+)\"/'

mike.k
  • 3,277
  • 1
  • 12
  • 18