1

I am new in regular expression.

I have this text:

$text =
'<ul style="list-style:none;">
  <li>
      <a href="files/docs/qwe.xls" target="_blank">Link1</a>
  </li>
  <li>
      <a href="files/docs/ere.xls" target="_blank">Link2</a>
  </li>
  <li>
      <a href="files/docs/123.xls" target="_blank">Link3</a>
  </li>
</ul>';

with regular expression I want to get this arrays:

$filePath[0] = "files/docs/qwe.xls";
$fileName[0] = "Link1";
$filePath[1] = "files/docs/ere.xls";
$fileName[1] = "Link2";
$filePath[2] = "files/docs/123.xls";
$fileName[2] = "Link3";

How can I do it?

Thanks.

Abduhafiz
  • 3,318
  • 5
  • 38
  • 48
  • Ideally one should not be parsing HTML with regular expressions, even the slightest changes in the text will tend to mess up your regexps. Please take a look at this http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – ffledgling Oct 30 '13 at 10:45

3 Answers3

2

You need simple regular expression

check this code

 $match = array();
 preg_match_all('#<a href="(.*?)">(.*?)</a>#sm', $text, $match);
 print_r($match)

(.*?) - means everything non greedy

Robert
  • 19,800
  • 5
  • 55
  • 85
0

Use

$res = array();
preg_match_all('/href="(.+?)".*?>(.+?)<\/a>/ims', $text, $res);
var_dump($res);
TiMESPLiNTER
  • 5,741
  • 2
  • 28
  • 64
  • @Robert: Why to define $res first? – TiMESPLiNTER Oct 30 '13 at 10:14
  • Because it will give error of undefined array moreover it's good habbit to declare variables that you're going to use. Btw your code will fail with multiline – Robert Oct 30 '13 at 10:14
  • But $res gets defined by preg_match_all and it does not throws a php error at all, even if I enable E_ALL error_reporting(). Multiline was not a requirement. – TiMESPLiNTER Oct 30 '13 at 10:19
  • `$res` is not defined by preg_match_all! Check the manual. As param preg_match_all takes reference to object. Reference means that object needs to exists before it's passed to function. http://php.net/manual/en/function.preg-match-all.php – Robert Oct 31 '13 at 07:26
  • Okay, but then we have to fix the doc and its examples overthere at php.net don't we? – TiMESPLiNTER Oct 31 '13 at 07:32
  • I don't understand u declaration param for matches is `array &$matches` which means the reference to array. Reference rather refers to objects that exists. – Robert Oct 31 '13 at 13:07
0

Use lookarounds, they are useful to check if there is something before or after the string you're looking for. Here is how it works:

/(?<=href=")[^"]*(?=")/

Here is what it means:

/ beginning
(?<=href=") preceded by href="
[^"]* any nomber of non-" characters
(?=") followed by "
/ end

SteeveDroz
  • 6,006
  • 6
  • 33
  • 65