4

I made this regex to get all attributes in tag "img".

 /<img\s+(?:([a-z_-]+)\s*=\s*"(.*?)"\s*)*\s*\/>/g

But, It just take only one attribute which is last.

How can I get all attributes with regex?

Test String:

 <img src="abc.png" alt="abc" />
 <img alt="def" src="def.png" />
 <img src="abc.png" alt="abc" style="border:none" />
 <img alt="def" src="def.png" style="border:none" />

Result: (with http://www.regex101.com)

 MATCH 1
 1. [19-22] `alt`
 2. [24-27] `abc`

 MATCH 2
 1. [47-50] `src`
 2. [52-59] `def.png`

 MATCH 3
 1. [93-98] `style`
 2. [100-111]   `border:none`

 MATCH 4
 1. [145-150]   `style`
 2. [152-163]   `border:none`
WebEngine
  • 157
  • 2
  • 10

2 Answers2

7

I suggest you to use \G anchor in-order to do a continuous string match.

(?:<img|(?<!^)\G)\h*([\w-]+)="([^"]*)"(?=.*?\/>)

Get the attribute from group index 1 and get the value from group index 2.

DEMO

$string = <<<EOT
 <img src="abc.png" alt="abc" />
 <img alt="def" src="def.png" />
 <img src="abc.png" alt="abc" style="border:none" />
 <img alt="def" src="def.png" style="border:none" />
EOT;
preg_match_all('~(?:<img|(?<!^)\G)\h*(\w+)="([^"]+)"(?=.*?\/>)~', $string, $match);
print_r($match[1]);
print_r($match[2]);

Output:

Array
(
    [0] => src
    [1] => alt
    [2] => alt
    [3] => src
    [4] => src
    [5] => alt
    [6] => style
    [7] => alt
    [8] => src
    [9] => style
)
Array
(
    [0] => abc.png
    [1] => abc
    [2] => def
    [3] => def.png
    [4] => abc.png
    [5] => abc
    [6] => border:none
    [7] => def
    [8] => def.png
    [9] => border:none
)
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
3

Try this:

/(\w+)=["']([a-zA-Z0-9_.:'"]+)["']/

Remember if you are using this with php option g is not supported, use preg_match_all() function

Try it at: https://regex101.com/r/cQ8jT2/1

karmendra
  • 2,206
  • 8
  • 31
  • 49
  • As a side note, you might want to use something like `/([\w\-]+)=([^"'>]+|(['"]?)(?:[^\3]|\3+)+?\3)/` which matches `attr=value`, `attr='value'`, `attr=' value "value" '`, `attr="value"`, `attr="regex 'r' us"`, `attr="another ""`, `attr="different delimiters' match until one valid is found[...]` and a few different combinations. But it might not work. If was addapted from my comment (http://stackoverflow.com/questions/27988667/how-to-select-image-src-using-php/27988904?noredirect=1#comment44369639_27988904) in another answer. You can check it working here: https://regex101.com/r/sT9rT8/1 – Ismael Miguel Jan 22 '15 at 10:51
  • Also, you can't use `(\w+)` in the first group since `data-value` won't match. Use `([\w\-]+)` instead. – Ismael Miguel Jan 22 '15 at 10:55
  • Sorry for bothering, I tried to edit the other comment but it is too late. The real working regex is this: `([\w\-]+)=([^"'> ]+|(['"]?)(?:[^\3]|\3+)+?\3)` and you can check here: https://regex101.com/r/sT9rT8/2 – Ismael Miguel Jan 22 '15 at 10:58
  • @IsmaelMiguel In `[a-zA-Z0-9_.':"]` if you add a space it will take care of all following cases you mentioned except `attr=value`. regex `(\w+)=["']([a-zA-Z0-9_.:'" ]+)["']` will match `attr='value', attr=' value "value" ', attr="value", attr="regex 'r' us", attr="another "", attr="different delimiters'` – karmendra Jan 22 '15 at 14:25
  • The regex you provided isn't working properly. You can see the workings here: https://regex101.com/r/fF0dH6/1 which is failing for things like `style="font-size:15px"` and even `style="dorber:none;"`. – Ismael Miguel Jan 22 '15 at 15:15
  • @AvinashRaj The solution I've provided works regardless of the tag. One can extract the `` tags and process later. – Ismael Miguel Jan 22 '15 at 15:16