2

With preg_match_all I want to get class and data-attributes in html.

The example below works, but it only returns class names or only data-id content.

I want the example pattern to find both class and data-id content.

Which regex pattern should I use?

Html contents:

<!-- I want to: $matches[1] == test_class  | $matches[2] == null -->
<div class="test_class"> 

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div class="test_class" data-id="1"> 

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div id="test_id" class="test_class" data-id="1">

<!-- I want to: $matches[1] == test_class test_class2 | $matches[2] == 1 -->
<div class="test_class test_class2" id="test_id" data-id="1">

<!-- I want to: $matches[1] == 1 | $matches[2] == test_class test_class2 -->
<div data-id="1" class="test_class test_class2" id="test_id" >

<!-- I want to: $matches[1] == 1 | $matches[2] == test_class test_class2 -->
<div id="test_id" data-id="1" class="test_class test_class2">

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div class="test_class" id="test_id" data-id="1">

The regex that does not work as I want:

$pattern = '/<(div|i)\s.*(class|data-id)="([^"]+)"[^>]*>/i';

preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

Thanks in advance.

Phil
  • 157,677
  • 23
  • 242
  • 245
Mert Aşan
  • 366
  • 1
  • 6
  • 18

2 Answers2

3

Why not use a DOM parser instead?

You could use an XPath expression like //div[@class or @data-id] to locate the elements then extract their attribute values

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXpath($doc);
$divs = $xpath->query('//div[@class or @data-id]');
foreach ($divs as $div) {
  $matches = [$div->getAttribute('class'), $div->getAttribute('data-id')];
  print_r($matches);
}

Demo ~ https://eval.in/1046227

Phil
  • 157,677
  • 23
  • 242
  • 245
  • Thank you very much! Can i also decompose CSS contents with this method? – Mert Aşan Aug 10 '18 at 04:30
  • @MertA. I'm not really sure what you mean but _probably_ not; you cannot parse CSS with a DOM parser – Phil Aug 10 '18 at 04:33
  • sorry, I am correcting: I am changing the html class names with this method. I am changing the class names in the css file with another code structure. I am using regex in php while changing the class names in the css file. I'm wondering if I could do both with DOM using php? – Mert Aşan Aug 10 '18 at 04:38
  • @MertA. No, you cannot read, interpret or modify CSS with a DOM parser. You can change the values in your HTML though. – Phil Aug 10 '18 at 04:40
2

I second Phil's answer, I think HTML parser is the way to go. It is safer and can handle much complicated things.

Having said that, if you want to try regex in your example, it would be something like this:

<(?:div|i)(?:.*?(?:class|data-id)="([^"]+)")?(?:.*?(?:class|data-id)="([^"]+)")?[^>]*>

Example: https://regex101.com/r/Gb82lF/1/

Ibrahim
  • 6,006
  • 3
  • 39
  • 50
  • 1
    Thank you for doing what I've been looking for hours. Yes, i think using the DOM is convenient. Eyvallah bro. – Mert Aşan Aug 10 '18 at 04:28
  • Hello again @Ibrahim. I need the code you gave me. But there is a problem like this please help me? https://regex101.com/r/vSIsac/3 – Mert Aşan Aug 10 '18 at 21:56
  • 1
    Hi @MertA., you could replace `.*?` with `[^\<]*?` to prevent the tags inside. Example: https://regex101.com/r/vSIsac/6 – Ibrahim Aug 10 '18 at 22:50
  • Thank you so much!! @Ibrahim I thought I was going to respond late, so I asked new question. Please write the answer in the question url: https://stackoverflow.com/questions/51794647/i-can-not-find-the-error-in-my-regex-code?noredirect=1#comment90547248_51794647 – Mert Aşan Aug 10 '18 at 22:53
  • 1
    @MertA. Your welcome :) Regarding `data-ss is not accepted` in the div, it is because you didn't add `=`. – Ibrahim Aug 10 '18 at 22:55