-8

I have got a string with links inside a divs. How can I use the correct regular expression to parse it?

I need to get the array of this values like this:

[
"/media/filer_public/b6/49/b6491a4d-5c0d-4a0f-aa9c-b32ea39912c6/category-2.jpg", 
"/media/filer_public/93/65/9365c3bc-8649-4d9d-932e-144f16ed535c/category-3.jpg"
]

Base HTML (example):

               <a href="/napolnye-pokrytiya/" class="category_cart">
                    <div class="category_cart__container">
                        <div style="background-image: url('/media/filer_public/b6/49/b6491a4d-5c0d-4a0f-aa9c-b32ea39912c6/category-2.jpg')" class="category_cart__thumbnail"></div>
                        <div class="category_cart__content">
                            <p class="category_cart__title">Напольные покрытия</p>
                        </div>
                    </div>
                </a>

                <a href="/oboi/" class="category_cart">
                    <div class="category_cart__container">
                        <div style="background-image: url('/media/filer_public/93/65/9365c3bc-8649-4d9d-932e-144f16ed535c/category-3.jpg')" class="category_cart__thumbnail"></div>
                        <div class="category_cart__content">
                            <p class="category_cart__title">Обои</p>
                        </div>
                    </div>
                </a>
yivi
  • 42,438
  • 18
  • 116
  • 138
  • 5
    Using regular expressions in [HTML has some serious potential problems](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454), I would suggest using something like DOMDocument which understands the structure and context of the tags. – Nigel Ren Mar 20 '19 at 14:45
  • If the strings you are trying to get are all inside `background-image` styling - it's probably easier to find the style attr on an element using js. Something like: https://stackoverflow.com/questions/14013131/javascript-get-background-image-url-of-div – ImClarky Mar 20 '19 at 14:48
  • 2
    [This question is being discussed on meta.](https://meta.stackoverflow.com/questions/381783/why-my-question-is-wrong-or-incorrect) – Script47 Mar 25 '19 at 14:26

1 Answers1

1

You should use DOMDocument and DOMXPath or something like that, but if you want it done with regexp, for your given html this should do the trick:

<?php 

$html_code = 
'<a href="/napolnye-pokrytiya/" class="category_cart">
    <div class="category_cart__container">
        <div style="background-image: url(\'/media/filer_public/b6/49/b6491a4d-5c0d-4a0f-aa9c-b32ea39912c6/category-2.jpg\')" class="category_cart__thumbnail"></div>
            <div class="category_cart__content">
                <p class="category_cart__title">Напольные покрытия</p>
            </div>
        </div>
</a>

<a href="/oboi/" class="category_cart">
    <div class="category_cart__container">
        <div style="background-image: url(\'/media/filer_public/93/65/9365c3bc-8649-4d9d-932e-144f16ed535c/category-3.jpg\')" class="category_cart__thumbnail"></div>
            <div class="category_cart__content">
                <p class="category_cart__title">Обои</p>
            </div>
        </div>
</a>';

//it will look for match between url(' and ')
preg_match_all('/url\(\'(.*?)\'\)/', $html_code, $matches_array);
echo '<pre>';
var_dump($matches_array);
echo '</pre>';

$your_array = array();
//matches including url(' and ') are stored in $matches_array[0], excluded in $matches_array[1] so
foreach($matches_array[1] as $match) {

    $your_array[] = $match;

}
echo '<pre>';
var_dump($your_array);
echo '</pre>';
?>

Output:

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(84) "url('/media/filer_public/b6/49/b6491a4d-5c0d-4a0f-aa9c-b32ea39912c6/category-2.jpg')"
    [1]=>
    string(84) "url('/media/filer_public/93/65/9365c3bc-8649-4d9d-932e-144f16ed535c/category-3.jpg')"
  }
  [1]=>
  array(2) {
    [0]=>
    string(77) "/media/filer_public/b6/49/b6491a4d-5c0d-4a0f-aa9c-b32ea39912c6/category-2.jpg"
    [1]=>
    string(77) "/media/filer_public/93/65/9365c3bc-8649-4d9d-932e-144f16ed535c/category-3.jpg"
  }
}
array(2) {
  [0]=>
  string(77) "/media/filer_public/b6/49/b6491a4d-5c0d-4a0f-aa9c-b32ea39912c6/category-2.jpg"
  [1]=>
  string(77) "/media/filer_public/93/65/9365c3bc-8649-4d9d-932e-144f16ed535c/category-3.jpg"
}
user11222393
  • 3,245
  • 3
  • 13
  • 23