image link problems regular expressions

Question

When I run the following script, the image is not rendered well. What is the problem here? This is the code (it's an assignment for school and need to do that with regular expressions...):

<?php
    header('Content-Type: text/html; charset=utf-8');
    $url = "http://www.asaphshop.nl/epages/asaphnl.sf/nl_NL/
            ObjectPath=/Shops/asaphnl/Products/80203122";
    $htmlcode = file_get_contents($url);
    $pattern = "/class=\"noscript\"\>(.*)\<\/div\>/isU";
    preg_match_all($pattern, $htmlcode, $matches);
    //print_r ($matches);
    $image = ($matches[0][0]);
    print_r ($image);
?>

This is the part of the link I need to copy (the data-src-l part)with the link of http://www.asaphshop.nl in front of it so i have a (complete) link of the image:

<div id="ProductImages" class="noscript">
    <ul>  
        <li>
            <a href="/WebRoot/products/8020/80203122/bilder/80203122.jpg">
            <img itemprop="image" alt="Jesus Remember Me - Taize Songs (2CD)"
                 src="/WebRoot/AsaphNL/Shops/asaphnl/5422/8F43/62EE/
                     D698/EF8E/4DEB/AED5/3B0E/80203122_xs.jpg"
                 data-src-xs="/WebRoot/AsaphNL/Shops/asaphnl/5422/8F43/62EE/
                     D698/EF8E/4DEB/AED5/3B0E/80203122_xs.jpg"
                 data-src-s="/WebRoot/products/8020/80203122/bilder/80203122_s.jpg"
                 data-src-m="/WebRoot/products/8020/80203122/bilder/80203122_m.jpg"
                 data-src-l="/WebRoot/products/8020/80203122/bilder/80203122.jpg"
            />
            </a>
        </li>
    </ul>
</div>

You match the `
` not the value of the `data-src-l` attribute. For all others: http://stackoverflow.com/a/1732454/2265374 — ThW, Oct 16 '14 at 08:34
replace pattern as `$pattern = "/class=\"noscript\"\>(.?*)\<\/div\>/imU";` — nu11p01n73R, Oct 16 '14 at 08:36
Warning: preg_match_all(): Compilation failed: nothing to repeat at offset 21 in C:\xampp\htdocs\stage\ripper2.php on line 6 — bananaman, Oct 16 '14 at 08:43
@nu11p01n73R I'm quite sure you mean `(.*?)` to get non greedy match. Here its any char 0 or 1 time, and the repeat can't work, which is the mesasge given by @bananaman — Tensibai, Oct 16 '14 at 08:48
@Tensibai, that's right, the error is gone now... but now is my array empty... how do i solve this? — bananaman, Oct 16 '14 at 08:51
@ThW I agree with the link. But it's sounds not about parsing html here but more 'grepping' values into it. At this point html is like any other text file and there's nothing wrong using regex for this I think. — Tensibai, Oct 16 '14 at 09:29
possible duplicate of [image problems with regular expressions](http://stackoverflow.com/questions/26398016/image-problems-with-regular-expressions) — rr-, Oct 21 '14 at 07:10

score 0 · Accepted Answer · answered Oct 16 '14 at 08:53

0

With a pattern data-src-l="(.*)" it should do

See the demo here

The regex is matching literarly data-src-l=" and then capturing anything (.*) until last double quote "

A better match would be [^"]* instead of .* to capture all but a " ([] is a class of character, starting it with ^ invert it (all but following chars) and the " represent what is not wanted.

On the demo you can play with that and see the explanation on the right panel.

answered Oct 16 '14 at 08:53

Tensibai

15,557
1
37
57

thanks! but i'm not so good with regular expressions... can you help me a little bit? – bananaman Oct 16 '14 at 09:16
I'm ok to help on a case, edit your question with the full problem for help. I already did try to explain the regex in use here. – Tensibai Oct 16 '14 at 09:25

image link problems regular expressions

1 Answers1

Linked