Extract page content and image using regex

Question

I am getting page content using:

$data = file_get_contents($url);

Now I want to extract

image and
data part leaving script and html code.

This is regex for image I used:

function get_logo($data) 
{
    return preg_match("/<img(.*?)src=(\"|\')(.+?)(gif|jpg|png|bmp)(\"|\')(.*?)(\/)?>(<\/img>)?/", $html, $matches) ? $matches[1] : '';
}

which returns nothing.

Check this question!! http://stackoverflow.com/questions/14939296/extract-image-src-from-a-string Maybe this can help you! :) — PabloWeb18, Oct 29 '13 at 11:27
@Karimkhan You should tag your questions appropriately. If you are using PHP, don't tag your question as javascript unless javascript is also involved. — Jerry, Oct 29 '13 at 11:34

score 2 · Answer 1 · answered Oct 29 '13 at 12:00

2

Do not use regular expressions to parse HTML!

I would suggest you to use a HTML DOM parse like PHP Simple HTML DOM Parser.

answered Oct 29 '13 at 12:00

secelite

1,353
1
11
19

dododo · Answer 2 · 2013-10-29T12:03:07.173

1

1) We don't see html and it is difficult understand that you need.

2) preg_match_all("/<img[^>]+src=[\"|\'](.+\.(gif|jpg|png|bmp))[\"|\']/im", $html, $matches) return all img tags, image names and extensions on page

edited Oct 29 '13 at 12:03

answered Oct 29 '13 at 11:56

dododo

256
1
6

score 1 · Accepted Answer · answered Oct 29 '13 at 11:58

1

The following regex will extract image urls from $data variable:

preg_match_all('/<img[^>]+src=([\'"])([^"\']+)\1/i', $content, $matches);
var_dump($matches[2]);

In in array $matches[2] will be all links to images from the $content

answered Oct 29 '13 at 11:58

romik

490
3
7

Extract page content and image using regex

3 Answers3