0

I am getting page content using:

$data = file_get_contents($url);

Now I want to extract

  1. image and
  2. data part leaving script and html code.

This is regex for image I used:

function get_logo($data) 
{
    return preg_match("/<img(.*?)src=(\"|\')(.+?)(gif|jpg|png|bmp)(\"|\')(.*?)(\/)?>(<\/img>)?/", $html, $matches) ? $matches[1] : '';
}

which returns nothing.

putvande
  • 15,068
  • 3
  • 34
  • 50
user123
  • 5,269
  • 16
  • 73
  • 121

3 Answers3

2

Do not use regular expressions to parse HTML!

I would suggest you to use a HTML DOM parse like PHP Simple HTML DOM Parser.

secelite
  • 1,353
  • 1
  • 11
  • 19
1

1) We don't see html and it is difficult understand that you need.

2) preg_match_all("/<img[^>]+src=[\"|\'](.+\.(gif|jpg|png|bmp))[\"|\']/im", $html, $matches) return all img tags, image names and extensions on page

dododo
  • 256
  • 1
  • 6
1

The following regex will extract image urls from $data variable:

preg_match_all('/<img[^>]+src=([\'"])([^"\']+)\1/i', $content, $matches);
var_dump($matches[2]);

In in array $matches[2] will be all links to images from the $content

romik
  • 490
  • 3
  • 7