0

I must save html source code in a sql database (for an android app). The content must stored local. So images must also saved. I think it would be appropriate to proceed following:

  • on the server (php): replace all img src with img src="data:image..." strings (regex and load image?)
  • then I could store the html code local in my appliction

But how could I realize this? Or should I save the images with html5? I hope you could help me!

UPDATE:

$search = '(<img.*?src=")([^"]*?(\/[^/]*\.[^"]+))';
$replace = "<img src=\"".data_uri('$2')."\">";
$content = preg_replace($search, $replace, $content);

Could someone correct this code? Thanks!

2nd UPDATE:

Examples:

<img class="alignnone" src="https://lh4.googleuserco (...)
<img src="https://lh4.googleuserco (...)
<img width="400" height="100" src='...' (...)
user1756209
  • 573
  • 10
  • 23

1 Answers1

8

Replace your <img src="image.png" alt="An image"> with <img src="<?php echo data_uri('image.png'); ?>" alt="An image"> and define the following function where appropriate:

function data_uri($filename) {
    $mime = mime_content_type($filename);
    $data = base64_encode(file_get_contents($filename));

    return "data:$mime;base64,$data";
}

You'll probably end up with huge html files, so perhaps storing the files outside of the database is better? I'm not familiar with Android, but on iOS you can set the base path of the webview displaying your html files, something like this.

UPDATE:

I created a (content.php) containing a couple of img elements an then ran the following on it:

$content = file_get_contents('content.php');
$search = '/(<img\s+src=["\'])([^"\']+)(["\']\s+[^>]+>)/';
$content = preg_replace_callback($search, create_function(
        '$matches',
        'return $matches[1] . data_uri($matches[2]) . $matches[3];'
    ), $content);

In the code you posted in your question your pattern was missing slashes, and you also would have ended up just literally running data_uri('$2') (that is, $2 being the actual string used as parameter). preg_replace_callback allows you to access the actual value found by preg_replace.

Anyway, the code above will replace all images with the value returned by data_uri, and thus build up img elements with data URI's. You might want to improve the pattern a bit, as it currently assumes attributes are enclosed by double-quotes and nothing else, and also that the src attribute is the first attribute of the element, which is why XML parsing is generally advised I think. The severity of this depends on you input data off course.

UPDATE 2:

A more generic solution would be to split it into two regexes as per my latest comment. That is first modify your search pattern into $search = '(]+>)'; and then do preg_replace_callback($search, 'img_handler', $content); having defined your img_handler function as something like this:

function img_handler($matches) { 
    $image_element = $matches[1]; 

    $pattern = '/(src=["\'])([^"\']+)(["\'])/'; 
    $image_element; = preg_replace_callback($pattern, create_function( 
            $matches, 
            $matches[1] . data_uri($matches[2]) . $matches[3]), 
        $image_element); 

    return $image_element; 
}

The way this works is that the first regex identifies all ing elements () and sends them to the callback function img_handler, which in turn replaces only the src attribute. XML is a bit more complex (but way more generic). I don't have time to put together an example, but it's quite well documented. Check out DOMDocument or SimpleXML which basically does the same thing.

IN CLOSING:

You have now modified your question twice, and will this is surely needed for clarification at times, I feel that we are drifting further and further away from the initial question. I would suggest keeping your questions concise and focused on a single subject. If the answers or comments raises further questions that aren't answered in themselves it is probably better to start a new thread on that matter (e.g. replacing the src attribute of an img element) or look for any similar already asked questions.

Community
  • 1
  • 1
Simon
  • 3,667
  • 1
  • 35
  • 49
  • Thanks! But how could I replace ALL image tags (also – user1756209 Oct 23 '12 at 10:41
  • Searching for `(An image)` and replacing with `An image` should work, given that your ing elements are as uniform as in the example. Otherwise reading all the files and parsing them as XML might be an alternative. See [this thread](http://stackoverflow.com/questions/8211047/replacing-image-path-with-regular-expression) and [this](http://stackoverflow.com/questions/6744904/replace-images-with-regular-expressions). – Simon Oct 23 '12 at 10:44
  • I like your first answer! I tried to copy some code from your link. Could you correct the code (in the question)? – user1756209 Oct 23 '12 at 10:57
  • You'll have to use `preg_replace_callback` instead. Just a minute, I'll adapt your code to use that instead. – Simon Oct 23 '12 at 11:29
  • Thank you. I need this for wordpress blogs. I could replace the double-quotes with one quote. But how could I make it that the src attribute could be anywhere? I've found a solution (regex) a few months ago. But I can not remember :( -> XML parsing is too long (code)... – user1756209 Oct 23 '12 at 12:39
  • As for single vs. double quotes I've just updated the pattern, but attribute order is a bit more complex. Basically you'll need to define a sub-pattern for any attribute (including value) that is *not* the src-attribute, and adjust the pattern to allow for 0-*n* of those before *and* after the actual src-pattern, with whitespace (`\s+`) in between. But perhaps it would be better just breaking things down a bit more, first a simple regex to find all img attributes, then in the `preg_replace_callback` callback another regex to replace just the src and return everything else. – Simon Oct 23 '12 at 13:06
  • Sorry, I need necessarily a solution to recognize all img tags. They are all different (/random) - because I don't know which pattern wordpress users use. Could you provide an example for general replacement (also XML if needed). Thank you very much! – user1756209 Oct 23 '12 at 13:23
  • Did you not see my chat messages? Also, my last comment above contains a note on how to divide the regex into to separate search/replace calls to simplify it a little, which will solve your problem. I feel your original question has in deed been answered, and I you have trouble getting the regex right I'd suggest starting a new thread discussing regex for replacing the src attribute of an img element. – Simon Oct 23 '12 at 15:22
  • Thanks - your answer in the chat is great - but how could I use the handler? preg_replace($search, img_handler(???), $content); – user1756209 Oct 23 '12 at 16:41
  • I've created a new thread -> http://stackoverflow.com/questions/13035668/regex-modify-img-src-tags – user1756209 Oct 23 '12 at 17:06
  • I'm terribly sorry, there was an error in my second reply, it's supposed to be preg_replace_callback on both occasions. Answer updated as well. – Simon Oct 23 '12 at 18:54