0

for example i have a text like

<p>
Quis vel accusantium libero. Suscipit officiis culpa
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
libero quia ad.
</p>

and i want to check if the string has any data:image then truncate only this part so max char of 50, so the results become

<p>
Quis vel accusantium libero. Suscipit officiis culpa
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH...">
libero quia ad.
</p>

am not sure how exactly to achieve that with preg_replace and "data:image.+?" pattern

ctf0
  • 6,991
  • 5
  • 37
  • 46
  • 3
    If you delete most of a bas64 image string then whats the point in even having the base 64 image? It is not going to work after that – ProEvilz Dec 01 '17 at 10:23
  • @dreftymac it's not exactly like there, because he has a pattern to check, but I would use http://php.net/manual/en/function.preg-split.php and then give the pattern as a regex (not checked, but maybe something like this: '/data:image(.){50}/') then get the first part of the result array – Edwin Dec 01 '17 at 10:28
  • @Edwin good point. I should have elaborated that regex is not essential to solve this problem. – dreftymac Dec 01 '17 at 10:30
  • @ProEvilz because i want to display it as string not as an image, am already aware that truncating the uri will render it useless. – ctf0 Dec 01 '17 at 11:00
  • How do you propose to show an image plus it's `src` ? – ProEvilz Dec 01 '17 at 11:01
  • am building a diff tool, so basically all the html data is being rendered as string which wont make any difference other than data:uri is a render block which will cuz delay when displaying the string to the end user. – ctf0 Dec 01 '17 at 11:03

2 Answers2

2

Problem: PHP string parse

  • Thanks for clarifying your question with comments. What you seem to be wanting is a general-purpose HTML parser that can make special-case modifications to the HTML Markup.
  • Generally speaking, it is not advisable to use regex to parse HTML.
  • If you are wanting a general-purpose tool (and not a quick-and-dirty approach) SO already has a question about Modifying html attributes with PHP that may be closer to what you want.
  • If all you want is a quick-and-dirty approach that will remove long base64 encoded data from src attribute on img tags, then you can tokenize the raw HTML string, and then perform regex replaces, but that approach is going to be painful if you decide you want to do other modifications. You may end up re-inventing the wheel, when you could have just used a real HTML parser to begin with.
  • Nevertheless, the below approach does just that, tokenize the string, do replacements and then return the entire modified string.

Solution using preg_replace (quick-and-dirty)

<?php

$demostring = '
<p>
Quis vel accusantium libero. Suscipit officiis culpa
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">
libero quia ad.
</p>
';

function ctf0_truncate($vinput){
  return( preg_replace('/(data:image.{50})(.*)/', '$1', $vinput) );
}

function ctf0_parse($text, $chars = 50) {
  if (strpos($text, 'data:image') !== FALSE){
    $tokens = explode('"',$text);
    $tokens = array_map("ctf0_truncate",$tokens);
    $vout   = implode('"',$tokens);
  } elseif( True ) {
    $vout = $text;
  }
  return $vout;
}

$myresult = ctf0_parse($demostring);
print($myresult);

Output result

<p>
Quis vel accusantium libero. Suscipit officiis culpa
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALA">
libero quia ad.
</p>

Notes

  • The above solution omits a requested element of the question. Specifically, how to add the '...' ellipsis points. For that part, please see other answers on SO, such as here and here.
dreftymac
  • 31,404
  • 26
  • 119
  • 182
  • because am building a diff tool, that render the new vs old string, so displaying the whole data:uri wont make any difference other than delaying the diff rendering to the end user – ctf0 Dec 01 '17 at 11:02
  • if you know another/better solution without using the regex, plz add it. – ctf0 Dec 01 '17 at 11:12
  • @ctf0 If my hunch is correct, and you are going to want your application to do more modifications similar to this one, and you do not want to re-invent the wheel with a bunch of custom-made regexes, then any full-fledged HTML parser should fit the bill. See e.g., [this link](https://stackoverflow.com/a/16139844/42223) – dreftymac Dec 01 '17 at 11:16
  • thanx, will give it a try, btw regarding the current example https://3v4l.org/cvifQ, is there a way to add '...' in place of removed chars ? – ctf0 Dec 01 '17 at 11:34
  • @ctf0 I will update my answer to include that as well. – dreftymac Dec 01 '17 at 11:43
1

You can do that in different ways, with preg_match(_all), preg_split, etc.

But with the preg_replace will work like this: run to see

<?php
$text='data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7';
$result=preg_replace('/(?<=data:image.{50}).*/', '', $text);

echo $result;
Edwin
  • 2,146
  • 20
  • 26