1

I'm assuming the easiest way to do this will be with regex, but I just can't seem to find clear information on regex. I'm a beginner and all the information I'm finding is confusing.

I need to find every image in an HTML file, insert the folder extension, and then overwrite the file. I know how to do everything but the replacement. From my understanding, the code should look something like this:

   preg_replace("^\"(.jpg|.jpeg|.gif|.png)$"....)

But I don't understand where to go from there. I need to keep the original value of whatever is between those things and add something to the beginning of it, so for example "image.jpg" would become "images/image.jpg".

Aaron Miller
  • 3,692
  • 1
  • 19
  • 26
user2597300
  • 43
  • 1
  • 7
  • no, rather *parse* the HTML file with DOM, SimpleHTMLDom, etc. see http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – michi Jul 18 '13 at 21:34
  • 4
    [No, we shall not manipulate HTML with regexes](http://stackoverflow.com/a/1732454/560648). – Lightness Races in Orbit Jul 18 '13 at 21:34
  • 1
    Why can't you just output the desired path to image in the first place? What's the reason for this? – vee Jul 18 '13 at 21:35
  • 1
    `Insert file extension at beginning of image address` is not meaning `"image.jpg" would become "images/image.jpg"` –  Jul 18 '13 at 21:39
  • @LightnessRacesinOrbit I'd almost argue that it's forgivable in this case, since the HTML itself isn't being manipulated with regexes, but rather attribute values within some tags. A double-quoted string is still a piece of balanced text and using regexes to manipulate it therefore still asks for trouble, but if you can guarantee within the problem domain that there will be no escaped quotes within the strings, you can get away with it when you're in a hurry. – Aaron Miller Jul 18 '13 at 21:44
  • I think this is one of these situations were combining regex and HTML is just OK. The question is not about parsing HTML or matching open/close tags, but just about replacing well-defined patterns that happen to be part of HTML code. – Racso Jul 18 '13 at 21:44

1 Answers1

0
$img = "<a href=\"hello.jpg\" /><a href=\"asdf.png\" /><a href=\"xkcd.gif\" />";
$img = preg_replace("/\"(\w+\.(jpg|jpeg|gif|png))\"/","\"images/$1\"",$img);
echo $img;

Output: <a href="images/hello.jpg" /><a href="images/asdf.png" /><a href="images/xkcd.gif" />

The regex can be improved using lookarounds, but I think they are overkill (and will make it more complex).

Racso
  • 2,310
  • 1
  • 18
  • 23
  • I don't understand your comment. It works for any [valid] value of `$img`. Is there a mistake? – Racso Jul 18 '13 at 21:53
  • Any known reason why this might work on the string in this example but not on a string of a full HTML file? – user2597300 Jul 19 '13 at 01:15
  • Upload the full string somewhere and show us to see what may be going on. If you are putting the string on PHP, be careful with the escaping backslashes (`\`) and that kind of stuff. – Racso Jul 19 '13 at 04:12