0

I'm using a buffered reader to go through a HTML file & have to replace the full URL of any img file with a new path.

For example, one file I'm working on has 3 new paths to be found & I've declared them as final variables:

public static final String x_TAG="https://newsite.com/media/x.jpg";
public static final String y_TAG="https://newsite.com/media/y.jpg";
public static final String z_TAG="https://newsite.com/media/z.jpg";

Now I can read through the file & pattern match where these are in the file by:

Pattern imgPattern = Pattern.compile("(<\\s*img\\s*alt\\s*=\\s*\").*?(\"\\s*>)");
Matcher imgMatcher = imgPattern.matcher(replaceAllTags);

while(imgMatcher.find()) {
    System.err.println("match at "+imgMatcher.group());
}

That prints back:

match at <img alt="/oldSite.com/Images?action=AttachFile&amp;do=get&amp;target=Images/x.jpg" src="cc_files/Images_003.jpg" title="/oldSite.com/Images?action=AttachFile&amp;do=get&amp;target=Images/x.jpg" width="600">


match at <img alt="/oldSite.com/Images?action=AttachFile&amp;do=get&amp;target=Images/y.jpg" src="cc_files/Images_004.jpg" title="/oldSite.com/Images?action=AttachFile&amp;do=get&amp;target=Images/y.jpg" width="600">


match at <img alt="/oldSite.com/Images?action=AttachFile&amp;do=get&amp;target=Images/z.jpg" src="cc_files/Images.jpg" title="/oldSite.com/Images?action=AttachFile&amp;do=get&amp;target=Images/z.jpg" width="600">**

So what's the best way to find & append the new URL for each image?

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
user1406476
  • 69
  • 1
  • 7

1 Answers1

0

yet another person trying to match screen-scrape with regex. :-) i'm not saying it's nto possible, but another approach is using html parser like jsoup https://stackoverflow.com/a/6042593/81520 or other such library to parse the HTML you read in. Then for each IMG tag, edit the SRC attribute.

Community
  • 1
  • 1
Peter Perháč
  • 20,434
  • 21
  • 120
  • 152