1

I tried many ways to do this. And I am totally new to Regular expression. I want to replace all img src link to other link.

My html file just are like this:

<img src="01"></img><img src="02"></img><img src="03"></img>

or it would be like this:

<  img src  =  "01"></img><    img src="02"><    img src = "03"></img>

There might be space or just without "</img>"

and I want them be like this way:

<div><p><DIV class="a"><img src="01"></img></p></div><div><p><DIV class="a"><img src="02"></img></p></div><div><p><DIV class="a"><img src="03"></img></p></div>

and I use this to get the img src link:

            Pattern p = null;
            Matcher m = null;
            p = Pattern.compile("<img[^>]*src\\s*=\\s*\"([^\"]*)");
            m = p.matcher(mystr);
            while (m.find()) {
                imgIDList.add(m.group(1));
            }

and I made the str list to replace: ArrayList imgList4Replace = new ArrayList();

and I use this to excuse replace :

                mystr.replace(("<img[^>]*src\\s*=\\s*\""+imgListReplaceOriginal.get(nIndex)+"([^\"]*)"), imgList4Replace.get(nIndex)+"$2");

it just don't work. I've spent so much time to test.

And need your help. Thank you very much.

AmyWuGo
  • 2,315
  • 4
  • 22
  • 26
  • [String.replace](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replace%5C%28java.lang.CharSequence,%20java.lang.CharSequence%5C%29) method doesn't apply regex. You should use [replaceAll](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29) or [replaceFirst](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceFirst%28java.lang.String,%20java.lang.String%29) instead – cubanacan Sep 13 '12 at 09:25
  • As a side note - images shouldn't have a closing tag. They should always be self closing: `` – punkrockbuddyholly Oct 16 '12 at 09:33

3 Answers3

5

You can't reliably use regexps with HTML/XML. You need an HTML parser, such as the confusingly named JTidy (although it claims to be an HTML pretty-printer, it also gives you a DOM-view on your document)

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • I just want to make string and the string will be add to an html file. And this is for android webview, so ..... – AmyWuGo Sep 13 '12 at 08:36
  • 1
    I don't think I can be any clearer. You can't reliably use regexps with HTML/XML! – Brian Agnew Sep 13 '12 at 15:50
  • he's right - look at that this epic piece and think about it: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags ;) - use xml tools and you're fine – pagid Sep 13 '12 at 22:05
  • Sorry,I think there is some misunderstanding about my question. Just forget the Html, just think I want to use regexps to change my string using java. I will try cubanacan's way, and let you know the result. – AmyWuGo Sep 14 '12 at 01:29
  • Thank you, I checked the link pagid sent to me. My fault. Sorry. – AmyWuGo Sep 14 '12 at 01:35
  • "You can't reliably use regexps with HTML/XML", why is that? – Patryk Dobrowolski Jan 28 '16 at 10:11
3

Here is the code :

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class ImgTest {
    public static void main(String[] args) {

            String s = "This is a sample<img src=\"test.html\" /> text";
            Pattern p = Pattern.compile("[<](/)?img[^>]*[>]");
            Matcher m = p.matcher(s);
            if (m.find()) {
              String src = m.group();
              System.out.println(src);
            }
            s = s.replaceAll("[<](/)?img[^>]*[>]", "");
            System.out.println(s);
    }
}
Dave Hogan
  • 3,201
  • 6
  • 29
  • 54
1

Here you are:

private static String replaceSrcs(String str, List<String> srcs) {
    Pattern p = Pattern.compile("(<\\s*img\\s*src\\s*=\\s*\").*?(\"\\s*>)");
    Matcher m = p.matcher(str);
    StringBuffer sb = new StringBuffer();
    int i = 0;
    while (m.find()) {
        m.appendReplacement(sb, "$1" + srcs.get(i++) + "$2");
    }
    m.appendTail(sb);
    return sb.toString();
}

Now you need just invoke it:

replaceSrcs(mystr, imgList4Replace);

And it returns what you like.

cubanacan
  • 644
  • 1
  • 9
  • 26
  • Hi Cubanacan, thank you for your code, I tried and in my case, it seem don't work. I plan to report that to my boss, and change my way. – AmyWuGo Sep 14 '12 at 01:45
  • @AmyWuGo mentioned cases are successfully tested by this method. why not to upvote then. – cubanacan Sep 14 '12 at 07:30