0

What regular expression can be used to extract the value of src attribute in the iframe tag?

user2699073
  • 71
  • 1
  • 5
  • javascript instead of java? – Jeroen Ingelbrecht Aug 25 '13 at 18:33
  • 1
    regex should not be used for parsing something as complex and *wierd* as html document. use library meant to be used for that kind of task. – dantuch Aug 25 '13 at 18:49
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) –  Aug 26 '13 at 02:27

5 Answers5

4

If you really are using Java (not JavaScript) and you only have the iframe, you can try the regular expression:

(?<=src=")[^"]*(?<!")

e.g.:

private static final Pattern REGEX_PATTERN = 
        Pattern.compile("(?<=src=\")[^\"]*(?<!\")");

public static void main(String[] args) {
    String input = "<iframe name=\"I1\" id=\"I1\" marginwidth=\"1\" marginheight=\"1\" height=\"430px\" width=\"100%\" border=\"0\" frameborder=\"0\" scrolling=\"no\" src=\"report.htm?view=country=us\">";

    System.out.println(
        REGEX_PATTERN.matcher(input).matches()
    );  // prints "false"

    Matcher matcher = REGEX_PATTERN.matcher(input);
    while (matcher.find()) {
        System.out.println(matcher.group());
    }
}

Output:

report.htm?view=country=us
Paul Vargas
  • 41,222
  • 15
  • 102
  • 148
0

I would say look into dom parsing. from there it would be extremely similar to the javascript answer. Dom parser will turn the html into a document from there you can do:

iframe = document.getElementById("I1"); src = iframe.getAttribute("src");

0

Regex is little bit costlier do not use it until you have other simple solution, in java try this

String src="<iframe name='I1' id='I1' marginwidth='1' marginheight='1'" + 
" height='430px' width='100%' border='0' frameborder='0' scrolling='no'" +
" src='report.htm?view=country=us'>";

int position1 = src.indexOf("src") + 5;
System.out.println(position1);

int position2 = src.indexOf("\'", position1);
System.out.println(position2);

System.out.println(src.substring(position1, position2));

Output:

134
160
report.htm?view=country=us
commit
  • 4,777
  • 15
  • 43
  • 70
-1

In case you meant javascript instead of java:

var iframe = document.getElementById("I1");
var src = iframe.getAttribute("src");
alert(src); //outputs the value of the src attribute
-1
 src="(.*?)"

The regular expression will match src="report.htm?view=country=us", but you will find only the part between the " in the first (and only) submatch.

When you only want to match src-attributes when they are in an iframe, do this:

<iframe.*?src="(.*?)".*?>

but there are certain corner-cases where this could fail due to the inherently non-regular nature of HTML. See the top answer to RegEx match open tags except XHTML self-contained tags for an amusing rant about this problem.

Community
  • 1
  • 1
Philipp
  • 67,764
  • 9
  • 118
  • 153
  • @djechlin I did point out that it can't parse any HTML, but it will parse the HTML in the question. I also think that your criticism would be more constructive when you would offer a better solution instead of just complaining. – Philipp Aug 25 '13 at 20:01
  • Comments that point out problems in solutions are constructive. They tell the reader "this won't work." The better solution is to use an HTML parser. – djechlin Aug 25 '13 at 20:10