What regular expression can be used to extract the value of src
attribute in the iframe
tag?

- 71
- 1
- 5
-
javascript instead of java? – Jeroen Ingelbrecht Aug 25 '13 at 18:33
-
1regex should not be used for parsing something as complex and *wierd* as html document. use library meant to be used for that kind of task. – dantuch Aug 25 '13 at 18:49
-
possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Aug 26 '13 at 02:27
5 Answers
If you really are using Java (not JavaScript) and you only have the iframe
, you can try the regular expression:
(?<=src=")[^"]*(?<!")
e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("(?<=src=\")[^\"]*(?<!\")");
public static void main(String[] args) {
String input = "<iframe name=\"I1\" id=\"I1\" marginwidth=\"1\" marginheight=\"1\" height=\"430px\" width=\"100%\" border=\"0\" frameborder=\"0\" scrolling=\"no\" src=\"report.htm?view=country=us\">";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "false"
Matcher matcher = REGEX_PATTERN.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Output:
report.htm?view=country=us

- 41,222
- 15
- 102
- 148
I would say look into dom parsing. from there it would be extremely similar to the javascript answer. Dom parser will turn the html into a document from there you can do:
iframe = document.getElementById("I1"); src = iframe.getAttribute("src");

- 1
- 1
Regex is little bit costlier do not use it until you have other simple solution, in java try this
String src="<iframe name='I1' id='I1' marginwidth='1' marginheight='1'" +
" height='430px' width='100%' border='0' frameborder='0' scrolling='no'" +
" src='report.htm?view=country=us'>";
int position1 = src.indexOf("src") + 5;
System.out.println(position1);
int position2 = src.indexOf("\'", position1);
System.out.println(position2);
System.out.println(src.substring(position1, position2));
Output:
134
160
report.htm?view=country=us

- 4,777
- 15
- 43
- 70
In case you meant javascript instead of java:
var iframe = document.getElementById("I1");
var src = iframe.getAttribute("src");
alert(src); //outputs the value of the src attribute

- 808
- 5
- 11
src="(.*?)"
The regular expression will match src="report.htm?view=country=us"
, but you will find only the part between the "
in the first (and only) submatch.
When you only want to match src-attributes when they are in an iframe, do this:
<iframe.*?src="(.*?)".*?>
but there are certain corner-cases where this could fail due to the inherently non-regular nature of HTML. See the top answer to RegEx match open tags except XHTML self-contained tags for an amusing rant about this problem.
-
@djechlin I did point out that it can't parse any HTML, but it will parse the HTML in the question. I also think that your criticism would be more constructive when you would offer a better solution instead of just complaining. – Philipp Aug 25 '13 at 20:01
-
Comments that point out problems in solutions are constructive. They tell the reader "this won't work." The better solution is to use an HTML parser. – djechlin Aug 25 '13 at 20:10