0

I get Iframe link http:\\abc.com?=blahblahiframelink from a third party rest service. I want to extract multiple values from content of that Iframe.

Here is simplified html. Please understand that the real html is far more complex having multiple nested div and tables

.css stuff

<html>
<div>
 <p> NEED THIS INFO </p> 
   ....
   blah blah

  <img src="NEED THIS INFO" > </img> 
</div> 
</html>

I marked "NEED THIS INFO" in above code as what I want to extract out, to demonstrate I want attribute values as well as element values.

I am thinking to first store that Iframe content in a java string in my rest service then use crazy Regex to get information I want.

Before I attempt that I want to check if there is more efficient way to do this. Is there some html parser I can use to get content in structured format.

If not then, please tell me how to store Iframe in Java string.

Please let me know if you need more info.

Watt
  • 3,118
  • 14
  • 54
  • 85

1 Answers1

1

There are a couple of ways to do this for those coming here. However, the most efficient is going to be to write the iframe to a string like thus using HttpURLConnection or HttpsURLConnection (conn is the connection). Iframes are grabbable from their links.

BufferedReader br=new BufferedReader(new InputStreamReader(conn.getInputStream())); String line=""; html="";

        while((line=br.readLine())!=null)
        {
            html=html+line+"\n";
        }
        br.close();

The most efficient is, of course, to limit the number of middle-men like Mechanize and the number of URL calls; etc.

It is possible to use java's powerful .net or .nio to do this just be creating an HttpURLConnection or javax.net's HttpsURLClient to get your page, the cookies; etc. From there the answer unfolds.

To parse the page in Java you can with A and B being the better options I know

A. Create an XML document and run an xpath. I am time limited so I've posted a resource for you. All you need is a string and you can do this. This fits your needs if you are not looking for something specific. Once you get the page, just get everthing you need.

http://www.mkyong.com/tutorials/java-xml-tutorials/

B. Regex. Look online to find a good solution I am limited to two links. Also, MyRegexTester is a great free resource for learning and testing Regex which is less daunting then you think, especially in java. Use those wildcards and look aheads.

C. Better yet, use a parser like Jsoup but set the xml ini- variable to output xml if you are not resource constrained but that appears to not be the case. JSoup does the xml parsing for you and allows you to use an xpath to get the result.

D. Use HttpUnit or a gui-less browser like Mechanize in Python(http://www.pythonforbeginners.com/cheatsheet/python-mechanize-cheat-sheet/), Perl, or Ruby. My favorite is Python since there are more ready-made modules and the speeds are about the same. Python also has a Jsoup plugin

Andrew Scott Evans
  • 1,003
  • 12
  • 26