Calling a script from a certain URL in Java?

Question

I'm using Java to parse HTML from a random website, let's say it's http://google.com for convenience. After parsing the HTML data, I want to extract some of that data, and show it on a display. After that the user will get to input a search term, and press a button. This button will execute that script behind the "search" button. I want to do this with several sites, so giving me a way that only works with google won't help me a lot.

So what if the button makes an AJAX call? - you'll run into the Same Origin Policy and it will break because the page expects to be on domain X and it's now proxied into domain Y. — Diodeus - James MacFarlane, Mar 29 '12 at 18:55
I don't understand the question. A website has html--what does google vs. ?? have to do with the html from the website? How does what you display differ from, say, "view source"? — Phil Freihofner, Mar 29 '12 at 18:57
I think he wants to show screen-scraped pages and have them behave as original pages. — Diodeus - James MacFarlane, Mar 29 '12 at 18:58
Yes, like diodeus said, but I want to be able to use scripts on that page. Like the Google search button, or the the stackoverflow vote button. For example, that I press a button in my own program that will actually click a vote button on this site (by executing the code behind that button). — ZimZim, Mar 30 '12 at 08:55

score 0 · Accepted Answer · edited May 23 '17 at 10:08

Edit:

Ah, I see. You are asking about how to call a remote web-page from your code? There are a couple of ways you can do this:

You can do it "by-hand" using the Java URL class.
You could use the great Apache HTTPClient library.
Another possibility is a tool like HTMLUnit.

Scraping of websites is a difficult problem and rarely have I found that a single scraper can handle multiple websites. The idea of a generic scraper is just not possible.

I would recommend writing a Java interface which is something like HandleSearchPage. It would contain a method to scrap the page and extract some of the data and another method to submit a search.

Then you can implement your scrapers for Google, Yahoo, etc.. As to how to parse html and drive a webpage there are many other questions/answers on that specific topic.

Best of luck.

Oh nonononono my bad, what I meant was, I need an explanation that will make me able to do it for all sites I find, programmatically of course. I absolutely don´t expect a single java code to be able to manipulate all scripts on every website haha. I just don´t want you to give me an explanation that I will only be able to use for google.com. And thanks, I´ll look into your answer. EDIT: You gave me an explanation of how to parse HTML. Like I said, I already know how to parse HTML in several ways. What I need to do is EXECUTE scripts on an external website through my own code. — ZimZim, Mar 29 '12 at 19:22

score 0 · Answer 2 · answered Mar 29 '12 at 19:30

0

Sorry I am not too sure what the quesiton is. - If you want to grab a web page from java and then strip out the html data then that is a task that you can fairly easily do - or use something like nutch. If you want to run the javascript inside a page inside your java then you will need to look at something like rhino.

nutch will spider the pages, and update a database (usually solr) you can then issue searches against the database and display the results.

answered Mar 29 '12 at 19:30

Symeon Breen

1,531
11
25

A good bit of this should be a comment dude. In these cases I say something like "I'm not sure you are talking about XXXX." Then my answer. Then "If you were talking about something else, edit your question." – Gray Mar 29 '12 at 19:34
Thanks for the comment Gray - I am a bit new on this site TBH - how do i do a comment - i see on this thread there is a grey Add Comment link, but there is not one under the OP post ? EDIT - ahh i need a 50 rep to add a comment . – Symeon Breen Mar 30 '12 at 11:35

Calling a script from a certain URL in Java?

2 Answers2