0

i'm using the selenium API for a java program (http://selenium.googlecode.com/svn/trunk/docs/api/java/index.html).

When i use the driver.get(completeUrl); method, selenium opens a new firefox window with the site i refer in "completeUrl".

Now, there're many web sites that have videos, music and other heavy content i don't want to download while using selenium with firefox. That because the information i need is included in the first KB of a site.

How can i don't lose time downloading all this content? There's a method of Selenium API that allows me to stop the downloading of a web page in Firefox after some time or KB? Or can it be done with some java method?

Please Help.

RazorMx
  • 365
  • 2
  • 5
  • 15
  • Why are you using selenium for that? I think thats the wrong tech for what you want to do. With Selenium you can simulate user interaction with your website to test its fucntionality. What exactly do you want to achieve? the source code? There are faster and easier methods for that. – Tarken Apr 03 '12 at 08:23
  • yes, i want to achieve the source code of the first bytes of the page. I don't want to download all the page if the informations i need are stored in first bytes. – RazorMx Apr 03 '12 at 08:36

2 Answers2

0

There is no method in Selenium to stop downloading. Selenium is just too strong for this sort of work, it is designed to interact with browsers and behave like a human sitting in front of the computer.

If you just want the HTML code, then use the procedures found at How to fetch HTML in Java or How do you Programmatically Download a Webpage in Java.

Community
  • 1
  • 1
Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • btw, I just realized that if that video is a Youtube video, then you can Stop the download in the context menu on the video itself... – Petr Janeček Apr 03 '12 at 21:45
  • Some nasty websites disallow crawling them in that way, and the only option is to use Selenium. If you think the question is wrong, use comments. You didn't answer the question. – polkovnikov.ph Apr 09 '17 at 01:50
0

Try doing it like this:

import java.io.*;
import java.net.URL;

public class WebsiteReader{
    public static BufferedReader read(String url) throws Exception{
        return new BufferedReader(new InputStreamReader(new URL(url).openStream()));}

public static void main (String[] args) throws Exception{
    BufferedReader reader = read(args[0]);
    String line = reader.readLine();

    while (line != null) {
        System.out.println(line);
        line = reader.readLine(); }}
}

U also can take a look at this topic: Get source of website in java There should be enough info to achieve what you want.

Community
  • 1
  • 1
Tarken
  • 2,112
  • 2
  • 23
  • 42