0

I try to set the proxy to conduct the scrapping google new search .

However, it appears the errors:

Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - Erroneous sym type: org.jsoup.Connection.proxy.userAgent.ignoreHttpErrors.followRedirects.timeout.ignoreContentType.get
    at javaapplication27.JavaApplication27.main(JavaApplication27.java:47)

RED LINE ERROR-->

cannot find symbol .Symbol :method proxy( proxy )

location:interface connection

on this line:

 Document document = Jsoup.connect(string+"&start="+(j+0)*10)
        .proxy(proxy)
        .userAgent(userAgent)
        .ignoreHttpErrors(true)
        .followRedirects(true)
        .timeout(100000)
        .ignoreContentType(true)
        .get();

_

Proxy proxy = new Proxy(                                      //
    Proxy.Type.HTTP,                                      //
    InetSocketAddress.createUnresolved("127.0.0.1", 8080) //
);
for (int j=0;j<3;j++) {
    Document document = Jsoup.connect(string+"&start="+(j+0)*10)
        .proxy(proxy)
        .userAgent(userAgent)
        .ignoreHttpErrors(true)
        .followRedirects(true)
        .timeout(100000)
        .ignoreContentType(true)
        .get();
    Elements links = document.select( ".r>a");
    ......
 }

My imports

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.URLDecoder;
import java.net.URLEncoder;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import static java.util.concurrent.TimeUnit.*;

How to fix?

Vito
  • 299
  • 3
  • 19
  • Take a look at this post may help you :http://stackoverflow.com/questions/7482748/how-to-add-proxy-support-to-jsoup-html-parser – JHDev Jan 23 '17 at 11:18
  • this sounds like a compilation error but I cannot reproduce – Nicolas Filotto Jan 23 '17 at 11:22
  • @Nicolas Filotto should i post the full code? – Vito Jan 23 '17 at 11:23
  • At which line and when (compilation or runtime) do you get this error exactly? – Nicolas Filotto Jan 23 '17 at 11:25
  • Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - Erroneous sym type: org.jsoup.Connection.proxy.userAgent.ignoreHttpErrors.followRedirects.timeout.ignoreContentType.get at javaapplication27.JavaApplication27.main(JavaApplication27.java:47) Java Result: 1 – Vito Jan 23 '17 at 11:25
  • `Document document = Jsoup.connect(string+"&start="+(j+0)*10).proxy(proxy).userAgent(userAgent). ignoreHttpErrors(true).followRedirects(true).timeout(100000).ignoreContentType(true).get();`<--error in this line – Vito Jan 23 '17 at 11:25

1 Answers1

1

The .proxy() method was first available in jsoup 1.9.1. What version are you using?

Also when I copied your code to test it, I found there are invisible zero-width spaces throughout which could be causing the syntax errors you're getting (your question talks about both missing interfaces and syntax errors).

Jonathan Hedley
  • 10,442
  • 3
  • 36
  • 47
  • i m using the jsoup 1.10 . I have fixed the syntax error ,but the 403 error still exists . – Vito Jan 24 '17 at 02:27
  • What 403 error? You hadn't mentioned that. At any rate that's a different problem - a forbidden response from the server you are connecting to. You can check the other existing answers for some suggestions. – Jonathan Hedley Jan 24 '17 at 17:01
  • `Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://ipv4.google.com/sorry/index?continue=http://www.google.com/search%253Fq%253Dstackoverflow%2526tbm%253Dnws%2526tbs%253Dcdr%2525253A1%2525252Ccd_min%2525253A5%2525252F30%2525252F2016%2525252Ccd_max%2525253A6%2525252F30%2525252F2016%2526start%253D0&q=EgTKLTckGKH5hsQFIhkA8aeDS-3IYZmr41q-m4rIMh7Uw7vC3wdLMgNyY24 at ` – Vito Jan 25 '17 at 04:07