-1

If you enter this in a browser url:

https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019

It returns a lot of data. But if I try to capture that data with an Input StreamReader, the only data returned is

{"retHTML":"", "rlist":""}

Here is the program:

List<Property> scrapePropertyInfo(List<Date> auctionDates) {
    List<Property> properties = new ArrayList<>();
    String urlStr = "https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019";
    String str = null;
    try {
        URL url = new URL(urlStr);
        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
        StringBuilder stringBuilder = new StringBuilder();
        while ((str = in.readLine()) != null) {
            stringBuilder.append(str);
        }
        System.out.println("Url: "+urlStr);
        System.out.println(stringBuilder.toString());
        in.close();
    } catch (MalformedURLException ex) {
        Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
    }
    return properties;
}

Does anybody know why?

Edit: a little smarter now So apparently more stuff is required to be sent to the server than just the url. Since this is dynamic ajax data being populated only if you ask it nice using the original web page, need to simulate that in java.

I discovered how to get that info in the chrome F12 debugger console. Under Network->XHR->Preview, click on each item until you see the expected data. Then right-click on it and select Copy->Copy Request Headers.

Here is what got copied:

GET /index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563231065712&bypassPage=1&test=1&_=1563231065712 HTTP/1.1 Host: charlotte.realforeclose.com Connection: keep-alive Accept: application/json, text/javascript, /; q=0.01 X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36 Origin: http://evil.com/ Referer: https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/16/2019 Accept-Encoding: gzip, deflate, br Accept-Language: en-US,en;q=0.9 Cookie: cfid=6f228aa1-bb7e-4734-92ff-39eabf23ed9b; cftoken=0; CF_CLIENT_CHARLOTTE_REALFORECLOSE_TC=1563229207612; AWSELB=E7779D5F1C1F6ABE3513A5C5B6B0C754520B66675A407900314ABAC5333A52E93FD1A8D7401D89BC8D5E8B98059C8AAC5507D12A2C6ED07F7E7CB77311BD7FB09B738DB945; _ga=GA1.2.1823487290.1563231012; _gid=GA1.2.1418453663.1563231012; _gat=1; _gcl_au=1.1.273755450.1563231013; __utma=65865852.1823487290.1563231012.1563231014.1563231014.1; __utmc=65865852; __utmz=65865852.1563231014.1.1.utmcsr=realauction.com|utmccn=(referral)|utmcmd=referral|utmcct=/client-sites; __utmt_UA-51657054-1=1; __utmb=65865852.2.10.1563231014; testcookiesenabled=enabled; CF_CLIENT_CHARLOTTE_REALFORECLOSE_LV=1563231067363; CF_CLIENT_CHARLOTTE_REALFORECLOSE_HC=73

Now how do I get that into the request from java? I know how to do it in javascript but not java.

user3217883
  • 1,216
  • 4
  • 38
  • 65
  • When you navigate a page in a browser, it makes all work with headers for you. So when you open a page first time, a browser receives cookies from server, and sends them to server in further requests. Make sure your code processes cookies in right way, and other necessary headers supplied also, like when you open the page in the browser. – omegastripes Jul 15 '19 at 21:11

2 Answers2

1

Actually, I opened your URL in the browser and got

{"retHTML":"", "rlist":""}

Then I wrote my own code similar to yours and got the same String in response. So for me browser and Java code fetched the same info. But It is easily explainable how it doesn't have to be the case. Server can check and detect whether or not client that sends request is a browser and what kind and from which location request was sent. Based on those details server can send back customized response.

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36
  • Interesting. Which browser did you use? I'm still getting data in chrome. But I'm no longer able to see any entries under the Network>XHR in the F12 debugger console. – user3217883 Jul 15 '19 at 21:32
  • I used chrome as well – Michael Gantman Jul 15 '19 at 21:35
  • @user3217883, I've got the same response ({"retHTML":"", "rlist":""}) too. Both in Firefox and Chrome. Try to clear cache of your browser and try one more time - I guess, server became broken but your browser returns cached value. – Bor Laze Jul 15 '19 at 21:36
  • Ah, yes, that is what was happening. Man its hard to diagnose when your getting cached data! Thank you. So now I have to figure out why the server is not returning data like it used to. Too many attempts? – user3217883 Jul 15 '19 at 22:24
  • Do you still need help on this? – Michael Gantman Jul 16 '19 at 17:29
0

Try running this – it will fetch that url and display the output:

curl "https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019"

So the behavior you're seeing isn't something Java is (or isn't) doing.

I suspect that the remote server is looking at the inbound HTTP request and deciding what to return. In your Java code, as with this simple curl example, there are no browser headers, user agent, etc. so the server is probably giving a generic answer because of that.

As another test, you could try changing your Java code to something else:

String urlStr = "http://duckduckgo.com";
Kaan
  • 5,434
  • 3
  • 19
  • 41
  • curl also returned {"retHTML":"", "rlist":""} – user3217883 Jul 15 '19 at 21:01
  • Yes, that's why I stated that the behavior you're seeing isn't something Java is (or isn't) doing. ;) – Kaan Jul 15 '19 at 21:04
  • Got it. Good clue. So what is the right way to get data from a java program as if it were the browser making the request? – user3217883 Jul 15 '19 at 21:05
  • It depends on the remote server. Any web server is free to look at inbound request details and handle things one way or another. It might try to determine if the request is from a mobile device vs. computer, or a certain browser type, or certain IP region, or if there's a valid referrer, or any number of things. If your intention is to consume the specific `charlotte.realforeclose.com` website, you could switch to using a console debugger (Firefox and Chrome both have them built in) to see what request headers _they_ send, then try to replicate those in your Java code. – Kaan Jul 15 '19 at 21:09
  • That's exactly how I got the url, from the F12 XHR Header, as someone kindly helped with that here: https://stackoverflow.com/questions/57033212/how-to-scrape-web-page-that-doesnt-show-its-data – user3217883 Jul 15 '19 at 21:12