0

How do I Connect to website with authentication in Java?

I have tried the method as suggested here : Connecting to remote URL which requires authentication using Java

But I get some login page as the response as shown below :


  <div class='field'>
    <label for="user_login">Login</label>
    <input autocomplete="off" class="initialFocus" id="user_login" name="user[login]" size="30" tabindex="1" type="text" />
  </div>
  <div class='field'>
    <label for="user_password">Password</label>
    <input autocomplete="off" class="pass_field" id="user_password" name="user[password]" size="30" tabindex="2" type="password" />
  </div>
  <input id="authentication" name="authentication" type="hidden" value="basic" />
  <div class='field submit'>
    <label for="login">&nbsp</label>
    <input class="button" name="commit" tabindex="4" type="submit" value="Log In" />
  </div>

fastcodejava
  • 39,895
  • 28
  • 133
  • 186
  • 1
    What sort of authentication? That link looks like it's talking about header-based HTTP Basic Auth, whereas the form you're showing looks to be an HTML form intended for a user to fill out and submit, probably via a POST request to some other endpoint to start a browsing session. – Silvio Mayolo Sep 24 '22 at 00:59

1 Answers1

1

The short answer: You can't. What you want is impossible.

The longer answer: Well, you sort of kind of can but it is incredibly complicated and requires continuous maintenance. It could break any moment. More likely you want to do something quite different and check for an API and use that instead.

Explanation

The word "authentication" means different things and this has confused you here. The HTTP protocol has a notion of 'authentication' baked right into it. The problem is, nobody uses this. I bet you've never seen this used in your entire lifetime. A website that uses this baked in principle would mean when you load it, the browser itself pops up a dialog asking you to log into the site.

What you have seen a billion times is just a webpage: It so happens to have a form with 2 fields: A username field, and a password field. Hopefully 3 (also a TOTP or other two-factor field). It's like any other form on the web. Fill it in, click 'submit', and a form submit goes out. Or, a bunch of javascript runs and does all sorts of ajaxy business to make it work. The other SO question you linked to talks about specifically getting HTTP Authentication to work. The snippet of HTML you included is not, however, 'HTTP Authentication'. It's authentication allright - just not the thing described under the header 'Authentication' in the HTTP specification. It's just a web form, like any other.

The problem is: Every website invents this stuff over and over again. There is no standard. Thus there is no simple 'just do this and voila'. The answer depends on the website. Worse, the website can change tomorrow and now the code you crafted specifically to authenticate for that one site, now no longer works. Websites generally do not promise that the way the pages are internally linked together never changes. Thus, if you go down this route, you have to be vigilant: Any second, it could break, and you have to rewrite it all.

If you insist on doing this, it's simple enough: Your question boils down to "How do I programatically submit web forms?", which is easy enough. Plenty of tutorials. You do whatever the browser does. Hopefully it's a simple form submit. Possibly you need to emulate the javascript that gets the data out, and applies all sorts of crypto to it client side first. You can go all the way (actually run the javascript in your JVM, this is quite complicated), or check what the javascript does, and rewrite it all in java too. Also quite complicated (there are no easy answers, that's why I gave you the short answer of: You can't).

The better solution

More generally, writing software that uses HTML as a source just doesn't work. HTML, fundamentally, isn't really designed for consumption by program code. Websites restyle all the time, and rarely put in the effort to make data particularly easy to fetch with e.g. selector queries.

Thus, the best option is to simply not do that. Any website that intends for programs to use it will have an API, with documentation to boot. These API docs will explain how to authenticate from within code. It won't involve 'sending a web form submit'. It'll probably involve OAuth or JWT (JSON Web Tokens), or something like that. And probably also involve API keys.

Twitter has such an API. My bank has such an API. I believe even stack overflow has such an API. YouTube does, too.

If the website you want to 'authenticate' into has such an API, read the docs and do that. If it doesn't, you need to come to grips first with the notion that they very much don't want you to do what you want to do (which is, use code to 'read' the site), probably would take legal steps to stop you if they can (especially in e.g. the EU, they probably can't, but they might want to try), and in any case, anytime the site changes their style a bit, your code breaks. Even if the authentication part didn't change, you're authenticating for some purpose - the purpose would. Say, you want to use this to log into twitter to then read the tweets. Even if twitter's login page does not change, if they add, say, polls, your 'tweet parser' probably crashes upon encountering one.

The solution is to use twitter's API instead.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72