1

I am trying to develop an instagram scraper; this is my code:

 try {
            
            
            System.out.println("search in https://instagram.com/" + txtUsername.getText() + "?__a=1");
            URLConnection connection = new URL("https://instagram.com/" + txtUsername.getText() + "?__a=1").openConnection();
            
            
            
            /*connection
                    .setRequestProperty("User-Agent",
                            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");*/
            connection
                    .setRequestProperty("Cookie",
                            "sessionid=XXXXXXXXXXXXXXXXXXXXX"); //setting cookie
 
            connection.connect();
            
            BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(),
                    Charset.forName("UTF-8")));
            
            StringBuilder sb = new StringBuilder();
            String line;
            while (r.readLine() != null) {
                    sb.append(r.readLine());
                
            }
            System.out.println(sb.toString());
        } catch (MalformedURLException ex) {
            Logger.getLogger(MainFrame.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(MainFrame.class.getName()).log(Level.SEVERE, null, ex);
        }

I am therefore trying to set a session cookie to simulate a login and be able to view a user's page in order to get the data (followers, following etc. from this link https://www.instagram.com/username/?__a=1 ). The problem is that the cookie is not set and in fact what I receive in output on the console is the source code of the instagram login page, this means that the cookie did not exist (or that the session is wrong but I'm sure it's right ). How can I solve this problem and then set the cookie?

Conta
  • 236
  • 1
  • 6
  • 21

1 Answers1

1

The web server sets the session id cookie. You can find it in Chrome see F12 -> Application-> Cookies and should also be seen in home page headers. You can try two things:

If you want to simulate the login using java core, you need to set with setRequestProperty most of the parameters your browser is sending (in Chrome see F12 -> Network -> Headers ->Request Headers) when you make a login request having set also the initial session. But this approach might not work since there are multiple layers of security in a large enterprise web app. With simple APIs or static web pages it would be simple.

What would have a higher chance of success is using a testing framework such as Selenium with ChromeDriver or Gecko for Mozilla. You just instruct the driver to login with your user and then access the user page then parse the page as you wanted.

Keep in mind that both approaches might not be accepted by Instagram policies or if you succeed, the requests from your IP would be redirected by the developer team.

  • I can't find the headers section in the developer tools – Conta Oct 12 '21 at 16:18
  • If you have NPP or use Intellij with regex you can do something like this (but with all headers): https://i.ibb.co/PmStN3G/npp.png I saw that Instagram encrypts password, so you could have a look over **ig_web_client_password_encryption** in their javascript code. – Narcis Postolache Oct 12 '21 at 18:30
  • This is where you get initial headers and your cookie from (parsed **result**): [Cookie](https://i.ibb.co/fqhf0R8/cookie.png). – Narcis Postolache Oct 12 '21 at 18:51
  • I'm working with netbeans so I'm afraid there isn't a way to verify the cookie setting – Conta Oct 13 '21 at 07:04