1

I'm trying to get the html of an instagram profile page but when I use the requests library it gets the html of the loading screen and I want the html of the page after loading. This is my code:

from bs4 import BeautifulSoup
import requests

source = requests.get("https://www.instagram.com/ethieen/").text
soup = BeautifulSoup(source,"lxml")
body = soup.find("body")

print(body.prettify())
martineau
  • 119,623
  • 25
  • 170
  • 301
  • 2
    I'm guessing the page content is loaded via javascript so you'll need something that understands javascript such as Selenium. –  Jun 08 '20 at 18:12
  • Give a try, https://stackoverflow.com/a/27652558/4985099 – sushanth Jun 08 '20 at 18:24

2 Answers2

1

The side probably uses JavaScript, so you want be able to access it with BeautifulSoup since it does not support JavaScript.

To test this you can deactivate JS in your browser and then navigate to that page. The things that are loaded are the things you can access via BeautifulSoup.

capek
  • 241
  • 1
  • 7
  • oh, and you know a library that may help with this? – Jayex Designs Jun 08 '20 at 18:26
  • Selenium could do it. – capek Jun 08 '20 at 18:42
  • but do you know if selenium can work without opening the actual page in a browser? I may have to use it but also I need a perfect performance and browsers may not do that good – Jayex Designs Jun 08 '20 at 18:49
  • You can start selenium without showing the browser, but in the background it will still use it. Therefore you will not get "perfect" performance. If you need to download many pages you could try to use multi threading (not sure if selenium works with it, but I don't see a reason why not). – capek Jun 08 '20 at 19:04
  • Alright thanks, is not that I need to use multiple pages but the thing is that I use my old laptop as a server for it so it is slow and it can not handle a lot of petitions, that's why I need the best performance I can get. But if can't find another way I'll just try this one. Thanks again – Jayex Designs Jun 08 '20 at 19:13
  • Oh, I thought you meant speed. Only way to find out if it runs on your laptop is to try it out. If it's able to run chrome with one tab open it should be able to run your script. – capek Jun 08 '20 at 19:22
0

This page load by js (ajax). you can do this with puppeteer

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.instagram.com/ethieen', {waitUntil: 'networkidle2'});
  await page.pdf({path: 'hn.pdf', format: 'A4'});

  await browser.close();
})();
Ahmed ElMetwally
  • 2,276
  • 3
  • 10
  • 15