0

Currently, I am aware that you can web scrape by first defining a URL, opening it, and reading the information.

For instance, in this link https://realpython.com/python-web-scraping-practical-introduction/, one of the first steps is to set the url to the website you want to scrape.

However, I am looking for a way to get information from the currently open window. Is there a way to screen/web scrape from an open chrome tab?

vitaliis
  • 4,082
  • 5
  • 18
  • 40
Tony Qu
  • 51
  • 6
  • 1
    I don't think that's how webscraping work, why can't you open the tab with selenium for example? – Celius Stingher May 19 '21 at 23:52
  • @CeliusStingher I want it to work with different urls, and also I don't know how to get the url of an open chrome tab through code – Tony Qu May 19 '21 at 23:58
  • 1
    Check Selenium then, might be helpful – Celius Stingher May 20 '21 at 00:06
  • I don't think you can get this - even with Selenium. As for me it is job rather for browser extension (created in JavaScript). Extension runs in already opened browser and it may have access to all opened tabs - and it could save it in file or send to web page (ie. created with Flask) which will get url and use it. – furas May 20 '21 at 10:09
  • Firefox/Chrome use database SQLite to keep some information - bookmarks, history - and probably they use it also to keep information about opened tabs (to recreate them when you start browser again) and you could check all databases in Firefox/Chrome folder with profiles and search if one of tables has values like you have in opened browser. – furas May 20 '21 at 10:12

1 Answers1

-1

With Selenium the easiest case is:

from selenium import webdriver

driver = webdriver.Chrome(executable_path='path to chromedriver')
driver.get("https://realpython.com/python-web-scraping-practical-introduction/")
print(driver.current_url)

The code also depends on what you want to get from the page. Let's imagine you want to get Table of Contents.

For this you will need to wait for it to appear and to get elements text:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get("https://realpython.com/python-web-scraping-practical-introduction/")
print(driver.current_url)

WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".article-body .toc li")))
toc = driver.find_elements_by_css_selector(".article-body .toc li")
for el in toc:
    print(el.text)

driver.close()
driver.quit()

The output will be:

https://realpython.com/python-web-scraping-practical-introduction/
Scrape and Parse Text From Websites
Your First Web Scraper
Extract Text From HTML With String Methods
A Primer on Regular Expressions
Extract Text From HTML With Regular Expressions
Check Your Understanding
Your First Web Scraper
Extract Text From HTML With String Methods
A Primer on Regular Expressions
Extract Text From HTML With Regular Expressions
Check Your Understanding
Use an HTML Parser for Web Scraping in Python
Install Beautiful Soup
Create a BeautifulSoup Object
Use a BeautifulSoup Object
Check Your Understanding
Install Beautiful Soup
Create a BeautifulSoup Object
Use a BeautifulSoup Object
Check Your Understanding
Interact With HTML Forms
Install MechanicalSoup
Create a Browser Object
Submit a Form With MechanicalSoup
Check Your Understanding
Install MechanicalSoup
Create a Browser Object
Submit a Form With MechanicalSoup
Check Your Understanding
Interact With Websites in Real Time
Conclusion
Additional Resources
vitaliis
  • 4,082
  • 5
  • 18
  • 40