Taking a certain part of the page with selenium

Question

 from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver import ActionChains
import selenium.webdriver.common.keys
from bs4 import BeautifulSoup
import requests
import time


driver = webdriver.Chrome(executable_path="../drivers/chromedriver.exe")
driver.get("https://www.Here the address of the relevant website ends with aspx.com.aspx")

element=driver.find_element_by_id("ctl00_ContentPlaceHolder1_LB_SEKTOR")
drp=Select(element)

drp.select_by_index(0)

element1=driver.find_element_by_id("ctl00_ContentPlaceHolder1_Lb_Oran")
drp=Select(element1)

drp.select_by_index(41)

element2=driver.find_element_by_id("ctl00_ContentPlaceHolder1_LB_DONEM")
drp=Select(element2)

drp.select_by_index(1)

driver.find_element_by_id("ctl00_ContentPlaceHolder1_ImageButton1").click()
time.sleep(1)
print(driver.page_source)

The last part of these codes, I can print the source codes of the page as a result. So I can get the source codes of the page as a print. But in source codes of the page I just need the following table part written in java. How can I extract this section. and I can output csv as a table. (How can I get the table in the Java section.)

Not:In the Selenium test, I thought of pressing the CTRL U keys while in Chrome, but I was not successful in this.The web page is a user interactive page. Some interactions are required to get the data I want. That's why I used Selenium.


<span id="ctl00_ContentPlaceHolder1_Label2" class="Georgia_10pt_Red"></span>
    <div id="ctl00_ContentPlaceHolder1_Divtable">
        <div id="table">
            <layer name="table" top="0"><IMG height="2" src="../images/spacer.gif" width="2"><br>
                        <font face="arial" color="#000000" size="2"><b>Tablo Yükleniyor. Lütfen Bekleyiniz...</b></font><br>
                    </layer>
        </div>
    </div>

<script language=JavaScript> var theHlp='/yardim/matris.asp';var theTitle = 'Piya Deg';var theCaption='OtomoT (TL)';var lastmod = '';var h='<a class=hislink href=../Hisse/Hisealiz.aspx?HNO=';var e='<a class=hislink href=../endeks/endeksAnaliz.aspx?HNO=';var d='<center><font face=symbol size=1 color=#FF0000><b>ß</b></font></center>';var u='<center><font face=symbol size=1 color=#008000><b>İ</b></font></center>';var n='<center><font face=symbol size=1 color=#00A000><b>=</b></font></center>';var fr='<font color=#FF0000>';var fg='<font color=#008000>';var theFooter=new Array();var theCols = new Array();theCols[0] = new Array('cksart',4,50);theCols[1] = new Array('2018.12',1,60);theCols[2] = new Array('2019.03',1,60);theCols[3] = new Array('2019.06',1,60);theCols[4] = new Array('2019.09',1,60);theCols[5] = new Array('2019.12',1,60);theCols[6] = new Array('2020.03',1,60);var theRows = new Array();theRows[0] = new Array ('<b>'+h+'42>AHRT</B></a>','519,120,000.00','590,520,000.00','597,240,000.00','789,600,000.00','1,022,280,000.00','710,640,000.00');
theRows[1] = new Array ('<b>'+h+'427>SEEL</B></a>','954,800,000.00','983,400,000.00','1,201,200,000.00','1,716,000,000.00','2,094,400,000.00','-');
theRows[2] = new Array ('<b>'+h+'140>TOFO</B></a>','17,545,500,000.00','17,117,389,800.00','21,931,875,000.00','20,844,054,000.00','24,861,973,500.00','17,292,844,800.00');
theRows[3] = new Array ('<b>'+h+'183>MSO</B></a>','768,000,000.00','900,000,000.00','732,000,000.00','696,000,000.00','1,422,000,000.00','1,134,000,000.00');
theRows[4] = new Array ('<b>'+h+'237>KURT</B></a>','2,118,000,000.00','2,517,600,000.00','2,736,000,000.00','3,240,000,000.00','3,816,000,000.00','2,488,800,000.00');
theRows[5] = new Array ('<b>'+h+'668>GRTY</B></a>','517,500,000.00','500,250,000.00','445,050,000.00','552,000,000.00','737,150,000.00','-');
theRows[6] = new Array ('<b>'+h+'291>MEME</B></a>','8,450,000,000.00','8,555,000,000.00','9,650,000,000.00','10,140,000,000.00','13,430,000,000.00','8,225,000,000.00');
theRows[7] = new Array ('<b>'+h+'292>AMMI</B></a>','-','-','-','-','-','-');
theRows[8] = new Array ('<b>'+h+'426>GOTE</B></a>','1,862,578,100.00','1,638,428,300.00','1,689,662,540.00','2,307,675,560.00','2,956,642,600.00','2,121,951,440.00');
var thetable=new mytable();thetable.tableWidth=650;thetable.shownum=false;thetable.controlaccess=true;thetable.visCols=new Array(true,true,true,true,true);thetable.initsort=new Array(0,-1);thetable.inittable();thetable.refreshTable();</script></form>
                                    <div style="clear: both; margin-top: 10px;">

<div style="background-color: Red; border: 2px solid Green; display: none">
    TABLO-ALT</div>
<div id="Bannerctl00_SiteBannerControl2">
    <div id="_bannerctl00_SiteBannerControl2">
        <div id="Sayfabannerctl00_SiteBannerControl2" class="banner_Codex">
        </div>

I've updated my answer with very basic BeautifullSoup implementation I've just googled. Please, note that I haven't tested it. — DGoiko, May 30 '20 at 15:07
Hey myfirend, I have no information to understand what you say, please tell me step by step. :( Because my knowledge on this issue is insufficient. — Hermes2, May 30 '20 at 18:56
I think there is a solution in this class you gave. Thank you mate I will try this method :) https://stackoverflow.com/questions/13960326/how-can-i-parse-a-website-using-selenium-and-beautifulsoup-in-python — Hermes2, May 30 '20 at 19:23
I think there is a solution in this class you gave. Thank you mate I will try this method I think I will try the solution in this link you provided :). :) https://stackoverflow.com/questions/13960326/how-can-i-parse-a-website-using-selenium-and-beautifulsoup-in-python — Hermes2, May 30 '20 at 19:23
html=driver.page_source soup=BeautifulSoup(html) for tag in soup.find_all('title'): print(tag.text) This code worked, :)) so how can I get the above java part. — Hermes2, May 30 '20 at 19:29
``` theRows[1] = new Array (''+h+'427>SEEL','954,800,000.00','983,400,000.00','1,201,200,000.00','1,716,000,000.00','2,094,400,000.00','-'); theRows[2] = new Array (''+h+'140>TOFO','17,545,500,000.00','17,117,389,800.00','21,931,875,000.00','20,844,054,000.00','24,861,973,500.00','17,292,844,800.00'); — Hermes2, May 30 '20 at 19:32

score 0 · Answer 1 · edited Feb 18 '21 at 20:03

Please, note that I've only used Selenium in Java, so I'll give you the most generic and languaje-agnostic answer I can. Keep in mind that Python Selenium MAY provide a method to do this directly.

Steps:

Make all Selenium interactions so the WebDriver actually has a VALID page version with all your contents loaded
Extract from selenium the current contents of the whole page
Load it with a HTML parsing library. I use JSoup in Java, I don't now if there's a Python version. From now on, Selenium does not matter.
Use CSS selectors on your parser Object to get the section you want
Convert that section to String to print.

If performance is a requeriment this approach may be a bit too expensive, as the contents are parsed twice: Selenium does it first, and your HTML parser will do it again later with the extracted String from Selenium.

ALTERNATIVE: If your "target page" uses AJAX, you may directly interact with the REST API that javascript is accesing to get the data to fill for you. I tend to follow this approach when doing serious web scraping, but sometimes this is not an option, so I use the above approach.

EDIT

Some more details base on questions in comments:

You can use BeautifullSoup as a html parsing library.

To load a page in BeautifullSoup use:

html = "<html><head></head><body><div id=\"events-horizontal\">Hello world</div></body></html>"
soup = BeautifulSoup(html, "html.parser")

Then look at this answer to see how to extract the specific contents from your soup:

your_div = soup.select_one('div#events-horizontal')

That would give you the first div with events-horizontal id:

<div id="events-horizontal">Hello world</div>

BeautifullSoup code based on:

How to use CSS selectors to retrieve specific links lying in some class using BeautifulSoup?

I did understand the logic of what you said. But I'm just new learning to code. I do not have enough experience and knowledge. But I'm trying. How to load a page's source code into a parser library ? I hope I can make progress only if you help me with this. "Extract from selenium the current contents of the whole page Load it with a HTML parsing library. " — Hermes2, May 30 '20 at 14:50
@Hermes2 I found a stackoverflow answer that uses BeautifullSoup to load a object from String: https://stackoverflow.com/questions/37997702/how-to-convert-a-string-into-a-beautifulsoup-object Tell me if you need more details. — DGoiko, May 30 '20 at 14:59
@Hermes and use this answeer https://stackoverflow.com/questions/24801548/how-to-use-css-selectors-to-retrieve-specific-links-lying-in-some-class-using-be to retireve the parts you ned with BeautifullSoup — DGoiko, May 30 '20 at 15:03
Selenium receives data by selecting dropbox list etc. fields on the web page. Therefore, I cannot use the website link directly. Selenium commands go to the page, select the required fields from the lists and bring the information by pressing a button. I can get the entire page source. No problem so far. I need to get a certain portion of the page source before printing. How can I parse? I cannot access this interactive information if I use the direct web site link in bs4. — Hermes2, May 30 '20 at 19:09

Taking a certain part of the page with selenium

1 Answers1