0

I want to scrape all tables from a site. The automation is required to reach the tables, so you might consider that. My attempt with research is the following:

from selenium.webdriver import Firefox
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import time

driver = Firefox(executable_path='/Users/.../PycharmProjects/Sportwinner/geckodriver')
driver.get("https://bskv.sportwinner.de/")
element = driver.find_element(By.ID, "id-button-einstellungen")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
driver.find_element(By.ID, "id-button-einstellungen").click()
element = driver.find_element(By.CSS_SELECTOR, "body")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
driver.find_element(By.ID, "id-klub-name").click()
driver.find_element(By.ID, "id-klub-name").send_keys("Dreieck Schweinfurt")
driver.find_element(By.ID, "id-button-einstellungen-option-ok").click()
time.sleep(1)
driver.find_element(By.ID, "id-dropdown-liga").click()
driver.find_element(By.LINK_TEXT, "Letzte Spielwoche").click()

tableContent = driver.find_elements_by_css_selector("id-table-spiel tr")
for row in tableContent:
    print(row.text)

Since I just heard about Selenium a couple of hours ago, I am a total noobie. I have no clue if this works, because I don't see any output. Is anybody able to help me with my attempt (I guess it's not correct) and how it is possible for me to see the result? I am using PyCharm for compiling.

vinceling
  • 5
  • 6

2 Answers2

1

The execution was so fast that it was not able to extract details from the table.

You need apply Implicit wait or Explicit waits so that the table data shows up and can be able to extract details.

# Imports Required
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

...
driver = webdriver.Chrome(executable_path="chromedriver.exe") # Have tried in Chrome
driver.implicitly_wait(20)

# Or apply Explicit wait like below.
wait = WebDriverWait(driver,30)
wait.until(EC.presence_of_element_located((By.XPATH,"//table[@id='id-table-spiel']//tbody/tr")))

tableContent = driver.find_elements_by_xpath("//table[@id='id-table-spiel']//tbody/tr//div")
for row in tableContent:
    print(row.get_attribute("innerText")) # row.text works too.

You can try like this:

tableContent = driver.find_elements_by_xpath("//table[@id='id-table-spiel']//tbody/tr//a")

for i in range(len(tableContent)):
    tableContent[i].click() # Clicks on the "+" icon
    innerrows = driver.find_elements_by_xpath("//tr[@class='detail-view'][{}]//tr".format(i+1)) #Find the rows inside the 1st and 2nd row.
    for inrow in innerrows:
        elemnets = inrow.find_elements_by_xpath(".//div") # Data are in "div" tags
        data = [] #Collect each row in a list
        for j in elemnets:
            data.append(j.text)
        print(data)
['', '', '1', '', '2', '', '3', '', '4', '', 'Kegel', '', 'SP', '', 'MP', '', '', '', 'MP', '', 'SP', '', 'Kegel', '', '4', '', '3', '', '2', '', '1', '', '', '']
['Krug, Tobias', '141', '141', '136', '86', '141', '152', '124', '131', 'Brandl, Gerald']
['Keller, Ralf', '148', '135', '139', '130', '140', '111', '154', '145', 'Haschke, Jens']
pmadhu
  • 3,373
  • 2
  • 11
  • 23
  • Thanks for your answer. I thought "time.sleep(1)" would solve this but as I said, I'm a beginner. Problem with your solution is, I need the expanded table content and not just the inner Text(?). Do you know how that would be done? – vinceling Sep 27 '21 at 14:24
  • @vinceling - Updated the answer for the same. – pmadhu Sep 27 '21 at 14:53
  • Did it work for you by any chance? The output is the same as before for me. The automation on clicking the + doesn't seem to work I guess - at least the table is not expanding – vinceling Sep 27 '21 at 15:01
  • @vinceling - No, the code clicked on "+" and extracted all the details. See above sample output. – pmadhu Sep 27 '21 at 15:13
  • 1
    Yeah that was my bad, I replaced the former tableContent part instead of adding the new one afterwards. Thanks my dude, you helped me a lot! – vinceling Sep 27 '21 at 15:28
  • I have a further question. Is there any possibility to redirect the output of your code to a file such as csv? @pmadhu – vinceling Sep 29 '21 at 09:14
  • @vinceling - Yes we can, and this has already been addressed. [Link1](https://stackoverflow.com/q/36755214/16452840), [Link2](https://stackoverflow.com/q/46026399/16452840), [Link3](https://stackoverflow.com/q/50778481/16452840) – pmadhu Sep 29 '21 at 10:11
  • Yeah, I already did some research on this topic. My problem with this is that "data" is only visible in the print(data) line(?). Respectively, after the for loop data is empty with [] if I redirect the output to a file. – vinceling Sep 29 '21 at 10:35
  • @vinceling - That's because `data` list is being created within the `for` loop. Create a list outside the loop and append `data` (It will become list of list) to that. Then when you print that list, you will have the list of data. – pmadhu Sep 29 '21 at 10:40
1

Once you reach to the desired page, by doing Letzte Spielwoche" in the drop down menu "Eine Liga auswählen" to see the tables

You can use this code :

wait = WebDriverWait(driver, 30)
table = wait.until(EC.visibility_of_element_located((By.ID, "id-table-spiel")))
size_of_table = driver.find_elements(By.XPATH, "//table[@id='id-table-spiel']//descendant::tr")
j = 1
for i in range(len(size_of_table)):
    element = driver.find_elements(By.XPATH, f"(//table[@id='id-table-spiel']//descendant::tr)[{j}]")
    driver.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.get_attribute('innerText'))
    j = j + 1

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
cruisepandey
  • 28,520
  • 6
  • 20
  • 38
  • I appreciate your help but for me it's not working so far. The error states: driver.execute_script("arguments[0].scrollIntoView(true);", element) – vinceling Sep 27 '21 at 14:39