Alternative to pandas.read_html where ulr is not unique?

Question

I want to access data from an html table from the section "ERGEBNIS" with python 3.7. The problem is, that the results for each combination of the drop down values are only shown after clicking on submit. This does however not change the url, so that I have no idea how I can access the results table after updating the input values of the drop downs.

Here is what I've done so far:


from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time

browser.get('https://daten.ktbl.de/feldarbeit/entry.html')

#Fix values of the drop down fields:

fertilizer = Select(browser.find_element_by_name("hgId"))
fertilizer.select_by_value("2") 

fertilizer = Select(browser.find_element_by_name("gId"))
fertilizer.select_by_value("193") 

fertilizer = Select(browser.find_element_by_name("avId"))
fertilizer.select_by_value("383")  

fertilizer = Select(browser.find_element_by_name("hofID"))
fertilizer.select_by_value("2") 

fertilizer = Select(browser.find_element_by_name("flaecheID"))
fertilizer.select_by_value("5") 

fertilizer= Select(browser.find_element_by_name("mengeID"))
fertilizer.select_by_value("60") 


# Submit changes to show the results of this particular combination of values

button = browser.find_element_by_xpath("//*[@type='submit']")
button.click()

Submitting the changes does, however, not change the url, so that I don't know how I can access the results (here "ERGEBINS") table.

Otherwise my approach would have been to use pd.read_html somehow like this:

...

url = browser.current_url
time.sleep(1)
df_list = pd.read_html(url, match = "Dieselbedarf")

But since the url isn't unique for each result, this doesn't make sense. Same issue would be with BeautifulSoup, or at least I don't understand how I can do it without a unique url..

Any ideas how I can access the html table otherwise?

EDIT: The answer of @bink1time could solve my problem how to access the table without the url, but via the raw HTML string:

html_source = browser.page_source
df_list = pd.read_html(html_source, match = "Dieselbedarf")

you need to scrap table data inside `ERGEBNIS` based on `AUSWAHL` slection? — Zaraki Kenpachi, Feb 26 '20 at 14:04
Thank you for taking the time to comment, bink1time has answered my question perfectly already. — Jana Keller, Feb 26 '20 at 14:41

score 0 · Accepted Answer · answered Feb 26 '20 at 14:04

0

You can probably just get the html source:

html_source = browser.page_source

According to the docs: https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.read_html.html read_html takes a URL, a file-like object, or a raw string containing HTML. In this case you pass the raw string.

html_source = browser.page_source
df_list = pd.read_html(html_source, match = "Dieselbedarf")

Just a note you don't need to sleep.

answered Feb 26 '20 at 14:04

bink1time

383
1
5
15

Awesome thank you very much! I had considered using the html source code but I was assuming that this would just give me the string of the unedited start page without the changes in the drop downs. Next time I'll try instead of assume... – Jana Keller Feb 26 '20 at 14:26

Alternative to pandas.read_html where ulr is not unique?

1 Answers1

Linked