0

EDIT:
So I found a way to do it by clicking on the Countries elements, see my answer.
Still have one question that would make this better:
When I execute the scrollIntoView(true) on a country <li> it goes under another element (<div class="sportList_subtitle">Desportos</div>) and is not clickable.

Is there some javascript or selenium function like "scrollIntoClickable"?

ORIGINAL:
I'm trying to scrape info from Betclic website with python and BeautifulSoup + Selenium.
Given the URL for each game has the structure: "domain"/"sports_url"/"competition_url"/"match_url"
Example: https://www.betclic.pt/futebol-s1/liga-dos-campeoes-c8/rennes-chelsea-m2695669
You can try it in your language, they translate the actual URL string but the structure and ID's are the same. The only thing that's left is grabbing all the different "competition_url"

So my question now is from the "sports_url" (https://www.betclic.pt/futebol-s1) how can I get all sub "competition_url"?
The problem is with the "hidden" URL's under each country's name on the left panel. Those only appear after you click the arrow next to each country's name, like a drop-down list. The click event actually adds one class "is-active" to the <li> for that country and also an <ul> at the end of that <li>. It's this added <ul> that has the URL's list I'm trying to get.

Code before click:

<!---->
<li class="sportList_item has-children ng-star-inserted" routerlinkactive="active-link" id="rziat-DE">
    <div class="sportList_itemWrapper prebootFreeze">
        <div class="sportlist_icon flagsIconBg is-DE"></div>
        <div class="sportlist_name">Alemanha</div>
    </div>
<!---->
</li>

Code after click (reduced for presentation):

<li class="sportList_item has-children ng-star-inserted is-active" routerlinkactive="active-link" id="rziat-DE">
    <div class="sportList_itemWrapper prebootFreeze">
        <div class="sportlist_icon flagsIconBg is-DE"></div>
        <div class="sportlist_name">Alemanha</div>
    </div>

    <!---->
    <ul class="sportList_listLv2 ng-star-inserted">
    <!---->
        <li class="sportList_item ng-star-inserted" routerlinkactive="active-link">
            <a class="sportList_itemWrapper prebootFreeze" id="competition-link-5" href="/futebol-s1/alemanha-bundesliga-c5">
                <div class="sportlist_icon"></div>
                <div class="sportlist_name">Alemanha - Bundesliga</div>
            </a>
        </li>(...)
        </li>(...)
        </li>(...)
        </li>
    </ul>
</li>

In this example is that "/futebol-s1/alemanha-bundesliga-c5" that I'm looking for.
Is there a way to get all those URL's? Or the "hiden" <ul> for that matter?
Maybe a way to simulate the click and parse the HTML code again?

Thanks in advance!

Drew
  • 113
  • 1
  • 14
  • It's not really clear what your issue is. Also, what would an expected output be? – baduker Nov 14 '20 at 14:32
  • I want to get all links like: href="/futebol-s1/alemanha-bundesliga-c5". – Drew Nov 14 '20 at 15:37
  • @Drew the tags are within HTML comments. To extract it, see [my previous answer](https://stackoverflow.com/a/64827796/) – MendelG Nov 15 '20 at 04:55
  • @MendelG Theonly comment on the homepage is "Production". There are almos 2k comment tags but they just open and close like: – Drew Nov 15 '20 at 10:30

1 Answers1

0

So I found a way to do it by clicking on the Countries elements.
Still have one question that would make this better:
When I execute the scrollIntoView(true) on a country <li> it goes under another element (<div class="sportList_subtitle">Desportos</div>) and is not clickable.

Is there some javascript or selenium function like "scrollIntoClickable"?

How I'm doing it now:

driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.betclic.pt/"
driver.get(url)

link_set = set()
all_sports = driver.find_element_by_css_selector(
    ("body > app-desktop > div.layout > div > app-left-menu > div >"
    " app-sports-nav-bar > div > div:nth-child(2) > ul")
    ).find_elements_by_tag_name("li")
try:
    cookies = driver.find_element_by_css_selector("body > app-desktop > bc-gb-cookie-banner > div > div > button")
    cookies.click()
except:
    print("Cookie error or not found...")

for sport in all_sports:
    sport.click()
    has_container = driver.find_element_by_tag_name("app-block-ext").size.get('height')>0
    if not has_container:
        for competition in driver.find_elements_by_css_selector("a[id*='block-link-']"):
            link_set.add(competition.get_attribute("href"))
            driver.execute_script("arguments[0].scrollIntoView(true);", competition)
    else:
        driver.execute_script("arguments[0].scrollIntoView(true);", driver.find_element_by_tag_name("app-block-ext"))
        all_countries = driver.find_elements_by_css_selector("li[id^='rziat']")
        for country in all_countries:
            country.click()
            competitions = driver.find_elements_by_css_selector("a[id^='competition-link']")
            for element in competitions:
                link_set.add(element.get_attribute("href"))
            driver.execute_script("arguments[0].scrollIntoView(true);", country)

for link in sorted(link_set):
    print(link)
Drew
  • 113
  • 1
  • 14