0

I am trying to scrape Ark Invest's CIO commentary page from 2020 Q1 to 2021 Q4 link using Selenium Webdriver.

I wanted the code to extract texts until the paragraph that said "To read a summary of ARK’s biggest contributors and detractors, please see below."

I use the following code to do the work for me. It worked fine scraping 2020Q1 to 2021Q1.

para = 1
while True:
    comment = driver.find_element_by_xpath('/html/body/div[2]/div[2]/div[2]/div[1]/div/div[1]/div/p[' + str(para) + ']').text
    if comment != "To read a summary of ARK’s biggest contributors and detractors, please see below.":  
        if para != 1:
            globals()[ark_file_name] += comment
            print(para)
            para += 1
        else:
            globals()[ark_file_name] = comment
            print(para)
            para += 1
    else:
        para = 1

However, when it comes to 2021Q2, the following error appears:

NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div\[2\]/div\[2\]/div\[2\]/div\[1\]/div/div\[1\]/div/p\[6\]"}
(Session info: chrome=99.0.4844.83)

Appreciate if you could help!

cruisepandey
  • 28,520
  • 6
  • 20
  • 38
  • Have you actually looked at the DOM for that page? It's probably not the same structure. This is why long xpath queries are unreliable. You should use HTML ids or classes instead. – Tim Roberts Mar 22 '22 at 16:59
  • mention the steps to go to `2020 Q1`, share the code for the same. as suggest by tim you should not be using absolute xpath rather use relative xpath. – cruisepandey Mar 22 '22 at 17:02

1 Answers1

0

To scrape Ark Invest's CIO commentary page from 2020 Q1 to 2021 Q4 link till the paragraph To read a summary of ARK’s biggest contributors and detractors, please see below using Selenium you can use list comprehension and you can use the following locator strategy:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver.get('https://ark-funds.com/articles/commentary/q2-2021-commentary/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#hs-eu-confirmation-button[role='button']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#close-board-popup"))).click()
print("*****Comments as list items*****")
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.b-detail__container.js-single-content > div.row div.b-detail__container-left p")[:7]])
print("*****Comments as a paragraph*****")
print(', '.join([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.b-detail__container.js-single-content > div.row div.b-detail__container-left p")[:7]]))

Console Output:

*****Comments as list items*****
['During the second quarter, broad-based global equity indexes – as measured by the S&P 500 and MSCI World – continued to appreciate as many economies began to reopen in response to successful vaccination rollouts. On Capitol Hill, the Biden Administration continued negotiations on what could become a bipartisan infrastructure bill and another source of economic stimulus. Meanwhile, in ARK’s view, the odds of significant capital gains and income tax increases in the US have declined thanks to an agreement signed by the vast majority of 130 countries that, if ratified in October, would impose a 15% minimum tax rate on large global corporations. Now that midterm election campaigns are in early stages, narrow majorities in both Houses of Congress also could lower the probability of onerous tax measures that appear to be unpopular. The US yield curve flattened slightly as the 10-year Treasury bond yield fell to 1.47%, well below the 1.74% peak posted at the end of March. In other words, the bond market does not seem to be corroborating the fears of inflation that have dominated headlines recently.', 'In ARK’s view, inflation will prove temporary thanks both to the base effects caused by price collapses during the coronavirus crisis last year and to supply chain bottlenecks that could be causing double- and triple-ordering of supplies which, in turn, could lead to a significant inventory overhang and a commodity price collapse. With the exception of oil, cracks in the commodity markets already are apparent. From their peaks in the second quarter, lumber prices have dropped more than 57% from $1,686 to $716 per thousand board feet while copper prices have dropped roughly 13% from $4.78 per pound to $4.16. Oil prices probably will not be far behind despite shareholder demands for cutbacks in energy-related capital spending. If drivers in the ride-sharing space migrate to electric vehicles and take advantage of the lower total cost of ownership relative to gas-powered vehicles, any decline in oil prices could be exacerbated.', 'In ARK’s view, exacerbating the cyclical deflation will be two secular sources of deflation, one good for economic activity and another deleterious. Innovation is the source of good deflation, as learning curves cut costs and increase productivity. Instead of investing to capitalize on the exponential opportunities associated with the 14 technologies evolving today, however, many companies have catered to short-term oriented shareholders who have demanded results “now”, leveraging their balance sheets to buy back stock, bolster earnings, and increase dividends. As a result, facing the disintermediation and disruption associated with aging products and services, they could be forced to cut prices to clear inventories and service bloated debts, resulting in deflation with a deleterious impact on economic activity.', 'If ARK is correct that the risk to the outlook is deflation, not inflation, then nominal GDP growth is likely to be much lower than expected, suggesting that scarce double-digit growth opportunities will be rewarded accordingly. Growth stocks in general and innovation-driven stocks in particular could be the prime beneficiaries.', 'During the second half of the quarter, value gave way to growth in performance. Through mid-May, as commodity prices soared, the rotation from growth/innovation toward value/cyclical strategies gained momentum. ARK believes that this rotation has broadened and strengthened the bull market, preventing another tech and telecom bubble and likely setting the stage for another leg up in innovation-based strategies. In late May and June, after the reset in growth valuations and the drop in commodity prices like lumber and copper, investors began to return to growth/ innovation at the expense of value/cyclical strategies. In ARK’s view, the coronavirus crisis transformed the world significantly and permanently, suggesting that many innovation-driven stocks could be productive holdings during the next five to ten years. Among the largest beneficiaries of the rotation toward cyclicals during the past six to nine months have been two sectors that ARK believes will be disrupted the most by innovation during the next five years: Energy and Financial Services. In ARK’s view, autonomous electric vehicles and digital wallets, including cryptocurrencies and decentralized financial services (DeFi) associated more broadly with blockchain technologies, will disrupt and disintermediate both Energy and Financial Services significantly during the next five years.', 'ARK’s five actively managed thematically focused ETFs and two self-indexed ETFs appreciated but underperformed relative to the S&P 500 and MSCI World Indexes during the second quarter. That said, the ARK Innovation ETF (ARKK), a concentrated portfolio of high conviction names with exposure to all of the ARK’s disruptive innovation themes, outperformed the broad-based global indexes.', 'To read a summary of ARK’s biggest contributors and detractors, please see below.']
*****Comments as a paragraph*****
During the second quarter, broad-based global equity indexes – as measured by the S&P 500 and MSCI World – continued to appreciate as many economies began to reopen in response to successful vaccination rollouts. On Capitol Hill, the Biden Administration continued negotiations on what could become a bipartisan infrastructure bill and another source of economic stimulus. Meanwhile, in ARK’s view, the odds of significant capital gains and income tax increases in the US have declined thanks to an agreement signed by the vast majority of 130 countries that, if ratified in October, would impose a 15% minimum tax rate on large global corporations. Now that midterm election campaigns are in early stages, narrow majorities in both Houses of Congress also could lower the probability of onerous tax measures that appear to be unpopular. The US yield curve flattened slightly as the 10-year Treasury bond yield fell to 1.47%, well below the 1.74% peak posted at the end of March. In other words, the bond market does not seem to be corroborating the fears of inflation that have dominated headlines recently., In ARK’s view, inflation will prove temporary thanks both to the base effects caused by price collapses during the coronavirus crisis last year and to supply chain bottlenecks that could be causing double- and triple-ordering of supplies which, in turn, could lead to a significant inventory overhang and a commodity price collapse. With the exception of oil, cracks in the commodity markets already are apparent. From their peaks in the second quarter, lumber prices have dropped more than 57% from $1,686 to $716 per thousand board feet while copper prices have dropped roughly 13% from $4.78 per pound to $4.16. Oil prices probably will not be far behind despite shareholder demands for cutbacks in energy-related capital spending. If drivers in the ride-sharing space migrate to electric vehicles and take advantage of the lower total cost of ownership relative to gas-powered vehicles, any decline in oil prices could be exacerbated., In ARK’s view, exacerbating the cyclical deflation will be two secular sources of deflation, one good for economic activity and another deleterious. Innovation is the source of good deflation, as learning curves cut costs and increase productivity. Instead of investing to capitalize on the exponential opportunities associated with the 14 technologies evolving today, however, many companies have catered to short-term oriented shareholders who have demanded results “now”, leveraging their balance sheets to buy back stock, bolster earnings, and increase dividends. As a result, facing the disintermediation and disruption associated with aging products and services, they could be forced to cut prices to clear inventories and service bloated debts, resulting in deflation with a deleterious impact on economic activity., If ARK is correct that the risk to the outlook is deflation, not inflation, then nominal GDP growth is likely to be much lower than expected, suggesting that scarce double-digit growth opportunities will be rewarded accordingly. Growth stocks in general and innovation-driven stocks in particular could be the prime beneficiaries., During the second half of the quarter, value gave way to growth in performance. Through mid-May, as commodity prices soared, the rotation from growth/innovation toward value/cyclical strategies gained momentum. ARK believes that this rotation has broadened and strengthened the bull market, preventing another tech and telecom bubble and likely setting the stage for another leg up in innovation-based strategies. In late May and June, after the reset in growth valuations and the drop in commodity prices like lumber and copper, investors began to return to growth/ innovation at the expense of value/cyclical strategies. In ARK’s view, the coronavirus crisis transformed the world significantly and permanently, suggesting that many innovation-driven stocks could be productive holdings during the next five to ten years. Among the largest beneficiaries of the rotation toward cyclicals during the past six to nine months have been two sectors that ARK believes will be disrupted the most by innovation during the next five years: Energy and Financial Services. In ARK’s view, autonomous electric vehicles and digital wallets, including cryptocurrencies and decentralized financial services (DeFi) associated more broadly with blockchain technologies, will disrupt and disintermediate both Energy and Financial Services significantly during the next five years., ARK’s five actively managed thematically focused ETFs and two self-indexed ETFs appreciated but underperformed relative to the S&P 500 and MSCI World Indexes during the second quarter. That said, the ARK Innovation ETF (ARKK), a concentrated portfolio of high conviction names with exposure to all of the ARK’s disruptive innovation themes, outperformed the broad-based global indexes., To read a summary of ARK’s biggest contributors and detractors, please see below.
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352