0

I try to get articles under same date with different page, there are three' Corp,FIG,SSA', I need to click one and back and click the other, but the xpath for element is same, so I am wondering is there some 'smart' way to do that instead of copying again and again?

And I also want the website be back if there is no article in the page, should I use 'try'?

Surprisingly, I got the results twice in the csv file, like aabb... no idea why

driver.get('https://www.globalcapital.com/Asia/Bonds')
Corp = driver.find_element_by_link_text("Corp")
Corp.click()
driver.implicitly_wait(10)
links=[link.get_attribute('href') for link in driver.find_elements_by_xpath("//div[contains(text(),'28 Jan 2021')]/preceding::a[2]")]
titles = [link.text for link in driver.find_elements_by_xpath("//div[contains(text(),'28 Jan 2021')]/preceding-sibling::h3/a")]
for link in links:
    for title in titles:
        dataframe = pd.DataFrame({'col1':title,'col2':link},index=[0])
        dataframe.to_csv('hi.csv',mode='a+',header=False,index=False,encoding='utf-8-sig')
driver.back()
FIG = driver.find_element_by_link_text("FIG")
FIG.click()
driver.implicitly_wait(10)
links=[link.get_attribute('href') for link in driver.find_elements_by_xpath("//div[contains(text(),'28 Jan 2021')]/preceding::a[2]")]
titles = [link.text for link in driver.find_elements_by_xpath("//div[contains(text(),'28 Jan 2021')]/preceding-sibling::h3/a")]
for link in links:
    for title in titles:
        dataframe = pd.DataFrame({'col1':title,'col2':link},index=[0])
        dataframe.to_csv('hi.csv',mode='a+',header=False,index=False,encoding='utf-8-sig')
driver.back()
SSA = driver.find_element_by_link_text("SSA")
SSA.click()
driver.implicitly_wait(10) 
Joyce
  • 435
  • 4
  • 13

1 Answers1

0

You're iterating over titles multiple times (one time for each link). You need to iterate over link, title pairs:

for link, title in zip(links, titles):
    dataframe = pd.DataFrame({'col1':title,'col2':link},index=[0])
    dataframe.to_csv('hi.csv',mode='a+',header=False,index=False,encoding='utf-8-sig')
JaSON
  • 4,843
  • 2
  • 8
  • 15
  • Thanks Jason, it works! could it be possible if I wanna reduce steps in this question? – Joyce Feb 03 '21 at 01:06
  • Hi Jason, somehow the href return :Japascipe: not the hrefs, how does it happen.. – Joyce Feb 03 '21 at 07:12
  • @Cathy `@href` might not contain URL. it can be something like [`javascript:void(0)`](https://stackoverflow.com/questions/1291942/what-does-javascriptvoid0-mean) – JaSON Feb 03 '21 at 08:46
  • I used`driver.get('http://www.chinamoney.com.cn/chinese/zjfxzx/?tbnm=%E6%9C%80%E6%96%B0&tc=null&isNewTab=1') links=[link.get_attribute('href') for link in driver.find_elements_by_xpath("//a[contains(@title,'中期票据') and not(contains(@title,'申购说明')) and not(contains(@title,'公告'))]")]` think it do contain href, but somehow returns java sth – Joyce Feb 03 '21 at 09:15
  • @Cathy I can't check it - I can't find any node with title that contains `"中期票据"` on provided page – JaSON Feb 03 '21 at 09:20
  • let me change to `links=[link.get_attribute('href') for link in driver.find_elements_by_xpath("//a[contains(@title,'同业存单') and not(contains(@title,'申购说明')) and not(contains(@title,'公告'))]")]` – Joyce Feb 03 '21 at 09:25
  • er right... is it possible to get the href link? or if I click into it, should it be possible to return the link? – Joyce Feb 03 '21 at 09:42
  • @Cathy I don't think it's possible. You can only try to click it – JaSON Feb 03 '21 at 09:56