1

I want to extract 'comment' from a website. I already tried using selenium and extract it using xpath but it not works.

from selenium import webdriver
import pandas as pd
            
driver = webdriver.Chrome()
driver.get('https://finance.detik.com/berita-ekonomi-bisnis/d-5307853/ri-disebut-punya-risiko-korupsi-yang-tinggi?_ga=2.13736693.357978333.1608782559-293324864.1608782559')
            
userid_element = driver.find_elements_by_xpath('//*[@id="cmt66364625"]/div[1]/div[1]/text()')[0]
userid = userid_element.text

This the result :


IndexError                                Traceback (most recent call last)
<ipython-input-73-151acf07e320> in <module>
----> 1 userid_element = driver.find_elements_by_xpath('//*[@id="cmt66364625"]/div[1]/div[1]/text()')[0]
      2 userid = userid_element.text

IndexError: list index out of range

i tried to delete the list index

userid_element = driver.find_elements_by_xpath('//*[@id="cmt66364625"]/div[1]/div[1]/text()')
userid = userid_element.text

but the result is :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-74-890ba28d7494> in <module>
      1 userid_element = driver.find_elements_by_xpath('//*[@id="cmt66364625"]/div[1]/div[1]/text()')
----> 2 userid = userid_element.text

AttributeError: 'list' object has no attribute 'text'
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Hanif
  • 33
  • 4

3 Answers3

0
userid = [i.text for i in userid_element]
print(userid)

Find_elemnts returns a list you have to iterate through each element . You can use above code to iterate and get text from each element and store it an array

PDHide
  • 18,113
  • 2
  • 31
  • 46
0

if you want all the comments you can do like this

comment_elements = driver.find_elements_by_xpath("//div[@class='comment__cmt_box_text___3bK3O comment__cmt_dk_komen___1Yzyg']")
comments = [comment.text for comment in comment_elements]
marco
  • 525
  • 4
  • 11
0

To scrape the comments from the website as the comments are within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the desired visibility_of_all_elements_located().

  • You can use either of the following Locator Strategies:

    • Using CSS_SELECTOR:

      driver.get('https://finance.detik.com/berita-ekonomi-bisnis/d-5307853/ri-disebut-punya-risiko-korupsi-yang-tinggi?_ga=2.13736693.357978333.1608782559-293324864.1608782559')
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe.xcomponent-component-frame.xcomponent-visible")))
      print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[class^='comment__cmt_'][style]")))])
      
    • Using XPATH:

      driver.get('https://finance.detik.com/berita-ekonomi-bisnis/d-5307853/ri-disebut-punya-risiko-korupsi-yang-tinggi?_ga=2.13736693.357978333.1608782559-293324864.1608782559')
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@class='xcomponent-component-frame xcomponent-visible']")))
      print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@class, 'comment__cmt_')][@style]")))])
      
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    ['buzzer pada kmenaa..giliran muhammdiyah ampe 400an komen..dapseeee\nLaporkan\n0BalasBagikan:  ', 'selama korupsi tidak dihukum mati disanalah korupsi masih liar dan ada kalaupun dibuat hukum mati setidaknya bisa mengurangi angka korupsi itu\nLaporkan\n2BalasBagikan:  ', 'kalo terindikasi korupsi, lalu teriak saya pancasila, biar pd takut\nLaporkan\n0BalasBagikan:  ', '1. Hukuman fisik diperberat. Hukuman sosial diadakan.\nLaporkan\n0BalasBagikan:  ', 'Padahal fokus tegakan hukum dan berantas korupsi otomatis ekonomi terangkat. Hukum tegak ekonomi kuat. Bayangkan setingkat RT aja korupsi. Dan herannya koruptor serasa lebih dihormatin dari pelaku kejahatan lain.\nLaporkan\n0BalasBagikan:  ', 'Bikin UU cashless aja Bu. Transaksi cash maks 1jt. Jadi lebih enak ditracing\nLaporkan\n0BalasBagikan:  ', 'Hukum terlalu lemah, yang pernah korupsi malah masih menjabat pemerintahaan dan malah masih mencalonkan diri sebagai bupati atau walikota dan gubernur setelah melakukan korupsi.\nLaporkan\n0BalasBagikan:  ', 'system birokrasi yg lemah, seharusnya mulai mengandalkan teknologi kontrol online untuk mengurangi kesempatan pejabat yg korupsi\nLaporkan\n0BalasBagikan:  ', 'Bukan cuma resiko, emang udah kejadian kaleeee hahahhahahaha\nLaporkan\n0BalasBagikan:  ', 'ga heran jamannya new orba\nLaporkan\n1BalasBagikan:  ']  
    

Reference

You can find a couple of relevant discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352