0

I am trying to extract userid, rating and review from the following site using selenium and it is showing "Invalid selector error". I think, the Xpath I have tried to define to get the review text is the reason for error. But I am unable to resolve the issue. The site link is as below:

teslamotor review

The code that I have used is following:

#Class for Review webscraping from consumeraffairs.com site
class CarForumCrawler(): 
    def __init__(self, start_link):
        self.link_to_explore = start_link 
        self.comments = pd.DataFrame(columns = ['rating','user_id','comments'])
        self.driver = webdriver.Chrome(executable_path=r'C:/Users/mumid/Downloads/chromedriver/chromedriver.exe')            
        self.driver.get(self.link_to_explore)
        self.driver.implicitly_wait(5)
        self.extract_data()
        self.save_data_to_file()
   
    def extract_data(self):
        ids = self.driver.find_elements_by_xpath("//*[contains(@id,'review-')]")
        comment_ids = []
        for i in ids:
            comment_ids.append(i.get_attribute('id'))

        for x in comment_ids:
            #Extract dates from for each user on a page
            user_rating = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]/div[1]/div/img')[0]
            rating = user_rating.get_attribute('data-rating')

            #Extract user ids from each user on a page
            userid_element = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]/div[2]/div[2]/strong')[0]
            userid = userid_element.get_attribute('itemprop')

            #Extract Message for each user on a page
            user_message = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]]/div[3]/p[2]/text()')[0]
            comment = user_message.text

            #Adding date, userid and comment for each user in a dataframe
            self.comments.loc[len(self.comments)] = [rating,userid,comment]

    def save_data_to_file(self):
    #we save the dataframe content to a CSV file
        self.comments.to_csv ('Tesla_rating-6.csv', index = None, header=True)
    def close_spider(self):
    #end the session
        self.driver.quit()

try:
    url = 'https://www.consumeraffairs.com/automotive/tesla_motors.html'
    mycrawler = CarForumCrawler(url)
    mycrawler.close_spider()
except:
    raise

The error that I am getting is as following:

Webscraping error

Also, The xpath that I tried to trace is from following HTML enter image description here

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Mumid
  • 21
  • 1
  • 7
  • You have `]]` when you wanted `]` – QHarr Nov 29 '21 at 18:27
  • Use this '//*[@id="' + x +'"]/div[3]/p[2]/text()')[0] instead '//*[@id="' + x +'"]]/div[3]/p[2]/text()')[0]. 2 closing ]] brackets is not making correct xpath syntax – Muzzamil Nov 29 '21 at 19:40

1 Answers1

1

You are seeing the classic error of...

InvalidSelectorException

as find_elements_by_xpath('//*[@id="' + x +'"]]/div[3]/p[2]/text()')[0] would select the attributes, instead you need to pass an expression that selects elements.

You need to change as:

user_message = self.driver.find_elements_by_xpath('//*[@id="' + x +'"]]/div[3]/p[2]')[0]

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352