1

I am in the process of rewriting this old pyton script (https://github.com/muvvasandeep/BuGL/blob/master/Scripts/DataExtraction.py) which used older version of Selenium. The aim of this script is to extract open and closed issues from open source projects form github. I am new to both python and Selenium. I am having hard time rewriting several things inside it. Currently I am struggling to get this working:

repo_closed_url = [link.get_attribute('href') for link in driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]').find_element(By.CLASS_NAME,'h4')]

the above should get all closed issues link from a github page and store it in the repo_closed_url array. But i am getting the AttributeError: 'list' object has no attribute 'find_element' error. Please help.

Prophet
  • 32,350
  • 22
  • 54
  • 79
Mano Haran
  • 634
  • 1
  • 6
  • 19

2 Answers2

2

I'm not sure this code line worked ever.
It uses driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]') method then trying to apply find_element(By.CLASS_NAME,'h4') on the output of previous method.
But driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]') returns a list of web element objects. While find_element(By.CLASS_NAME,'h4') can be applied on webdriver or webelement object only, not on a list of objects.
That code can be refactored as mentioned by dm2

repo_closed_url = [link.find_element(By.CLASS_NAME,'h4').get_attribute('href') for link in driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]')]

In this case driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]') output is a list of web element objects, you are performing inline iteration on this list with for link in (output of)(driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]')) and then appying .find_element(By.CLASS_NAME,'h4').get_attribute('href') on each link element in the previously received list of links

Prophet
  • 32,350
  • 22
  • 54
  • 79
  • Thanks you man! I am not sure the person who wrote the initial script deliberately did mistakes but anyway, your suggestion worked! I will ask few more doubts here in the comment section and it will be really great if you could help me with that as well. Thanks again! – Mano Haran Jan 27 '23 at 15:01
  • The above code gets the first closed issue link all the time However I want to get links of all the closed issues in each page. - https://github.com/moodle/moodle/pulls?q=is%3Apr+is%3Aclosed Is it possible to do that? The array only stores 1 URL. – Mano Haran Jan 27 '23 at 15:10
  • I'm sorry, I have to go now. Will be back tomorrow – Prophet Jan 27 '23 at 15:10
  • Hi! I am still trying to get the links of all the closed issues in this page. Is it possible to do this? – Mano Haran Jan 31 '23 at 17:45
  • I am going by page by page so just fixing the above code itself will be fine. – Mano Haran Jan 31 '23 at 17:46
  • I don't know. Please open a new question for that so all the users will see it in their feed, not just me. We will try to help there – Prophet Jan 31 '23 at 17:54
  • just opened one - https://stackoverflow.com/questions/75301196/getting-list-of-all-the-urls-in-a-closed-issue-page-in-github-using-selenium – Mano Haran Jan 31 '23 at 17:55
  • Great. I already answered. Please let me know if it worked. Now I had some time, I'm sorry for the previous time when I had to go. – Prophet Jan 31 '23 at 18:06
1

find_elements returns a list, and a list doesn't have a method find_element (as per your error message).

You could move the find_element method to link:

repo_closed_url = [link.find_element(By.CLASS_NAME,'h4').get_attribute('href') for link in driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]')]

Which seems more of what you're trying to do, instead of iterating over a single element, you iterate over a list of elements from the find_elements method.

dm2
  • 4,053
  • 3
  • 17
  • 28
  • The above code gets the first closed issue link all the time However I want to get links of all the closed issues in each page. - github.com/moodle/moodle/pulls?q=is%3Apr+is%3Aclosed Is it possible to do that? The array only stores 1 URL – Mano Haran Jan 27 '23 at 15:19