How to scrape ID-less website elements with XPath-only regex patterns

Question

There are several similar questions related to the usage of regex in XPath searches -- However, some are not very illuminating to me, whereas others failed for my specific problem. Therefore and for future users that might come across the same, I post the following question:

Using one call in Python/Selenium, I want to be able to scrape all elements below at once (for readability without code formatting):

/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**1**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**2**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**3**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**4**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**5**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**6**]/div/div[2]/div[1]

Note that the number of matching elements is variable among target websites (can be more than 6, but at least one) and that the associated elements do not have a specific ID assigned (which excludes many solutions explained elsewhere on StackOverflow, according to my understanding).

What I am looking for is something like:

website = driver.get(URL)
html = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.XPATH, "/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[[0-9]{1}]/div/div[2]/div[1]", regex = True)))

What doesn't work is:

website = driver.get(URL)
html = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.XPATH, "/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[matchers['[0-9]{1}']]/div/div[2]/div[1]")))
TimeoutException: Message: 
Screenshot: available via screen

How to scrape all website elements without ID whose XPath matches a regex pattern in Python + Selenium?

score 1 · Accepted Answer · answered Jan 08 '18 at 08:13

1

You don't want a regex for this, you want the predicate [position()<=6].

answered Jan 08 '18 at 08:13

Michael Kay

156,231
11
92
164

Thanks for your answer - for the sake of beginners, is it possible to include a (link to a) reductive example? – sudonym Jan 08 '18 at 15:34
1

`/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[position()<=6]/div/div[2]/div[1]` (I fear that this is not just for beginners, it is for beginners who haven't done any background reading on XPath, and frankly, if you haven't done any background reading on XPath then you shouldn't be trying to use it) – Michael Kay Jan 08 '18 at 15:46

How to scrape ID-less website elements with XPath-only regex patterns

1 Answers1

Linked