How to retrieve a sub-string from a string that changes dynamically with respect to multiple delimiters through Selenium in Python

Question

I wonder if its possible to remove part of the scraped string like:

Wujek Drew / Uncle Drew

into

Uncle Drew

Of course, as it is web scraping, the titles will be different every time, so what can I do here to get the result above?

Update

I forgot to add something that need to be removed also. Wujek Drew / Uncle Drew (2018) I Will need to delete the data at the end of the string.

Do you always want to ignore everything up to a slash character? Or can that change too? — John Gordon, Aug 31 '18 at 18:40
The slash may be used to remove the first part of the sentence if its possible. — serengeti, Aug 31 '18 at 18:41

undetected Selenium · Accepted Answer · 2018-08-31T19:47:12.470

1

To remove first part of the scraped string separated by / character you can use the following solution:

value = driver.find_element_by_xpath("element_xpath").get_attribute("innerHTML").split("/")[1]

As per your comment update if you want to extract the sub-string Uncle Drew from the string Wujek Drew / Uncle Drew (2018) you can use the following solution:

import re

value = driver.find_element_by_xpath("element_xpath").get_attribute("innerHTML")
#value='Wujek Drew / Uncle Drew (2018)'
print(re.split('[/()]',value)[1])

edited Aug 31 '18 at 19:47

answered Aug 31 '18 at 18:41

undetected Selenium

183,867
41
278
352

Thank you. I forgot to add something that need to be removed also. Wujek Drew / Uncle Drew (2018) I Will need to delete the data at the end of the string. Tried to play with your solution but struggling. – serengeti Aug 31 '18 at 18:57
@serengeti Checkout my answer update and let me know the status – undetected Selenium Aug 31 '18 at 19:47
Thanks for update. Tried it and here is what I got. https://cdn.pbrd.co/images/HBLusR8.png – serengeti Aug 31 '18 at 19:57
@serengeti I have dropped the `split("/")[1]` part in the answer update and handled with `re.split()`. Please cross check. – undetected Selenium Aug 31 '18 at 20:03
yeah I already checked that and gave you the result on screenshot. so, with this - only the date is printed. – serengeti Aug 31 '18 at 20:07
oh, my bad, I used it with the earlier function sorry. now its looks good. thanks for support :) – serengeti Aug 31 '18 at 20:11
1

Okay :) cool down. Watch carefully. In the first version of my answer I have extracted the entire `innerHTML` and invoked `split()` in the same step. Where as in the updated version of my answer, in the first step I have extracted the entire `innerHTML` only. In the next line I have performed the `split()`. There is a difference. – undetected Selenium Aug 31 '18 at 20:12
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/179211/discussion-between-serengeti-and-debanjanb). – serengeti Aug 31 '18 at 22:00

How to retrieve a sub-string from a string that changes dynamically with respect to multiple delimiters through Selenium in Python

Update

1 Answers1

Linked