1

I wonder if its possible to remove part of the scraped string like:

Wujek Drew / Uncle Drew

into

Uncle Drew

Of course, as it is web scraping, the titles will be different every time, so what can I do here to get the result above?


Update

I forgot to add something that need to be removed also. Wujek Drew / Uncle Drew (2018) I Will need to delete the data at the end of the string.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
serengeti
  • 117
  • 2
  • 15

1 Answers1

1

To remove first part of the scraped string separated by / character you can use the following solution:

value = driver.find_element_by_xpath("element_xpath").get_attribute("innerHTML").split("/")[1] 

As per your comment update if you want to extract the sub-string Uncle Drew from the string Wujek Drew / Uncle Drew (2018) you can use the following solution:

import re

value = driver.find_element_by_xpath("element_xpath").get_attribute("innerHTML")
#value='Wujek Drew / Uncle Drew (2018)'
print(re.split('[/()]',value)[1])
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thank you. I forgot to add something that need to be removed also. Wujek Drew / Uncle Drew (2018) I Will need to delete the data at the end of the string. Tried to play with your solution but struggling. – serengeti Aug 31 '18 at 18:57
  • @serengeti Checkout my answer update and let me know the status – undetected Selenium Aug 31 '18 at 19:47
  • Thanks for update. Tried it and here is what I got. https://cdn.pbrd.co/images/HBLusR8.png – serengeti Aug 31 '18 at 19:57
  • @serengeti I have dropped the `split("/")[1]` part in the answer update and handled with `re.split()`. Please cross check. – undetected Selenium Aug 31 '18 at 20:03
  • yeah I already checked that and gave you the result on screenshot. so, with this - only the date is printed. – serengeti Aug 31 '18 at 20:07
  • oh, my bad, I used it with the earlier function sorry. now its looks good. thanks for support :) – serengeti Aug 31 '18 at 20:11
  • 1
    Okay :) cool down. Watch carefully. In the first version of my answer I have extracted the entire `innerHTML` and invoked `split()` in the same step. Where as in the updated version of my answer, in the first step I have extracted the entire `innerHTML` only. In the next line I have performed the `split()`. There is a difference. – undetected Selenium Aug 31 '18 at 20:12
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/179211/discussion-between-serengeti-and-debanjanb). – serengeti Aug 31 '18 at 22:00