2

I'm new to Python and Selenium and I am scraping information from website. Some items don't have proper div class to provide normal script functioning. I need to separate the output of xpath statement, to not contain a text value from h2 part.

I've already re-written the loop body and xpath statement.

elif driver.find_element_by_xpath("//span[@class='italic']").text == "Chapter":
            test = driver.find_element_by_xpath("//a[@class='strong']")
            test.click()
            elem4 = driver.find_elements_by_xpath('//div[@class="work_identifiers_type_txt"] | //h2[@class="font18 strong inline"]')[0].text
            elem5 = f"ISBN={{{f'{elem4}'}}}}}"
            driver.back()
            file.write(f'{elem2}, ' + f'{elem5}')
            file.write('\n\n,\n')
            driver.back()
            driver.implicitly_wait(5)

Div with h2:

<div class="col-sm-12">
                <h2 class="font18 strong inline">
                </h2>
                <span class="italic">
</span>
            </div>

I want to only write text from first part in the xpath statement (div) to variable (and file). Second part of this string should give possibility to write something to file too for eg.

            file.write(f'{elem2}')
            file.write('\n\n,\n')
            driver.back()
            driver.implicitly_wait(5)

For now xpath statement alternately write to a file once div and once h2 value (if one tag is missing).

Emdzej
  • 23
  • 4
  • Welcome to SO. Can you share the html of the div element which have h2 in it. I have 2 options to handle it, but want to make sure to share most appropriate in your case. – supputuri Apr 29 '19 at 23:09
  • @supputuri div with h2 in it added to post and Thanks for warm welcome. – Emdzej Apr 29 '19 at 23:20
  • Do you want the text from the div alone excluding text in h2 and span? or the text in the div > span? – supputuri Apr 29 '19 at 23:25
  • I need the text from h2 only. I use that h2 like a "dummy" tag to import for eg. information about a book (check if it is on the list). – Emdzej Apr 29 '19 at 23:37
  • Thanks for help! It works. – Emdzej Apr 30 '19 at 23:22

1 Answers1

1

Try to get the h2 text directly using below.

h2Text= driver.find_element_by_xpath("//h2[@class='font18 strong inline']").text
supputuri
  • 13,644
  • 2
  • 21
  • 39