Scraping text values using Selenium with Python

Question

For each vendor in an ERP system (total # of vendors = 800+), I am collecting its data and exporting this information as a pdf file. I used Selenium with Python, created a class called Scraper, and defined multiple functions to automate this task. The function, gather_vendors, is responsible for scraping and does this by extracting text values from tag elements.

Every vendor has a section called EFT Manager. EFT Manager has 9 rows I am extracting from:

For #2 and #3, both have string values (crossed out confidential info). But, #3 returns null. I don’t understand why #3 onward returns null when there are text values to be extracted.

The format of code for each element is the same.

I tried switching frames but that did not work. I tried to scrape from edit mode and that didn’t work as well. I was curious if anyone ever encountered a similar situation. It seems as though no matter what I do I can’t scrape certain values… I’d appreciate any advice or insight into how I should proceed. Thank you.

Don't post screenshots of code snippets; instead, include them inline: https://meta.stackexchange.com/editing-help#code — Ian Lesperance, Aug 23 '18 at 15:20
Try removing the wrapping `try/except` blocks. By doing that, you're catching all exceptions, which is likely hiding your problem. — Ian Lesperance, Aug 23 '18 at 15:25
@IanLesperance thanks for the tip. I will write code inline next time. The expected output is the text value. The actual output was empty, in other words, the scraper couldn't detect the xpath. I will try removing the try/except blocks. I will update you on what happens. — ekim420, Aug 23 '18 at 21:02

score 0 · Answer 1 · answered Aug 23 '18 at 15:25

0

Why not try to use

find_element_by_class_name("panelList").find_elements_by_tag_name('li')

To collect all of the li elements. And using li.text to retrieve their text values. Its hard to tell what your actual output is besides you saying "returns null"

answered Aug 23 '18 at 15:25

degenTy

340
1
9

That didn't work and resulted in a lot of empty values. Strange, so I have to think of a different approach. Thanks for your suggestion. – ekim420 Aug 23 '18 at 21:00

score 0 · Answer 2 · answered Aug 23 '18 at 15:27

0

Try to use visibility_of_element_located instead of presence_of_element_located
Try to get textContent with javascript fo element Given a (python) selenium WebElement can I get the innerText?

element = driver.find_element_by_id('txtTemp_creditor_agent_bic') text= driver.execute_script("return attributes[0].textContent", element)

answered Aug 23 '18 at 15:27

Sers

12,047
2
12
31

I tried the first suggestion but it I doesn't work. I will try the next suggestion soon and update what happens. – ekim420 Aug 23 '18 at 20:59

score 0 · Accepted Answer · answered Aug 31 '18 at 14:41

0

The following is what worked for me:

Get rid of the try/except blocks.
Find elements via ID's (not xpath).

That allowed me to extract text from elements I couldn't extract from before.

answered Aug 31 '18 at 14:41

ekim420

455
1
6
19

score 0 · Answer 4 · edited Aug 13 '20 at 09:10

0

You should change the way of extracting the elements on web page to ID's, since all the the aspects have different id provided. If you want to use xpaths, then you should try the JavaScript function to find them.

E.g.

//span[text()='Bank Name']

edited Aug 13 '20 at 09:10

sɐunıɔןɐqɐp

3,332
15
36
40

answered Aug 13 '20 at 08:07

Jaspreet Kaur

119
4

Scraping text values using Selenium with Python

4 Answers4