0

For each vendor in an ERP system (total # of vendors = 800+), I am collecting its data and exporting this information as a pdf file. I used Selenium with Python, created a class called Scraper, and defined multiple functions to automate this task. The function, gather_vendors, is responsible for scraping and does this by extracting text values from tag elements.

Every vendor has a section called EFT Manager. EFT Manager has 9 rows I am extracting from:

enter image description here

For #2 and #3, both have string values (crossed out confidential info). But, #3 returns null. I don’t understand why #3 onward returns null when there are text values to be extracted.

enter image description here

The format of code for each element is the same.

enter image description here

I tried switching frames but that did not work. I tried to scrape from edit mode and that didn’t work as well. I was curious if anyone ever encountered a similar situation. It seems as though no matter what I do I can’t scrape certain values… I’d appreciate any advice or insight into how I should proceed. Thank you.

ekim420
  • 455
  • 1
  • 6
  • 19
  • Don't post screenshots of code snippets; instead, include them inline: https://meta.stackexchange.com/editing-help#code – Ian Lesperance Aug 23 '18 at 15:20
  • Please provide (1) expected output and (2) actual output. – Ian Lesperance Aug 23 '18 at 15:20
  • Try removing the wrapping `try/except` blocks. By doing that, you're catching all exceptions, which is likely hiding your problem. – Ian Lesperance Aug 23 '18 at 15:25
  • @IanLesperance thanks for the tip. I will write code inline next time. The expected output is the text value. The actual output was empty, in other words, the scraper couldn't detect the xpath. I will try removing the try/except blocks. I will update you on what happens. – ekim420 Aug 23 '18 at 21:02

4 Answers4

0

Why not try to use

find_element_by_class_name("panelList").find_elements_by_tag_name('li')

To collect all of the li elements. And using li.text to retrieve their text values. Its hard to tell what your actual output is besides you saying "returns null"

degenTy
  • 340
  • 1
  • 9
  • That didn't work and resulted in a lot of empty values. Strange, so I have to think of a different approach. Thanks for your suggestion. – ekim420 Aug 23 '18 at 21:00
0
  1. Try to use visibility_of_element_located instead of presence_of_element_located
  2. Try to get textContent with javascript fo element Given a (python) selenium WebElement can I get the innerText?

    element = driver.find_element_by_id('txtTemp_creditor_agent_bic') text= driver.execute_script("return attributes[0].textContent", element)

Sers
  • 12,047
  • 2
  • 12
  • 31
  • I tried the first suggestion but it I doesn't work. I will try the next suggestion soon and update what happens. – ekim420 Aug 23 '18 at 20:59
0

The following is what worked for me:

  1. Get rid of the try/except blocks.
  2. Find elements via ID's (not xpath).

That allowed me to extract text from elements I couldn't extract from before.

ekim420
  • 455
  • 1
  • 6
  • 19
0

You should change the way of extracting the elements on web page to ID's, since all the the aspects have different id provided. If you want to use xpaths, then you should try the JavaScript function to find them.

E.g.

//span[text()='Bank Name']
sɐunıɔןɐqɐp
  • 3,332
  • 15
  • 36
  • 40