2

I am trying to parse the able from the site https://octopart.com/mcp3304-bi%2Fp-microchip-407390?r=sp#PriceAndStock. I have tried using xpath of a table with selenium but it fetches only first row. I have also tried html parse with beautifulsoup but I get unstructured text from table.

Code trials:

driver.get('https://octopart.com/search?q=PMEG120G20ELRX&currency=USD&specs=0')
soup = BeautifulSoup(driver.page_source, 'html.parser')

table=soup.find('table')
for distributor in table.find_all('tbody'):
    rows=distributor.find_all('tr')
    for row in rows:
        data=row.find('td')
        print(data)
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
PyCode
  • 57
  • 3

1 Answers1

3

To scrape the table from the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following locator strategy:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

driver.get('https://octopart.com/search?q=PMEG120G20ELRX&currency=USD&specs=0')
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'part')]//table"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
driver.quit()

Console Output:

[   Unnamed: 0          Distributor                       SKU  Stock   MOQ  ...     10    100  1,000  10,000  Updated
0         NaN  Future Electronics3                   4128873    500     1  ...  0.260  0.200  0.182   0.170       1d
1         NaN            Digi-Key3  1727-PMEG120G20ELRXCT-ND    488     1  ...  0.378  0.257  0.145   0.145      <1m
2         NaN                  TTI            PMEG120G20ELRX  18000  3000  ...    NaN    NaN    NaN   0.124       1d
3         NaN               Mouser        771-PMEG120G20ELRX   4461     1  ...  0.378  0.258  0.150   0.149      14m
4         NaN              Verical            PMEG120G20ELRX   6000  3000  ...    NaN    NaN    NaN   0.178      <1m

[5 rows x 13 columns]]
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Awesome. Thank you very much. It worked. Just one question though, the output is list with length=1. How can I get in the dataframe format. Eventually, I want to save table as .csv file. – PyCode Mar 09 '22 at 13:40
  • 1
    _list with length=1_: Where is the list? I passed the entire `` HTML to `read_html()` method, which is processed by [pandas](https://stackoverflow.com/a/70454194/7429447)
    – undetected Selenium Mar 09 '22 at 13:51
  • 1
    I got it, it was list with two element, the first element was the actual table. df[0] worked. – PyCode Jan 03 '23 at 04:42
  • 1
    @PyCode Glad to know your issue got resolved. – undetected Selenium Jan 10 '23 at 23:03