0

With reference to this post, I got the solution from @DebanjanB, but however I'm unable to use that solution for all my PRODUCT TYPE, it seems working only for Acrylics and Coal Tar. How can I use It for all the PRODUCT TYPE

This is the solution

1) print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Acrylics']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

But When I use for

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

This doesn't works

Any suggestion on how this could work.

Thanks

Andre_k
  • 1,680
  • 3
  • 18
  • 41
  • I know I mention this in my answer on the other post but all the data is loaded when the page is first loaded. If you just use `requests` you will be easily be able to scrape all the data within seconds. I would at least try my code from the last post, you will see that all data shows up. – antfuentes87 Jun 04 '19 at 16:57
  • @antfuentes87 I get your point of using `request` for extracting all product line without using `selenium`, but at the end I want to map all those product list against which **PRODUCT TYPE** the are, so for that reason I first click a element under **PRODUCT TYPE** dropdown which acts nothing but like a filter for it – Andre_k Jun 05 '19 at 04:10
  • Oh ok, I did not get that part. Well that is super easy. On the `li` with the class `topLevel` it has a `data-types` attribute which tells you the type. You can easily just add that right into the dictionary (look at my answer on the other question). It can still be done without `selenium` and using `requests` only. Like I said *ALL* data is in the `HTML` on the `request`. – antfuentes87 Jun 05 '19 at 04:22
  • Yes now, I'm able to extract that as well from the the link..Thanks for making me use of request rather than `selenium` – Andre_k Jun 05 '19 at 04:34
  • No problem, glad I could make another person see the light :) – antfuentes87 Jun 05 '19 at 04:36

3 Answers3

2

I have tried with following code and it returns me product type you are after.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver=webdriver.Chrome()
driver.get("http://www.carboline.com/products/")
driver.maximize_window()
driver.find_element_by_css_selector('a.close-privacy-cookie.acceptButton').click()
element=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"h5#Typeh5 span")))
element.click()
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//div[@aria-labelledby='Typeh5']//ul[@id='Type']//li//label[contains(.,'Alkyds')]"))).click()
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.XPATH, "//ul[@id='productList']//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

Output:

['Carbocoat 115', 'Carbocoat 115 VOC', 'Carbocoat 116', 'Carbocoat 140', 'Carbocoat 150 Universal Primer', 'Carbocoat 153', 'Carbocoat 2600', 'Carbocoat 2900', 'Carbocoat 2901', 'Carbocoat 30', 'Carbocoat 45 Industrial Enamel', 'Carbocoat 56', 'Carbocoat 70', 'Carbocoat 8215', 'Carbocoat 8215 Non-Skid', 'Carbocoat 8215 VOC', 'Carbocoat 8216 Non-Skid', 'Carbocoat 8225', 'Carbocoat 8229 Non-Lift Primer', 'Carbocoat 8239', 'Carbocoat 8245', 'Carbocoat 8259 WR', 'Carbocoat 8287 WR', 'Carbocoat OEM Universal Primer']
KunduK
  • 32,888
  • 5
  • 17
  • 41
  • The **PRODUCT TYPE** is not required. kindly refer the link to my previous post https://stackoverflow.com/questions/56439472/how-to-extract-all-the-texts-from-a-tag-using-selenium-through-python#56439472 – Andre_k Jun 04 '19 at 10:37
  • @deepesh : Updated the code with relevant click`(Alkyds)` – KunduK Jun 04 '19 at 14:26
1

Does this get what you need?

import pandas as pd
from bs4 import Beautifulsoup
import requests

response = requests.get('http://www.carboline.com/products/')
soup = BeautifulSoup(response.text, 'html.parser')

products = soup.find('ul', {'id':'productList'})
lists = products.find_all('li',{'class':'topLevel'})

results = pd.DataFrame()
for each in lists:
    a = each.find('a')
    text = a.text
    href = a['href']
    results = results.append(pd.DataFrame([[text, href]], columns = ['product_type', 'href'])).reset_index(drop=True)

Output:

print(results)
                       product_type                                              href
0                  A/D Firefilm III              /products/product-details/?prod=35AD
1                A/D Firefilm III C              /products/product-details/?prod=48AD
2                  A/D TC-55 SEALER              /products/product-details/?prod=30AD
3                  Accelerator A-20              /products/product-details/?prod=50AD
4                    Acrilast Caulk              /products/product-details/?prod=0177
5         Add-2 Mildewcide Additive              /products/product-details/?prod=0658
6                      Additive 101              /products/product-details/?prod=P262
7                       Additive 47              /products/product-details/?prod=0547
8                     Additive 8504              /products/product-details/?prod=8504
9                     Additive 8505              /products/product-details/?prod=8505
10                    Additive 8506              /products/product-details/?prod=8506
11                    Additive 8509              /products/product-details/?prod=8509
12                Bitumastic 300 LH              /products/product-details/?prod=0168
13                 Bitumastic 300 M  /products/product-details/?prod=0165&global=true
14             Bitumastic 300 M COE              /products/product-details/?prod=0391
15                    Bitumastic 50              /products/product-details/?prod=0025
16                    Carbocoat 115              /products/product-details/?prod=0801
17                Carbocoat 115 VOC              /products/product-details/?prod=206F
18                    Carbocoat 116              /products/product-details/?prod=0295
19                    Carbocoat 140              /products/product-details/?prod=228F
20   Carbocoat 150 Universal Primer  /products/product-details/?prod=0808&global=true
21                    Carbocoat 153              /products/product-details/?prod=0632
22                   Carbocoat 2600              /products/product-details/?prod=0005
23                   Carbocoat 2900              /products/product-details/?prod=0010
24                   Carbocoat 2901              /products/product-details/?prod=0012
25                     Carbocoat 30              /products/product-details/?prod=P483
26   Carbocoat 45 Industrial Enamel              /products/product-details/?prod=0171
27                     Carbocoat 56              /products/product-details/?prod=DM56
28                     Carbocoat 70              /products/product-details/?prod=1519
29                   Carbocoat 8215              /products/product-details/?prod=8215
..                              ...                                               ...
470                       Thinner 2              /products/product-details/?prod=0522
471                      Thinner 21              /products/product-details/?prod=0521
472                     Thinner 213              /products/product-details/?prod=0555
473                     Thinner 214              /products/product-details/?prod=0556
474                     Thinner 215              /products/product-details/?prod=0557
475                     Thinner 221              /products/product-details/?prod=0546
476                     Thinner 224              /products/product-details/?prod=0574
477                   Thinner 225 E              /products/product-details/?prod=0591
478                     Thinner 228              /products/product-details/?prod=0570
479                     Thinner 230              /products/product-details/?prod=0551
480                     Thinner 231              /products/product-details/?prod=0516
481                     Thinner 234              /products/product-details/?prod=0562
482                     Thinner 235              /products/product-details/?prod=0563
483                   Thinner 236 E              /products/product-details/?prod=0564
484                     Thinner 238              /products/product-details/?prod=0566
485                     Thinner 241              /products/product-details/?prod=0374
486                   Thinner 242 E              /products/product-details/?prod=T242
487                   Thinner 243 E              /products/product-details/?prod=T243
488                     Thinner 246              /products/product-details/?prod=T246
489                     Thinner 248              /products/product-details/?prod=215F
490                      Thinner 25              /products/product-details/?prod=0525
491                     Thinner 254              /products/product-details/?prod=0631
492                      Thinner 26              /products/product-details/?prod=0526
493                      Thinner 33              /products/product-details/?prod=0533
494                      Thinner 38              /products/product-details/?prod=TH39
495                      Thinner 45              /products/product-details/?prod=0545
496                      Thinner 72              /products/product-details/?prod=0572
497                      Thinner 76              /products/product-details/?prod=0576
498             Zinc Filler Type II              /products/product-details/?prod=0229
499            Zinc Filler Type III              /products/product-details/?prod=0224

[500 rows x 2 columns]
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • 1
    I already tried giving him this answer on the other post but I am thinking he is really keen on using `selenium` only. Which is a shame because all the data loads when the site first loads. No need for `selenium` when you can just prase the `HTML` that comes from `requesting` the website link. – antfuentes87 Jun 04 '19 at 17:03
  • 1
    total agree. Selenium is nice, but really only ideal as a last resort. – chitown88 Jun 05 '19 at 10:47
1

I would shorten as follows, do a starts with operator substring match on href attribute value

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

r = requests.get('http://www.carboline.com/products/')
soup = bs(r.content, 'lxml')
df = pd.DataFrame([(item.text, 'http://www.carboline.com' + item['href']) for item in soup.select('[href^="/products/product-details/?prod="]')], columns = ['product', 'link'])
print(df)
QHarr
  • 83,427
  • 12
  • 54
  • 101