0

I am trying to pull a table from a list of URL's. When I only input one URL it only prints out the first items in the table and when I add more URL's to the list I get the error message 'list' object has no attribute 'timeout'. What is the best way to get the rest of the items and adding more URL's? Below is the code I am running.

import time, random, csv, bs4, requests, io
import pandas as pd
timeDelay = random.randrange(5, 20)
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_urls = [
"https://www.lonza.com/products-services/bio-research/electrophoresis-of-nucleic-acids-and-proteins/nucleic-acid-electrophoresis/precast-gels-for-dna-and-rna-analysis/truband-gel-anchors.aspx",
"https://www.lonza.com/products-services/bio-research/transfection/nucleofector-kits-for-primary-cells/nucleofector-kits-for-primary-epithelial-cells/nucleofector-kits-for-human-mammary-epithelial-cells-hmec.aspx",
"https://www.lonza.com/products-services/bio-research/transfection/nucleofector-kits-for-primary-cells/nucleofector-kits-for-primary-neural-cells/nucleofector-kits-for-mammalian-glial-cells.aspx",
]
uClient = uReq(my_urls)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll('tbody')


product_name_list =[]
cat_no_list = []
size_list = []
price_list =[]

for container in containers:
    if (len(container) > 0):
    #try:
        title_container = container.findAll('td')
        Product_name = title_container[0].text.strip()
        product_name_list.append(Product_name)

        CatNo_container = container.findAll('td')
        CatNo = CatNo_container[1].text.strip()
        cat_no_list.append(CatNo)

        #Size_container = container.findAll('div',{'class':'col-xs-2 noPadding'})
        #Size = Size_container[0].text.strip()
        #size_list.append(Size)

        Price_container = container.findAll('td')
        Price = Price_container[4].text.strip()
        price_list.append(Price)

        print('Product_name: '+ Product_name)
        print('CatNo: ' + CatNo)
        print('Size: ' + 'N/A')
        print('Price: ' + Price)
        print(" ")
        time.sleep(timeDelay)

1 Answers1

1

You are passing a list here, uClient = uReq(my_urls) as my_urls where a string is required.
You need to pass the individual element of the list i.e. the strings.

Here is the edited code that works for multiple urls.

UPDATED CODE (to get all items):

import time, random, csv, bs4, requests, io
import pandas as pd
timeDelay = random.randrange(5, 20)
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_urls = [
"https://www.lonza.com/products-services/bio-research/electrophoresis-of-nucleic-acids-and-proteins/nucleic-acid-electrophoresis/precast-gels-for-dna-and-rna-analysis/truband-gel-anchors.aspx",
"https://www.lonza.com/products-services/bio-research/transfection/nucleofector-kits-for-primary-cells/nucleofector-kits-for-primary-epithelial-cells/nucleofector-kits-for-human-mammary-epithelial-cells-hmec.aspx",
"https://www.lonza.com/products-services/bio-research/transfection/nucleofector-kits-for-primary-cells/nucleofector-kits-for-primary-neural-cells/nucleofector-kits-for-mammalian-glial-cells.aspx",
]

for url in my_urls:
    print("URL using: ", url)
    uClient = uReq(url)
    page_html = uClient.read()
    uClient.close()
    page_soup = soup(page_html, "html.parser")

    containers = page_soup.findAll('tbody')


    product_name_list =[]
    cat_no_list = []
    size_list = []
    price_list =[]

    for container in containers:
        if (len(container) > 0):
        #try:
            items = container.findAll('tr')
            for item in items:
                item = item.text.split('\n')

                Product_name = item[1]
                product_name_list.append(Product_name)

                CatNo = item[2]
                cat_no_list.append(CatNo)

                #Size_container = container.findAll('div',{'class':'col-xs-2 noPadding'})
                #Size = Size_container[0].text.strip()
                #size_list.append(Size)

                Price = item[6]
                price_list.append(Price)

                print('Product_name: '+ Product_name)
                print('CatNo: ' + CatNo)
                print('Size: ' + 'N/A')
                print('Price: ' + Price)
                print(" ")
            time.sleep(timeDelay)
hsnsd
  • 1,728
  • 12
  • 30
  • would you happen to know why it is only printing the fist item of the table and not the entire table? – user9269112 Jun 29 '18 at 17:35
  • @user9269112 i guess you dint correctly understand the tags within which the data is contained. Inside , every item is within a tag. You need to iterate over items again to get what you want. I have updated the code to do what I just said. – hsnsd Jun 29 '18 at 18:54