0

I am currently trying to write a code that allows me to iterate through a list of google searches, for then to be scraping with beautifulsoup the relevant URL links of the google search and use pandas to extract a table within each URL link.

The last part seems to be working ok when I run one Google search at a time, but as I try to loop through a list of keywords, the following attribute error appears:

ResultSet object has no attribute 'findAll'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

The code runs through the first iteration but stops at the second for loop at the second iteration.

Any ideas on how to solve it?

import time
import pandas as pd
import requests
#from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

mylist = [205085000, 205520000, 205545190, 205566030, 205622000]

filename = 'ShipInfo.csv'

TableHeader = 'Vessel Name; IMO number;  Vessel Type; Year Built; \n'
f = open(filename, 'w')
f.write(TableHeader)

for mmsi in mylist:
    
    print(mmsi)
    # Make two strings with default google search URL
    # 'https://google.com/search?q=' and
    # our customized search keyword.
    # Concatenate them
    mmsi_number = str(mmsi)
    
    web_page= "vesselfinder+"
    url = 'https://google.com/search?q=' + web_page + mmsi_number
    print(url)  
    html = requests.get(url, headers=headers).text
    
    #html parsing
    soup = soup(html, 'html.parser')
    
    #extract google search results containing "www.vesselfinders.com" 
    links = []
    for link in soup.findAll('a', href=True):
        
        if 'www.vesselfinder.com' in link.get('href'):
            links.append(link.get('href'))
    print('Total number of links from google are: ', len(links))
    
    #Request the url from the first google result       
    r = requests.get(links[0],  headers=headers)
    
    #use pandas to extract tables from the web page, and create a summary dictonary
    df = pd.read_html(r.text)
    ship = pd.concat([df[1], df[2]], ignore_index=True).set_index(0).to_dict()[1]
    
    print('The Ship was built in: ', ship['Year of Built'])
    

    
    f.write(ship['Vessel Name'] + '; ' + ship['IMO number'] + '; ' + ship['Ship type'] + '; ' +  ship['Year of Built'] + '\n')
    
f.close() 
Arthur Morris
  • 1,253
  • 1
  • 15
  • 21
OlK
  • 1
  • 1
  • 1
    Welcome by SO - Please improve your question / code, it causes another error first. – HedgeHog Dec 10 '21 at 07:43
  • It causes _several_ errors first. This question may help: https://stackoverflow.com/questions/2870667/how-to-convert-an-html-table-to-an-array-in-python – Arthur Morris Dec 10 '21 at 09:37

0 Answers0