how to read links from a list with beautifulsoup?

Question

I have a list with lots of links and I want to scrape them with beautifulsoup in Python 3

links is my list and it contains hundreds of urls. I have tried this code to scrape them all, but it's not working for some reason

 links= ['http://www.nuforc.org/webreports/ndxe201904.html',
'http://www.nuforc.org/webreports/ndxe201903.html',
'http://www.nuforc.org/webreports/ndxe201902.html',
'http://www.nuforc.org/webreports/ndxe201901.html',
'http://www.nuforc.org/webreports/ndxe201812.html',
'http://www.nuforc.org/webreports/ndxe201811.html',...]

raw = urlopen(i in links).read()
ufos_doc = BeautifulSoup(raw, "html.parser")

Please show the contents of the list in your post. – glhr Apr 12 '19 at 12:32 — glhr, Apr 12 '19 at 12:32
What language or IDE are you writing this code? – Danon Apr 12 '19 at 12:34 — Danon, Apr 12 '19 at 12:34

score 0 · Accepted Answer · answered Apr 12 '19 at 13:04

raw should be a list containing the data of each web-page. For each entry in raw, parse it and create a soup object. You can store each soup object in a list (I called it soups):

links= ['http://www.nuforc.org/webreports/ndxe201904.html',
'http://www.nuforc.org/webreports/ndxe201903.html',
'http://www.nuforc.org/webreports/ndxe201902.html',
'http://www.nuforc.org/webreports/ndxe201901.html',
'http://www.nuforc.org/webreports/ndxe201812.html',
'http://www.nuforc.org/webreports/ndxe201811.html']

raw = [urlopen(i).read() for i in links]
soups = []
for page in raw:
    soups.append(BeautifulSoup(page,'html.parser'))

You can then access eg. the soup object for the first link with soups[0].

Also, for fetching the response of each URL, consider using the requests module instead of urllib. See this post.

score 0 · Answer 2 · answered Apr 13 '19 at 03:11

You need a Loop over the list links. If you have a lot of these to do, as mentioned in other answer, consider requests. With requests you can create a Session object which will allow you to re-use connection thereby more efficiently scraping

import requests
from bs4 import BeautifulSoup as bs

links= ['http://www.nuforc.org/webreports/ndxe201904.html',
'http://www.nuforc.org/webreports/ndxe201903.html',
'http://www.nuforc.org/webreports/ndxe201902.html',
'http://www.nuforc.org/webreports/ndxe201901.html',
'http://www.nuforc.org/webreports/ndxe201812.html',
'http://www.nuforc.org/webreports/ndxe201811.html']

with requests.Session as s:
    for link in links:
        r = s.get(link)
        soup = bs(r.content, 'lxml')
        #do something

how to read links from a list with beautifulsoup?

2 Answers2