-1

I have this code that gets all child URLs within a page.

How do I parse multipe URLs through this code?

from bs4 import BeautifulSoup
import requests

headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/91.0.4472.114 Safari/537.36'}
source = requests.get("https://www.oddsportal.com/soccer/england/efl-cup/results/", headers=headers)

soup = BeautifulSoup(source.text, 'html.parser')
main_div = soup.find("div", class_="main-menu2 main-menu-gray")
a_tag = main_div.find_all("a")
for i in a_tag:
    print(i['href'])

How do I modify it to run for multiple URLs

while my URL list is as:

df:

|    | URL                                                                 |
|----|---------------------------------------------------------------------|
|  0 | https://www.oddsportal.com/soccer/nigeria/npfl-pre-season/results/  |
|  1 | https://www.oddsportal.com/soccer/england/efl-cup/results/          |
|  2 | https://www.oddsportal.com/soccer/europe/guadiana-cup/results/      |
|  3 | https://www.oddsportal.com/soccer/world/kings-cup-thailand/results/ |
|  4 | https://www.oddsportal.com/soccer/poland/division-2-east/results/   |

I tried parsing it this way :

headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/91.0.4472.114 Safari/537.36'}
for url in df:
    source = requests.get(df['URL'], headers=headers)

    soup = BeautifulSoup(source.text, 'html.parser')
    main_div = soup.find("div", class_="main-menu2 main-menu-gray")
    a_tag = main_div.find_all("a")
    for i in a_tag:
        print(i['href'])

However I am getting this error:

line 742, in get_adapter
    raise InvalidSchema("No connection adapters were found for {!r}".format(url))

How can I modify the same to parse multiple URLs?

1 Answers1

0

change

for url in df:
    source = requests.get(df['URL'], headers=headers)

To

for url in df['URL']:
    source = requests.get(url, headers=headers)
  • This works. I am just not as good at developer logic as I had thought haha. Apologies for this noob question now. How do I save the `print(i['href'])` to a csv? –  Jul 05 '21 at 09:36
  • 1
    @PyNoob_N You welcome, Well, since you are new to `Stack Overflow!` community, We are limited to provide an answer for the question being asked, so moving from a question to another one after receiving an answer within same post is forbidden. you've to create another post for that. Also, kindly check [ask] and provide [mcve] – αԋɱҽԃ αмєяιcαη Jul 05 '21 at 09:40
  • 1
    You are correct! However before asking, I will try it on my own and if I dont get an asnwer, I will ask a new question. Thank you again for the guidance. –  Jul 05 '21 at 09:42