How do I modify code to parse multiple URL?

Question

I have this code that gets all child URLs within a page.

How do I parse multipe URLs through this code?

from bs4 import BeautifulSoup
import requests

headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/91.0.4472.114 Safari/537.36'}
source = requests.get("https://www.oddsportal.com/soccer/england/efl-cup/results/", headers=headers)

soup = BeautifulSoup(source.text, 'html.parser')
main_div = soup.find("div", class_="main-menu2 main-menu-gray")
a_tag = main_div.find_all("a")
for i in a_tag:
    print(i['href'])

How do I modify it to run for multiple URLs

while my URL list is as:

df:

|    | URL                                                                 |
|----|---------------------------------------------------------------------|
|  0 | https://www.oddsportal.com/soccer/nigeria/npfl-pre-season/results/  |
|  1 | https://www.oddsportal.com/soccer/england/efl-cup/results/          |
|  2 | https://www.oddsportal.com/soccer/europe/guadiana-cup/results/      |
|  3 | https://www.oddsportal.com/soccer/world/kings-cup-thailand/results/ |
|  4 | https://www.oddsportal.com/soccer/poland/division-2-east/results/   |

I tried parsing it this way :

headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/91.0.4472.114 Safari/537.36'}
for url in df:
    source = requests.get(df['URL'], headers=headers)

    soup = BeautifulSoup(source.text, 'html.parser')
    main_div = soup.find("div", class_="main-menu2 main-menu-gray")
    a_tag = main_div.find_all("a")
    for i in a_tag:
        print(i['href'])

However I am getting this error:

line 742, in get_adapter
    raise InvalidSchema("No connection adapters were found for {!r}".format(url))

How can I modify the same to parse multiple URLs?

you can create a function that takes url and loop through each of the urls in the list calling that function. — MEdwin, Jul 05 '21 at 09:20
@NizamMohamed Sorry, I have corrected the body to define `df` — , Jul 05 '21 at 09:22
@PyNoob_N you are iterating over the `df` dataframe, you've to iterate over the column values which is `df['URL']` — αԋɱҽԃ αмєяιcαη, Jul 05 '21 at 09:22
@MEdwin I am trying this [question and its solution from here](https://stackoverflow.com/questions/40629457/scrape-multiple-urls-using-beautiful-soup) However I am unable to progress — , Jul 05 '21 at 09:23
@αԋɱҽԃαмєяιcαη Yes, isnt that what I have done here? How should I progress? Thank you! — , Jul 05 '21 at 09:24

score 0 · Accepted Answer · answered Jul 05 '21 at 09:29

0

change

for url in df:
    source = requests.get(df['URL'], headers=headers)

To

for url in df['URL']:
    source = requests.get(url, headers=headers)

answered Jul 05 '21 at 09:29

αԋɱҽԃ αмєяιcαη

11,825
3
17
50

This works. I am just not as good at developer logic as I had thought haha. Apologies for this noob question now. How do I save the `print(i['href'])` to a csv? – Jul 05 '21 at 09:36
1

@PyNoob_N You welcome, Well, since you are new to `Stack Overflow!` community, We are limited to provide an answer for the question being asked, so moving from a question to another one after receiving an answer within same post is forbidden. you've to create another post for that. Also, kindly check [ask] and provide [mcve] – αԋɱҽԃ αмєяιcαη Jul 05 '21 at 09:40
1

You are correct! However before asking, I will try it on my own and if I dont get an asnwer, I will ask a new question. Thank you again for the guidance. – Jul 05 '21 at 09:42

How do I modify code to parse multiple URL?

1 Answers1