3

How to gather links from "View More Campaigns" using Python 3? I wish to gather all 260604 links from this page? https://www.gofundme.com/mvc.php?route=category&term=sport

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
una
  • 57
  • 3

2 Answers2

2

When clicking on the View More Campaigns button, the browser requests the following URL:

https://www.gofundme.com/mvc.php?route=category/loadMoreTiles&page=2&term=sport&country=GB&initialTerm=

This could be used to request further pages as follows:

from bs4 import BeautifulSoup    
import requests

page = 1
links = set()
length = 0

while True:
    print("Page {}".format(page))
    gofundme = requests.get('https://www.gofundme.com/mvc.php?route=category/loadMoreTiles&page={}&term=sport&country=GB&initialTerm='.format(page))
    soup = BeautifulSoup(gofundme.content, "html.parser")
    links.update([a['href'] for a in soup.find_all('a', href=True)])

    # Stop when no new links are found
    if len(links) == length:
        break

    length = len(links)
    page += 1

for link in sorted(links):
    print(link)

Giving you an output starting like:

https://www.gofundme.com/100-round-kumite-rundraiser
https://www.gofundme.com/10k-challenge-for-disabled-sports
https://www.gofundme.com/1yeti0
https://www.gofundme.com/2-marathons-1-month
https://www.gofundme.com/23yq67t4
https://www.gofundme.com/2fwyuwvg

Some of the links returned are duplicates, so a set is used to avoid this. The script continues to attempt to request new pages until no new links are seen, which appears to happen at around 18 pages.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
1

From retrieve links from web page using python and BeautifulSoup

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('https://www.gofundme.com/mvc.php?route=category&term=sport')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        print (link['href'])
whackamadoodle3000
  • 6,684
  • 4
  • 27
  • 44
  • This won't gather all the fundraising campaign links the OP wants, only the campaigns that are initially on the page. – hoefling Nov 22 '17 at 20:49