0

How do you iterate through each sub text (fighter) to get the data i need and then leave the sub link to go back to the page with all the fighters names on it and then iterate to the next fighter(link) and get all the data on that fighter and keep doing that until it gets to the end of the list on that specific page.

records=[]
r = requests.get('http://www.espn.com/mma/fighters')
soup = BeautifulSoup(r.text,'html.parser')
data = soup.find_all('tr',attrs={'class':['oddrow','evenrow']})
for d in data:
    try:
        name = d.find('a').text
    except AttributeError: name = ""
    try:
        country = d.find('td').findNext('td').text
    except AttributeError: county = ""

    records.append([name,country])

The above code is where all the fighters names are located. I am able to iterate over each one to collect the (fighters name and country)

links = [f"http://www.espn.com{i['href']}" for i in data.find_all('a') if re.findall('^/mma/', i['href'])][1]
r1 = requests.get(links)
data1 = BeautifulSoup(test.text,'html.parser')
bio = data1.find('div', attrs={'class':'mod-content'})

weightClass = data1.find('li',attrs={'class':'first'}).text
trainingCenter = data1.find('li',attrs={'class':'last'}).text
wins = data1.find('table',attrs={'class':'header-stats'})('td')[0].text
loses = data1.find('table',attrs={'class':'header-stats'})('td')[1].text
draws = data1.find('table',attrs={'class':'header-stats'})('td')[2].text
tkos = data1.find_all('table',attrs={'class':'header-stats'})[1]('td')[0].text
subs = data1.find_all('table',attrs={'class':'header-stats'})[1]('td')[1].text

The above code is currently entering into the second fighter and collecting all the data for that specific fighter(link).

records=[]
r = requests.get('http://www.espn.com/mma/fighters')
soup = BeautifulSoup(r.text,'html.parser')
data = soup.find_all('tr',attrs={'class':['oddrow','evenrow']})
links = [f"http://www.espn.com{i['href']}" for i in data.find_all('a') if re.findall('^/mma/', i['href'])]

for d in data:
    try:
        name = d.find('a').text
    except AttributeError: name = ""
    try:
        country = d.find('td').findNext('td').text
    except AttributeError: county = ""


    for l in links:
        r1 = requests.get(links)
        data1 = BeautifulSoup(test.text,'html.parser')
        bio = data1.find('div', attrs={'class':'mod-content'})
        for b in bio:
            try:
                weightClass = data1.find('li',attrs={'class':'first'}).text
            except AttributeError: name = ""
            try:
                trainingCenter = data1.find('li',attrs={'class':'last'}).text
            except AttributeError: name = ""
            try:
                wins = data1.find('table',attrs={'class':'header-stats'})('td')[0].text
            except AttributeError: name = ""
            try:
                loses = data1.find('table',attrs={'class':'header-stats'})('td')[1].text
            except AttributeError: name = ""
            try:
                draws = data1.find('table',attrs={'class':'header-stats'})('td')[2].text
            except AttributeError: name = ""
            try:
                tkos = data1.find_all('table',attrs={'class':'header-stats'})[1]('td')[0].text
            except AttributeError: name = ""
            try:
                subs = data1.find_all('table',attrs={'class':'header-stats'})[1]('td')[1].text
            except AttributeError: name = ""

    records.append([name,country,weightClass])

The above code is what i am trying, but i am getting an error message: "ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?"

How do i add this to the initial code i have so, i can collect the fighters name and country on the original page and then iterate into the fighters(link) and gather the data you see above and then have it do it for all fighters on that page?

Ezzy
  • 79
  • 1
  • 8
  • Does this answer your question? [Beautiful Soup: 'ResultSet' object has no attribute 'find\_all'?](https://stackoverflow.com/questions/24108507/beautiful-soup-resultset-object-has-no-attribute-find-all) – AMC Mar 22 '20 at 22:39

2 Answers2

1

Check out this solution. I don't have much time at this moment but I will check around as soon as I'm free. You can do the main operation using the following code. The only thing you ned to do is get the data from the target page. The below script can fetch you all the links from each page going through the pagination (a to z) and then from the target page it will collect you the names.

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

url = "http://www.espn.com/mma/fighters?search={}"

for linknum in [chr(i) for i in range(ord('a'),ord('z')+1)]:
    r = requests.get(url.format(linknum))
    soup = BeautifulSoup(r.text,'html.parser')
    for links in soup.select(".tablehead a[href*='id']"):
        res = requests.get(urljoin(url,links.get("href")))
        sauce = BeautifulSoup(res.text,"lxml")
        title = sauce.select_one(".player-bio h1").text
        print(title)
SIM
  • 21,997
  • 5
  • 37
  • 109
0
import requests, re
from urllib.parse import urljoin
from bs4 import BeautifulSoup
import pandas as pd

url = "http://www.espn.com/mma/fighters?search={}"

titleList = []
countryList = []
stanceList = []
reachList = []
ageList = []
weightClassList = []
trainingCenterList = []
winsList = []
losesList =[]
drawsList = []
tkosList = []
subsList = []

#i believe this is what takes us from one page to another, but not 100% sure yet
for linknum in [chr(i) for i in range(ord('a'),ord('z')+1)]:
    r = requests.get(url.format(linknum))
    soup = BeautifulSoup(r.text,'html.parser')
    #a[href*=] gets all anchors a that contain whatever the href*=''
    for links in soup.select(".tablehead a[href*='id']"):
        #urljoin just takes a url and another string and combines them to create a new url
        res = requests.get(urljoin(url,links.get("href")))
        sauce = BeautifulSoup(res.text,"lxml")
        try:
            title = sauce.select_one(".player-bio h1").text
        except AttributeError: title = ""
        try:
            country = sauce.find('span',text='Country').next_sibling
        except AttributeError: country = ""
        try:
            stance = sauce.find('span',text='Stance').next_sibling
        except AttributeError: stance = ""
        try:
            reach = sauce.find('span',text='Reach').next_sibling
        except AttributeError: reach = ""
        try:
            age = sauce.find('span',text='Birth Date').next_sibling[-3:-1]
        except AttributeError: age = ""
        try:
            weightClass = sauce.find('li',attrs={'class':'first'}).text
        except AttributeError: weightClass = ""
        try:
            trainingCenter = sauce.find('li',attrs={'class':'last'}).text
        except AttributeError: trainingCenter = ""
        try:
            wins = sauce.find('table',attrs={'class':'header-stats'})('td')[0].text
        except AttributeError: wins = ""
        try:
            loses = sauce.find('table',attrs={'class':'header-stats'})('td')[1].text
        except AttributeError: loses = ""
        try:
            draws = sauce.find('table',attrs={'class':'header-stats'})('td')[2].text
        except AttributeError: draws = ""
        try:
            tkos = sauce.find_all('table',attrs={'class':'header-stats'})[1]('td')[0].text
        except AttributeError: tkos = ""
        try:
            subs = sauce.find_all('table',attrs={'class':'header-stats'})[1]('td')[1].text
        except AttributeError: subs = ""

        titleList.append(title)
        countryList.append(country)
        stanceList.append(stance)
        reachList.append(reach)
        ageList.append(age)
        weightClassList.append(weightClass)
        trainingCenterList.append(trainingCenter)
        winsList.append(wins)
        losesList.append(loses)
        drawsList.append(draws)
        tkosList.append(tkos)
        subsList.append(subs)

df = pd.DataFrame()
df['title'] = titleList
df['country'] = countryList
df['stance'] = stanceList
df['reach'] = reachList
df['age'] = ageList
df['weightClass'] = weightClassList
df['trainingCenter']= trainingCenterList
df['wins'] = winsList
df['loses'] = losesList
df['draws'] = drawsList
df['tkos'] = tkosList
df['subs'] = subsList

df.to_csv('MMA Fighters', encoding='utf-8')
Ezzy
  • 79
  • 1
  • 8
  • @SIM does this look about right? i am running it right now and it seems to be working, but i am not 100% sure if its going to get stuck in a loop or not. Also, the only part of what you sent me that i dont understand is the: for linknum in [chr(i) for i in range(ord('a'),ord('z')+1)]: – Ezzy Jun 25 '18 at 17:07
  • Check out [this link](https://stackoverflow.com/questions/16060899/alphabet-range-python). It's for controlling alphabetical pagination. – SIM Jun 25 '18 at 18:05