-1

How can I get the text "Lionel Messi" from this HTML code?

<a href="/20/player/44079/lionel-messi" class="player_name_players_table">Lionel Messi</a>

This is my code so far:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

page = requests.get('https://www.futbin.com/players')
soup = BeautifulSoup(page.content, 'lxml')

pool = soup.find(id='repTb')


player_names = pool.find_all(class_='player_name_players_table')


print(player_names[0])

When I print player_names I get this result:

/Users/ejps/PycharmProjects/scraper_players/venv/bin/python /Users/ejps/PycharmProjects/scraper_players/scraper.py
<a class="player_name_players_table" href="/20/player/44079/lionel-messi">Lionel Messi</a>

Process finished with exit code 0

But what code would I have to put in to get only the text of it?

I want to scrape all player names form that page in my code. But first I need to find a way to get that text extracted I think.

Cant find a way to make it work unfortunately.

I am new to python and try to do some projects to learn it.

EDIT:

With the help from comments I was able to get the text I need.

I only have one more question here.

Is it possible to find class_ by partial text only?

Like this:

prating = pool.find_all(class_='form rating ut20')

The full class would be

class="form rating ut20 toty gold rare"

but it is changing. The part that is always the same is "form rating ut20" so I thought maybe there is some kind of a placeholder that let me search for all "class" names inlcuding "form rating ut20"

Could you maybe help me with this as well?

exec85
  • 447
  • 1
  • 5
  • 21
  • Does this answer your question? [BeautifulSoup getText from between

    , not picking up subsequent paragraphs](https://stackoverflow.com/questions/12451997/beautifulsoup-gettext-from-between-p-not-picking-up-subsequent-paragraphs), https://stackoverflow.com/questions/38133759/how-to-get-text-from-span-tag-in-beautifulsoup

    – Rakesh Jan 16 '20 at 10:40
  • You should split this into two different questions. – Jack Fleeting Jan 16 '20 at 13:45

2 Answers2

2

To select specific class you can use either regular expression or if you have installed version bs4 4.7.1 or above you can use css selector.

Using regular expression will get list of element.

import re
prating = pool.find_all(class_=re.compile("form rating ut20"))

Or Using css selector will get list of element.1st css selector means contains and other one means starts-with.

prating = pool.select('[class*="form rating ut20"]')

OR

prating = pool.select('[class^="form rating ut20"]')
KunduK
  • 32,888
  • 5
  • 17
  • 41
1

Get text using the getText() method.

 player_names[0].getText()
Asad
  • 930
  • 7
  • 10
  • giving me this error: AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()? Process finished with exit code 1 – exec85 Jan 16 '20 at 10:51
  • You are trying to call the getText() on a list. Make sure you call getText() on a single item from the list. – Asad Jan 16 '20 at 10:57
  • 1
    import requests import urllib.request import time from bs4 import BeautifulSoup page = requests.get('https://www.futbin.com/players') soup = BeautifulSoup(page.content, 'lxml') pool = soup.find(id='repTb') player_names = pool.find_all(class_='player_name_players_table') print(player_names[0].getText()) – Asad Jan 16 '20 at 10:58
  • 1
    If you want to get the list of all player names: [player_names[i].getText() for i in range(0,len(player_names))] – Asad Jan 16 '20 at 11:17
  • Also, if you found this answer useful please upvote it. Thanks – Asad Jan 16 '20 at 11:18
  • thanks that helped! Do you also have an Idea for my EDIT maybe? – exec85 Jan 16 '20 at 12:45
  • Yes, you can use regex – Asad Jan 16 '20 at 12:59