1

I am making a simple scraping program.

First, the user will write the name of a footballer, then I will make a link to transfermarkt.com web search, and then I would like to enter the first link and scrape data from the footballer's profile. Unfortunately, I have a problem with selenium. How do I enter a website programmatically and scrape data from the site?

Here is my code:

from urllib.request import urlopen
import bs4
from bs4 import BeautifulSoup
from selenium import webdriver

data = input('Enter name: ')
data = data.replace(" ", "+")
print(data)
link = 'https://www.transfermarkt.pl/schnellsuche/ergebnis/schnellsuche?query='
search = link + data + '&x=0&y=0'
print(search)
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
driver.find_element_by_css_selector('.spielprofil_tooltip tooltipstered').click()
name_box = soup.find('h1', attrs={'class': 'dataValue'})
print(name_box)

It works only to line print(search), but then I'm lost. Browser is open, but there is only data:, in the address bar.

noname
  • 29
  • 8
  • 1
    one suggestion, why don't u retrieve the content of a `url` with `requests.get().text` then pass that to `BeautifulSoup` ?this will eliminate the need for selenium. I can whip up a quick solution for you if you like :) – Cryptoharf84 Feb 04 '20 at 19:35
  • You dont need to use BeautifulSoup with selenium, also you dont need to use selenium here :) Use selenium when you can't use requests or when you just want to write your code fast. – Ekrem Dinçel Feb 04 '20 at 19:37
  • You appear to be mixing/confusing multiple different approaches/libraries. I don't know how much we can do for now. – AMC Feb 04 '20 at 19:51
  • @Cryptoharf84 Hello, thank you for your response. I heard beautiful soup is better for scraping but i heard i am not able to do rest of job with it, that's why decided to use selenium. If it is possible to do it without selenium i will appreciate your help :) – noname Feb 04 '20 at 19:55
  • @noname Saying that BeautifulSoup is _better_ than Selenium, or vice-versa, would make little sense, since they serve very different purposes, they're not competitors. – AMC Feb 04 '20 at 21:32
  • ok, thank you, but am i able to enter website via Beautiful Soup knowing only name of class which contains my href? – noname Feb 09 '20 at 19:58
  • FYI it’s __scrape__ not scrap – DisappointedByUnaccountableMod May 03 '21 at 20:12

2 Answers2

0

You only need that for headless browser:

from selenium import webdriver
#####
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu') 
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", options=options)

But as I said, you dont need to use selenium here. Use selenium when you can't use requests or when you just want to write your code fast.

Browser is open, but there is only data:, in the address bar.

Because you didn't get the url in browser:

browser.get(source)
Ekrem Dinçel
  • 1,053
  • 6
  • 17
  • Thank you for you explanation. Maybe you have idea how to change this line 'driver.find_element_by_css_selector('.spielprofil_tooltip tooltipstered').click()' when i will not use selenium? – noname Feb 04 '20 at 20:23
  • Well, I said "or when you just want to write your code fast." because it is a bit more work with requests. You need to find that anchor (link) and get ``href`` attribute of it (via BeautifulSoup). This attribute is the link when you go if you click on this anchor. – Ekrem Dinçel Feb 04 '20 at 20:28
  • Try to achieve this, if you cant, I can help you. – Ekrem Dinçel Feb 04 '20 at 20:30
  • I don't know if you mean this: for example it will be "href="/paulo-dybala/profil/spieler/206050"" - But at the beginning, i don't know which player will be searched by user so this href is unknown and i can't use it. – noname Feb 04 '20 at 20:38
  • You can find it via BeautifulSoup by using it's class selector ('.spielprofil_tooltip tooltipstered') – Ekrem Dinçel Feb 04 '20 at 20:39
  • Ok, so i assume we have now something like that: 'soup.find('h1', attrs={'class': 'spielprofil_tooltip tooltipstered'})' ? – noname Feb 04 '20 at 20:54
  • Yes, something like it. – Ekrem Dinçel Feb 04 '20 at 20:56
  • Good, thank you :) Maybe you know why i get information 'name soup is not defined'? Should i define something? I tried 'import soup' but it says there is no module soup but i did before 'pip3 install soup' and i imported also other beautifulsoup staff – noname Feb 04 '20 at 21:01
  • Hmm. Did you find your code form internet? You can [see here](https://pypi.org/project/beautifulsoup4/) how you should use BeautifulSoup. – Ekrem Dinçel Feb 04 '20 at 21:04
  • i read docs but i still have problem with this app, i tried to use requests.get to enter new website via class name but it does not work – noname Feb 09 '20 at 19:55
  • Okey, so write your problems in order. And I will help you. If you want, you can add your code too. And [let us continue in chat](https://chat.stackoverflow.com/rooms/207507/myroom) – Ekrem Dinçel Feb 09 '20 at 20:40
  • Thank you, so now i want to scrap data from website, but i don't know address this website, i know only class name. First i tried to use 'name_box = soup.find('h1', attrs={'class': 'spielprofil_tooltip tooltipstered'})' but i get an error that i should use HTTP client like requests, but i didn't find any solution to use requests with class name, this line: 'name_box = requests.get('h1', attrs={'class': 'spielprofil_tooltip tooltipstered'}) is not working for me, i know request doesn't use attrs, but i don't know how to replace it with working command – noname Feb 09 '20 at 21:27
  • It is better to continue [in a chat room](https://chat.stackoverflow.com/rooms/207507/myroom), may you join? – Ekrem Dinçel Feb 10 '20 at 14:09
  • i would like to join but i don't have enough reputation to talk there – noname Feb 10 '20 at 18:07
  • Oh okay :D I don't know why there is a limit for chat. I wrote a code for you and added comments in it. It finds the link address of person. You can find code here: https://gist.github.com/EkremDincel/a2fdaf919a8050f3dac51a2504872afd – Ekrem Dinçel Feb 10 '20 at 19:08
  • Oh thank you very much! it works very good! :) i will read it carefully. I have also a question, for example a href will not have a class but will be contained for example in h3 which will have class name. What is to change in code? I think changing in line 33 'a' to 'h3' and writing this class name is not enough because it will not work. – noname Feb 10 '20 at 19:22
  • You are welcome :D If you want to find child elements like that, you should have a look at CSS selectors: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors – Ekrem Dinçel Feb 10 '20 at 19:34
  • Thank you very much again, you help me the most :D i used 'select' method and some line to extract url and it worked :) now i can try to scarp data :D – noname Feb 12 '20 at 17:59
  • How beautiful for you! – Ekrem Dinçel Feb 17 '20 at 13:50
0

Seems you were close. It works only to the line print(search) because though you have constructed the desired url as search you haven't invoked get() passing the url. So you need to pass the url as follows:

  • Code Block:

    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    data = input('Enter name: ')
    data = data.replace(" ", "+")
    print(data)
    link = 'https://www.transfermarkt.pl/schnellsuche/ergebnis/schnellsuche?query='
    search = link + data + '&x=0&y=0'
    print(search)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get(search)
    
  • Console Output:

    Enter name: Kylian Mbappé
    Kylian+Mbappé
    https://www.transfermarkt.pl/schnellsuche/ergebnis/schnellsuche?query=Kylian+Mbappé&x=0&y=0
    

Now there can be diverse reasons behind seeing the text data:, within the address bar. The error stack trace would have helped us to debug the issue in a better way. However, in majority of the cases this error is due to either of the following issues:

  • is not installed at the expected default location.
  • Incompatibility between the version of the binaries you are using.

Reference

You can find a detailed discussion in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thank you for your answer :) I will remember that, but with your solution this website still appears on my web browser, i just would like it to open in program to do next stuff, i don't want her to be displayed. – noname Feb 04 '20 at 20:47
  • @noname Checkout the updated answer and let me know the status. – undetected Selenium Feb 04 '20 at 21:08
  • Unfortunately it still doesn't show this footballer profile, it only shows searching results – noname Feb 09 '20 at 20:02