-1

Im stuck at a problem with python 2.7.12 using BeautifulSoup to scrape some webpage data, I really can't figure how to scrape a specific 'title=' tag within a <a href link </a>

Until now I get output with this code:

    import urllib2
    from bs4 import BeautifulSoup

    hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"}
    url = 'REMOVED'

    req=urllib2.Request(url,headers=hdr)
    urllib2.urlopen(url).read()
    html = urllib2.urlopen(req).read()
    soup=BeautifulSoup(html,"html5lib")

    players = soup.find_all("td", {"data-title": "Navn"})

    player_data = ""
    saveFile = open('player_data.txt','w')

for item in players:

    player_data = item.contents[0].encode("utf-8")
    print player_data
    saveFile.write (player_data)

saveFile.close()    

I get lines of data in this format:

<a href="/da/player/123/lionel-messi/" title="Lionel Messi">Lionel Messi</a>

Could anyone please help me to get the specific name from 'title=' I just can't seem to get it working...

Thanks in advance :)

BulletEyeDK
  • 65
  • 1
  • 8
  • oh well, im sorry if you think it's a duplicate... im pretty new into python programming, and have been stuck with this issue for 2 days now, and believe me, i've read and tried out numerous aproaches to this, also on similar qustions from stackoverflow, but i've not seen another question similar to my problem... please link me to that original question, thanks ;) - i reckon this probably is pretty easy to overcome with years of python experience, but i haven't :) – BulletEyeDK Jul 28 '16 at 20:45

1 Answers1

3

In order to get the title from href code:

players = soup.find('a')['title']

Output:

Lionel Messi

What is soup.find('a')['title']?

  • .find('a') means find the a href tag
  • ['title] means get the title attribute from a tag
  • Thanks for your comment, somehow... i still can't figure how to get it working, im sorry, but im pretty fresh working with python... can i solve this in a "one-liner" code with my actual line of code: players = soup.find_all("td", {"data-title": "Navn"}) as this one needs to be there... otherwise i don't have the data to start with – BulletEyeDK Jul 28 '16 at 20:05
  • running exact line of code gives me this error: TypeError: 'NoneType' object is not iterable – BulletEyeDK Jul 28 '16 at 20:26
  • Thanks for useful information, I got it working with `code` player_data = item.contents[0]['title'].encode("utf-8") `code`´ Thanks ;) – BulletEyeDK Jul 31 '16 at 09:07