I am working on an app that uses beatifulsoup, Python, requests and django. I've been kind of grasping how to use beautiful soup. But drilling down seems to different elements is confusing at times. I created a function, albeit not the best, that scrapes links from posts and uses those links to go to the posts detail page. And from that page scrape the the script data that contains the the face book url and the image associated with it. This is the the code
from my scraper.py
def panties():
pan_url = 'http://www.panvideos.com'
html = requests.get(pan_url, headers=headers)
soup = BeautifulSoup(html.text, 'html5lib')
video_row = soup.find_all('div', {'class': 'video'})
def youtube_link(url):
youtube_page = requests.get(url, headers=headers)
soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
video_row = soupdata.find('div', {'class': 'video-player'})
entries = [{'text': str(div),
} for div in video_row][3]
return entries
entries = [{'text': div.h4.text,
'href': div.a.get('href'),
'tube': youtube_link(div.a.get('href')),
} for div in video_row][:3]
return entries
from my views.py
pan = panties()
context = {
'pan': pan,
}
return render(request, 'index.html', context)
and in my template
{% for p in pan %}
Title: {{p.text}}<br>
Href: {{p.href}}<br>
Tube: {{p.tube}}<hr>
{% endfor %}
and heres what it returns
Title: Juanka - Esperando por ti (Official Video)
Href: http://www.videos.com/video/2962/juanka-esperando-por-ti-official-video-/
Tube: {'text': '<script type="text/javascript">jwplayer("video-setup").setup({file:"http://www.youtube.com/watch?v=QL4JFUHd71o",image:"http://i1.ytimg.com/vi/QL4JFUHd71o/maxresdefault.jpg",primary:"html5",stretching:"fill","controlbar":"bottom",width:"100%",aspectratio:"16:9",autostart:"true",logo:{file:"http://www.panvideos.com/uploads/gopcds-png5787dbcd53a72.png",position:"bottom-right",link:"http://www.panvideos.com/"},sharing:{link:"http://www.panvideos.com/video/2962/juanka-esperando-por-ti-official-video-/","sites":["facebook","twitter","linkedin","pinterest","tumblr","googleplus","reddit"]}});</script>'}
my thing is I only want
http://www.youtube.com/watch?v=QL4JFUHd71o
and
http://i1.ytimg.com/vi/QL4JFUHd71o/maxresdefault.jpg
which are the video and image respectively. How can I accomplish this. My code is not set in stone and I don't mind changing it to make it work. Thanks for any advice i advance.