I'm trying to parse web-site and get 4 URLs for video files. Links example: https://cs510400.vk.me/3/u381845574/videos/e8f1419d5b.720.mp4
First i grub HTML code and find tag witch contains my links. And find current line with my links.
My code:
# coding: utf-8
import requests
from bs4 import BeautifulSoup
import re
r = requests.get('https://vk.com/video-63758929_456249306')
soup = BeautifulSoup(r.content,'lxml')
scripts = soup.find_all('script')
current_tag = scripts[-1].string
links = re.findall('^.*source.*$',current_tag,re.MULTILINE)
current_line = []
for x in links:
current_line.append(x)
print(current_line)
I got this result:
[u'ajax.preload(\'al_video.php\', {"act":"show","video":"-63758929_456249306","module":"direct"}, ["\u041d\u0435\u043c\u043d\u043e\u0433\u043e \u043f\u043e\u0442\u0430\u0441\u043a\u0443\u0445\u0430","<div id=\\"video_box_wrap-63758929_456249306\\" class=\\"video_box_wrap\\">\\n <video id=\\"video_player\\" poster=\\"https:\\/\\/pp.vk.me\\/c836534\\/v836534929\\/16e40\\/DWpFw6tiZDQ.jpg\\" preload=\\"none\\" controls onplaying=\\"cur.incViews && cur.incViews()\\">\\n <source src=\\"https:\\/\\/cs510400.vk.me\\/3\\/u381845574\\/videos\\/e8f1419d5b.720.mp4?extra=hX6mAywSWUELjj_xFMO6YyRMX8rqNcm193BCOJzcZxIFlwHMD5ApeTI1DW9euordOFWDVq3m4ii7OAsbPKO8y901BjBcWz7nv5U-dKOz6i69zJNmaAeQiNEezDylB3s\\" type=\\"video\\/mp4\\"><\\/source><source src=\\"https:\\/\\/cs510400.vk.me\\/3\\/u381845574\\/videos\\/e8f1419d5b.480.mp4?extra=hX6mAywSWUELjj_xFMO6YyRMX8rqNcm193BCOJzcZxIFlwHMD5ApeTI1DW9euordOFWDVq3m4ii7OAsbPKO8y901BjBcWz7nv5U-dKOz6i69zJNmaAeQiNEezDylB3s\\" type=\\"video\\/mp4\\"><\\/source><source src=\\"https:\\/\\/cs510603.vk.me\\/3\\/u381845574\\/videos\\/e8f1419d5b.360.mp4?extra=hX6mAywSWUELjj_xFMO6YyRMX8rqNcm193BCOJzcZxIFlwHMD5ApeTI1DW9euordOFWDVq3m4ii7OAsbPKO8y901BjBcWz7nv5U-dKOz6i69zJNmaAeQiNEezDylB3s\\" type=\\"video\\/mp4\\"><\\/source><source src=\\"https:\\/\\/cs510603.vk.me\\/3\\/u381845574\\/videos\\/e8f1419d5b.240.mp4?extra=hX6mAywSWUELjj_xFMO6YyRMX8rqNcm193BCOJzcZxIFlwHMD5ApeTI1DW9euordOFWDVq3m4ii7OAsbPKO8y901BjBcWz7nv5U-dKOz6i69zJNmaAeQiNEezDylB3s\\" type=\\"video\\/mp4\\"><\\/source>\\n <div class=\\"video_box_background\\" style=\\"background-image:url(https:\\/\\/pp.vk.me\\/c836534\\/v836534929\\/16e40\\/DWpFw6tiZDQ.jpg);\\"><\\/div>\\n <div class=\\"video_box_cant_play\\">\u0414\u0430\u043d\u043d\u043e\u0435 \u0432\u0438\u0434\u0435\u043e \u043d\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u043f\u0440\u043e\u0438\u0433\u0440\u0430\u043d\u043e \u043d\u0430 \u044d\u0442\u043e\u043c \u0443\u0441\u0442\u0440\u043e\u0439\u0441\u0442\u0432\u0435<\\/div>\\n <\\/video>\\n<\\/div>","\\naddTemplates({\\"_\\":\\"_\\",\\"audio_row\\":\\"<div class=\\\\\\"audio_row _audio_row _audio_row_%1%_%0% %cls% clear_fix\\\\\\" onclick=\\\\\\"return getAudioPlayer().toggleAudio(this, event)\\\\\\" data-audio=\\\\\\"%serialized%\\\\\\" data-full-id=\\\\\\"%1%_%0%\\\\\\" id=\\\\\\"audio_%1%_%0%\\\\\\">\\\\n <div class=\\\\\\"audio_play_wrap\\\\\\" data-nodrag=\\\\\\"1\\\\\\"><button class=\\\\\\"audio_play _audio_play\\\\\\" id=\\\\\\"play_%1%_%0%\\\\\\" aria-label=\\\\\\"\\\\\\"><\\\\\\/button><\\\\\\/div>\\\\n <div class=\\\\\\"audio_info\\\\\\">\\\\n <div class=\\\\\\"audio_duration_wrap _audio_duration_wrap\\\\\\">\\\\n <div class=\\\\\\"audio_hq_label\\\\\\"><\\\\\\/div>\\\\n <div class=\\\\\\"audio_duration _audio_duration\\\\\\">%duration%<\\\\\\/div>\\\\n <div class=\\\\\\"audio_acts\\\\\\">\\\\n <div class=\\\\\\"audio_act\\\\\\" id=\\\\\\"recom\\\\\\" onmouseover=\\\\\\"audioShowActionTooltip(this, \'%1%_%0%\')\\\\\\" onclick=\\\\\\"AudioPage(this).showRecoms(this, \'%1%_%0%\', event)\\\\\\"><div><\\\\\\/div><\\\\\\/d
...
But i need only my 4 links. What i'm doing wrong? How to get only links from this large tag?