1

Im triying to generate an array of [url_audioenci,url_caratula,titulo_cancion,nombre_artista] to download a list of music from http://los40.com.ar/lista40/. I know how to download media with Requests library, but i cant extract and the links from the page

from bs4 import BeautifulSoup
import requests
# import re
url = 'http://los40.com.ar/m/lista40/'
videos = []
response = requests.get(url)
bs = BeautifulSoup(response.text)
for i in range (1,41):
    videos[i]= bs.find_all('datos_camcion_'+i))
# responses= bs.find_all('script', language="javascript", type="text/javascript")

print(videos)
<h3>LISTA DEL 08/06/2019</h3>
<script language="javascript" type="text/javascript">
  var datos_cancion_1 = Array();
  datos_cancion_1['url_audioenci']         = 'https://recursosweb.prisaradio.com/audios/dest/570005645440.mp4';
  datos_cancion_1['url_muzu']         = '';
  datos_cancion_1['url_youtube']      = 'https://www.youtube.com/watch?v=XsX3ATc3FbA';
  datos_cancion_1['url_itunes']       = '';
  datos_cancion_1['posicion']         = '1';
  datos_cancion_1['url_caratula']     = 'https://recursosweb.prisaradio.com/fotos/dest/570005645461.jpg';
  datos_cancion_1['titulo_cancion']   = 'Boy with luv';
  datos_cancion_1['nombre_artista']   = 'BTS;Halsey';
  datos_cancion_1['idYes']            = 'BTS';
  datos_cancion_1['VidAu']            = 0;
</script>

I expect

videos=[['https://recursosweb.prisaradio.com/audios/dest/570005645440.mp4','https://recursosweb.prisaradio.com/fotos/dest/570005645461.jpg','Boy with luv','BTS;Halsey'].....]

1 Answers1

0

My attempt at filtering the data:

from bs4 import BeautifulSoup
import requests

url = 'http://los40.com.ar/m/lista40/'
videos = []
response = requests.get(url)
bs = BeautifulSoup(response.text, features="html5lib")

scripts = bs.find_all('script', language='javascript', type='text/javascript') 
end = len( bs.find_all('script', language='javascript', type='text/javascript') )
start = end - 40
data = []

for i in range( start, end ):
    data.append( str(scripts[ i ]) )

print( data[0] ) 

Output:

<script language="javascript" type="text/javascript">
  var datos_cancion_1 = Array();
  datos_cancion_1['url_audioenci']         = 'https://recursosweb.prisaradio.com/audios/dest/570005645440.mp4';
  datos_cancion_1['url_muzu']         = '';
  datos_cancion_1['url_youtube']      = 'https://www.youtube.com/watch?v=XsX3ATc3FbA';
  datos_cancion_1['url_itunes']       = '';
  datos_cancion_1['posicion']         = '1';
  datos_cancion_1['url_caratula']     = 'https://recursosweb.prisaradio.com/fotos/dest/570005645461.jpg';
  datos_cancion_1['titulo_cancion']   = 'Boy with luv';
  datos_cancion_1['nombre_artista']   = 'BTS;Halsey';
  datos_cancion_1['idYes']            = 'BTS';
  datos_cancion_1['VidAu']            = 0;
</script>

Data[0:39] contains the top 40 and all the relevant data as strings, but I'm not sure how to extract the information from the strings.

There are some suggestions in this thread via import json or import re that I tried fiddling with, but I couldn't get them to work.