3

I'm writing a script to download multiple FLACs from a website, and I'm using Beautiful Soup to get the flac link and downloading the links with urlopen

I want BS to search for a link that ends in .flac (I don't know the file name, just the extension EX: 1 file is XXX.flac, the other is YYY.flac)

The HTML for the flac file is here

<b><a class=location href="/soundtracks/index.php">Soundtracks</a><font class=location> &raquo </font><a href="/soundtracks/highquality/index.php">High Quality Game 
Soundtracks [FLAC]</a><font class=location> &raquo </font><a href="/soundtracks/highquality/Metal_Gear_20th_Anniversary/72">Metal Gear 20th Anniversary</a><font class=location> &raquo 01 Metal Gear 20 Years History -Past, Present, Future- Download</font></b><h1>Metal Gear 20th Anniversary Download Links:</h1><a style="font-size: 16px; font-weight:bold;" href="http://50.7.161.234/bks/94/245/Music/[029] MG 20th Anniversary [FLAC]/01 Metal Gear 20 Years History -Past, Present, Future-.flac">Metal Gear 20th Anniversary - 01 Metal Gear 20 Years History -Past, Present, Future-</a> <font face="Verdana" style="font-size: 16px;">Format: FLAC, Size: 76M</font><br> <font face="Verdana" style="font-size: 10px;"><b>Note: If the file starts playing in your browser window, try right-clicking and "Save Target As"</b></font><br>

I have tried to find id. t = soup.find(id="flac") but I don't get any relevant results. I'm quite blank on this I don't know of any way to solve it 

How would I get BS to search and find the file link and then assign that file link to a variable?

import mechanize
import urllib, urllib2, re
from bs4 import BeautifulSoup
####MECHANIZE####
br = mechanize.Browser()
res = br.open("http://www.emuparadise.me/soundtracks/highquality/Metal_Gear_20th_Anniversary/72")
a = 2 #COUNTER FOR LOOP
br.follow_link(text_regex='Download', nr=a)
b = br.geturl() #GETS THE URL
print b


page = urllib2.urlopen(b).read()
soup = BeautifulSoup(page)
soup.prettify()
t = soup.find(id="")
print t
RN_
  • 878
  • 2
  • 11
  • 30
  • I am very new to Beautiful soup, I don't know of any way to solve this, I spent an hour researching, but I couldn't find anything relevant. I am sorry. – RN_ Sep 26 '12 at 23:03
  • You realise that there isn't an element with the `id="flac"` in the html that you show? – Marcin Sep 26 '12 at 23:11
  • As I said, I am a novice How would I get it to search for the class "location"? – RN_ Sep 26 '12 at 23:14
  • Further to @Marcin's comment, "If you pass in a value for an argument called `id`, Beautiful Soup will filter against each tag’s `id` attribute:" (http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-keyword-arguments) – Kev Sep 26 '12 at 23:15
  • See: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class – Kev Sep 26 '12 at 23:16

1 Answers1

3

Your code is trying to match on an id attribute that doesn't exist in the anchor tags linking to those flacs.

Instead use a regex to match href's that end in .flac:

t = soup.find_all(href=re.compile(".flac$"))
Kev
  • 118,037
  • 53
  • 300
  • 385