Extracing song length and size from HTML using Python

Question

I am making a simple mp3 down-loader from a Website. In doing so I stuck while parsing time and size of audio:

<div class="mp3-info">
    1.69 mins
<br/>
    2.33 mb
</div>

Now I need to parse 1.69 mins and 2.33 mb from above HTML. I am using python 3.4

What do you mean by "parse" exactly? – ljk321 Apr 28 '15 at 11:13 — ljk321, Apr 28 '15 at 11:13
@skyline75489 to get the value of time and size – 0x6773 Apr 28 '15 at 11:42 — 0x6773, Apr 28 '15 at 11:42
Are answers below meeting your needs? – ljk321 Apr 28 '15 at 11:59 — ljk321, Apr 28 '15 at 11:59
@skyline75489 Not at all but I got the answer, – 0x6773 Apr 28 '15 at 12:19 — 0x6773, Apr 28 '15 at 12:19

RvdK · Accepted Answer · 2015-04-28T11:28:16.027

1

I would use BeautifulSoup4 to parse your HTML. See docs here.

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(your_html_string)
soup.findAll("div", {"class": "mp3-info"})
# Now extract the text

Also because it's a class, it could be that there are multiple ones on the page...

edited Apr 28 '15 at 11:28

answered Apr 28 '15 at 11:22

RvdK

score 0 · Answer 2 · edited May 23 '17 at 12:06

0

You can extract text from HTML using lxml library.

After you get the length and size as text out, then you split them pieces. E.g.

 text = ... extract element text using lxml
 minutes, min_suffix, megabytes, mega_suffix = text.split()

 seconds = float(minutes) * 60.0

edited May 23 '17 at 12:06

Community

answered Apr 28 '15 at 11:18

Mikko Ohtamaa

2 Answers2