1

I am making a simple mp3 down-loader from a Website. In doing so I stuck while parsing time and size of audio:

<div class="mp3-info">
    1.69 mins
<br/>
    2.33 mb
</div>

Now I need to parse 1.69 mins and 2.33 mb from above HTML. I am using python 3.4

Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
0x6773
  • 1,116
  • 1
  • 14
  • 33

2 Answers2

1

I would use BeautifulSoup4 to parse your HTML. See docs here.

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(your_html_string)
soup.findAll("div", {"class": "mp3-info"})
# Now extract the text

Also because it's a class, it could be that there are multiple ones on the page...

RvdK
  • 19,580
  • 4
  • 64
  • 107
0

You can extract text from HTML using lxml library.

Here is related StackOverflow question https://stackoverflow.com/a/4624146/315168

After you get the length and size as text out, then you split them pieces. E.g.

 text = ... extract element text using lxml
 minutes, min_suffix, megabytes, mega_suffix = text.split()

 seconds = float(minutes) * 60.0
Community
  • 1
  • 1
Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435