2

I am using BeautifulSoup4 to build a script that does financial calculations. I have successfully extracted data to a list, but only need the float numbers from the elements.

For Example:

Volume = soup.find_all('td', {'class':'text-success'})

print (Volume)

This gives me the list output of:

[<td class="text-success">+1.3 LTC</td>, <td class="text- success">+5.49<span class="muteds">340788</span> LTC</td>, <td class="text-success">+1.3 LTC</td>,]

I want it to become:

[1.3, 5.49, 1.3]

How can I do this?

Thank-you so much for reading my post I greatly appreciate any help I can get.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Possible duplicate of http://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string-in-python – Jack Evans Sep 11 '16 at 14:59
  • The list is obviously not a valid python list. Do you mean `["+1.3", ...]`? – linusg Sep 11 '16 at 15:00
  • @linusg not a valid python list, but this is how the `BeautifulSoup`'s `ResultSet` string representation looks like. – alecxe Sep 11 '16 at 15:04

2 Answers2

2

You can find the first text node inside every td, split it by space, get the first item and convert it to float via float() - the + would be handled automatically:

from bs4 import BeautifulSoup

data = """
<table>
    <tr>
        <td class="text-success">+1.3 LTC</td>
        <td class="text-success">+5.49<span class="muteds">340788</span> LTC</td>
        <td class="text-success">+1.3 LTC</td>
    </tr>
</table>"""

soup = BeautifulSoup(data, "html.parser")

print([
    float(td.find(text=True).split(" ", 1)[0])
    for td in soup.find_all('td', {'class':'text-success'})
])

Prints [1.3, 5.49, 1.3].

Note how the find(text=True) helps to avoid extracting the 340788 in the second td.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
1

You can do

>>> import re
>>> re.findall("\d+\.\d+", yourString)
['1.3', '5.49', '1.3']
>>> 

Then to convert to floats

>>> [float(x) for x in re.findall("\d+\.\d+", yourString)]
[1.3, 5.49, 1.3]
>>> 
Jack Evans
  • 1,697
  • 3
  • 17
  • 33