1

my code:-

mean_distances = [td.text.strip() for td in rows[3].find_all('td') ]

Output:-

mean_distances: ['Mean distance from the Sun', 'km   AU', '57,909,175  
 0.38709893', '108,208,930   0.72333199', '149,597,890   1.00000011', '227,936,640   1.52366231', '778,412,010   5.20336301', '1,426,725,400  
 9.53707032', '2,870,972,200   19.19126393', '4,498,252,900   30.06896348']

I want to extract only the numbers in km (ignore the second number expressing the distance in AU) and covert the str data to float value. How can I do it?

quamrana
  • 37,849
  • 12
  • 53
  • 71
n53
  • 39
  • 3

4 Answers4

2
# loop over the items in mean_distances
# skip the first two rows which are the caption and column headers
for item in mean_distances[2:]:

    # split this item into km and au
    km, au = item.split()

    # remove commas from km
    km = km.replace(",", "")

    # convert km to float
    km = float(km)
John Gordon
  • 29,573
  • 7
  • 33
  • 58
0
float_value = [float(item.split()[1]) for item in mean_distances[2:]]
MendelG
  • 14,885
  • 4
  • 25
  • 52
Jean S
  • 13
  • 5
0

If you know that the items like Mean distance from the Sun and km AU will always take up exactly the first two items, you can use the str.split() method to cut off everything after the first space and the float() method to convert it to a float:

temp = []
for distance in mean_distances[2:]:
    temp.append(float(distance.split(" ")[0].replace(",", "")))
mean_distances = temp

This gives an output of [57909175.0, 108208930.0, 149597890.0, 227936640.0, 778412010.0, 1426725400.0, 2870972200.0, 4498252900.0].

Denendaden
  • 213
  • 2
  • 7
0

Try this:

m=[]
for i in range(2,len(l)):
    m.append(l[i].split()[0])
    res=[float(i.replace(',', '')) for i in m]
  
>>> print(res)

[57909175.0, 108208930.0, 149597890.0, 227936640.0, 778412010.0, 1426725400.0, 2870972200.0, 4498252900.0]

If you want to keep ',' as thousands separators, to be more readable, see here for some solutions:

How to print number with commas as thousands separators?

IoaTzimas
  • 10,538
  • 2
  • 13
  • 30