0

I am scraping numbers from a webpage and appending them onto a Python list. The scraped strings take on the following forms: Millions:

  • 1,000,000
  • 1,000,000.9
  • 1,000,000.99
  • 1,000,000.999

Hundreds of thousands (same applies for tens of thousands and thousands):

  • 100,000
  • 100,000.9
  • 100,000.99
  • 100,000.999

Which means: trailing zeros are not displayed in the decimal places.

My list has the following composition:

list = [{'all examples above'}]

I want to format all numbers that are floats into floats, with their respective decimal places, and format all integers into integers (or floats with .0) with the correct comma or period separation.

My current process is simply to eliminate all non-numeric characters:

list = [re.sub("[^0-9]", "", i) for i in list] # remove non-numeric characters
list = [int(i) for i in list] # turn strings into integers

I don't know what to do next because I don't know how to account for the different formating within a single list.

Luiz Scheuer
  • 305
  • 1
  • 10
  • What you want to do is, remove everything that is not a numeric character and a ., IIUC. – Dani Mesejo Sep 19 '21 at 14:51
  • As seen above, I already removed all non-numeric characters, but I don't know how to format them according to their number of decimal places. Because if you remove all non-numeric characters, python doesn't know if 70001 is 700,001 or 700.001. – Luiz Scheuer Sep 19 '21 at 14:52
  • Remove the commas but not not the dots. Then convert to a number using `Decimal`. – BoarGules Sep 19 '21 at 15:16

0 Answers0