-1

I am parsing a file with Python which has words and numbers. I am only interested in numbers, i.e. only characters 0 to 9, dot(.) and comma (,). I am interested in keeping both dot and comma because some files are written in the American style numbers, i.e. 3.14159 while some others are in European (German) style, i.e. 3,14159.

I would like to have a simple solution, i.e. without any for loops, without generators, yields or complicated functions. Using regular-expression (re) library is completely fine but it would be great if you can explain what the re.func() is doing so that we understand how to call it differently later if needed.

My input is a string of mixed up numbers and characters. Two consecutive numbers are always separated by one or more characters other than the decimal characters. The desired output should be a list of strings, i.e. one string for each extracted number. Following is an example, where there are three numbers to be separated, i.e. 3.14, 3,14 and 85.2

Example input:

This Is3.14ATes t3,14 85.2

Desired Output:

['3.14', '3,14', '85.2']

My apologies if there is already some other post here which addresses the exact same problem. Though I have searched a lot for a similar questions here but the closest I could find was this: Regular expression numbers with dots and commas, which, however, does not really address my problem because of the format of the input and the desired output. Thanks in advance for your help.

Andy
  • 49,085
  • 60
  • 166
  • 233
  • 1
    E.g. `\d+[,.]\d+`. There are hundreds of variations though. – Wiktor Stribiżew Jun 06 '19 at 08:03
  • 2
    Perhaps better `r'\d+(?:[,.]\d+)*'` for non-decimals too? – yatu Jun 06 '19 at 08:04
  • ie. `re.findall(r'\d+[.,]\d+', "This Is3.14ATes t3,14 85.2")` – furas Jun 06 '19 at 08:04
  • new_strings = re.sub('[^\d.,]', ' ', my_string).split() – Frida Schenker Jun 06 '19 at 08:12
  • 1
    Even better: `re.findall(r'(?<![.,])\d+[,.]{0,1}\d*', s)` – yatu Jun 06 '19 at 08:19
  • Why @WiktorStribiżew has to mark it as duplicate? Why not simply answer the actual question. Live and let live. Thanks. –  Jun 06 '19 at 08:21
  • it is natural to mark it as duplicate. For you it is only one question but we can see the same (or similar) questions few times a day and then it is waste of time to write again and again the same answers. Why to waste time to write again answer if there is duplicate question with much more examples in answers. – furas Jun 06 '19 at 08:45
  • The answer given in the answer linked as a duplicate works straight out of the box: `re.findall(r'\d{1,2}[\,\.]{1}\d{1,2}', s)` gives `['3.14', '3,14', '85.2']`. – MatsLindh Jun 06 '19 at 12:47

1 Answers1

1

You can use a regex like the following:

input_string = 'This Is3.14ATes t3,14 85.2'

match = re.findall("([0-9]+[,.]+[0-9]+)", input_string)

This will find anything in the following format :

(number)(, or .)(number)
Thibault Bacqueyrisses
  • 2,281
  • 1
  • 6
  • 18
metafizicx
  • 11
  • 2