0

I have the following group of numbers:

SalesCost% Margin
2,836,433.182,201,355.6422.39

Expected Result:

I want to separate this and extract the numbers such that I get the result as shown below:

2,836,433.18
2,201,355.64
22.39

Attempt

I tried the (\d+)(?:\.(\d{1,2}))? regex but this only extracts the number until the first decimal, i.e. I only get 2,836,433.18.

Question

Is there a way I can extract the numbers using Regex (or alternatively someway through Python) to get the results shown above?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Dpak
  • 41
  • 2

1 Answers1

2

You can use

re.findall(r'\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?', text)
re.findall(r'(?:\d{1,3}(?:,\d{3})*|\d+)(?:\.\d{1,2})?', text)

See the regex demo #1 and regex demo #2.

Details:

  • \d{1,3} - one, two or three digits
  • (?:,\d{3})* - zero or more occurrences of a comma and three digits
  • (?:\.\d{1,2})? - an optional sequence of . and one or two digits.

The (?:\d{1,3}(?:,\d{3})*|\d+)(?:\.\d{1,2})? variation supports numbers like 123456.12, i.e. no digit grouping symbol containing integer parts.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Dear Wiktor, may I ask what is the purpose of using non capturing group here? – Anoushiravan R Dec 15 '21 at 21:20
  • 2
    @AnoushiravanR Due to the fact `re.findall` returns (list of) tuples if a regex pattern contains a capturing group (groups) a non-capturing group is a common way to work around this problem. See [re.findall behaves weird](https://stackoverflow.com/a/31915134/3832970). You may also be interested in [R's equivalent of Python's re.findall](https://stackoverflow.com/a/43401685/3832970). – Wiktor Stribiżew Dec 15 '21 at 22:08