-1

I am newbie in Regular expressions,I want to extract amount from a given text,Here is my code:

import pandas as pd
import re
msg='He was paid USD 2,000.00 & USD 500 on 19-02-2018 at 08:15:24.'

pattern = re.compile(r'USD\s+(\d+)')

matches = pattern.finditer(msg)


for match in matches:
    print(match)

I want output as 2000 and 500,But currently I am getting USD 2 as output.Please help. Note:The original message is very long but all the amounts have USD preceding them.

mujahir
  • 187
  • 4
  • 11
  • 1
    See https://ideone.com/L0sKWR – Wiktor Stribiżew Sep 09 '18 at 10:54
  • If you are waiting for a solution with a single line of code to extract `2000` from `2,000.00`, bear in mind you cannot match discontinuous texts within one match operation. – Wiktor Stribiżew Sep 09 '18 at 11:58
  • `(?<= )(?:\d+[.,]?)+(?= )`?? ie `re.findall(r'(?<= )(?:\d+[.,]?)+(?= )',msg)` but this will capture other numbers like 02, 17,87.87,87 etc.. just to anchor it to the values, if at all we have a specified currency, then it will be necessary to do `re.findall(r'(?<=USD )(?:\d+[.,]?)+(?= )',msg)` – Onyambu Sep 09 '18 at 17:37

2 Answers2

1

This will be the correct pattern: r'USD\s+([\d,\.]+)'

>>> pattern = re.compile(r'USD\s+([\d,\.]+)')
>>> matches = pattern.finditer(msg)
>>> for match in matches:
...     print(match)
...
<re.Match object; span=(12, 24), match='USD 2,000.00'>
<re.Match object; span=(27, 34), match='USD 500'>

You need to include the commas (,) and dots (.) in your regex. \d will match only the numbers.

Once you remove them from the final matches, your job is done.

Bharel
  • 23,672
  • 5
  • 40
  • 80
  • What if I only want to display the first amount in a line i.e only USD 2,000.00.I will be reading a file line by line. – mujahir Sep 09 '18 at 13:17
0

Try this one. It will work.

USD\s+(\d+)(,*)(\d+)
Paolo
  • 21,270
  • 6
  • 38
  • 69
Imran S
  • 157
  • 2
  • 10