-2

I'm trying to extract unique invoice ids from strings like this:

1) Payment of invoice nr.2021-3-5450
2) Invoice 2021 3 27 has been paid

Words can change, but the Invoice id format is always:

 - YEAR-MONTH-CUSTOMER_ID, or
 - YEAR MONTH CUSTOMER_ID

Customer_ID can be from 1 to 9999.

I have tried this:

m = re.search(r"\d+", s)

But it only returns 2021. Is there a way that I can capture all numbers in the above formats?

Mako212
  • 6,787
  • 1
  • 18
  • 37
bll
  • 3
  • 2
  • Please update to show the work you have attempted so far (or at least minimal code that you have tried with) – rv.kvetch Sep 20 '21 at 17:14
  • 1
    Thanks for the suggestion @rv.kvetch, I have updated the question. – bll Sep 20 '21 at 17:29
  • 1
    `re.findall(r'\d+', s)` works for you. – Ryszard Czech Sep 20 '21 at 20:48
  • yes, that's actually a good point. something that simple should actually work. however it wasn't too clear in the OP whether the regex needs to perform validation, like checking if customer_id is not above max value (9999) for example. If the answer is no, then this answer does indeed work perfectly. – rv.kvetch Sep 20 '21 at 20:55

1 Answers1

0

Try this out in the Regex playground: Link

Regex: ([\d]{4})[- ](0?[1-9]|1[0-2])[- ](\d{1,4})\b

Explanation: Matches year as a 4-digit number, month as an integer between 1-12 (including leading zeros), and customer id as an integer from 0-9999; the values can be separated by either dashes or spaces. The groups will be captured as (year, month, customer_id) in that order.


Python demo:

import re
from typing import Optional, NamedTuple


invoice_re = re.compile(r'([\d]{4})[- ](0?[1-9]|1[0-2])[- ](\d{1,4})\b')

# NamedTuple that contains the invoice data
Invoice = NamedTuple('Invoice', year=int, month=int, customer_id=int)


def parse_invoice(invoice: str) -> Optional[Invoice]:
    """Parse an invoice, and return a tuple of (year, month, customer_id)"""
    result = invoice_re.search(invoice)
    return Invoice(*map(int, result.groups())) if result else None


s1 = 'Payment of invoice nr.2021-3-5450'
s2 = 'Invoice 2021 3 27 has been paid'

print(parse_invoice(s1))
print(parse_invoice(s2))
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53