1

I have string like below

string = "your invoice number IVR/20170531/XVII/V/12652967 and IVR/20170531/XVII/V/13652967"

I want to get invoice number IVR/20170531/XVII/V/12652967 and IVR/20170531/XVII/V/13652967 into list using regex with this pattern

       result = re.findall(r'INV[/]\d{8}[/](M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/](M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/]\d{7,9}',string)  

but the result is

[('XVII',  '', '','',  '', '',  '',  '', 'X',  'VII', '',  '', '',  'V','','','',  '', '',  '', '',  '', '',  '', '',  'V')]

I tried this pattern in http://regexr.com/, the result is appropriately but in python not

PWS
  • 195
  • 3
  • 11

4 Answers4

0
string = "your invoice number IVR/20170531/XVII/V/12652967 and IVR/20170531/XVII/V/13652967"
results = []
matches = re.finditer(regexpattern, string)
for matchNum, match in enumerate(matches):
    results.append(match.group())
PWS
  • 195
  • 3
  • 11
0

You need to add ?: before all the groups so that you can use non-capturing groups

Try with this regex:

IVR[/]\d{8}[/](?:M{0,4}(?:CM|CD|D?C{0,3})|(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))[/](?:M{0,4}(?:CM|CD|D?C{0,3})|(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))[/]\d{8}

Basically you need to add ?: for each group.

0

You should modify your pattern, add normal brackets around whole regular expression, and afterwards access that text with first back-reference. You can read more about back-references here.

invoices = []
# Your pattern was slightly incorrect
pattern = re.compile(r'IVR[/]\d{8}[/](M{1,4}(CM|CD|D?C{0,3})|(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/](M{1,4}(CM|CD|D?C{0,3})|(XC|XL|L?X{0,3})|(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/]\d{7,9}') 

# For each invoice pattern you find in string, append it to list
for invoice in pattern.finditer(string):
    invoices.append(invoice.group(1))

Note:

You should also use pattern.finditter() because that way you can iterate trough all pattern findings in text you called string. From re.finditer documentation:

re.finditer(pattern, string, flags=0) Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

Aleksandar Makragić
  • 1,957
  • 17
  • 32
0

You can try this one to retrieve number, roman, roman and number values:

IVR\/(\d{8})\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(\d{7,9})

Demo

Snippet

import re

string = "your invoice number IVR/20170531/XVII/V/12652967 and IVR/20170531/XVII/V/13652967"

pattern = r"IVR\/(\d{8})\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(\d{7,9})"

for match in re.findall(pattern, string):
    print(match)

Run online

Stephane Janicaud
  • 3,531
  • 1
  • 12
  • 18