Regex Grouping and extraction of particular group based on regex look behind

Question

I want to extract digits after 'ID' occurrence in the following text This is how I am able to get it.

import re

txt="Recharge done on 28-12-2017 04:57PM,MRP:Rs9.00,GST 18% payable by Company/Distributor/Retailer:Rs1.37, ID 147894886."

# 'ID' need to be present as mandatory group
regex = '(id)(.*?)(\d+})' 

rg = re.compile(regex ,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
if m:
    print m.group(3)

When I run the following code, it prints

147894886

Here comes the problem

If txt become like this

txt="Recharge done on 28-12-2017 04:57PM,MRP:Rs9.00,GST 18% payable by Company/Distributor/Retailer:Rs1.37, TransID 147894886."

and "Trans" word appears before "ID" then I dont want to extract digits. How to do that in regex (i.e don't extract digits if "TransID" is present before digits but only if "ID" is present then extract digits)

Are you specifically looking for `trans` or do you want to ensure, that `id` is a whole word. If the later, see https://stackoverflow.com/questions/1751301/regex-match-entire-words-only — Sebastian Proske, Feb 08 '18 at 13:16
I want to ensure that characters before ID should not be '(trans|trx|transc)' etc. — Ankur Choraywal, Feb 08 '18 at 13:28

Pandraghon · Accepted Answer · 2018-02-08T13:27:30.980

2

You can use a negative lookbehind [doc] :

(?<!trans)(id)(.*?)(\d+)

Demo

Or, as Sebastian Proske suggests, you can use a word boundary :

\b(id)(.*?)(\d+)

Demo

edited Feb 08 '18 at 13:27

answered Feb 08 '18 at 13:15

Pandraghon

62
2
4

What if i want to use this regex ->>> (?<!trans|transaction|trx|tranc|transc)(id|account|landline|fixedline|towards|dsl|a\/c|ca|no|number)(.*?)(\d{6,12}) this is giving error as error: look-behind requires fixed-width pattern – Ankur Choraywal Feb 08 '18 at 13:26
If multiple possibilities for negative lookbehind are there eg. transacid OR transid OR trxid then also it should not match and return group – Ankur Choraywal Feb 08 '18 at 13:30
You can separate the excluded strings in several negative lookbehind groups : [like here](https://regex101.com/r/n8q5X1/3) – Pandraghon Feb 08 '18 at 13:30
Worked like a charm. I didn't knew that it can be done like that. Thanks a lot ! Cheers :) – Ankur Choraywal Feb 08 '18 at 13:32

score 0 · Answer 2 · answered Feb 08 '18 at 13:40

You can use a word boundary (\b) to make sure that ID is a full word.

\b(id)(.*?)(\d+)

It might also help to match your pattern less generally. If you always have ID followed by a space, followed by 9 numbers, you can use this regex:

\b(id)([ ])(\d{9})

Pythex Demo

Regex Grouping and extraction of particular group based on regex look behind

2 Answers2