0

I want to extract digits after 'ID' occurrence in the following text This is how I am able to get it.

import re

txt="Recharge done on 28-12-2017 04:57PM,MRP:Rs9.00,GST 18% payable by Company/Distributor/Retailer:Rs1.37, ID 147894886."

# 'ID' need to be present as mandatory group
regex = '(id)(.*?)(\d+})' 

rg = re.compile(regex ,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
if m:
    print m.group(3)

When I run the following code, it prints

147894886

Here comes the problem

If txt become like this

txt="Recharge done on 28-12-2017 04:57PM,MRP:Rs9.00,GST 18% payable by Company/Distributor/Retailer:Rs1.37, TransID 147894886."

and "Trans" word appears before "ID" then I dont want to extract digits. How to do that in regex (i.e don't extract digits if "TransID" is present before digits but only if "ID" is present then extract digits)

  • 1
    Are you specifically looking for `trans` or do you want to ensure, that `id` is a whole word. If the later, see https://stackoverflow.com/questions/1751301/regex-match-entire-words-only – Sebastian Proske Feb 08 '18 at 13:16
  • I want to ensure that characters before ID should not be '(trans|trx|transc)' etc. – Ankur Choraywal Feb 08 '18 at 13:28

2 Answers2

2

You can use a negative lookbehind [doc] :

(?<!trans)(id)(.*?)(\d+)

Demo

Or, as Sebastian Proske suggests, you can use a word boundary :

\b(id)(.*?)(\d+)

Demo

Pandraghon
  • 62
  • 2
  • 4
  • What if i want to use this regex ->>> (?<!trans|transaction|trx|tranc|transc)(id|account|landline|fixedline|towards|dsl|a\/c|ca|no|number)(.*?)(\d{6,12}) this is giving error as error: look-behind requires fixed-width pattern – Ankur Choraywal Feb 08 '18 at 13:26
  • If multiple possibilities for negative lookbehind are there eg. transacid OR transid OR trxid then also it should not match and return group – Ankur Choraywal Feb 08 '18 at 13:30
  • You can separate the excluded strings in several negative lookbehind groups : [like here](https://regex101.com/r/n8q5X1/3) – Pandraghon Feb 08 '18 at 13:30
  • Worked like a charm. I didn't knew that it can be done like that. Thanks a lot ! Cheers :) – Ankur Choraywal Feb 08 '18 at 13:32
0

You can use a word boundary (\b) to make sure that ID is a full word.

\b(id)(.*?)(\d+)

It might also help to match your pattern less generally. If you always have ID followed by a space, followed by 9 numbers, you can use this regex:

\b(id)([ ])(\d{9})

Pythex Demo

Darrick Herwehe
  • 3,553
  • 1
  • 21
  • 30