0

I need to extract a string from a document with the following regex pattern in python. string will always start with either "AK" or "BK"..followed by numbers or letters or - or /(any order) This string pattern can contain anywhere in the document

document_text="""
This is the organization..this is the address. 
some information
AK3418CPMP
lot of other information down
"""
I have written following code.
pattern="(?:AK|BK)[A-Za-z0-9-/]+"
res_list=re.findall(pattern,document_text)

but I am getting the list contains AKs and BKs something like this

res_list=['AKBN','BKCPU','AK3418CPMP']
it is also matching the words "AKBN", "BKCPU" along with the required "AK3418CPMP". I want conditions to be following to extract only 1 string "AK3418CPMP": 
1.string should start with AK or BK 
2.It should followed by letters and numbers or numbers and letters 
3.It can contain "-" or "/" 

How can I do that?

TLanni
  • 330
  • 1
  • 4
  • 15
  • 1
    Use a non capture group `(?:AK|BK)[A-Za-z0-9-/]+` or shorted it to `[AB]K[A-Za-z0-9-/]+` – The fourth bird Aug 02 '21 at 10:53
  • 1
    @Thefourthbird I think it's `[AB]K`, not `A[BK]`. – fsimonjetz Aug 02 '21 at 10:56
  • Thanks but doing that it's also matching the words "AKBN", "BKCPU" along with the required "AK3418CPMP". I want conditions to be following: 1.string should start with AK or BK 2.It should followed by letters and numbers or numbers and letters 3.It can contain "-" or "/" I would really appreciate the help – TLanni Aug 02 '21 at 11:02

0 Answers0