-1

I have a string that I need to match using regex. It works perfectly fine when I have a single occurrence in a single line, however, when there are multiple occurrences of the same string in a single line I'm not getting any matches. Can you please help?

Sample strings:

MS17010314 MS00030208 IL00171198 IH09850115 IH99400409 IH99410409
IL01771010 IL01791002 IL01930907 IL02360907 CM00010904 IH09520115
MS00201285 MS19050708 MS00370489 MS19011285T

Regex that I tried:

(([A-Z]{2}[0-9]{8,9}[A-Z]{1})|([A-Z]{2}[0-9]{8,9}))
Veltzer Doron
  • 934
  • 2
  • 10
  • 31
Venkatesh
  • 133
  • 2
  • 12
  • 1
    can you show your regex code, it could be that its just matching the first occurance, which program language are you using – rawwar May 21 '18 at 12:21
  • Kalyan, I am using Python. I am testing this in regexonline.com to verify prior to adding the code in my program. – Venkatesh May 21 '18 at 12:22
  • 2
    if you're using python use findall/ finditer or search instead of match, usually match matches the entire string (which is why you're not getting anything), see https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match – Veltzer Doron May 21 '18 at 12:23
  • @Venkatesh, i just answered it..check it out – rawwar May 21 '18 at 12:23
  • Ah! thank you. I got the problem.. As I was performing a bitwise OR operator, it was a missing closing parenthesis that was causing the error! Thank you all! – Venkatesh May 21 '18 at 12:34

2 Answers2

1

i tried using python and the following code worked

import re
s='''MS17010314 MS00030208 IL00171198 IH09850115 IH99400409 IH99410409
IL01771010 IL01791002 IL01930907 IL02360907 CM00010904 IH09520115
MS00201285 MS19050708 MS00370489 MS19011285T'''
lst_of_regex = [a,b]
pattern = '|'.join(lst_of_regex)
print(re.findall(pattern,s))
rawwar
  • 4,834
  • 9
  • 32
  • 57
  • Thanks Kalyan. The problem is that I have atleast 20 other strings that I need to match. So, I am creating a list of all the regular expressions and performing a bitwise | operator to add all the regex, before performing a re.findall(pattern) over the string. – Venkatesh May 21 '18 at 12:25
  • i just modified it, take a look – rawwar May 21 '18 at 12:27
  • one note is, keep each of your regex patterns in () – rawwar May 21 '18 at 12:29
1

This seems to work fine:

a = '''MS17010314 MS00030208 IL00171198 IH09850115 IH99400409 IH99410409
IL01771010 IL01791002 IL01930907 IL02360907 CM00010904 IH09520115
MS00201285 MS19050708 MS00370489 MS19011285T'''

import re

patterns = ['[A-Z]{2}[0-9]{8,9}[A-Z]{1}','[A-Z]{2}[0-9]{8,9}']
pattern = '({})'.format(')|('.join(patterns))

matches = re.findall(pattern, a)

print([match for sub in matches for match in sub if match])
#['MS17010314', 'MS00030208', 'IL00171198', 'IH09850115', 'IH99400409',
# 'IH99410409', 'IL01771010', 'IL01791002', 'IL01930907', 'IL02360907',
# 'CM00010904', 'IH09520115', 'MS00201285', 'MS19050708', 'MS00370489',
# 'MS19011285T']

I've added a way to combine all patterns.

zipa
  • 27,316
  • 6
  • 40
  • 58