2

I have problem related to case insensitive search for regular expression. Here is part of the code that I wrote:

engType = 'XM665'

The value of engType was extracted from other files. Based on the engType, I want to find lines in another text file which contain this part and extract description infomation from that line, the description part will be between the engType string and 'Serial'.

for instance:

lines = ['xxxxxxxxxxx','mmmmmmmmmmm','jjjjj','xM665 Module 01 Serial (10-11)']
pat = re.compile(engType+'(.*?)[Ss][Ee][Rr][Ii][Aa][Ll]')
for line in lines:
    des = pat.search(line).strip()
    if des:
        break;
print des.group(1).strip()

I know the result will be an error, since the case of my string engType is different from what it is in 'xM665 Module 01 Serial (10-11)', I understand that I can use [Ss] to do the case insensitive comparisons as What I have done in the last part of pat. However, since my engType is a variable, I could not apply that on a variable. I knew I could search in lower case like:

lines = ['xxxxxxxxxxx','mmmmmmmmmmm','jjjjj','xM665 Module 01 Serial (10-11)']
pat = re.compile(engType.lower()+'(.*?)serial')
for line in lines:
    des = pat.search(line.lower()).strip()
    if des:
        break; 
print des.group(1).strip()

result:

module 01

The case is now different compared to Module 01. If I want to keep the case, how can i do this? Thank you!

fyr91
  • 1,253
  • 5
  • 17
  • 33

2 Answers2

4

re.IGNORECASE is the flag you're looking for.

pat = re.compile(engType+'(.*?)[Ss][Ee][Rr][Ii][Aa][Ll]',re.IGNORECASE)

Or, more simply re.compile(engType+'(.*?)serial',re.IGNORECASE).

also, bug in this line:

des = pat.search(line.lower()).strip()

Remove the .strip(); if pat.search() is None you will get an AttributeError.

roippi
  • 25,533
  • 4
  • 48
  • 73
1

Check out re.IGNORECASE in http://docs.python.org/3/library/re.html

I believe it'll look like:

pat = re.compile(engType.lower()+'(.*?)serial', re.IGNORECASE)
dstromberg
  • 6,954
  • 1
  • 26
  • 27
  • Ya, This works, thank you! I think re.compile(engType+'(.*?)serial', re.INGORECASE) will do, no need to lower the case of engType – fyr91 Nov 22 '13 at 05:32