0

I need to match a letter if it is in the a-c range only if it is followed by another character.

For example, "hello a" should not match 'a' as it is the last character in the string.

    import re 
      
    my_txt = "An investment in knowledge pays the best interest."
    
    def LetterCompiler(txt):
        result = re.findall(r'([a-c]).+?', txt)
        return result
    
    print(LetterCompiler(my_txt))

The problem with this code is that consecutive characters are not matched.

For example, in string "abc", only 'a' is matched, but not the letter 'b' even though it fits the criteria.

I could use the regular expression r"[a-c]" to get all instances, but it cannot remove the match if the character is at the end of the string.

1 Answers1

1

Instead of matching the following character(s), lookahead for a character instead:

import re

my_txt = "An investment in knowledge pays the best interest."

def LetterCompiler(txt):
    result = re.findall(r'[a-c](?=.)', txt)
    return result

print(LetterCompiler(my_txt)) # ['a', 'b']

If you also want to match characters at the end of a line, just not at the end of the string, then negative lookahead for $ instead, or use dotall:

import re

my_txt = """An investment in knowledge pays the best interest. c
foo"""

def LetterCompiler(txt):
    result = re.findall(r'[a-c](?!$)', txt)
    return result

print(LetterCompiler(my_txt)) # ['a', 'b', 'c']
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320