-2

I'm trying to match a key-value pairing chunk by chunk instead of substrings, so say I'm trying to match from=email@email.com I'm using from=email@email.com(?!|\S) to make sure I don't hit on substring matches. And according to https://regex101.com/r/ehuXFY/1 it works. But here's my unit tests and the case where the match is at the end of the string doesn't seem to work:

import unittest
import re


class MyRegexFuTestCases(unittest.TestCase):
    def test_something(self):
        lines = [
            'from=test_email@email.com\talias= <test_email@email.com>\trcptlist=test_recipient@email.com\trip=8.8.8.8\tdate=1486528190\tsubject= Test Subject\treply_to=test_email@email.com\treport=leoisafatcat\tattach_3=New List.xls']
        whitelisted_pairs = ['attach_3=New List.xls']
        lines = filter(lambda line: any(
            map(lambda pair: not re.match(r'%s(?!\S)' % pair, line),
                whitelisted_pairs)), lines)
        self.assertEqual(len(lines), 0)

    def test_another_case(self):
        lines = [
            'from=test_email@email.com\talias= <test_email@email.com>\trcptlist=test_recipient@email.com\trip=8.8.8.8\tdate=1486528190\tsubject= Test Subject\treply_to=test_email@email.com\treport=leoisafatcat\tattach_3=New List.xls']
        whitelisted_pairs = ['from=test_email@email.com']
        lines = filter(lambda line: any(
            map(lambda pair: not re.match(r'%s(?!\S)' % pair, line),
                whitelisted_pairs)), lines)
        self.assertEqual(len(lines), 0)

    def test_no_match(self):
        lines = [
            'from=test_email@email.com\talias= <test_email@email.com>\trcptlist=test_recipient@email.com\trip=8.8.8.8\tdate=1486528190\tsubject= Test Subject\treply_to=test_email@email.com\treport=leoisafatcat\tattach_3=New List.xls']
        whitelisted_pairs = ['from=test_email@email.co']
        lines = filter(lambda line: any(
            map(lambda pair: not re.match(r'%s(?!\S)' % pair, line),
                whitelisted_pairs)), lines)
        self.assertEqual(len(lines), 1)

if __name__ == '__main__':
    unittest.main()



..F
======================================================================
FAIL: test_something (__main__.MyRegexFuTestCases)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/stupidfatcat/PycharmProjects/adhoc/so_help.py", line 13, in test_something
    self.assertEqual(len(lines), 0)
AssertionError: 1 != 0

----------------------------------------------------------------------
Ran 3 tests in 0.001s

FAILED (failures=1)
Stupid.Fat.Cat
  • 10,755
  • 23
  • 83
  • 144

1 Answers1

2

You're using re.match when you should be using re.search.

match attempts to match the regex at the beginning of the line. Thus, only the two last test cases match because they start at the very beginning of the line.

search on the other hand, has the behavior you expected. It matches the regex against any portion of the line.

pchaigno
  • 11,313
  • 2
  • 29
  • 54
  • 1
    `re.match` does not attempt to match the regex "against the whole line". It only matches at the start of string. It is not Java where `.matches()` requires a full string match. – Wiktor Stribiżew Aug 15 '17 at 19:27
  • Oh, right. I'm learning something :-) Never done regex in Java though. I edited my answer. – pchaigno Aug 15 '17 at 19:29
  • `re.match` is not aware of *lines*, it will always look for a match at the start of the *whole string*. Even if there is a `re.MULTILINE` flag. – Wiktor Stribiżew Aug 15 '17 at 19:30
  • I'm using the word `line` here because, in his code, each string against which the regex matches is a line. – pchaigno Aug 15 '17 at 19:31