0
>>> line = '\xc2d sdsdfdslkfsdkfjdsf'
>>> pattern_strings = ['\xc2d', '\xe9']
>>> pattern = '|'.join(pattern_strings)
>>> pattern
'\xc2d|\xe9'
>>> import re
>>> re.findall(pattern, line)
['\xc2d']

When I put line in a file and try to do the same regex, it doen't show up anything

def find_pattern(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            pattern_strings = ['\xc2d', '\xe9'] # or using ['\\xc2d', '\xe9'] doesn't help
            pattern = '|'.join(pattern_strings)
            print re.findall(pattern, line)

where path is a file looked as following
\xc2d sdsdfdslkfsdkfjdsf

I see

\xc2d
[]
d\xa0
[]
\xe7
[]
\xc3\ufffdd
[]
\xc3\ufffdd
[]
\xc2\xa0
[]
\xc3\xa7
[]
\xa0\xa0
[]
'619d813\xa03697'
[]
daydreamer
  • 87,243
  • 191
  • 450
  • 722

2 Answers2

2

line = "\xc2d bla" is a Python string where `"\xc2d" is a substring with 2 characters in it.

Your file sounds like it has the literal string "\xc2d" in it, which wouldn't match that pattern.

If you'd like to match the literal string, you'd need to match each of its characters (so, escape the slash).

pattern = r"\\xc2d" 
Julian
  • 3,375
  • 16
  • 27
1

You would need to read file in binary mode f = open("myfile", "rb") to prevent \x conversion, as in Python \xhh represents hex escape characters.

Non-binary reading will fail - check here.

Ωmega
  • 42,614
  • 34
  • 134
  • 203