0

I'm using Python to process a text file, line by line, using RegEx. The first line includes the pattern to be matched, but for some reason Python isn't matching it. If I add a blank line—and make no other changes—it matches that line.

Any thoughts on why this might be happening?

Here's the relevant code:

infile = open(filename, 'r')
fulltext = infile.readlines()

pattern = r'{LO[^{]*}(.)\s(.*)'
regex = re.compile(pattern)

for line in fulltext:
  match = re.match(regex,line)

Here's that first line. Again, it matches when it's not the first line, so I don't see what the issue is.

{LO 1.1a}m Plain text here.
  • 1
    The first line includes BOM, right? Use `re.search` to match text not at the start of the string. Or [read the file in with no BOM](https://stackoverflow.com/questions/13590749/reading-unicode-file-data-with-bom-chars-in-python). – Wiktor Stribiżew Jun 26 '18 at 14:14
  • That is the first line, right there. Sorry I'm new(ish) and have never seen "BOM" before. – Eric Karnowski Jun 26 '18 at 14:21
  • You were right, using `re.search` worked. I'll see if I can find someone to explain this to me, because I don't understand those other answers. Thank you! – Eric Karnowski Jun 26 '18 at 14:25
  • [Here](https://stackoverflow.com/a/180993/3832970), eveyrthing is explained. `re.search` matches anywhere inside the string and `re.match` only matches at the very beginning of the string. – Wiktor Stribiżew Jun 26 '18 at 15:30
  • thanks again... the issue was, as far as I can tell, that *is* the very beginning of the string, so I thought it should work; but apparently the BOM isn't showing up in any of the ways I'm using to view it, so I couldn't see it. – Eric Karnowski Jun 26 '18 at 17:50
  • So, the problem is probably with some invisible whitespace. In most cases, you may either match it with `\s*` or `\W*` (any 0+ non-word chars). – Wiktor Stribiżew Jun 26 '18 at 18:04
  • Good to know, thank you. The re.search solution was fine, and the invisible character (if it's being retained in what I'm doing) isn't causing issues in the final product. But I'll keep these in mind if I come across it again. You've been wonderfully helpful, thank you! – Eric Karnowski Jun 27 '18 at 18:44

0 Answers0