0

So I am slightly new to Python but familiar with other scripting languages. How do you include a semicolon in a search string with Python correctly. Whenever I do, I assume python is interpreting it as a new code block and their for not returning the proper results. See sample below:

Sample text file:

<value> I; want; this; line; </value>
<value> And; this; line; </value>
<value> I dont want this line </value>

Code:

import os
import re

find = "<value>*;*"
filename = "C:\\temp\\Sample.txt"

with open (filename, 'r') as infile:
    for line in infile:
        if re.match(find, line):
            print(line)

It is returning all lines rather than just the first and second lines. I have tried multiple different methods around this (including this method) but nothing seams to work. There has to be a simple way to do this, or is Python just really this annoying to work with?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • Thank you! I knew it was something simple I was missing but yes I got it confused with other languages. – Brock0003 Jun 10 '20 at 17:02

2 Answers2

0

It seems like you're confusing regex with another wildcard language (e.g. globbing). * means zero or more of the preceding expression, not zero or more of anything. You need to use . to represent anything.

find = "<value>.*;.*"

To be clear, the problem doesn't really have anything to do with Python.

Check out the Regular Expression HOWTO for more details about using regex.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
0

You're using a wildcard pattern rather than a regexp. The regexp <value>*;* matches <value followed by zero or more > followed by zero or more ;. Every line matches because they all begin with <value.

The correct regexp is

find = "<value>.*;"

. matches any character, and * means to match any number of them. Then it matches ;.

I suggest you read the tutorial at www.regular-expression.info.

Barmar
  • 741,623
  • 53
  • 500
  • 612