2

I'm parsing a file and want to find wherever either <color = orange> or <color> starts in my file.

Then I want to pull out the value orange.

How would I do this with regular expressions.

So far I have this (which isn't sufficient since it doesn't look for the case where color has a value):

def main():
    basefile = open ("base.txt")
    libfile = open ("file.txt")
    lines = []
    while 1:
        line = libfile.readline()
        lines.append("%s" % libfile.readline())
        if not line:
            break
    inlibrary = 0 
    newlibrary = []
    for line in lines:
        if "<color>" in line:
VoronoiPotato
  • 3,113
  • 20
  • 30
user1328021
  • 9,110
  • 14
  • 47
  • 78

2 Answers2

0

If your problem is a matching regex, look at them:

>>> m = re.match("<color(?:\s*=\s*(.*?))?>", "<color>asdfsdaf")
>>> m, m.groups()
(<_sre.SRE_Match object at 0x7fb0579467b0>, (None,))
>>> m = re.match("<color(?:\s*=\s*(.*?))?>", "<color=fuschia>asdfsdaf")
>>> m, m.groups()
(<_sre.SRE_Match object at 0x7fb057946738>, ('fuschia',))
>>> m = re.match("<color(?:\s*=\s*(.*?))?>", "foobarbaz")
>>> m #None
>>>

But you should really use an XML parser for this job.

utdemir
  • 26,532
  • 10
  • 62
  • 81
0

If you don't want to use a full blown XML parser, this should do:

import re
with open("file.txt") as libfile:
    lines_with_color = []
    for line in libfile.readlines():
       if re.match("<color(=[^>]+)?>", line):
           lines_with_color.append(line)
Manuel Ebert
  • 8,429
  • 4
  • 40
  • 61