1

I'm looking to match many files against some common templates, and extract the differences. I'd like suggestions on the best way to do this. For example:

Template A:

<1000 text lines that have to match>
a=?
b=2
c=3
d=?
e=5
f=6
<more text>

Template B:

<1000 different text lines that have to match>
h=20
i=21
j=?
<more text>
k=22
l=?
m=24
<more text>

If I passed in file C:

<1000 text lines that match A>
a=500
b=2
c=3
d=600
e=5
f=6
<more text>

I'd like an easy way to say this matches template A, and extract 'a=500', 'd=600'.

I could match these with a regex, but the files are rather large, and building that regex would be a pain.

I've also tried difflib, but parsing the opcodes and extracting the differences doesn't seem optimal.

Anyone have a better suggestion?

Peter Hofmann
  • 59
  • 1
  • 6

2 Answers2

3

You may have to tweak it a little to handle the additional text, as I dont know the exact format, but it shouldn't bee too difficult.

with open('templ.txt') as templ, open('in.txt') as f:
    items = [i.strip().split('=')[0] for i in templ if '=?' in i]
    d = dict(i.strip().split('=') for i in f)
    print [(i,d[i]) for i in items if i in d]

out:

[('a', '500'), ('d', '600')]  # With template A
[]                            # With template B

or if aligned:

from itertools import imap,compress
with open('templ.txt') as templ, open('in.txt') as f:
    print list(imap(str.strip,compress(f,imap(lambda x: '=?' in x,templ))))  

out:

['a=500', 'd=600']
root
  • 76,608
  • 25
  • 108
  • 120
  • First thank you, this is a way to extract the data I hadn't thought of. It doesn't seem to help find the matching template though. If I run it with templ.txt set to Template B then I get ['c=3', 'e=5']. I'm looking for a way to iterate over the templates, find the matching one, and then extract the data. – Peter Hofmann Jan 23 '13 at 16:48
  • @PeterHofmann -- added one that shoud work with any template. – root Jan 23 '13 at 16:53
0

Not looking into performance:

  1. Load everything into a dictionary, so that you have e.g. A = {'a': '?', 'b': 2, ...}, B = {'h': 20, 'i': 21, ...}, C = {'a': 500, 'b': 2, ...}

  2. If A.keys() == C.keys() you know that C matched A.

  3. Then simply diff both dictionaries.

Improve as needed.

Community
  • 1
  • 1
E.Z.
  • 6,393
  • 11
  • 42
  • 69
  • Sorry, I should have make my example clearer, there's more than just values in the file, there's static lines, comments, lots of other stuff that should match as well. – Peter Hofmann Jan 23 '13 at 16:49