0

I'm reading a file and I need to replace certain empty tags ([[Image:]]).

The problem is every replacement has to be unique.

Here's the code:

import re
import codecs

re_imagematch = re.compile('(\[\[Image:([^\]]+)?\]\])')

wf = codecs.open('converted.wiki', "r", "utf-8")
wikilines = wf.readlines()
wf.close()

imgidx = 0
for i in range(0,len(wikilines)):
 if re_imagematch.search(wikilines[i]):
  print 'MATCH #######################################################'
  print wikilines[i]
  wikilines[i] = re_imagematch.sub('[[Image:%s_%s.%s]]' % ('outname', imgidx, 'extension'), wikilines[i])
  print wikilines[i]
  imgidx += 1

This does not work, as there can be many tags in one line:

Here's the input file.

[[Image:]][[Image:]]
[[Image:]]

This is what the output should look like:

[[Image:outname_0.extension]][Image:outname_1.extension]]
[[Image:outname_2.extension]]

This is what it currently looks likeö

[[Image:outname_0.extension]][Image:outname_0.extension]]
[[Image:outname_1.extension]]

I tried using a replacement function, the problem is this function gets only called once per line using re.sub.

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
Marki
  • 660
  • 8
  • 24

2 Answers2

3

You can use itertools.count here and take some advantage of the fact that default arguments are calculated when function is created and value of mutable default arguments can persist between function calls.

from itertools import count

def rep(m, cnt=count()):
    return '[[Image:%s_%s.%s]]' % ('outname', next(cnt) , 'extension')

This function will be invoked for each match found and it'll use a new value for each replacement.

So, you simply need to change this line in your code:

wikilines[i] = re_imagematch.sub(rep, wikilines[i])

Demo:

def rep(m, count=count()):
    return str(next(count))

>>> re.sub(r'a', rep, 'aaa')
'012'

To get the current counter value:

>>> from copy import copy
>>> next(copy(rep.__defaults__[0])) - 1
2
Community
  • 1
  • 1
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • @AshwiniChaudhary although the current counter value works, it might be easier to wrap in a class that exposes a property of the previously yielded value... Although - it's quite a bit more work :p – Jon Clements Aug 10 '14 at 18:04
1

I'd use a simple string replacement wrapped in a while loop:

s = '[[Image:]][[Image:]]\n[[Image:]]'
pattern = '[[Image:]]'
i = 0
while s.find(pattern) >= 0:
    s = s.replace(pattern, '[[Image:outname_' + str(i) + '.extension]]', 1)
    i += 1
print s
Falko
  • 17,076
  • 13
  • 60
  • 105
  • Since I'm not a big expert in Python, and neither are the people here that have to understand this too, I am accepting your answer. Thanks. – Marki Aug 11 '14 at 15:44