Python substitute with numbering inside patterns

Question

I'm trying to come up with a python script to automatically number footnotes in pandoc markdown. Given input like this:

This is a testing document for testing purposes only.[^0] This is a testing document for testing purposes only. This is a testing document for testing purposes only.[^121][^5] This is a testing document for testing purposes only.

[^0]: Footnote contents.

[^0]: Footnote contents.

[^0]: Footnote contents.

It should produce output like this:

This is a testing document for testing purposes only.[^1] This is a testing document for testing purposes only. This is a testing document for testing purposes only.[^2][^3] This is a testing document for testing purposes only.

[^1]: Footnote contents.

[^2]: Footnote contents.

[^3]: Footnote contents.

I've been able to make the basic functionality work, but I'm stuck on how to cover the case of two footnotes on one line. Perhaps the loops should not be line based? Or should I opt for some sort of nested loop, replacing nth occurence of a pattern (which, as I understand from this question is not trivial)?

And since I'm trying to learn as much as possible from this, feel free to drop any comments or pointers for further improvements. Thanks!

Here is the script I have so far:

import re
from sys import argv

script, first = argv

i=0
n=0
buff = ''

# open the file specified through the first argument
f = open(first, 'rU')

for line in f:
    if re.search(r'\[\^[0-9]+\]:', line):
        i += 1
        line2 = re.sub(r'\[\^[0-9]+\]:', '[^' + str(i) + ']:', line)
        buff += line2

    elif re.search(r'\[\^[0-9]+\]', line):
        n += 1
        line3 = re.sub(r'\[\^[0-9]+\]', '[^' + str(n) + ']', line)
        buff += line3

    else:
        buff += line

print buff

f.close()

Joran Beasley · Accepted Answer · 2015-09-15T17:12:26.543

1

my_text="""This is a testing document for testing purposes only.[^0] This is a testing document for testing purposes only. This is a testing document for testing purposes only.[^121][^5] This is a testing document for testing purposes only.

[^0]: Footnote contents.

[^0]: Footnote contents.

[^0]: Footnote contents."""


num_notes = len(re.findall("\[\^\d+\]",my_text))
i = -1 
def do_sub(m):
    global i
    i+=1
    return "[^%d]"%(i if i < num_notes//2 else i-num_notes//2)

re.sub("\[\^\d+\]",do_sub,my_text)

I think will do what you want

edited Sep 15 '15 at 17:12

answered Sep 14 '15 at 16:45

Joran Beasley

110,522
12
160
179

That doesn't seem to work. Tried to `print my_text` after the last line or assigned the last line to a variable and then print it, but it always prints the original (non-numbered) contents. And I have trouble understanding the `return` line (the part in parentheses) so I'm not really able to figure out what's wrong. – Tom Karger Sep 15 '15 at 04:14
that's because re. Sub returns a new string – Joran Beasley Sep 15 '15 at 04:27
Okay, but when I do `print re.sub("\[^\d+\]",do_sub,my_text)` (analogically to how it's done [here](http://stackoverflow.com/questions/13748674/how-to-use-re-sub), for example), I still get the original unnumbered input. Why is that? – Tom Karger Sep 15 '15 at 10:13
Just tried it, but there is no difference. I would paste exactly what I did, but that exceeds the allowed length of a comment. Maybe I should mention that I use Python 2.7.9 ? (the default one on Ubuntu 15.04) – Tom Karger Sep 15 '15 at 15:14
oh dang you are totally right ... I forgot to escape the `^` it should be `\^` ... see this ideone sketch http://ideone.com/9R8c3m ..editing answer to correct it now – Joran Beasley Sep 15 '15 at 17:12
Oh, I also missed that. Now it's working! Would you care to explain the `return` line? Pretty pretty please :) – Tom Karger Sep 16 '15 at 06:47

Python substitute with numbering inside patterns

1 Answers1