How to remove a footnote from markdown with Python

Question

I'm a regex novice. I have some strings in Python like this: ^b^[word](#30b) from markdown text. I would like to strip the footnote to get just the word.

I have the following working:

import re
pattern = r"\[([\w]+)\]"
s = "^b^[word](#32b)"
m = re.search(pattern, s)
print(m.group(1))

That snippet extracts the word word. But now what if I have multiple words inside the brackets like: ^c^[every word](#12c) and I want to extract all the words? Thanks!

Laurel · Answer 1 · 2016-04-27T01:40:33.980

0

You can use this: \^[^^]+\^\[([^\]]+)\]\([^)]+\)

The code will be like:

import re
p = re.compile(ur'\^[^^]+\^\[([^\]]+)\]\([^)]+\)')
test_str = u"^b^[word another words](#30b)"


for (words) in re.findall(p, test_str):
    print words.split()

The regex isn't very complicated, it just involves a lot of escaping.

[^^]+ matches some characters that aren't ^
([^\]]+) captures the inside of the brackets
[^)]+ matches characters that aren't )

I have only provided a simple split for the words.

You can find more complex solutions here.

edited Apr 27 '16 at 01:40

answered Apr 25 '16 at 22:50

Laurel

5,965
14
31
57

@TJB You gave too little data for me to know that not all foot notes start with `^b^`. I have fixed this, and added some instructions for how to get an array of words. Please don't change the question, and instead ask a new one in the future. – Laurel Apr 26 '16 at 15:47
Sorry bout that! ;-) I very much appreciate your help. When I run your code I get errors that they regex is invalid syntax. What I am missing? Is it a Python3 vs Python2 thing? – TJB Apr 27 '16 at 01:25
@TJB I don't know much Python, but it's running when I test it. What are the errors? – Laurel Apr 27 '16 at 01:32

How to remove a footnote from markdown with Python

1 Answers1