0

I'm a regex novice. I have some strings in Python like this: ^b^[word](#30b) from markdown text. I would like to strip the footnote to get just the word.

I have the following working:

import re
pattern = r"\[([\w]+)\]"
s = "^b^[word](#32b)"
m = re.search(pattern, s)
print(m.group(1))

That snippet extracts the word word. But now what if I have multiple words inside the brackets like: ^c^[every word](#12c) and I want to extract all the words? Thanks!

TJB
  • 3,493
  • 4
  • 23
  • 20

1 Answers1

0

You can use this: \^[^^]+\^\[([^\]]+)\]\([^)]+\)

The code will be like:

import re
p = re.compile(ur'\^[^^]+\^\[([^\]]+)\]\([^)]+\)')
test_str = u"^b^[word another words](#30b)"


for (words) in re.findall(p, test_str):
    print words.split()

The regex isn't very complicated, it just involves a lot of escaping.

  • [^^]+ matches some characters that aren't ^

  • ([^\]]+) captures the inside of the brackets

  • [^)]+ matches characters that aren't )


I have only provided a simple split for the words.

You can find more complex solutions here.

Laurel
  • 5,965
  • 14
  • 31
  • 57
  • @TJB You gave too little data for me to know that not all foot notes start with `^b^`. I have fixed this, and added some instructions for how to get an array of words. Please don't change the question, and instead ask a new one in the future. – Laurel Apr 26 '16 at 15:47
  • Sorry bout that! ;-) I very much appreciate your help. When I run your code I get errors that they regex is invalid syntax. What I am missing? Is it a Python3 vs Python2 thing? – TJB Apr 27 '16 at 01:25
  • @TJB I don't know much Python, but it's running when I test it. What are the errors? – Laurel Apr 27 '16 at 01:32