How can I find multiple of the same format in Python?

Question

For a little idea of what the project is, I'm trying to make a markup language that compiles to HTML/CSS. I plan on formatting links like this: @(link mask)[(link url)], and I want to find all occurrences of this and get both the link mask and the link url.

I tried using this code for it:

re.search("@(.*)\[(.*)\]", string)

But it started at the beginning of the first instance, and ended at the end of the last instance of a link. Any ideas how I can have it find all of them, in a list or something?

kindall · Answer 1 · 2016-11-25T05:17:26.083

0

* is greedy: it matches as many characters as it can, e.g. up to the last right parenthesis in your document. (After all, . means "any character" and ) is 'any character" as much as any other character.)

You need the non-greedy version of *, which is *?. (Probably actually you should use +?, as I don't think zero-length matches would be very useful).

edited Nov 25 '16 at 05:17

answered Nov 23 '16 at 19:23

kindall

178,883
35
278
309

Borealid · Accepted Answer · 2016-11-23T21:25:21.230

The default behavior of a regular expression is "greedy matching". This means each .* will match as many characters as it can.

You want them to instead match the minimal possible number of characters. To do that, change each .* into a .*?. The final question mark will make the pattern match the minimal number of characters. Because you anchor your pattern to a ] character, it will still match/consume the whole link correctly.

How can I find multiple of the same format in Python?

2 Answers2