0

For a little idea of what the project is, I'm trying to make a markup language that compiles to HTML/CSS. I plan on formatting links like this: @(link mask)[(link url)], and I want to find all occurrences of this and get both the link mask and the link url.

I tried using this code for it:

re.search("@(.*)\[(.*)\]", string)

But it started at the beginning of the first instance, and ended at the end of the last instance of a link. Any ideas how I can have it find all of them, in a list or something?

MiseroMCS
  • 39
  • 2

2 Answers2

0

* is greedy: it matches as many characters as it can, e.g. up to the last right parenthesis in your document. (After all, . means "any character" and ) is 'any character" as much as any other character.)

You need the non-greedy version of *, which is *?. (Probably actually you should use +?, as I don't think zero-length matches would be very useful).

kindall
  • 178,883
  • 35
  • 278
  • 309
0

The default behavior of a regular expression is "greedy matching". This means each .* will match as many characters as it can.

You want them to instead match the minimal possible number of characters. To do that, change each .* into a .*?. The final question mark will make the pattern match the minimal number of characters. Because you anchor your pattern to a ] character, it will still match/consume the whole link correctly.

Borealid
  • 95,191
  • 9
  • 106
  • 122