Stop at first character match?

Question

I want to fetch a certain html node in a large html text, but something in my regex is bad.

I want to fetch all urls that look like this:

<a href="ftp://mysite.com"> some stuff </a>

I am trying to do:

/<a href="ftp:(.+)">/

but sometimes it will work, but sometimes it will grab everything until the next close >.

Is there a way to rewrite this regex so it will stop at the first >?

`+` is greedy by narure. You need to make it non-greedy by adding `?` quatifier after `+`. so your regex would be, `` — Avinash Raj, Sep 21 '14 at 10:06
Just use `[^"]*` if you want it to match everything until the `"`, instead of `.+`. — Unihedron, Sep 21 '14 at 10:11

score 1 · Answer 1 · answered Sep 21 '14 at 10:06

1

Make your regex ungreedy:

/<a href="ftp:(.+?)">/
//        here __^

or:

/<a href="ftp:([^>"]+)">/

But it's better to use a parser.

answered Sep 21 '14 at 10:06

Toto

89,455
62
89
125

score 1 · Answer 2 · edited May 23 '17 at 10:33

1

*, + are greey (matches as much as possible). By appending ? after them, you can make non-greedy.

/<a href="ftp:(.+?)">/

or you can specify exclude " using negated character classes ([^...]):

/<a href="ftp:([^"]+)">/

BTW, it's not a good idea to use regular expression to parse HTML.

edited May 23 '17 at 10:33

Community

1
1

answered Sep 21 '14 at 10:07

falsetru

357,413
63
732
636

score 1 · Accepted Answer · answered Sep 21 '14 at 10:15

1

+ is a greedy operator meaning it matches as much as it possibly can and still allows the rest of the regex to match. For this, I recommend using a negated class meaning any character except: " "one or more" times.

/<a href="ftp:([^"]+)">/

Live Demo

answered Sep 21 '14 at 10:15

hwnd

69,796
4
95
132

Stop at first character match?

3 Answers3