1

I have a table with names and URLs like this:

<tr>
  <td>name1</td>
  <td>www.url.com</td> </tr>
<tr>
  <td>name2</td>
<td>www.url2.com</td> </tr>

I want to select all URL-tabledata in a table. I tried:

<td>w{3,3}.*(</td>){1,1}

But this expression doesn't "stop" at the first </td>. I get:

<td>www.url.com</td> </tr>
    <tr>
    <td>name2</td>
    <td>www.url2.com</td>

as result. Where is my mistake?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129

2 Answers2

1

There are several ways to match a URL. I'll try the simplest to your needs: just correcting your regex. You can use this one instead:

<td>w{3}.*?</td>

Explanation:

<td>          # this part is ok
w{3,3}        # the notation {3} is simpler for this case and has the same effect
.*            # the main problem: you have to use .*? to make .* non-greedy, that
                is, to make it match as little as possible
(</td>){1,1}  # same as second line. As the number is 1, {1} is not needed
acdcjunior
  • 132,397
  • 37
  • 331
  • 304
  • Note: If you just want to **match the just URL part** (without the `td`s), you can use a look behind and a look ahead: `(?<=)w{3}.*?(?=)` – acdcjunior Jun 29 '13 at 11:14
0

Your regex can be

\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]

or

"((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"

See this link - What is the best regular expression to check if a string is a valid URL?. Many answers are available.

Community
  • 1
  • 1
Paritosh
  • 11,144
  • 5
  • 56
  • 74
  • Thx for your fast reply. I already tried this regex. Notepad++ said that he can find this regex.. What can i do? – user2494904 Jun 29 '13 at 11:01