Select URL in HTML table with regular expression

Question

I have a table with names and URLs like this:

<tr>
  <td>name1</td>
  <td>www.url.com</td> </tr>
<tr>
  <td>name2</td>
<td>www.url2.com</td> </tr>

I want to select all URL-tabledata in a table. I tried:

<td>w{3,3}.*(</td>){1,1}

But this expression doesn't "stop" at the first </td>. I get:

<td>www.url.com</td> </tr>
    <tr>
    <td>name2</td>
    <td>www.url2.com</td>

as result. Where is my mistake?

score 1 · Accepted Answer · answered Jun 29 '13 at 11:07

There are several ways to match a URL. I'll try the simplest to your needs: just correcting your regex. You can use this one instead:

<td>w{3}.*?</td>

Explanation:

<td>          # this part is ok
w{3,3}        # the notation {3} is simpler for this case and has the same effect
.*            # the main problem: you have to use .*? to make .* non-greedy, that
                is, to make it match as little as possible
(</td>){1,1}  # same as second line. As the number is 1, {1} is not needed

Note: If you just want to **match the just URL part** (without the `td`s), you can use a look behind and a look ahead: `(?<=)w{3}.*?(?=)` — acdcjunior, Jun 29 '13 at 11:14

score 0 · Answer 2 · edited May 23 '17 at 11:56

0

Your regex can be

\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]

or

"((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"

See this link - What is the best regular expression to check if a string is a valid URL?. Many answers are available.

edited May 23 '17 at 11:56

Community

1
1

answered Jun 29 '13 at 10:57

Paritosh

11,144
5
56
74

Thx for your fast reply. I already tried this regex. Notepad++ said that he can find this regex.. What can i do? – user2494904 Jun 29 '13 at 11:01

Select URL in HTML table with regular expression

2 Answers2