Extract multiple urls from a string

Question

I have a string containing a html source code. In this source there are many urls, but I find it hard to separate them from the rest of the string. I've been trying to find a way to get all the text between ("http:",".jpg"), but have not been successful in finding a way, at least to find multiple urls. As you have probably guessed I haven't been using C# for a long time. Any help will be appreciated.

Sample from the source I'm trying to extract the urls from:

<td class="rad">
    <input type="hidden" name="filenames[]" value="1270000_12_2.jpg">
    <a href="http://xxxxxxxxx/files/orders/120000/127200/12700000/Originals/1200000_12_2.jpg" target="_blank">
        <img src="http://xxxxxxxxxxxx/files/orders/120000/127200/120000/Originals/127000_12_2_thumb.jpg" border="0">
    </a>
    <br/>
    120000_12_2.jpg
</td>
<td class="rad" width="300" valign="top">
    <label>Enter comment to photographer:</label>
    <br/>
    <textarea rows="7" cols="35" name="comment[]"></textarea>
</td>
<td class="rad" width="300" valign="top">
    <label for="comment_from_editor">Comment from editor</label>
    <br/>
    <textarea rows="4" cols="35" name="comment_from_editor[]" id="comment_from_editor">
    </textarea>
    <br/>
</td>

[`Regex`](http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex%28v=vs.110%29.aspx) to the rescue! (also, https://xkcd.com/208/) — Drew McGowen, Aug 06 '14 at 20:02
@DrewMcGowen Yes.. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags :) — L.B, Aug 06 '14 at 20:03

score 4 · Answer 1 · edited May 23 '17 at 11:44

4

Use HTML parser like CsQuery or Html Agility Pack to get A elements and their HREF attributes.

D̻̻̤̜̪̜ơ͔ no͏̳̙t̸̳̤̭͓͍͍͈ ̵̬͚̤͔ú̟̜̹͈̰̞͇s̥͜e̴ ͚̹r̛̻͔̘̫̭̼é͚̼̹͎̞̯ge̢̤x.

edited May 23 '17 at 11:44

Community

1
1

answered Aug 06 '14 at 20:02

Athari

33,702
16
105
146

2

++ For the zalgo and the link. (Oh, and also for your actual answer) :) – Jashaszun Aug 06 '14 at 20:06

Globius · Accepted Answer · 2014-08-06T20:39:17.857

In C#

using System.Collections.Generic;
using System.Text.RegularExpressions;

    static string[] ParseLinkToJpg(string str)
    {
        Regex regex = new Regex(@"(http:.*?\.(.*?)).\s");
        Match match = regex.Match(str);
        List<string> result=new List<string>();
        while (match.Success)
        {
            if (match.Groups[2].ToString()=="jpg")
            result.Add(match.Groups[1].ToString());
            match = match.NextMatch();
        }
        return result.ToArray();
    }

This function will return an array of links to images.

You can change the regular expression (http:.*?\.(.*?)).\s to what you need.

https://www.debuggex.com/ is an exellent service for testing regular expressions.

Extract multiple urls from a string

2 Answers2