-3

I have a string containing a html source code. In this source there are many urls, but I find it hard to separate them from the rest of the string. I've been trying to find a way to get all the text between ("http:",".jpg"), but have not been successful in finding a way, at least to find multiple urls. As you have probably guessed I haven't been using C# for a long time. Any help will be appreciated.

Sample from the source I'm trying to extract the urls from:

<td class="rad">
    <input type="hidden" name="filenames[]" value="1270000_12_2.jpg">
    <a href="http://xxxxxxxxx/files/orders/120000/127200/12700000/Originals/1200000_12_2.jpg" target="_blank">
        <img src="http://xxxxxxxxxxxx/files/orders/120000/127200/120000/Originals/127000_12_2_thumb.jpg" border="0">
    </a>
    <br/>
    120000_12_2.jpg
</td>
<td class="rad" width="300" valign="top">
    <label>Enter comment to photographer:</label>
    <br/>
    <textarea rows="7" cols="35" name="comment[]"></textarea>
</td>
<td class="rad" width="300" valign="top">
    <label for="comment_from_editor">Comment from editor</label>
    <br/>
    <textarea rows="4" cols="35" name="comment_from_editor[]" id="comment_from_editor">
    </textarea>
    <br/>
</td>
Rakesh
  • 4,004
  • 2
  • 19
  • 31
MrHaga
  • 1
  • 3
  • 2
    [`Regex`](http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex%28v=vs.110%29.aspx) to the rescue! (also, https://xkcd.com/208/) – Drew McGowen Aug 06 '14 at 20:02
  • @DrewMcGowen Yes.. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags :) – L.B Aug 06 '14 at 20:03

2 Answers2

4

Use HTML parser like CsQuery or Html Agility Pack to get A elements and their HREF attributes.

D̻̻̤̜̪̜ơ͔ no͏̳̙t̸̳̤̭͓͍͍͈ ̵̬͚̤͔ú̟̜̹͈̰̞͇s̥͜e̴ ͚̹r̛̻͔̘̫̭̼é͚̼̹͎̞̯ge̢̤x.

Community
  • 1
  • 1
Athari
  • 33,702
  • 16
  • 105
  • 146
0

In C#

using System.Collections.Generic;
using System.Text.RegularExpressions;

    static string[] ParseLinkToJpg(string str)
    {
        Regex regex = new Regex(@"(http:.*?\.(.*?)).\s");
        Match match = regex.Match(str);
        List<string> result=new List<string>();
        while (match.Success)
        {
            if (match.Groups[2].ToString()=="jpg")
            result.Add(match.Groups[1].ToString());
            match = match.NextMatch();
        }
        return result.ToArray();
    }

This function will return an array of links to images.

You can change the regular expression (http:.*?\.(.*?)).\s to what you need.

https://www.debuggex.com/ is an exellent service for testing regular expressions.

Globius
  • 16
  • 1