0

I want to get webpage's source(HTML) code and then in my WPF, with ItemsControl I want to get just links from that html documents. for example, www.google.com/search?q=stackoverflow that is the url and I want to take all the main links from it's html code. I'm about begginer in C# and what I'm looking for about this question, I can't understand well, that is why I beg you to make me do this in detail. please, I need your help. thanks.

Mazarin
  • 1
  • 1
  • 1
    You should check out HtmlAgilityPack. It will help you extract links from html content (http://htmlagilitypack.codeplex.com/) – AirL Nov 01 '13 at 10:03
  • I think, you can find answer here: http://stackoverflow.com/a/11773005/2931307 – Aleksey Nov 01 '13 at 13:16

1 Answers1

1

You should check out the HtmlAgilityPack library, it will help you get and filter links from a HTML document :

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML.

Once you retrieved all the links from the HTML document with HtmlAgilityPack, you could simply bind the returned links collection (or transform it into something that fits your needs) to your ItemsSource and display them the way you want.

You'll find a lot of different tutorials on the web, here are two of them (don't forget to install HtmlAgilityPack and to define proper namespaces into your cs file, for installing nuget the best way is using nuget as it is recommanded on the codeplex project page) :

Here's an example that you could use to put all link urls into a single listbox and assuming that's everything is placed into your Window code-behind (we are focusing on HtmlAgilityPack and WPF here, not on architectural or design matters ^^)

In your MainWindow.cs :

  • First of all define this namespace at the top of you cs file using HtmlAgilityPack;
  • Declare a List<string> dependency property which will be the list containing all the displayed links and bound to your Listbox ItemsSource
  • Then declare the Click event callback for your button triggering the HTML parsing
  • Then declare the GetLinks method being called in your callback

Here's the full code :

public partial class MainWindow : Window
{
    public List<string> Links
    {
        get { return (List<string>)GetValue(LinksProperty); }
        set { SetValue(LinksProperty, value); }
    }

    // Using a DependencyProperty as the backing store for Links.  This enables animation, styling, binding, etc...
    public static readonly DependencyProperty LinksProperty =
        DependencyProperty.Register("Links", typeof(List<string>), typeof(MainWindow), new PropertyMetadata(0));

    public MainWindow()
    {
        InitializeComponent();
        DataContext = this;
    }

    private List<string> GetLinks()
    {
        var links = new List<string>();
        HtmlDocument doc = new HtmlDocument();
        doc.Load("YourHtmlFileInHere");
        foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
        {
            HtmlAttribute attribute = link.Attributes["href"];
            if (attribute != null)
            {
                links.Add(attribute.Value);
            }
        }
        return links;
    }

    private void Button_Click(object sender, RoutedEventArgs e)
    {
        this.Links = this.GetLinks();
    }
}

Finally, you can create a ListBox and a Button to display your links list into your main Window :

<Window x:Class="WpfApplication2.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="350" Width="525">
    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="*"></RowDefinition>
            <RowDefinition Height="Auto"></RowDefinition>
        </Grid.RowDefinitions>
        <ListBox Grid.Row="0"  ItemsSource="{Binding Path=Links}"></ListBox>
        <Button Grid.Row="1" Content="Get links" Click="Button_Click"></Button>
    </Grid>
</Window>

Of course, this is very basic example, Links list content cannot be updated and using the code behind that way is not the best thing to do. But still, this is a start!

Community
  • 1
  • 1
AirL
  • 1,887
  • 2
  • 17
  • 17
  • thanks for that. but, can you help me in doing another thing? how can I out the result in my window, I mean in XAML with ItemsControl tag? where and what type of file are returned after that code? how can I Print it for example in TextBox? ? ? – Mazarin Nov 01 '13 at 12:35
  • besides that, in that code, I have a red line at LINK , DOCUMENTELEMENT , and FIXLINK words. what is the reason? – Mazarin Nov 01 '13 at 12:39
  • I have updated my answer with more details and a concrete sample. – AirL Nov 01 '13 at 13:24
  • thanks, but some errors again. first of all, your function GetLinks() is void. and then in the end there are return statement. another one is that doc.DocumentElement has a red line couz that property is not defined in html agility pack, and need alternative. and what i want to do is to put that operation in button function. Im sorry for bothering you so much but i really need help :( i must do that project before 12:00 o'clock and now is 18:23. sorry and thanks again. . – Mazarin Nov 01 '13 at 14:23
  • I tried to post event more code and took into account your button scenario. You're right, GetLinks return value wasn't valid and DocumentElement no longer exists, the official sample is not up to date :/ – AirL Nov 01 '13 at 14:57
  • I'm glad i could help! If you feel this answer solved the problem, please mark it as 'accepted' by clicking the green check mark. This will help future readers to find what they need (and trust it) and will help others users to keep the focus on unanswered questions ;) – AirL Nov 01 '13 at 19:59