1

I have to get the HTML code of a web and after that to find this class:

<span class='uccResultAmount'>0,896903</span>

I have tried with Regular-Expressions. And also with Streams, I mean, storing the whole HTML code in a string. However, the code is very large for a string. So that makes it impossible, because the amount 0,896903 I am searching does not exist in the string.

Is there any way to only read a little block of the Stream?

A part of the method:

public static string getValue()
        {
            string data = "not found";
            string urlAddress = "http://www.xe.com/es/currencyconverter/convert/?Amount=1&From=USD&To=EUR";

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            if (response.StatusCode == HttpStatusCode.OK)
            {
                Stream receiveStream = response.GetResponseStream();
                StreamReader readStream = null;

                if (response.CharacterSet == null)
                {
                    readStream = new StreamReader(receiveStream);
                }
                else
                {
                    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
                }

                data = readStream.ReadToEnd(); // the string in which I should search for the amount

                response.Close();
                readStream.Close();
            }

If you find an easier way to fix my problem let me know it.

Oscar Martinez
  • 621
  • 1
  • 8
  • 18

1 Answers1

1

I would use HtmlAgilityPack and Xpath

var web = new HtmlAgilityPack.HtmlWeb();
var doc = web.Load("http://www.xe.com/es/currencyconverter/convert/?Amount=1&From=USD&To=EUR");
var value = doc.DocumentNode.SelectSingleNode("//span[@class='uccResultAmount']")
               .InnerText;

A Linq version is also possible

var value = doc.DocumentNode.Descendants("span")
            .Where(s => s.Attributes["class"] != null && s.Attributes["class"].Value == "uccResultAmount")
            .First()
            .InnerText;

Don't use this. Just to show

But the problem is that this html code does not fit in a single string

is not correct

string html = new WebClient().DownloadString("http://www.xe.com/es/currencyconverter/convert/?Amount=1&From=USD&To=EUR");
var val = Regex.Match(html, @"<span[^>]+?class='uccResultAmount'>(.+?)</span>")
               .Groups[1]
               .Value;
Community
  • 1
  • 1
L.B
  • 114,136
  • 19
  • 178
  • 224
  • Would be a way to do this without using HtmlAgilityPack? – Oscar Martinez Oct 06 '16 at 20:57
  • @OscarM You need a tool to parse html. You can not use Regex http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – L.B Oct 06 '16 at 21:00
  • But the problem is that this html code does not fit in a single string, so I can not parse something that doesnt contain the substring I need. – Oscar Martinez Oct 06 '16 at 21:09
  • @OscarM It is not correct. It fits to a string. How do you think HtmlAgilityPack handles it.... See my edit – L.B Oct 06 '16 at 21:12
  • Yes, sorry, I have just exported the string to a file and the whole data was there. – Oscar Martinez Oct 06 '16 at 21:30
  • @OscarM In case you missed that link http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work :) – L.B Oct 06 '16 at 21:32
  • @OscarM See the link I posted previously... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – L.B Oct 06 '16 at 21:43
  • I did. Regex does not work with context-free languages but sometimes when html is well nested it works – Oscar Martinez Oct 06 '16 at 21:53