0

Here is my Regular Expression for getting version number from playstore HTML content:

var content = responseMsg.Content == null 
                  ? null 
                  : await responseMsg.Content.ReadAsStringAsync();

var versionMatch = Regex.Match(
    content, 
    "<div[^>]*>Current Version</div><span[^>]*><div><span[^>]*>(.*?)<").Groups[1];

if (versionMatch.Success)
{
    version = versionMatch.Value.Trim();
}

enter image description here

Here I am getting this value Inside VersionMatch= "{}"

So how to get this proper version? like VersionMatch="1.9"

The html content is very large so I cut off from that html content :

<div class="hAyfc">
<div class="BgcNfc">Current Version</div>
<span class="htlgb">
<div class="IQ1z0d">
<span class="htlgb">1.9</span>
</div>
Vidhya
  • 443
  • 8
  • 27
  • Just because the string is displayed in the web browser developer tools DOM view, doesn't mean it is also this way in your actual HTML source code coming from the server. – Uwe Keim Jan 22 '19 at 06:28
  • 2
    BTW: [Using Regex to parse HTML is really bad](https://stackoverflow.com/a/1732454/107625). – Uwe Keim Jan 22 '19 at 06:28
  • so can you give me a proper solution to get this version number? – Vidhya Jan 22 '19 at 06:29
  • @swe Please [stop adding tags to titles](https://meta.stackexchange.com/a/130208/133056). – Uwe Keim Jan 22 '19 at 06:33
  • 1
    What about the text between `Current Version` and the `` where the version number is in? Your regex does not match this. – Klaus Gütter Jan 22 '19 at 06:36
  • 1
    Try removing \r\n before parsing it, you have "div>\r\n – Sufyan Jabr Jan 22 '19 at 06:37
  • This regex was working fine,but recently I noticed that it not working. – Vidhya Jan 22 '19 at 06:37
  • It will be better if you submit the HTML by "Right click >> view source" than from inspect. in order to see the exact HTML how it looks like. – Sufyan Jabr Jan 22 '19 at 06:40
  • I update my code.please see this – Vidhya Jan 22 '19 at 06:46
  • 1
    @UweKeim I did not add a tag to the title. I corrected a typo and added the "regex"-tag to the taglist. Why do you think i did? - aaahh. reviewing the history makes it clear: it seems as if i edited the title. I did accidentally. I think it was because of an concurrent edit, because i did in fact not change it... I'm sorry – swe Jan 22 '19 at 06:56

2 Answers2

2

To skip over the intermediate text between Current Version</div> and the <span> where the version number is in, you can use a (non-greedy) .*?. The dot will also match \r\n, if RegexOptions.Singleline is given. To get the correct span, specify its content as "digits and dots" ([\d\.]+) instead of "anything" (.*?)

var content = @"<div class=""hAyfc"">
<div class=""BgcNfc"">Current Version</div>
<span class=""htlgb"">
<div class=""IQ1z0d"">
<span class=""htlgb"">1.9</span>
</div>";

var versionMatch = Regex.Match(
    content, 
    @"<div[^>]*>Current Version</div>.*?<span[^>]*>([\d\.]+)<", RegexOptions.Singleline).Groups[1];

versionMatch.Value is then "1.9"

Klaus Gütter
  • 11,151
  • 6
  • 31
  • 36
0

You could try using HtmlAgilityPack with Fizzler.Systems.HtmlAgilityPack so you can basically do something like this:

var web = new HtmlWeb();
var html = web.Load(uri);
var documentNode = html.DocumentNode;
var version = documentNode.QuerySelector(".htlgb").InnerHtml;

And you don't have to worry about the regex

Jorge
  • 57
  • 2
  • 7