-8

i've to catch a value beetween angular brackets, I parse an'html page into a string (i can't use external library, so i have to use that html like a string). I have two div's content to catch, i know the id they have and i'm trying to catch the content by using regex, but i'm not able to do it.

var div_tags = Regex.Match(json, "<div id=(.*)</div>").Groups[0];

that returns me all the 3 div that i have with an id. but i only need of two div, wich id contains the word "mobile". So.. I tryed another regex suggested by a my coworker, but if think that it's not compatible with .net regex evaluetor.

string titolo = Regex.Replace(json, "<div id=[.*]mobile[.*]>(.*)</div>");

Thath's an example of the div. the only value i need is Message. The two div's ids are mobileBody and mobileTitle.

<div id='mobileBody' style='display:none;'>Message</div>

What's wrong in my regex that doesn't allow me to catch the correct text?

osharko
  • 204
  • 3
  • 19

1 Answers1

0

You can try this:
<[a-z\s]+id=[\'\"]mobile[\w]+[\'\"][\sa-zA-Z\d\'\=\;\:]*>([a-zA-Z\d\s]+)<[\/a-z\s]+>
Anyway it will not match special chars or symbols.
You can test and optimize it here: https://regex101.com/r/fnYQ1o/10

EDIT - Code example
This could be the portion of code to extract the messages:

 var rgx = @"<[a-z\s]+id=[\']mobile[\w]+[\'][\sa-zA-Z\d\s\'\=\;\:]*>([a-zA-Z\d\s]+)<[\/a-z\s]+>";
 var txt = "<!DOCTYPE html><html lang='it' xml:lang='it'><!-- <![endif]--><head><meta http-equiv='Content-Type' content='text/html; charset=UTF-8'><title>Banca Mediolanum S.p.A. | Accesso clienti</title><meta name='description' content='Banca Mediolanum S.p.A. | Accesso clienti'><meta name='keywords' content='Banca Mediolanum S.p.A. | Accesso clienti'><meta name='title' content='Banca Mediolanum S.p.A. | Accesso clienti'><meta name='author' content='Banca Mediolanum S.p.A.'><meta name='robots' content='index, follow'><meta name='viewport' content='width=1439,user-scalable=no'><link rel='shortcut icon' href='./images/favicon.ico' type='image/x-icon'><style>#cort {background-image: url(bmedonline_10set.png);background-repeat: no-repeat;background-position-x: center;height: 850px;width: auto;/*background-size: 100%;*/}@media only screen and (max-width: 768px) and (min-width: 641px) section.contactus-area.chat {}body {border: 0 none;margin: 0;padding: 0}</style></head><body class=' '><!-- Google Tag Manager --><script>(function (w, d, s, l, i) {w[l] = w[l] || [];w[l].push({'gtm.start': new Date().getTime(),event: 'gtm.js'});var f = d.getElementsByTagName(s)[0],j = d.createElement(s),dl = l != 'dataLayer' ? '&l=' + l : '';j.async = true;j.src ='//www.googletagmanager.com/gtm.js?id=' + i + dl;f.parentNode.insertBefore(j, f);})(window, document, 'script', 'dataLayer', 'GTM-KGSP');</script><!-- End Google Tag Manager --><div id='cort'></div><div id='mobileTitle' style='display:none;'>Titolo prova</div><div id='mobileBody' style='display:none;'>Corpo messaggio prova</div></body></html>";

 /* Using matches and aggregation */
 var matches = Regex.Matches(txt, rgx).Cast<Match>();
 /* Aggregation without using foreach*/
 if (matches != null && matches.Count() > 0)
 {
    matches = matches.Where(x => !String.IsNullOrEmpty(x.Groups[1].Value));
    var exitString = matches.Select(x => x.Groups[1].Value).Aggregate((x, y) => x + "-" + y);
    Console.WriteLine("Match and aggregation");
    Console.WriteLine(exitString);
  }

  /* using replace with regex: .*<div id='mobileTitle'[\s\w\W]*>([\s\w]*)<\/div>[\s\r\n]*<div id='mobileBody'[\s\w\W]*>([\s\w]*)<\/div>.* */
  Console.WriteLine();
  Console.WriteLine(@"Replace with another regex");
  Console.WriteLine(Regex.Replace(txt, @".*<div id='mobileTitle'[\s\w\W]*>([\s\w]*)<\/div>[\s\r\n]*<div id='mobileBody'[\s\w\W]*>([\s\w]*)<\/div>.*", "$1-$2"));

  Console.ReadLine();
Daniele
  • 56
  • 1
  • 9
  • Hi, thanks for replying. If before i use the method: Regex.Match(string, "your_regex"); i have error "unrecognized escape sequence". if i place a @ before your regex, the error are too. how could i remove the error? – osharko Oct 10 '17 at 20:42
  • i don't know why, but it also doesn't work. I did like below: var s = Regex.Matches(json, "
    – osharko Oct 11 '17 at 07:11
  • I added a working example in the edited post. It get only the text inside tags and if group 1 ([a-zA-Z\d\s]+) is not empty. – Daniele Oct 11 '17 at 07:21
  • Oh, thank you. This code got stuck in loop, i think it's because of the linq or somethink like it. Tryed on this site: http://rextester.com/ But when the loop is stopped, it give me the correct values. do you know where the errors are? – osharko Oct 11 '17 at 07:33
  • I've just paste the code to visual studio, and it works really fine. Maybe it use too much resourse for the online compiler.. By the way, could you explain me why you use the cast methods with empty () ? Doing that isn't like don't using it? – osharko Oct 11 '17 at 07:53
  • Because the method .Cast() doesn't accept any input arguments. It is a extension method that simply "Casts the elements of an IEnumerable to the specified type". As you can see in this Microsoft reference: https://msdn.microsoft.com/en-us//library/bb341406(v=vs.110).aspx – Daniele Oct 11 '17 at 09:03
  • By the way, i just noticed that an online regex tester for c# make differents result instead of regex on c#, Infact i made the right regex that i want, but running it, it would not give me the correct response. http://regexstorm.net/tester here i posted the regex and into the replacement i put "$1-$2", that gaves me the two string separated by a '-'. How could I reproduce it on c#? Using replace i receive a differnt match. – osharko Oct 11 '17 at 09:41
  • Is you code like this? var rgx = Regex(strRgx); rgx.Replace(input, "$1-$2"); – Daniele Oct 11 '17 at 10:55
  • yes, it's var result = Regex.Replace(json, @"
    ([\s\w]*)
    ([\s\w]*)
    ","$1-$2"); in json there's the html string. Am I doing something wrong?
    – osharko Oct 11 '17 at 11:01
  • Maybe the problem is that the regex will not match if there is a carriage return or new line or a blank between those two divs. Without the "json" i can't say precisaly. – Daniele Oct 11 '17 at 11:23
  • https://pastebin.com/Nr6fPE22§ Here there is the "json" value. i know it's not a json, but usually a json is stored into it. so the variable is named json. – osharko Oct 11 '17 at 11:43
  • The problem is that simply didn't match the whole string and replace only the matched one with the two groups: https://regex101.com/r/tVQFUZ/1. If you don't want to use the matches method you can add .* at the start and at the end of the regex: https://regex101.com/r/M3Cg0z/1. i will edit the answer with an example. – Daniele Oct 11 '17 at 11:52
  • ohhhhw, finally. ".*" is what i was looking for. now it do the excact thing that i want. Thank you very much :) – osharko Oct 11 '17 at 12:29