-1

I am trying to get something between 2 strings in HTML email source code i am using c#. The part of html is:

<td width="200" align="right" valign="top" style="line-height:22px; font-size:20px; font-family: Arial, sans-serif; color:#636363; text-decoration:none;">

9/7/2018

</td>

i need the date 9/7/2018.i have tried the regex,

color:#636363; text-decoration:none;">(.*?)</td>

This should Capture string between color:#636363; text-decoration:none;"> and </td>. but its not working. i think newline and blank characters are causing problems.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Rana Jawad
  • 75
  • 1
  • 7
  • 2
    Wouldn't it be better to use xpath selector for that? – Lucas Wieloch Sep 13 '18 at 16:20
  • [You can't parse \[X\]HTML with regex](https://stackoverflow.com/a/1732454/107625)! – Uwe Keim Sep 13 '18 at 16:22
  • Try `color:#636363; text-decoration:none;">([\s\S]*?)` to get rid of new line issues – Rubens Farias Sep 13 '18 at 16:23
  • Use [HtmlAgilityPack](http://html-agility-pack.net/). You cannot parse HTML with Regex – maccettura Sep 13 '18 at 16:32
  • @RubensFarias this seems to work but there is " causing the pattern error https://imgur.com/a/jCScNpF – Rana Jawad Sep 13 '18 at 16:45
  • You can make `.` match newlines with `RegexOptions.Singleline` - [`(?s)color:#636363; text-decoration:none;">(.*?)`](http://regexstorm.net/tester?p=color%3a%23636363%3b+text-decoration%3anone%3b%22%3e%28.*%3f%29%3c%2ftd%3e&i=rif%3b+color%3a%23636363%3b+text-decoration%3anone%3b%22%3e%0d%0a%0d%0a9%2f7%2f2018%0d%0a%0d%0a%3c%2ftd%3e&o=s), but you should parse HTML with a dedicated parser. – Wiktor Stribiżew Sep 13 '18 at 16:47
  • Here is also old way coding `var lPos = myString.IndexOf("", 0); int fPos = 0; if (lPos > -1) fPos = myString.LastIndexOf('>', lPos); var result = myString.Substring(fPos + 1, lPos - 1 - fPos); Console.WriteLine(result.Trim());` – T.S. Sep 13 '18 at 16:56

2 Answers2

0

Okay, so you want this? You did not escape the / in </td> and you needed to add in the new line characters. This should do you nicely

color:#636363; text-decoration:none;\">[\r\n]+(.*?)[\r\n]+<\/td>
Ajaypayne
  • 517
  • 3
  • 12
-1

Try below pattern

<td[^>]*>(.*?)</td>

It will ignore all attributes.

Arun Kumar
  • 885
  • 5
  • 11