3

I'm trying to fetch some data from a market web. After inspecting that web I find the part where I'm interested in:

"<td>03/04/19</td> <td>2814.37</td> <td>2816.88</td> <td>2767.66</td> <td>2792.81</td> </tr> <tr> <td>03/01/19</td> <td>2798.22</td> <td>2808.02</td> <td>2787.38</td> <td>2803.69</td>"

I'd made this code to collect the data:

MatchCollection m1 = Regex.Matches(html, @"<td>(.+?)</td>", RegexOptions.Singleline);

I've got it BUT.... each data in a line!!!

I want to get something like this:

03/04/19 2814.37 2816.88 2767.66 2792.81
03/01/19 2798.22 2808.02 2787.38 2803.69
… and so on…

How can I get it?

Thanks in advance..

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    What do you mean by "each data in a line"? BTW, since it is HTML, it is recommended to parse it with an HTML parser. Like HtmlAgilityPack or similar. – Wiktor Stribiżew Mar 05 '19 at 16:16
  • Is the order of the data always the same? If so, you'll want to use "named capture groups". – D-Inventor Mar 05 '19 at 16:20
  • FYI: the purpose of `RegexOptions.Singleline` is quite likely different from what you believe it is. I leave it to you to look up the documentation for `RegexOptions.Singleline` to see what it does (and what it doesn't do)... –  Mar 05 '19 at 16:30
  • Please, please don't try to parse HTML with Regex. [Seriously](https://stackoverflow.com/a/1732454/4416750). – Lews Therin Mar 05 '19 at 17:58

1 Answers1

0

You regex looks fine, your just need to group matched result. Try this code:

var input = "<td>03/04/19</td> <td>2814.37</td> <td>2816.88</td> <td>2767.66</td> <td>2792.81</td> </tr> <tr> <td>03/01/19</td> <td>2798.22</td> <td>2808.02</td> <td>2787.38</td> <td>2803.69</td>";
var result = Regex.Matches(input, "<td>(.+?)</td>")
    .Cast<Match>() // to enable Linq
    .Select((m, i) => new {m, part = i / 5}) // here "5" is size of a group
    .GroupBy(x => x.part, x => x.m)
    .Select(x => string.Join(" ", x.Select(m => m.Groups[1].Value))) // create a single line from five matches
    .ToArray();

Now if you print result to console

foreach (var line in result)
    Console.WriteLine(line);

You will get

03/04/19 2814.37 2816.88 2767.66 2792.81
03/01/19 2798.22 2808.02 2787.38 2803.69

Aleks Andreev
  • 7,016
  • 8
  • 29
  • 37