-1

i want get the table "Partite" on this site:
http://it.soccerway.com/national/italy/serie-a/20142015/regular-season/r27139/

so i create this code:

Dim HTML As String = New WebClient().DownloadString(URLs(MetroComboBox2.SelectedIndex))
    Dim URL_Params As String = "&callback_params=" & Regex.Match(HTML, "'block_competition_matches_summary', ({[\w\s"",:]+})").Groups(1).ToString
    Dim Base_URL As String = "http://it.soccerway.com/a/block_competition_matches_summary?block_id=page_competition_1_block_competition_matches_summary_6"
    Dim Giornata_URL As String = Base_URL & URL_Params & "&action=changeView&params={""view""%3A1}"

with Html variable I get the link that i posted previously, in the URL_Params i'm trying to match the div class "block_competition_matches_summary".

But apparently the regex don't catch the element. So i assembly all variable in Giornata_Url. What I wrong in this?

  • 3
    Where's the obligatory link? Okay then. [I'd recommend using a HTML parser instead.](http://stackoverflow.com/a/1732454/1222951) – Aran-Fey Jan 10 '15 at 13:50
  • Base_URL is the obblgatory link – EspertoDiProgrammazione Jan 10 '15 at 13:52
  • Show the result you expect and the result you get instead to be clear. – J0e3gan Jan 10 '15 at 15:08
  • I expect that if the user chooses that link the end result is this: http://it.soccerway.com/a/block_competition_matches_summary?block_id=page_competition_1_block_competition_matches_summary_6&callback_params=%7B%22page%22%3A0%2C%22bookmaker_urls%22%3A%7B%2213%22%3A%5B%7B%22link%22%3A%22http%3A%2F%2Fwww.bet365.com%2Fhome%2F%3Faffiliate%3D365_308121%22%2C%22name%22%3A%22Bet%20365%22%7D%5D%7D%2C%22block_service_id%22%3A%22competition_summary_block_competitionmatchessummary%22%2C%22round_id%22%3A27139%2C%22outgroup%22%3Afalse%2C%22view%22%3A2%7D&action=changeView&params=%7B%22view%22%3A1%7D – EspertoDiProgrammazione Jan 10 '15 at 15:39
  • See the final link in the url, this final result allow me to grab the specific value for each championship. – EspertoDiProgrammazione Jan 10 '15 at 15:39

1 Answers1

1

I suppose you are trying to match this part of the web page?

'block_competition_matches_summary', {"page":0,"bookmaker_urls":{"13":[{"link":"http:\/\/www.bet365.com\/home\/?affiliate=365_308136","name":"Bet 365"}]},"block_service_id":"competition_summary_block_competitionmatchessummary","round_id":27139,"outgroup":false,"view":2}

That will never be matched by this regular expression:

'block_competition_matches_summary', ({[\w\s",:]+})

The data structure contains nested braces; that are not catered for by character class [\w\s",:].

Matching nested braces is not easy with a regular expression. Which closing brace should close the match?

An easy alternative is to anchor the end of the match to the end of the line. This regular expression works alright:

'block_competition_matches_summary', (\{.*?\})\);\n

Explanation:

  • ( - start of the capturing subpattern
  • \{ - please escape braces because they have special meaning in regex syntax
  • .*? - any number of characters, non-greedy (this is essential here)
  • \} - again, escape braces
  • ) - end of the capturing subpattern
  • \) - literal character: closing parenthesis
  • ; - literal character: semicolon
  • \n - linebreak

I advice you to use this in combination with RegexOptions.Singleline, just in case there would be a linebreak inside the expression you are trying to match.

Final comment: please URL-encode the resulting string before you add it to URL_Params. This makes the complete statement:

Dim URL_Params As String = "&callback_params=" & WebUtility.UrlEncode(Regex.Match(HTML, "'block_competition_matches_summary', (\{.*?\})\);\n", RegexOptions.Singleline).Groups(1).Value)
Ruud Helderman
  • 10,563
  • 1
  • 26
  • 45