0

I am working in C# and using HTMLAgilityPack to get content from a web page and I am trying to parse the data and extract certain elements I need.

<tr height=20 onMouseOver="this.bgColor = '#C0C0C0'" onMouseOut="this.bgColor = 'whitesmoke'" bgcolor=whitesmoke>
  <td>9</td>
  <td align=left><a href="link to page" style="color:blue; text-decoration:none">PlayerFirstName&nbsp;Surname</a></td>
  <td>Position</td>
  <td width=30 align=right bgcolor=dcdcdc>3</td>
  <td width=30 align=right>6</td>
  <td width=30 align=right>4</td>
  <td width=30 align=right>2</td>
  <td width=30 align=right>0</td>
  <td width=30 align=right>0</td>
</tr>

Above is a snippet of HTML that I want to parse from a particular web page. I am only interested in the text from these tags. So 9 values in total. The values: 9, Playername, Position, and the remaining six numbers in the the remaining six tags (3, 6, 4, 2, 0, 0).

Unfortunately the way I have tried to solve this is not working as I want it to. Here is the C# code I am using to try to separate each value into separate variables.

HtmlWeb web = new HtmlWeb();
        HtmlDocument doc = web.Load(teamUrl);
        foreach(HtmlNode node in doc.DocumentNode.Descendants("tr"))
        {
            string[] lines = node.InnerText.Split(new[] { "\r\n", "\r", "\n"}, StringSplitOptions.None).Where(x => !string.IsNullOrWhiteSpace(x)).ToArray();
            listOfPlayers.Add(new Player(lines[0], lines[1], lines[2], lines[3]....lines[8]));
        }

** The above string array "lines" outputs the following with a Count() of 4. So my call to pass the elements to the listOfPlayers doesn't work.

2

FirstName Surname

Midfielder/Forward

121100

The remaining six values are joined together as one and I can't figure how to separate them.

It seems like it should be and probably is an easy task but I am not familiar at all with HTML and only getting back into C# programming, so any help at all would be amazing. Thanks.

Edit:

Player Object -

class Player
    {
        public int ShirtNumber { get; private set; }
        public string Name { get; private set; }
        public string Position { get; private set; }
        public int GamesPlayed { get; private set; }
        public int Points { get; private set; }
        public int GoalsScored { get; private set; }
        public int Assists { get; private set; }
        public int YellowCards { get; private set; }
        public int RedCards { get; private set; }

        public Player(int shirtNumber, string name, string position, 
            int gamesPlayed, int points, int goalsScored, int assists, int yellowCards, int redCards)
        {
            ShirtNumber = shirtNumber;
            Name = name;
            Position = position;
            GamesPlayed = gamesPlayed;
            Points = points;
            GoalsScored = goalsScored;
            Assists = assists;
            YellowCards = yellowCards;
            RedCards = redCards;
        }
  • can you share the player object? – Sriman Saswat Suvankar Feb 08 '18 at 16:01
  • @SrimanSaswatSuvankar Sure, I'll update my code. However, I realize I wasn't very clear when I talked about output. What I meant to say was the string array "lines" has only a Count() of 4 so the last numbers are being calculated as one value so I can't actually pass all the data to my Player object. – pseudonquixote Feb 08 '18 at 16:07
  • You really should not be getting inner text and then splitting by nodes again... Duplicate https://stackoverflow.com/questions/36968474/c-sharp-get-value-of-tr-in-html shows much better way of parsing tables. (If your question is not related to HTML parsing but rather printing an array you may want to [edit] post to clarify that. – Alexei Levenkov Feb 08 '18 at 16:08
  • @AlexeiLevenkov apologies for not being able to find the answer and wasting your time. All the same, you are amazing. Thank you! Saved my silly head from exploding. Big Love <3 – pseudonquixote Feb 08 '18 at 16:18

0 Answers0