For starters I am by no means an expert in VBA. Just know enough to be dangerous 8).
I started out by doing a search on how to extract a table from a web page and saw many many people have asked the same question. Unfortunately most of what I was reading was over my head. One article I read pointed me to this detailed article by Siddharth Rout, but alas I could not follow what was going on other than there are two methods internet explorer or some other methods. Since I only have IE11 installed and MS Office I would prefer to go the IE route.
I have encountered this problem several times in the past and have always dropped the project or done things manually. Today I thought I would try and learn how to do this and make my future life hopefully a little easier. As such I am going to use data from a gaming website since it mimics other things I have encountered in the past.
So today's (this week's..no this month's..I am an optimist!) project is to build a list of every team involved in a tournament and copy their results into excel. This would be akin to pulling cricket, hockey, baseball, soccer, or football stats. I tried using Excel's built in Get Data From Web process, but it did not identify the table on the web page.
The address for the web page is: http://worldoftanks.com/en/tournaments/1000000017/
and is in the image below
So the basics and my starting point is to simply pull the list of teams from 1 group and paste it in an excel page with no formatting. Basically the area in yellow in the image above. The image could not fit the whole page but there are actually 10 teams in this group. However I would like to make it variable as sometimes you may have more or less than 10 teams in a group. I am going to assume the number of rows is a minor issue at this point.
Once I get that part figured out I am hoping it will be relatively easy to switch to the next group, grab that list of teams and results and add them to the end of the list I am building in excel. On the web page this would be done by selecting the blue areas.
Now once I have those two things figured out I would need to build the list again from scratch based on the stage of the tournament areas in green and put that list on a new page. I have some ideas how to achieve this but it will really depend on what the previous two steps look like.
I have a bonus task for myself too which is to pull the schedule for each team in a group to see how they did against various other teams. Who beat who type deal. I am hoping I can figure that part out based on the information learned from the task above.
So I am pretty sure there are other languages/prgs that are better suited for the task at hand, but I would like to stick with what I have...and the little I know so far. So I tried a wee bit of VBA code and commented on what I need to achieve. So far I think I have opened the webpage! and built a bit a thought process in comments on how to do some of the things.
Sub GetTeamData()
Dim IE As Object
Dim roundcounter As Integer
Dim groupcounter As Integer
Dim TeamList As Variant
Dim WebAddress As String
Dim Number_of_rounds as Integer
Dim Number_of_Groups as Integer
'set webaddress of site to link to
WebAddress = "http://worldoftanks.com/en/tournaments/1000000017/"
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate (WebAddress)
End With
'What does this chunk of code do? Wait for webpage to finish loading?
While IE.readyState <> 4
DoEvents
Wend
'set initial parameters for loops. I am ok with hardcoding this for now.
Number_of_groups = 125
Number_of_rounds = 5
'start pulling teamdata
'For roundcounter = 1 To number_of_rounds
'select roundcounter on webpage
'for groupcounter = 1 to number_of_groups
'select groupcounter on webpage
'grab table of 6-10 teams (position, team name, battles, wins, losses, ties, and points)
'add table to TeamList
'next groupcounter
'paste TeamList to sheet roundcounter cell A1
'clear TeamList
'next roundcounter
'Next task
'based on results on how to pull group table date, pull individual team schedule results to build matrix result
Set IE = Nothing
End Sub
One thing I was thinking about was that instead of using for next loops with a counter is if it would be easier to set it up to do a loop until an error had occurred like exceeding the number of groups or rounds. Now I am rambling.
Anyhow if someone would be so kind to get me started on how to pull the yellow area from the image above that would be much appreciated! Please be gentle! I do realize that this question has been asked many a time... I just did not understand what I was reading. Also if this is not possible or extremely difficult to do please let me know. Thank you in advance for your assistance in educating me.
UPDATE 16/03/19 0900
So I tried the Get Data From Web process again this morning with a bit more luck...but not much.
after 1 error window which I click yes to I get the web page to load
I got the little yellow arrow to show up once on the page in the very top left corner. So I tried it and it did pull in information.
but I did notice there were no yellow boxes next to the table I want which makes me wonder if it is not a table.
When I did pull in information, it was not the information I was looking for. When I scanned through the results, I could see where the data I am looking for should be, but all the results are missing, just the table column headers show up in about Row 263 or so.
So then I tried doing a copy and paste method from the web page using select all for the copy on the web page. For the paste I tried different methods. keeping source formatting resulted in nothing. keep destination formatting brought in information. I tried paste special (html, Unicode and text) HTML made things look pretty and the other two put everything into a single column. More importantly the results were in the table.
Now if I only needed round 1 group 1 team list and results I could work with this. Simply delete all the rows above and below the table and voila! however since the web address is the same for every group and every round I have no idea how to "click" on the blue or green areas to update the info. If I knew this I could automate the process by copying and pasting each page, then editing the results to just the table, and moving the table to another sheet just below the last results.
To me there seems like there should be a better method.
16/03/19 1600
<!-- ko if: visibleBracketType() === ROUND_ROBIN -->
<table class="tournament-table tournament-table__indent" cellpadding="0" cellspacing="0">
<tr class="tournament-table_tr">
<th class="tournament-table_th tournament-table_th__numb">#</th>
<th class="tournament-table_th">
<div class="tournament-table_ico-holder">
<span class="ico-team">Team</span>
</div>
<div class="tournament-table_heading-text">
Team
</div>
</th>
<th class="tournament-table_th">
<div class="tournament-table_ico-holder">
<span class="ico-battles">Battles</span>
</div>
<div class="tournament-table_heading-text">
Battles
</div>
</th>
<th class="tournament-table_th">
<div class="tournament-table_ico-holder">
<span class="ico-victory">Victories</span>
</div>
<div class="tournament-table_heading-text">
Victories
</div>
</th>
<th class="tournament-table_th tournament-table_th__mobile-hide">
<div class="tournament-table_ico-holder">
<span class="ico-flag">Defeats</span>
</div>
<div class="tournament-table_heading-text">
Defeats
</div>
</th>
<th class="tournament-table_th tournament-table_th__mobile-hide">
<div class="tournament-table_ico-holder">
<span class="ico-division">Draws</span>
</div>
<div class="tournament-table_heading-text">
Draws
</div>
</th>
<th class="tournament-table_th">
<div class="tournament-table_ico-holder">
<span class="ico-points">Points</span>
</div>
<div class="tournament-table_heading-text">
Points
</div>
</th>
</tr>
<!-- ko foreach: {data: rrBrackets().teams, as: 'team' } -->
<tr class="tournament-table_tr" data-bind="css: {'tournament-table_tr__my-team': team.team_id === $root.currentUserTeamIdInCurrentGroup()}">
<td class="tournament-table_td" data-bind="text: team.position"></td>
<td class="tournament-table_td" data-bind="css: {'tournament-table_td__my-team': team.team_id === $root.currentUserTeamIdInCurrentGroup()}">
<a class="tournament-table_team tournament-table_team__big" target="_blank" data-bind="text: team.team_title, attr: {href: $root.getTournamentTeamUrl(team.team_id)}"></a>
</td>
<td class="tournament-table_td" data-bind="text: team.battle_played"></td>
<td class="tournament-table_td" data-bind="text: team.wins"></td>
<td class="tournament-table_td tournament-table_td__mobile-hide" data-bind="text: team.losses"></td>
<td class="tournament-table_td tournament-table_td__mobile-hide" data-bind="text: team.draws"></td>
<td class="tournament-table_td" data-bind="text: team.extra_statistics.points"></td>
</tr>
<!-- /ko -->
</table>
ok, from what I am gathering from the various posts I have been reading and videos I have been watching, I need to find some critical "Tag" in the coding of the web page and from that I can eventually start pulling data. I hit F12 on IE to view the code, and then in the code area I did a search on some of the display text in the area I was looking and found the above chunk of "code". With a lot of GUESSING I am hoping I grabbed the right chunk. Now to figure out what that critical tag is and how to use it. By the way, what code is that web page in?