0

I want to webcrawl this link using the XML package. Problem is the data are not automatically generated. This piece of HTML generates the table:

<table width="1280px" id="maintable">
<tr id="tabletoggles">
<td>&nbsp;</td>
<td id="tablelabel">&nbsp;</td>
<td id="abovestats" class="abovestats" align="right">
&nbsp;&nbsp;&nbsp;<span class="revscore likelink"></span>
&nbsp;&nbsp;&nbsp;<b>Stats:</b>&nbsp;
<span class="statso stattab">Serve</span>&nbsp;|&nbsp;<span class="statsr stattab likelink">Return</span>&nbsp;|&nbsp;<span class="statsw stattab likelink">Raw</span>
</td></tr>
<tr>
<td id="footer" class="footer">&nbsp;</td>
<td colspan="2" id="stats" class="stats"><table id="matches"></table></td>
</tr>
<tr>
<td id="belowmenus">&nbsp;<br/>&nbsp;<br/>&nbsp;<br/>&nbsp;<br/>&nbsp;</td>
<td colspan="2" id="belowmatches">&nbsp;</td>
</tr>
</table></div>
</div>

When using the function readHTMLTable in XML on this piece of HTML I just get nonsensical values:

readHTMLTable("http://www.tennisabstract.com/cgi-bin/player.cgi?p=NovakDjokovic&f=ACareerqq",which = 3)

V1         V2
1 Â
2 Â Â Â Â Â   Â

How can I retrieve the "full link" containing all data? I can do it manually for each page using Firebug but I'd like to have a solution which can retrieve multiple urls at the same time.

Community
  • 1
  • 1
1053Inator
  • 302
  • 1
  • 15
  • that's because all the data you want is [here](http://www.minorleaguesplits.com/tennisabstract/cgi-bin/jsmatches/NovakDjokovicCareer.js) – hrbrmstr Sep 22 '15 at 20:02
  • (got pulled away) every page is dynamically generated. You either have to use selenium or get the javascript data that is associated with the page and scrub it for `fromJSON` or use `V8` to get it into a readable form. – hrbrmstr Sep 22 '15 at 20:13
  • just missed the data, thank you both – 1053Inator Sep 23 '15 at 18:19

1 Answers1

0

I believe that this is due to lack of UTF8 encode.

What language are you using for get this data?

If you are using PHP to take the data, I recommend using

header('Content-Type: text/html; charset=utf-8');

before the entire code.