0

What's the best way to parse this data? Should I use regex or something else? The data is in html, but I found it from a website and will be parsing this and only this string (note: string is much longer - over 1,300 instances - only two below) - note I use php & jquery for most web programming.

I only need to extract the data in the second td and only the anchor text inside the anchor - in instance 1, it's: Best, Jahvid DET RB

And I only need to run this loop one time.

<tr class="oddtablerow"><td class="rank">1.</td><td class="rank">1.</td><td class="player"><a href="http://football22.myfantasyleague.com/2010/player?L=34793&amp;P=9839"  title="Salary: $2250000, Year: 3, Status: 3, Info: Drafted 10 1:04 Team, Week 3: at Vikings Sun 1:00 p.m. ET" class="position_rb">Best, Jahvid DET RB</a> (R) </td><td class="points tot">53.90</td><td class="points avg">26.950</td><td class="points"><a href="detailed?L=34793&amp;W=1&amp;P=9839&amp;YEAR=2010">17.55</a></td> 
<td class="points"><a href="detailed?L=34793&amp;W=2&amp;P=9839&amp;YEAR=2010">36.35</a></td> 
<td class="status"><a title="Owner: William Gold"  class="franchise_0009" href="http://football22.myfantasyleague.com/2010/options?L=34793&amp;F=0009&amp;O=01">Team Name</a> - <a href="options?L=34793&amp;O=05&amp;FRANCHISE=0013,0009&amp;PLAYER=9839,">Trade</a></td><td class="week">7</td><td class="salary">$2250000</td></tr> 
<tr class="eventablerow myfranchise "><td class="rank">2.</td><td class="rank">2.</td><td class="player"><a href="http://football22.myfantasyleague.com/2010/player?L=34793&amp;P=3291"  title="Salary: $7400000, Year: 3, Status: 3, Info: , Week 3: at Broncos Sun 4:15 p.m. ET" class="position_qb">Manning, Peyton IND QB</a></td><td class="points tot">49.61</td><td class="points avg">24.805</td><td class="points"><a href="detailed?L=34793&amp;W=1&amp;P=3291&amp;YEAR=2010">26.66</a></td> 
<td class="points"><a href="detailed?L=34793&amp;W=2&amp;P=3291&amp;YEAR=2010">22.95</a></td> 
<td class="status"><a title="Owner: Robert M. Cavezza "  class="myfranchise franchise_0013" href="http://football22.myfantasyleague.com/2010/options?L=34793&amp;F=0013&amp;O=01">The Bullies</a></td><td class="week">7</td><td class="salary">$7400000</td></tr> 

Edit: What happened to the jquery answer? I was about to implement it but it disappeared

Bob Cavezza
  • 2,810
  • 7
  • 38
  • 56
  • 2
    Regex and HTML? You're on the right website :) – miku Sep 22 '10 at 01:44
  • This may be the most [upvoted answer](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) on the site. You're probably better off using a proper [html parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) and pulling information out of that. – Roman Sep 22 '10 at 01:54
  • Is there a way to view old answers that were deleted either by admins or by the author? – Bob Cavezza Sep 22 '10 at 02:57
  • It was deleted by the author and jQuery is not related at all with PHP this is probably the reason why it was deleted. – HoLyVieR Sep 22 '10 at 02:58
  • Crap - no way to find the cached answer? – Bob Cavezza Sep 22 '10 at 02:59

1 Answers1

1

If you are looking for a solution with the fastest execution speed XmlReader is one of the fastest XML parser. It is a bit harder to use, then other solution such as DOM, but since you want to parse a lot of entry, performance is probably important.

Otherwise DOM is pretty simple to use. You can find a simple example of how to use in this answer I gave on an other question.

If you want to load up your content as a string here's how you do it :

XMLReader

$foo = new XMLReader();
$foo->xml($yourStringHere);

DOMDocument

$foo = new DOMDocument();
$foo->loadHTML($yourStringHere);
Community
  • 1
  • 1
HoLyVieR
  • 10,985
  • 5
  • 42
  • 67
  • To use these xml readers, do I need to change this data into an xml object and then parse the code? or can I parse the xml using this data as a php string? – Bob Cavezza Sep 22 '10 at 03:08
  • I copy & pasted an html off a website into a string - so the string is considered xml or do I have to build a domdocument? - In your example, you use the domdocument and an html file - I will be using an html string, should I put the quoted text where the get_file_contents function is located in that script? – Bob Cavezza Sep 22 '10 at 03:13
  • @Bob See my edit for how you can load up your data as a string into both XMLReader and DOMDocument. – HoLyVieR Sep 22 '10 at 03:20