I have a piece of software that exports race results as a htm file.
I currently open this file in excel and reformat the results tables into a format suitable to be uploaded into a mysql databse, where lots of number crunching takes place using php.
I want to automate the re-format/parse/scrape, and searches on the web keep suggesting python 2.7 with beuatifulsoup4 which I started to play around with, but I have not used either before...
This is a look at the original table structure
<h1>CLASS 1</h1>
<table class="results">
<tr><th nowrap="1">Name</th><th nowrap="1">Town</th><th nowrap="1">Bike</th><th nowrap="1">Penalty</th></tr>
<tr><td class="rider">RIDER 1</td><td></td><td></td><td>01:14:20</td></tr>
<tr><td colspan="7"><table class="laps"><tr><td>00:26:36</td><td>00:19:51</td><td>00:27:54</td></tr></table></td></tr>
<tr><td class="rider">RIDER 2</td><td></td><td></td><td>00:41:06</td></tr>
<tr><td colspan="7"><table class="laps"><tr><td>00:19:10</td><td>00:21:57</td></tr></table></td></tr>
<tr><td class="rider">RIDER 3</td><td></td><td></td><td>00:36:59</td></tr>
<tr><td colspan="7"><table class="laps"><tr><td>00:37:00</td></tr></table></td></tr>
<tr><td class="rider">RIDER 4</td><td></td><td></td><td>01:26:41</td></tr>
<tr><td colspan="7"><table class="laps"><tr><td>01:26:42</td></tr></table></td></tr>
</table>
<h1>CLASS 2</h1>
I would like the export to be one table for each class with all the info for eash rider on a single row, like this...
NAME1 02:26:4 200:12:42 00:13:04 00:13:25 00:13:19 00:13:22 00:13:29 00:13:44
NAME2: 02:41:06 00:13:17 00:14:10 00:13:40 00:13:38 00:13:47 00:13:12 00:13:24
Playing around in python I got so far as to reading file using beautifulsoup.
from bs4 import BeautifulSoup
with open(r'test.htm', "r") as f:
pagebuffer = f.read()
soup = BeautifulSoup(pagebuffer, "lxml")
After inspecting the html I was able to search the soup for the relevant class names.
riders = soup.find_all(class_="rider")
for item in riders:
print item.text
NAME 1
NAME 2
NAME 3
NAME 4
NAME 5
NAME 6
NAME 7
NAME 8
NAME 9
NAME 10
NAME 11
NAME 12
NAME 13
NAME 14
NAME 15
NAME 16
NAME 17
NAME 18
NAME 19
laps = soup.find_all(class_="laps")
for item in laps:
print item.text
00:12:4200:13:0400:13:2500:13:1900:13:2200:13:2900:13:4400:13:3000:13:2000:13:3800:13:10
00:12:2600:13:1700:14:1000:13:4000:13:3800:13:4700:13:1200:13:2400:13:2500:13:4700:13:43
00:12:3100:13:1300:13:2200:13:5200:13:5500:14:0800:13:2500:13:4500:13:5300:13:4400:13:25
00:14:2300:14:2600:15:0100:14:5300:14:5800:14:3100:14:4400:15:3300:14:1900:14:14
00:13:5700:13:4800:14:1900:14:3200:14:5100:15:0300:14:3600:17:5700:14:4200:14:39
00:14:1100:14:3200:14:4300:14:2300:14:5900:14:4600:15:1000:15:0500:15:1400:16:13
00:13:4100:13:3200:14:0000:14:0100:14:3200:14:1000:14:3600:14:2100:28:5500:14:17
00:13:3000:13:3900:14:00
02:36:4900:13:2800:13:3700:13:5600:13:5700:14:4600:14:1700:14:2700:15:1800:14:3800:14:1100:14:15
02:36:5800:13:5900:13:4900:14:1900:14:1100:14:2300:14:2700:14:2700:14:2400:14:2600:14:1300:14:21
02:27:0100:14:2300:14:2600:15:0100:14:5300:14:5800:14:3100:14:4400:15:3300:14:1900:14:14
02:28:2300:13:5700:13:4800:14:1900:14:3200:14:5100:15:0300:14:3600:17:5700:14:4200:14:39
02:29:1500:14:1100:14:3200:14:4300:14:2300:14:5900:14:4600:15:1000:15:0500:15:1400:16:13
02:36:0400:13:4100:13:3200:14:0000:14:0100:14:3200:14:1000:14:3600:14:2100:28:5500:14:17
00:41:0800:13:3000:13:3900:14:00
This is where I'm stuck...
1. How do I combine those two searches (riders, laps)together?
2. The total time is not defined by a class name, how can I search to the 3rd [td] tag after the class rider [td] tag?
3. Will I be able to make this executable on machines that dont have python and bs4 installed or should i look at other coding methods?
here is a link to a typical htm file: http://www.kr3w.co.uk/downloads/test.htm