I really couldn't think of a decent title to give a overview of what I'm trying to do, but the examples I have should explain it nicely, my company provides a schedule online, but they don't have any APIs or anything to extract it, so I'm using the Python framework Scrapy to scrape the data, and then adding it to my Google Calendar
A girl gave me a Regex line to handle the data because it was kicking my butt for days and she was feeling nice, but I've since realized that it doesn't handle split shifts (most likely because I was not scheduled for any so she didn't see the possibility of one)
My regex is
re.findall("""dow1'>(\w+)<\S+?>(\w+ \d+)</td>\s*<td class.*?tlHours'>(\d+).*?span>\s*(\d+)<span.*?ment'>(.*?)</spa.*?Meal: (.*?)</sp.*?start'>(\S+?)</spa.*?end'>(\S+?)<""", response.body)
Example data:
This is a normal 8 hour day with a meal break, which is handled fine:
<tr>
<td class='dt'>
<span class='dow1'>Sunday</span>Dec 09
</td>
<td class='ScheduledDetails'valign='top'>
<div style="position:relative;">
<span class='tlHours'>8<span class='spart'> hrs</span> 0<span class='spart'> mins</span></span><span class='department'>Cashier</span><span class='meal'>Meal: 2pm - 3pm</span>
</div>
</td>
<td>
</td>
<td class='Schedunderlay'>
<div class='Sched'>
<div class='schedbar' style='left: 143px; width: 234px;'>
<div class='schedbar_l'></div>
<div class='schedbar_m' style='width: 226px;'>
<span class='start'>10am</span><span class='end'>7pm</span>
</div>
<div class='schedbar_r'></div>
</div>
<div class='availbar' style='left: 9px; width: 498px; display: none;'>
<div class='schedbar_l'></div>
<div class='schedbar_m' style='width: 490px;'>
<span class='start'><img src='/Images/Schedule/arrowLeft.gif' alt='' style='margin-left:5px; margin-top:2px;' /></span>
<div class='OTtext' align='center'>All Day</div>
<span class='end'></span>
</div>
<div class='schedbar_r'></div>
</div>
<div class='availbar' style='left: 508px; width: 216px; display: none;'>
<div class='schedbar_l_on'></div>
<div class='schedbar_m_on' style='width: 208px;'><span class='start'></span>
<div class='OTtext' align='center'>All Day</div>
<span class='end'><img src='/Images/Schedule/arrowRight.gif' alt='' style='margin-left:5px; margin-top:2px;' /></span>
</div>
<div class='schedbar_r_on'></div>
</div>
</div>
</td>
<td> </td>
<td class='rightColDetails'>
<div class='AvailDetails' align='left' style='display: table-cell;'>
<span class='iefix'><b>Avail - All Day</b></span><br/>
<span style='font-size: 11px;'>Pref - All Day</span>
</div>
</td>
</tr>
And this is a split shift, two four hour shifts separated by a empty 1 hour slot (they do this to cheat the scoring system, two covered shifts instead of one):
<tr>
<td class='dt'>
<span class='dow1'>Thursday</span>Dec 13
</td>
<td class='ScheduledDetails' valign='top'>
<div style="position:relative;">
<span class='tlHours'>8<span class='spart'> hrs</span> 0<span class='spart'> mins</span></span><span class='department'>Cashier</span><span class='meal'>Meal: None</span>
</div>
</td>
<td> </td>
<td class='Schedunderlay'>
<div class='Sched'>
<div class='schedbar' style='left: 247px; width: 104px;'>
<div class='schedbar_l'></div>
<div class='schedbar_m' style='width: 96px;'>
<span class='start'>2pm</span><span class='end'>6pm</span>
</div><div class='schedbar_r'></div>
</div>
<div class='schedbar' style='left: 377px; width: 104px;'>
<div class='schedbar_l'></div>
<div class='schedbar_m' style='width: 96px;'>
<span class='start'>7pm</span> <span class='end'>11pm</span>
</div>
<div class='schedbar_r'></div>
</div>
<div class='availbar' style='left: 9px; width: 498px; display: none;'>
<div class='schedbar_l'></div><div class='schedbar_m' style='width: 490px;'>
<span class='start'><img src='/Images/Schedule/arrowLeft.gif' alt='' style='margin-left:5px; margin-top:2px;' /></span>
<div class='OTtext' align='center'>All Day</div>
<span class='end'></span>
</div>
<div class='schedbar_r'></div>
</div>
<div class='availbar' style='left: 508px; width: 216px; display: none;'>
<div class='schedbar_l_on'></div>
<div class='schedbar_m_on' style='width: 208px;'>
<span class='start'></span>
<div class='OTtext' align='center'>All Day</div>
<span class='end'><img src='/Images/Schedule/arrowRight.gif' alt='' style='margin-left:5px; margin-top:2px;' /></span>
</div>
<div class='schedbar_r_on'></div>
</div>
</div>
</td>
<td> </td>
<td class='rightColDetails'>
<div class='AvailDetails' align='left' style='display: table-cell;'>
<span class='iefix'><b>Avail - All Day</b></span><br/><span style='font-size: 11px;'>Pref - All Day</span>
</div>
</td>
</tr>
The important difference is on the regular shift there's one start and one end time, with the split shift there's a start, and end, and start, and end....
I've been pounding my head against this for about five hours now... and making no headway, I suppose I'd have more luck if I understood Regex.. any help at all would be greatly appreciated...