I was using Python and regular expressions to find things an HTML document and unlike what most people say, it was working perfectly, even though things could go wrong. Anyway, I decided Beautiful Soup would be faster and easier but I don't really know how to make it do what I did with regex, which was fairly easy, but messy.
I am using this page's HTML:
http://www.locationary.com/places/duplicates.jsp?inPID=1000000001
EDIT:
Here is the HTML for the main place:
<tr>
<td class="Large Bold" nowrap="nowrap">Riverside Tower Hotel </td>
<td class="Large Bold" width="100%">80 Riverside Drive, New York, New York, United States</td>
<td class="Large Bold" nowrap="nowrap" width="55"> <input name="selectCheckBox" type="checkbox" checked="checked" disabled="disabled" />Yes
</td>
</tr>
Example of the first similar place:
<td class="" nowrap="nowrap"><a href="http://www.locationary.com/place/en/US/New_York/New_York/54_Riverside_Dr_Owners_Corp-p1009633680.jsp" target="_blank">54 Riverside Dr Owners Corp</a></td>
<td width="100%"> 54 Riverside Dr, New York, New York, United States</td>
<td nowrap="nowrap" width="55">
When my program gets it and I use Beautiful Soup to make it more readable, the HTML comes out a little different than Firefox's "view source"...I don't know why.
These were my regular expressions:
PlaceName = re.findall(r'"nowrap">(.*) </td>', main)
PlaceAddress = re.findall(r'width="100%">(.*)</td>\n<td class="Large Bold"', main)
cNames = re.findall(r'target="_blank">(.*)</a></td>\n<td width="100%"> ', main)
cAddresses = re.findall(r'<td width="100%"> (.*)</td>\n<td nowrap="nowrap" width="55">', main)
cURLs = re.findall(r'<td class="" nowrap="nowrap"><a href="(.*)" target="_blank">', main)
The first two are for the main place and address. The rest are for the information of the rest of the places. After I made these, I decided I only wanted the first 5 results for cNames, cAddresses, and cURLs, because I don't need 91 or whatever it was.
I don't know how to find this kind of information with BS. All I can do with BS is find specific tags and do things with them. This HTML is kind of complicated because all of the info. I want is in tables and the table tags are kind of a mess too...
How do you get that info, and limit it only to the first 5 results or so?
Thanks.