Is there a way using xpath
and R (not PHP) to pick out only a piece (the city) from a longer address string?
Here is the relevant portion of the content of the following webpage:
http://www.kentmcbride.com/offices/
<table id="offices" cellspacing="8" width="700" height="100" border="0">
<tbody>
<tr>
<td valign="top">
<h2>
<img width="122" height="22" src="/_common/sub_philadelphia.png">
</h2>
<p>
1617 JFK Boulevard
<br>
Suite 1200
<br>
Philadelphia, PA 19103
</p>
</td>
<td valign="top">
<td valign="top">
</tr>
Parsing the content and using xpath
expression, R returns the entire string address (remainder omitted), but I only want the city (and I do not know the city until I look at the returned content).
require(XML)
doc <- htmlTreeParse('http://www.kentmcbride.com/offices/', useInternal = TRUE)
xpathSApply(doc, "//table[@id = 'offices']//p", xmlValue, trim = TRUE)
[1] "1617 JFK Boulevard\n Suite 1200\n Philadelphia, PA 19103"
[2] "1040 Kings Highway North\n Suite 600\n Cherry Hill, NJ 08034"
[3] "824 North Market Street\n Suite 805 \n Wilmington, DE 19801"
A previous question assumes I know the city name; I don't. XPath - How to extract specific part of the text from one text node
Is there a way to obtain only the city?
` element before it. I guess you're saying you don't want to interpret a one-line address as a city, but the 2nd line of a two-line address, you do? – LarsH Sep 08 '14 at 15:35