0

I've extracted some addresses from google maps and they are in an xml file. In myxml file I have some xelements like

<location>, <place_id>, <adr_address>, etc

The 'adr_address' element has different classes and each class contains, city, street, country, etc. values. How do I get each value from the 'adr_address' xElement

<adr_address>&lt;span class="street-address"&gt;1805 Geary Boulevard&lt;/span&gt;, &lt;span class="locality"&gt;San Francisco&lt;/span&gt;, &lt;span class="region"&gt;CA&lt;/span&gt; &lt;span class="postal-code"&gt;94115&lt;/span&gt;, &lt;span class="country-name"&gt;United States&lt;/span&gt;</adr_address>

I'm putting the adr_address xElement in to a object here, but not sure what to do to get the values of each class after that.

XElement firstOrDefault = xElement.Descendants("adr_address").FirstOrDefault();
John Saunders
  • 160,644
  • 26
  • 247
  • 397
chuckd
  • 13,460
  • 29
  • 152
  • 331
  • I don't know why, but Google Maps has returned you some HTML inside of the `adr_address` element. It is not XML at all. Among other things, these are "classes" in the HTML sense, not in a programming language sense. If you don't now HTML, then you likely don't understand what "class" means in this case. – John Saunders May 04 '15 at 21:12

4 Answers4

0

It seems strange to me that you get values like address, zip code, .. in this form. Normally Google Maps should give these values properly parsed.

Anyway, what you can do is unescape the special characters like this:

firstOrDefault.Value.Replace("&lt;", "<").Replace("&gt;", ">");  

and then use this regular expression to extract the values:

 var str = "&lt;span class=\"street-address\"&gt;1805 Geary Boulevard&lt;/span&gt;, &lt;span class=\"locality\"&gt;San Francisco&lt;/span&gt;, &lt;span class=\"region\"&gt;CA&lt;/span&gt; &lt;span class=\"postal-code\"&gt;94115&lt;/span&gt;, &lt;span class=\"country-name\"&gt;United States&lt;/span&gt;".Replace("&lt;", "<").Replace("&gt;", ">");

        Regex regex = new Regex("<span class=\"street-address\">(.*)</span>, <span class=\"locality\">(.*)</span>, <span class=\"region\">(.*)</span> <span class=\"postal-code\">(.*)</span>, <span class=\"country-name\">(.*)</span>");
        Match match = regex.Match(str);

        if (match.Success)
        {
            string address = match.Groups[1].Value;
            string locality = match.Groups[2].Value;
            string region = match.Groups[3].Value;
            string zip = match.Groups[4].Value;
            string country = match.Groups[5].Value;
         }
Fabian
  • 1,886
  • 14
  • 13
  • Bad, bad, bad. Suggesting Regex to parse HTML without even _noting_ to the OP that Google has returned him some HTML and not XML. – John Saunders May 04 '15 at 21:12
  • I noticed it is in HTML format .. Did you try tu write the code to parse this HTML posted here? it is spans seperated by commas. not clean in my opinion. a simple regexp seems much easier to me to interpreted this. I agree it might be less stable in the long run. But all depends where the data comes from and how it changes. I let you post jour code to compare :) – Fabian May 05 '15 at 06:24
  • The best would be to properly store the data in the first place or find out why it gets in that form. I'm pretty sure it doesn't come directly from google maps api – Fabian May 05 '15 at 06:42
  • In general, regex does not work on HTML. And if the OP is calling Google Maps and getting this info back, then it's Google Maps sending it back for one reason or another. Now, yes, maybe the OP actually _sent_ that info to Google Maps to begin with (I don't know their API), but in any case, it's HTML and should not be parsed with regex. – John Saunders May 05 '15 at 07:13
  • Fabian, please see the question "[RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags)" and its accepted answer. – John Saunders May 05 '15 at 07:42
  • OK. Thanks. But like they said in the other post. It is a bad idea for large datasets. For small datasets (which seems to be the case here) it can be perfectly reasonable and work in production. but all depends where the data is coming from and how it evolves. – Fabian May 05 '15 at 11:07
  • 1
    The other post didn't say "for large datasets". It said that HTML is not a "[regular language](http://en.wikipedia.org/wiki/Regular_language)", so in general, cannot be parsed or matched by a regular expression. – John Saunders May 05 '15 at 12:05
0

The accepted answer is wrong, adr_address is not documented and we cannot rely on it, you have to go with address_components, it's an array with all that info already split and with types identifiers (here is a list for them):

var addrComponents =  xElement.Descendants("address_component");
foreach(var component in addrComponents)
{
   if(component.Descendants('type').Any(t => t.Value == "country"))
       country = component.long_name;
   else if (....)
       ....
}

As each component may have more than one type you have to search among all its types, thats is why I'm using Any.

Sorry if this does not compile because I wrote this directly here, but this is the main idea.

Luizgrs
  • 4,765
  • 1
  • 22
  • 28
0

this works (tried and tested) :)

// load xml string from webresponse into the linq functionality library .
var elements = XElement.Load(XmlReader.Create(new StringReader(xml)));

// get all the address_component elements in the xml
var addrComponents = elements.Descendants("address_component");

// under those: get all the one's that contain element "type"
var country = addrComponents.Where(d => d.Descendants("type")

// filter further to get the one's with country in their value.(ie. 
//<type>country</type>)
.Any(t => t.Value == "country"))

//first one that matches these criteria, take the long_name value ie 
//<long_name>'merica</long_name> this could be subbed for short_name as well 
//for country code
.First().Element("long_name").Value;

all done :)

jurasans
  • 151
  • 1
  • 8
0

If you don't mind using jQuery, this works great for me:

var street_address = $("<p>" + place.adr_address + "</p>").find(".street-address").html()
ogoldberg
  • 781
  • 7
  • 8