0

I am in need of help from some mad regex wizard. Or maybe someone who is not as noobish as me. It would really be appreciated...

I have an xml string with a list of countries that has some nesting that I cannot seem to deal with.

In every country there is:

  1. A list of names for said country in different languages;
  2. A list of cities which all also have:
    • A list of names for said city in different languages;
  3. A list of states which all also have:
    1. A list of names for said statein different languages;
    2. A list of cities which all also have:
      • a list of names for said city in different languages;

I'm trying to parse that xml and turn the data into objects that I can use in my code and I've somehow managed to figure out an idea of how to do that but I cannot find a way to separate the country names from the ones for the states/cities. Does anybody have an idea how to match only the names that are meant for the country?

How the xml looks like:

     <xsd:country xsd1:id="..." xsd1:name="..." xsd:isoCode="..." xmlns:xsd1="...">
        <xsd:state xsd1:id="..." xsd1:name="..." xsd:isoCode="...">
           <xsd:city xsd1:id="..." xsd1:name="..." xsd:isoCode="...">
              <xsd:name xsd1:lang="..." xsd1:name="..."/>
              <xsd:name xsd1:lang="..." xsd1:name="..."/>
              ...
           </xsd:city>
           <xsd:city xsd1:id="..." xsd1:name="..." xsd:isoCode="...">
              ...
           </xsd:city>
           <xsd:name xsd1:lang="..." xsd1:name="..."/>
           <xsd:name xsd1:lang="..." xsd1:name="..."/>
           ...
        </xsd:state>
        <xsd:state xsd1:id="..." xsd1:name="..." xsd:isoCode="...">
           ...
        </xsd:state>

        <xsd:city xsd1:id="..." xsd1:name="..." xsd:isoCode="...">
           <xsd:name xsd1:lang="..." xsd1:name="..."/>
           <xsd:name xsd1:lang="..." xsd1:name="..."/>
           ...
        </xsd:city>
        <xsd:city xsd1:id="..." xsd1:name="..." xsd:isoCode="...">
           ...
        </xsd:city>

        <xsd:name xsd1:lang="..." xsd1:name="..."/>
        <xsd:name xsd1:lang="..." xsd1:name="..."/>
        ...
     </xsd:country>

The simplest regex I've come to is the following:

(?<countryNames><xsd:name.*?xsd1:lang="(?<lantuage>.*?)?".*?xsd1:name="(?<name>.*?)?".*?\/>)

...that however catches all names.

PS. I'm using C# to process the xml string although I don't think that matters much.

See demo with xml HERE

  • If you reconsider your approach there are also a lot of questions on parsing XML with proper XDocument/XmlDocument APIs... Otherwise Standard "parse XHTML with RegEx" covers all. – Alexei Levenkov May 12 '20 at 22:05
  • The country names are enclosed in the country open/close tags. Delete the city and state tag and contents and left wit country and there nam –  May 12 '20 at 22:23
  • Country name is in the name attribute of country open tag. –  May 12 '20 at 22:32

0 Answers0