0

I'm following up on a question since the question changed

Finding the regex for /<region>/<city>/<category>?

The answer that works is /(?:[^/]+)/?([^/]*)/?([^/]*) and it outputs city in $1, category in $2 but I also just want to output $0 so can you help me modify it a little do achieve this?

It seems I'll finally be able to do what I want with this regex and it also applies to 2 earlier questions I asked

Can I use a python regex for letters, dashes and underscores?

How to represent geographical locations

My plan is to implement the functionality with multitenancy so that same software can serve large cities like sao paulo and delhi at same time with same code so I must make it very general and for all locations with same expression i.e. //

Plus the problem what we mean when we say e.g. "Search New York" - the region or the city? One piece of info for this is the output from google maps that defines "New York" where "region" corresponds to "administrative area":

{
  "name": "New York",
  "Status": {
    "code": 200,
    "request": "geocode"
  },
  "Placemark": [ {
    "id": "p1",
    "address": "New York, NY, USA",
    "AddressDetails": {
   "Accuracy" : 4,
   "Country" : {
      "AdministrativeArea" : {
         "AdministrativeAreaName" : "NY",
         "SubAdministrativeArea" : {
            "Locality" : {
               "LocalityName" : "New York"
            },
            "SubAdministrativeAreaName" : "New York"
         }
      },
      "CountryName" : "USA",
      "CountryNameCode" : "US"
   }
},
    "ExtendedData": {
      "LatLonBox": {
        "north": 40.8495342,
        "south": 40.5788964,
        "east": -73.7498543,
        "west": -74.2620919
      }
    },
    "Point": {
      "coordinates": [ -74.0059731, 40.7143528, 0 ]
    }
  }, {
    "id": "p2",
    "address": "Manhattan, New York, NY, USA",
    "AddressDetails": {
   "Accuracy" : 4,
   "Country" : {
      "AdministrativeArea" : {
         "AdministrativeAreaName" : "NY",
         "SubAdministrativeArea" : {
            "Locality" : {
               "DependentLocality" : {
                  "DependentLocalityName" : "Manhattan"
               },
               "LocalityName" : "New York"
            },
            "SubAdministrativeAreaName" : "New York"
         }
      },
      "CountryName" : "USA",
      "CountryNameCode" : "US"
   }
},
    "ExtendedData": {
      "LatLonBox": {
        "north": 40.8200450,
        "south": 40.6980780,
        "east": -73.9033130,
        "west": -74.0351490
      }
    },
    "Point": {
      "coordinates": [ -73.9662495, 40.7834345, 0 ]
    }
  } ]
}
:

However I don't think all the code has to be in the same file since I can make one file per region or some structure like that since the total number of regions ("states") for the whole world is not very much larger than the total number of countries but the total number of cities of the world is a large number. And it seems to have a file for every country included in the project is an easy and good way to organize.

Many thanks

Update

The regex I found useful is

application = webapp.WSGIApplication([('/([^/]+)/?([^/]*)/?([^/]*)',Handler),],debug=True)

Community
  • 1
  • 1
Niklas Rosencrantz
  • 25,640
  • 75
  • 229
  • 424
  • 3
    What's your question? You seem to be asking several different ones, without qualifying what it is you actually want to know. – Nick Johnson Sep 25 '11 at 04:57
  • Your question has a scent being written by someone who has lethally high levels of regex exposure. They drive everyone a bit mad, that why it's called love :-)....or you can push it to the limits, perhaps looking here : http://stackoverflow.com/questions/827557/how-do-you-validate-a-url-with-a-regular-expression-in-python – Morten Bergfall Sep 25 '11 at 08:02
  • It works now. Shuge Lee answered perfectly. Thanks for keeping up with all my stupid questions.. – Niklas Rosencrantz Sep 25 '11 at 09:24
  • 2
    I'm not sure what purpose all the additional stuff other than the initial question was supposed to serve, then? – Nick Johnson Sep 25 '11 at 09:57
  • The pattern was found so I'm very glad for this. I beg your pardon since I mixed in irrelevant stuff. I find regex among the most difficult problems since a very small change in specification can change the regex a lot. – Niklas Rosencrantz Sep 25 '11 at 10:31
  • @Niklas R So you mean that the great improvement from pattern ``/(?:[^/]+)/?([^/]*)/?([^/]*)`` to this one ``/([^/]+)/?([^/]*)/?([^/]*)`` is the elimination of ``?:`` !? You pretend to use regexex without knowing the core basics, that's astounding. - The reflections written by Nick Johnson prove that I'm not the only person to find the way you treat and expose your problems a little special. - I wonder what quality of code is produced in the end by all this rough and messy way to develop code. – eyquem Sep 25 '11 at 20:08
  • Guys, are you saying that it is easy to develop a regex? You're lying. – Niklas Rosencrantz Sep 26 '11 at 09:55
  • And the difficult part may be not the regex but what you mean when you say for instance "new York" - the region or the city? Same with Sao Paulo - the state or the city? How could you know? It's now just make a regex you have to know the basic `/` – Niklas Rosencrantz Sep 26 '11 at 10:03
  • You guys are free to go to chat to have a protracted discussion, comments are really for clarifying the _question_ at hand. – Tim Post Sep 27 '11 at 15:56
  • @Tim Post Part of my question is what is meant when my boss says "New York" "Stockholm" or "Sao Paulo" - the state or the city? If you knew. – Niklas Rosencrantz Sep 29 '11 at 08:27

1 Answers1

3

(?:...) A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

http://docs.python.org/library/re.html

?: is means, match it but doesn't save it in groups, remove it and try again.

Morty
  • 746
  • 5
  • 12