0

This was posted to me

<*@google.com> wrote:

Hi Niklas, If you just want to map this: /region/city/category/ supposing its only this valid characters: [a-zA-Z0-9_] you can do the following: - main.py

application = webapp.WSGIApplication([('/([-\w]+)/([-\w]+)/([-\w]+)', handler)],debug=True)

and on your handler: class Handler(webapp.RequestHandler): def get(self, region, city, category): # Use those variables on the method Hope this helps!

My webapp is suppossed to handle URI like <region>/<city>/<category> with optional city and category e.g.

/rio_de_janeiro/grande_rio_de_janeiro/casas? #<region>/<city>/<category>
/gujarat/ahmedabad/vehicles-for_sale #<region>/<city>/<category>
/rio_de_janeiro/grande_rio_de_janeiro/ #<region>/<city>
/delhi #<region>

Etc Now I want to enable a request handler that can take optional arguments divided by the separator /. If I use the regex '/(.*) and variables in the request handler the first varible becomes a/b and the second variable becomes b so that is nearly what I want but a and b as 2 different variables instead. The regex I tried for the request handler is

application = webapp.WSGIApplication([('/(.*)',MyPage),

And the function head of my request handler is

class MyPage(RequestHandler):
    def get(self, location='frankfurt', category='electronics'):

To enable an HTTP query e.g. /frankfurt, /frankfurt/, /frankfurt/electronics, /madrid/apartments, /newyork, etc allowing all possible combinations. Can you advice me a regex that can achieve what I want? I want functionality like a mod_rewrite but for GAE.

Thanks

Clarification

It's just a question of "make the directory a variable" so to clarify here are some examples how it should behave

'/frankfurt', - put 'frankfurt' in variable 1 '/frankfurt/', - put 'frankfurt' in variable 1 '/frankfurt/electronics', - put 'frankfurt' in variable 1 and 'electronics' in virable 2 '/frankfurt/electronics/', same as above '/eu/frankfurt/electronics', same as above i.e. only last 2 groups count '/eu/frankfurt/electronics/', same as above 'toronto/lightnings', doesn't start with / so shan't work 'toronto/lightnings/', as above 'lima/cars/old', as above 'lima/cars/old/' as above

Typical cases I want to handle is /region/city/category i.e. if I apply the example to Brazil it could be /rio_de_janeiro/grande_rio_de_janeiro/casas? for /region/city/category or for India it could be /delhi/delhi/for_sale or /gujarat/ahmedabad/vehicles-for_sale

Solution

As far as I can tell the solution from the answer works for my purposes:

/(?:[^/]+)/?([^/]*)/?([^/]*)

Niklas Rosencrantz
  • 25,640
  • 75
  • 229
  • 424
  • "make the directory a variable" - why is it uncomprehensible? What's weird is that there is no good mo_rewrite for this environment and no easy way to do something easy. – Niklas Rosencrantz Sep 24 '11 at 18:09
  • 1
    It was more uncomprehensible before you added some explanations. But even with these, it remains rather confused for me. – eyquem Sep 25 '11 at 20:01
  • 1
    "make the directory a variable" means nothing for my brain. 1/ I don't understand what you call the directory 2/ 'variable' is a confusioning word in Python. It may mean 'chunk of memory whose content can change'. If used in this sense in your expression: first, it should in fact be "put the directory into the variable"; secondly, unfortunately there are no variables in this sense, in Python. – eyquem Sep 25 '11 at 20:02
  • 1
    3/ 'variable' may also mean 'identifier', in Python. When this word appears (rarely) in the official docs of Python, it is employed in this sense. However I don't think that you wanted to express that the directory should be made an identifier. – eyquem Sep 25 '11 at 20:03
  • 1
    In the first part of your question, I am disconcerted by the fact you affirm that the regex pattern ``/(.*)`` can catch **a/b** in a first variable and **b** in a second variable (I suppose that 'variables' incorrectly designates the groups defined by the regex) while there is only one group defined in this pattern and that ``.*`` is a greedy expression (a non greedy one would be ``.*?`` but there should also be some non-optional part after it). – eyquem Sep 25 '11 at 20:03
  • 1
    In the "clarification" part, you first present exemples of strings to be matched: '/frankfurt', '/frankfurt/electronics', '/eu/frankfurt/electronics'. But just after you pretend that typical cases are '/region/city/category' and in the end, you declare to be satisfied with the regex pattern ``/(?:[^/]+)/?([^/]*)/?([^/]*)`` that works only for strings like '/region/city/category' and not for '/frankfurt' (result : ``('', '')`` ) and not for '/frankfurt/electronics' (result : ``('electronics', '')`` ). That's not clarification, it's mess. – eyquem Sep 25 '11 at 20:05
  • 1
    All that is sufficient to make your question appearing uncomprehensible, don't you see ? – eyquem Sep 25 '11 at 20:06
  • I beg your pardon and I promise to make myself more clear. – Niklas Rosencrantz Sep 26 '11 at 09:12
  • 1
    I didn't feel offended. I don't consider me as a solemn coder that must be begged a pardon with deference. It's not relatively to me that I reacted, but on the intellectual level and about the way of reasoning. I'm kind of annoyed when I read something unclear and I can't stop me to react relatively to the obscurity. Excuse me if I hurted you; but things must be said, sometimes. – eyquem Sep 26 '11 at 09:39
  • It is uncomprehensible what is a region and a city when all someone says is "Sao Paulo" - it's both a state and a city – Niklas Rosencrantz Sep 26 '11 at 09:42
  • 1
    Niklas, everybody knows that natural languages are extremely complex and can give ambiguities when the speaker doesn't have a high rigourness of expression. What does it prove concerning the manner to explain a programming problem and the way to develop an algorithm ? All the coders that want to make a program having to do with natural language have the same problems, you are not the first. – eyquem Sep 26 '11 at 10:30
  • @eyquem Do you agree that we speak about development more now than about programming when we mix in communication between us and other people? Programming is programming and development is another job: More about communication. Do you agree? – Niklas Rosencrantz Sep 29 '11 at 06:34

4 Answers4

2

After you have given more details, I can now propose another regex pattern:

import re

reg = re.compile('(?:/[^/]+(?=/[^/]+/[^/]+/?\Z)' # this partial RE matches the
                                                 # first of 3 groups, if 3
                 '|'    # OR
                 ')'   # nothing is catched
                 '/([^/]+)'  # the group always catching something
                 '(?:/([^/]+)?)?'  # the possible second or third group
                 '/?\Z' ) # the end

for ss in ('/frankfurt', '/frankfurt/',
           '/frankfurt/electronics', '/frankfurt/electronics/',
           '/eu/frankfurt/electronics', '/eu/frankfurt/electronics/',
           'toronto/lightnings', 'toronto/lightnings/',
           'lima/cars/old', 'lima/cars/old/',
           '/rio_de_janeiro/grande_rio_de_janeiro/casas/Magdalena'):
    mat = reg.match(ss)
    print ss,'\n',mat.groups() if mat else '- No matching -','\n'

result

/frankfurt 
('frankfurt', '') 
/frankfurt/ 
('frankfurt', '') 
/frankfurt/electronics 
('frankfurt', 'electronics') 
/eu/frankfurt/electronics/ 
('frankfurt', 'electronics') 
toronto/lightnings 
- No matching - 
lima/cars/old/ 
- No matching -
/rio_de_janeiro/grande_rio_de_janeiro/casas/Magdalena 
- No matching -

But, you know, using a regex isn't absolutely necessary to solve your problem:

for ss in ('/frankfurt', '/frankfurt/',
           '/frankfurt/electronics', '/frankfurt/electronics/',
           '/eu/frankfurt/electronics', '/eu/frankfurt/electronics/',
           'toronto/lightnings', 'toronto/lightnings/',
           'lima/cars/old', 'lima/cars/old/',
           '/rio_de_janeiro/grande_rio_de_janeiro/casas/Magdalena'):
    if ss[0]=='/':
        splitted = ss.rstrip('/').split('/')
        if len(splitted)==2:
            grps = splitted[::-1]
        elif len(splitted) in (3,4):
            grps = splitted[-2:]
        else:
            grps = None
    else:
        grps = None
    print ss,'\n',grps if grps else '- Incorrect string -','\n'

The results are the same as above.

agf
  • 171,228
  • 44
  • 289
  • 238
eyquem
  • 26,771
  • 7
  • 38
  • 46
1

You can try

/(?:[^/]+)/?([^/]*)/?([^/]*)

which will put 'a/b' in variable 1, 'a' in variable 2 and 'b' in variable 3. Not sure if that is what you want.

Narendra Yadala
  • 9,554
  • 1
  • 28
  • 43
  • Nearly. What I really want is to put 'frankfurt' in variable 1 and 'electronics' in variable 2 from a query like `/eu/frankfurt/electronics/` while still allowing for optionally shorter length i.e. `/eu` and `eu/frankfurt/` should also be allowed – Niklas Rosencrantz Sep 24 '11 at 16:34
  • 1
    From the info you gave it looks like variable 1 always contains the entire match. Only from variable 2 onwards the matching groups are assigned. If that is the case, you can just ignore the first variable. And if you dont want any of the matching subgroups you can use the non-capturing group using ?: e.g. /(?:[^/])*/([^/])* to escape matching eu in /eu/frankfurt – Narendra Yadala Sep 24 '11 at 16:50
  • Basically the case I'm trying to handle is /region/city/category with optional city and category – Niklas Rosencrantz Sep 24 '11 at 18:00
  • 1
    This is the simplest re i can think of /(?:[^/]+)/?([^/]*)/?([^/]*) I am not sure which group will go into which variable. I am just testing it out on my chrome console and it outputs city in $1, category in $2. The answer posted by @eyquem is more rigorous and strict. – Narendra Yadala Sep 24 '11 at 18:44
  • The solution seems to work for me: `('/(?:[^/]+)/?([^/]*)/?([^/]*)',GroupHandler)` mapping the regex to a grouphandler. I also need other regexes so I put this last or even can modify it a little if needed. Thanks a lot! – Niklas Rosencrantz Sep 25 '11 at 02:39
1

A solution that may work for you, though you may find it too hardcoded:

Routes for your app structured like this:

routes = [
    ('/foo/([a-zA-Z]+)/?', TestHandler),
    ('/foo/([a-zA-Z]+)/([a-zA-Z]+)/?', TestHandler),
    ('/foo/([a-zA-Z]+)/([a-zA-Z]+)/([a-zA-Z]+)/?', TestHandler)
]

And in your handler you'd check len(args), something like:

class TestHandler(webapp.RequestHandler):
    def get(self, *args):
        if len(args): # assign defaults, perhaps?
maligree
  • 5,939
  • 10
  • 34
  • 51
  • Nice solution. I'll try it but the one Narenda posted also works. `('/(?:[^/]+)/?([^/]*)/?([^/]*)',GroupHandler)` mapping the regex to a grouphandler. – Niklas Rosencrantz Sep 25 '11 at 02:40
1

If you just want to map this: (region)/(city)/(category)/ supposing its only this valid characters: [a-zA-Z0-9_]

you can do the following: - main.py

application = webapp.WSGIApplication([
                    ('/([-\w]+)/([-\w]+)/([-\w]+)', Handler)
],debug=True)

and on your handler:

class Handler(webapp.RequestHandler):
    def get(self, region, city, category):
        # Use those variables on the method

Hope this helps!

  • that's working great thanks a lot! Now I have many working alternatives so how prove which solution is "best"...when there are several solution that work. – Niklas Rosencrantz Sep 29 '11 at 06:36