GET search in multilanguage site

Question

I've included a search form in my web2py application, in the following form:

myapp/controller/search?query=myquery

However, for security reasons web2py automatically replaces spaces and non-alphanumeric characters with underscores, which is okay for English-only sites but an impediment for languages that use accent marks. For example, searching for "áéíóú" returns five underscores.

This could be solved by using POST instead of GET for the search form, but then the users wouldn't be able to bookmark the results.

Is there any option to solve this?

Thanks in advance.

score 1 · Accepted Answer · answered May 17 '16 at 02:29

Here's an idea that I've used in the past:

Use post to submit the query
Generate a unique string (e.g. youtube: https://www.youtube.com/watch?v=jX3DuS2Ak3g)
Associate the query to that string and store as key/value pair in session/app state/db (depending on how long you want it to live)
Redirect the user to that

If you don't want to occupy extra memory/space as they tend to grow a lot in some cases, you can substitute steps 2-3 with encrypting the string to something you can decrypt afterwards. You can do this in a middleware class so that it's transparent to your app's logic.

Hi thanks for the answer. I had a similar idea, but I'm expecting something more web2py-friendly. Let me wait for other alternatives and in case this is my best choice I'll go ahead and accept it. — cdonts, May 17 '16 at 02:37

score 1 · Answer 2 · edited May 23 '17 at 12:22

1

This is a general problem people face while handling urls. You can use the quote/quote_plus module in urllib to normalize the strings -

For example, from the strings you suggested -

>>> print  urllib.quote('éíóú')
%C3%A9%C3%AD%C3%B3%C3%BA
>>> print  urllib.unquote('%C3%A9%C3%AD%C3%B3%C3%BA')
éíóú

you will have to perform the unquote when you retrieve it on the backend from the request.

There are also some other posts which might be helpful - urlencode implementation and unicode ready urls

edited May 23 '17 at 12:22

Community

1
1

answered May 22 '16 at 16:15

minocha

1,043
1
12
26

I think this will be enough. Thanks! – cdonts May 22 '16 at 17:00
@cdonts no problem :) – minocha May 22 '16 at 17:03
@cdonts I'm not sure how you are performing the search or indexing it, but to keep a standard i would suggest to use the quote unquote and decode to utf-8 output to index the strings for the search too, that way you will never run into trouble while indexing/searching – minocha May 22 '16 at 17:06
Well, unfortunately this doesn't work with web2py, since it unquotes (and replace non-alphanumeric characters with underscores) the query before I can do any processing. Thanks anyway. – cdonts May 24 '16 at 21:39

GET search in multilanguage site

2 Answers2