3

I want to know what symbol I can use to refer to any character within the utf8 encoding table for nginx rewriting. I have tried:

rewrite ^/.$ /new-location.html break;

but it seems the "." can only stand for ascii characters, when I tried http://example.com/汉 (a Chinese character), it did not work.

This also does not work:

rewrite ^/([\x00-\xff])$ /new-location.html break;
yang
  • 508
  • 7
  • 17

2 Answers2

5

From the documentation :

However, UTF-8 and Unicode support has to be explicitly enabled; it is not the default. The Unicode tables corre- spond to Unicode release 6.0.0.

So you have to enable utf-8 to work :

"(*UTF8)^yourregex$"
FailedDev
  • 26,680
  • 9
  • 53
  • 73
  • Thank you for your replay, but I got error msg: [emerg]: `pcre_compile() failed: (*VERB) not recognized in "^(*UTF8)/(.)$" at "8)/(.)$" in xxx `. – yang Oct 31 '11 at 06:58
  • Hm OK. There are these two directive you may want to put into your config file. `charset utf8; source_charset utf8;` – FailedDev Oct 31 '11 at 07:03
  • I have already done that. Maybe it is my PCRE (version 7.8 2008-09-05) does not support the (*UTF8) argument. I am trying to upgrade it. With Ubuntu, will `apt-get upgrade libpcre3 libpcre3-dev` upgrade my PCRE? – yang Oct 31 '11 at 07:52
  • I am not an ubuntu expert so I can't help you here. :). However if specify the charset then the . should match all characters, including utf-8 ones. – FailedDev Oct 31 '11 at 07:54
  • I did it. I upgraded my PCRE and (*UTF8) works now. If you read Chinese, this article will help you on how to upgrade PCRE for Nginx: http://www.cslog.cn/Content/nginx-pcre-utf8-rewrite/ – yang Nov 04 '11 at 03:48
3

The instruction above tells you to use...

"(*UTF8)^yourregex$"

but your error message reveals you're using something different...

"^(*UTF8)/(.)$"

I'm no expert, but it looks like you've been advised to prefix your regex with (*UTF8) but instead you're inserting it after the opening character of your regex.

Brian Lowe
  • 33
  • 2