I’m trying to detect URLs containing Chinese symbols in the query string.
I tested with a single symbol (also with (*UTF8)
) but it didn’t work:
# Doesn't work
if ( $args ~ '址' ) {
return 404;
}
It turned out that it’s because nginx "sees" it as URL-encoded:
# Works
if ( $args ~ '%E5%9D%80' ) {
return 404;
}
The above works but it matches only this specific character. I URL-encoded a bunch of other symbols and the only pattern I saw is that they are all composed of three %-encoded chars with the first one being %E
followed by a digit. But that’s a very lose pattern and it probably matches non-Chinese characters as well.
Is there some clean way to do this without generating a monstruous regex from the URL-encoding of all Chinese characters?