2

Using Django v1.10 and Postgres

there's a datafield which may contain a mixture of symbols (such as \|?), numbers, alphabetical letters, as well as Asian language characters.

The user says the maximum of this field should be 15 characters.

How do I enforce this using Django and Postgres as the database? In postgres, we use utf-8 encoding.

1 character may be a digit or a Chinese character or an English alphabetic letter

I know in PHP, there's a function called mb_strlen. And in python, the equivalent would be to use unicode strings.

Within the Django way, what's the best way to enforce max string length?

Kim Stacks
  • 10,202
  • 35
  • 151
  • 282

1 Answers1

0

To begin with, you have to start by defining what you mean by characters. You mentioned korean, which is one of the languages that many string length functions misinterpret.

Multiple unicode characters may be used to describe a single grapheme (user perceived character), such as:

>>> len(u"한")
3

Using unicode strings will make it easy to count the number of unicode characters, but that is not the same as the number of user perceived characters. I would recommend reading this article on python text length.

If you do wish to count unicode characters instead of graphemes, then it's simple. Just use a CharField with a max_length argument (on your model and your forms).

If you wish to limit the field to a maximum of 15 graphemes however, you have to let the database field contain more characters than that and make some custom validation for your forms.

A helpful library for such a validator might be grapheme, which can calculate the number of graphemes in a string.

Alvin Lindstam
  • 3,094
  • 16
  • 28
  • I have defined character as digit, letter, or Chinese character – Kim Stacks Sep 11 '17 at 14:16
  • A letter is not really a definition either. For example, the sign 한 used above is composed of three unicode characters (hangul syllables): ㅎ,ㅏ and ㄴ. Do you want to count the length of 한 as 1 or 3? – Alvin Lindstam Sep 11 '17 at 15:37
  • I eliminated Korean and Japanese from considerations – Kim Stacks Sep 11 '17 at 15:54
  • Ok, so it seems you just want to count unicode characters, not graphemes. Do you have a django model? Where do you want it enforced? As I wrote, the most common way is a CharField on the model with `max_length=15`, which would create a 15 char database column and validate the length in generated ModelForms. – Alvin Lindstam Sep 12 '17 at 08:09