1

I am using jemdoc+mathjax(http://www.mit.edu/~wsshin/jemdoc+mathjax.html) to make my website. However, when I am compiling, I came with the following mistake. If I want to simply compile jemdoc.py home, then everything goes ok. However, when I want to compile with the defult mysite.conf as follows

jemdoc.py -c mysite.conf home

then it does not run and the here is the bug report

Traceback (most recent call last):
  File "C:\homepage\jemdoc.py", line 1646, in <module>
    main()
  File "C:\homepage\jemdoc.py", line 1642, in main
    procfile(f)
  File "C:\homepage\jemdoc.py", line 1390, in procfile
    out(f.outf, f.conf['bodystart'])
  File "C:\homepage\jemdoc.py", line 380, in out
    f.write(s)
UnicodeEncodeError: 'gbk' codec can't encode character '\u2630' in position 747: illegal multibyte sequence

My system is windows 10 and the language is Chinese. But in my home.jemdoc, there is no Chinese character. Also, compiling using either python 2 or python 3 has the above problem.

Does anyone know how to solve it? Thanks a lot!

bc a
  • 13
  • 2
  • `\u2630` character is `☰` (U+2630, *Trigram For Heaven*). I'd check its presence in `mysite.conf`… – JosefZ Sep 06 '22 at 14:14
  • @JosefZ Thanks a lot! I found that here is a `☰` in the sentence ``. After deleting this character, the program ran correctly and the resulting html seems no wrong. I hope this character has no effect in the original `mysite.conf`. – bc a Sep 06 '22 at 14:34

1 Answers1

0

Replace the character (U+2630, Trigram For Heaven) with another one (a similar glyph), e.g. with (U+2261, Identical To).

'gbk' codec then encodes this character as

'\u2261'.encode('gbk')    # b'\xa1\xd4'

Another similar glyphs \u2506 or \u2507:

  • (U+2506, Box Drawings Light Triple Dash Vertical)
  • (U+2507, Box Drawings Heavy Triple Dash Vertical)

In Python:

'┆ ┇'.encode('gbk')       # b'\xa9\xaa \xa9\xab'
JosefZ
  • 28,460
  • 5
  • 44
  • 83