1

I need an alphabet of characters for a graphical project, so those characters must be "printable", specially in canvas element (I need to do ctx.fillText(char,...)).
Currently i'm just doing

const alphabeth = [
    'A','B','C','D','E','F','G','H','I','L','M','N','O','P','Q','R','S','T','U','V','Z',
    'a','b','c','d','e','f','g','h','i','l','m','n','o','p','q','r','s','t','u','v','z',
    '1','2','3','4','5','6','7','8','9',
    '-','+','@','?','^','!','(',')','&','#','%','$','|','<','>'
];

But this is limited to the char that I manually insert... are there way to get all printable chars? (or is exists a list of those)

adiga
  • 34,372
  • 9
  • 61
  • 83
Alberto Sinigaglia
  • 12,097
  • 2
  • 20
  • 48
  • 1
    Do you have any rigorous definition of "printable"? The same for "all printable chars" (like maaaaany unicode characters should be printable)? – Jan Stránský Aug 31 '20 at 13:23
  • You forgot `0`... – Mr. Polywhirl Aug 31 '20 at 13:25
  • You mean the range from ASCII 32-127? (If you want to count DELETE as printable) http://facweb.cs.depaul.edu/sjost/it212/documents/ascii-pr.htm – General Grievance Aug 31 '20 at 13:27
  • 3
    Oh boy, you'll find out that text rendering is a ton more complicated than you think it is. There are some "non-printable" characters that are absolutely necessary for some text and then there are characters that make a lot of sense after specific others but none to very little on their own. The short of it: your notion of "printable characters" falls flat pretty quickly. Could you tell us what situation you want to **avoid** so that we can suggest alternate approaches? – Joachim Sauer Aug 31 '20 at 13:32
  • @JoachimSauer what you mean with _avoid_? I mean, if i print a `\n` or a `\t` on a canvas, nothing will be printed since there is non graphical representation of them, so here you are a possible definition of "printable" – Alberto Sinigaglia Aug 31 '20 at 16:05
  • @JanStránský a char that if printed using `fillText` produce an output "not blank" like `\n` or `\t` – Alberto Sinigaglia Aug 31 '20 at 16:06
  • 1
    @Berto99 so to test one character, just print it with `fillText` and check if the result on the canvas is blank or not :-) – Jan Stránský Aug 31 '20 at 16:12
  • 1
    @Berto99 concerning "all chars", what is "all"? All unicode? All ascii? Som "all" subset of one or the other? Are you aware of [String.fromCharCode](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCharCode) function? – Jan Stránský Aug 31 '20 at 16:14
  • @JanStránský _alla chars that produce an output when using `fillText`_ i mean, it's pretty clean what you mean... if canvas support only ASCII, then only ascii, if it support UTC, the use UTC and so on... – Alberto Sinigaglia Aug 31 '20 at 16:16

2 Answers2

2

A DOMString is a UTF-16 encoded string. JavaScript uses UCS-2 for encoding strings internally.

See this answer for converting between UCS-2 and UTF-16 codepoints. The suggested library is Punycode.

You can use a regular expression to removed unprintable characters as seen here. Just build your ranges. Instead of making a regular expression, you could map out all your ranges and filter by the hex value prior to actually encoding the value. You can use a character-literal or Unicode number when forming a range.

const CharClassRanges = [
  '0-9',  // Numeric
  'a-z',  // Latin
  'α-ω',  // Greek
  '一-龯', // Japanese -- https://gist.github.com/terrancesnyder/1345094
  '\uFB1D-\uFB4F', // Hebrew (a few in range are unprintable)
  '!"#$%&\'()*+,.\/:;<=>?@\\[\\] ^_`{|}~-' // Special charcters
];
const PrintableUnicode = new RegExp(`^[${CharClassRanges.join('')}]*$`, 'i');

console.log(PrintableUnicode)

/**
 * Generate a range of UTF-16 Unicode values from 0xFEFF0000 to 0xFEFFFFFF.
 * @see http://www.fileformat.info/info/charset/UTF-16/list.htm
 */
function* generatePrintableUTF16() {
  const result = [];
  for (let i = 0x0000; i < 0xFFFF; i++) {
    const value = punycode.ucs2.encode([i]);
    if (PrintableUnicode.test(value)) {
      yield value;
    }
  }
};

console.log([...generatePrintableUTF16()].join('\n')); // Scroll to see all
.as-console-wrapper { top: 0; max-height: 100% !important; }
<script src="https://cdnjs.cloudflare.com/ajax/libs/punycode/1.4.1/punycode.min.js"></script>

The example below generates ~65,535 Unicode characters. Let's just say that you shouldn't really need to validate the text in the call.

/**
 * Generate a range of UTF-16 Unicode values from 0xFEFF0000 to 0xFEFFFFFF.
 * @see http://www.fileformat.info/info/charset/UTF-16/list.htm
 */
const generateUTF16 = () => {
  const result = [];
  for (let i = 0x0000; i < 0xFFFF; i++) {
    result.push(punycode.ucs2.encode([i]));
  }
  return result;
};

console.log(generateUTF16().join('\n')); // Scroll to see all 
.as-console-wrapper { top: 0; max-height: 100% !important; }
<script src="https://cdnjs.cloudflare.com/ajax/libs/punycode/1.4.1/punycode.min.js"></script>

The main Latin characters range between (0xFEFF0021 and 0xFEFF007E).

/**
 * Generate a range of UTF-16 Unicode values from 0xFEFF0000 to 0xFEFFFFFF.
 * @see http://www.fileformat.info/info/charset/UTF-16/list.htm
 */
const generateLatinUTF16 = () => {
  const result = [];
  for (let i = 0x21; i < 0x7F; i++) {
    result.push(punycode.ucs2.encode([i]));
  }
  return result;
};

console.log(generateLatinUTF16().join('\n')); // Scroll to see all
.as-console-wrapper { top: 0; max-height: 100% !important; }
<script src="https://cdnjs.cloudflare.com/ajax/libs/punycode/1.4.1/punycode.min.js"></script>
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
  • I think simplifying that whole article to"JavaScript uses UCS-2" just because it exposes the individual surrogates is problematic. Since "using UCS-2" would imply that only text within the BMP can be represented which is obviously wrong. – Joachim Sauer Aug 31 '20 at 15:38
  • some of those produce non output, at least on my macbook... probably i've not been as clear as i should with the definition of printable... i mean that produce a "visible" output (ex `\n` is not included) – Alberto Sinigaglia Aug 31 '20 at 16:10
  • @Berto99 So do Greek, Chinese, Japanese, etc characters count? If so, you will need an exhaustive list. Your question is very open-ended. – Mr. Polywhirl Aug 31 '20 at 16:15
  • @Mr.Polywhirl what i'm wishing is to have a "very exhaustive" list of printable chars – Alberto Sinigaglia Aug 31 '20 at 17:36
-2

Assuming you want all the caracters from the ascii table you can do that :

Edit: I haded a banlist or caracters who aren't printable

let caracters = []
let ban= [0,9,12,13,32] // ban list of caracters that are not printable
for(let i=0; i<127; i++ ){
    let temp = false
    ban.map(item=>{
        if(i == item){
            temp = true
        }
    })

    if(!temp){
        caracters.push(String.fromCharCode(i))
    }
    

}

console.log(caracters)
eltha
  • 53
  • 1
  • 6
  • Not necessary. Only characters from 0-31 are not printable. (MS Excel uses this definition.) BTW, 32 is space, so it's printable. – General Grievance Aug 31 '20 at 13:44
  • He needs character that are printable in javaScript, so not using Excel définition. Yes it's right 32 is printable but i assume he doesn't want space as you can see in his array – eltha Aug 31 '20 at 13:49