Can we convert Unicode to ASCII in Javascript? charCodeAt() is only for Unicode?

Question

We must do a small program for our teacher to get the ASCII code of any value in Javascript.

I have searched and researched, but it seems that there is no method to do so. I have only found:

charCodeAt()

http://www.hacksparrow.com/get-ascii-value-of-character-convert-ascii-to-character-in-javascript.html

That returns the Unicode value, but not ASCII.

I have read in this forum that the ASCII value is the same as the Unicode value for the ASCII characters that already have an ASCII value:

Are Unicode and Ascii characters the same?

But it seems that is not always the case, as for example with the extended ASCII characters. So for example:

var myCaracter = "├";

var n = myCaracter.charCodeAt(0);

document.write (n);

The ASCII value of that character is 195, but the program returns 226 (Unicode value).

I can't find a pattern to follow to convert from one to another, so:

¿Can we obtain the ASCII from Unicode, or should I look for another way?

Thanks!

http://www.differencebetween.net/technology/web-applications/difference-between-ansi-and-ascii/ — Blorgbeard, Oct 12 '16 at 21:47
This question was possibly already answered here [Efficiently replace all accented characters in a string?](http://stackoverflow.com/questions/286921/efficiently-replace-all-accented-characters-in-a-string) — Canilho, Oct 12 '16 at 21:52
"The" ASCII value is not a definite given for ... well, what comes down to *non*-ASCII characters. I'm betting `charCodeAt` was what your teacher was after. — Jongware, Oct 12 '16 at 21:52
ASCII is a 7-bit code, so there's no character 195. *Extended* ASCII is a name for a group of many 8-bit codes. There is no single accepted 8-bit "ASCII" code. — Pointy, Oct 12 '16 at 21:54
@Rad Lexus thanks, but she said that charCodeAt() is not the answer, 'cause it returns Unicode value, but not ASCII... — , Oct 12 '16 at 21:57
If you are asking for "*extended ASCII characters*", then you need to meticulously describe *which* extension you mean. After all, Unicode is just another extension of ASCII. — Bergi, Oct 12 '16 at 21:58
@Pointy i will try to understand what you say. All these basic things about formats and codes are pretty new to me (and informatics in general). — , Oct 12 '16 at 21:58
For an ASCII character (in the range 0-127), there's no difference between Unicode code point and ASCII value or even UTF-8 encoding. If your teacher tells you that `├` is an ASCII character, then she's terribly wrong. — jcaron, Oct 12 '16 at 21:59
Okay. The alternative is this: you are supposed to be able to find out how to use a *dictionary* to automatically translate between `charCodeAt`'s Unicode values and a dict that contains a mapping to [codepage 437](https://en.wikipedia.org/wiki/Code_page_437). Is your class that advanced, at this moment? If it is, [this list](http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT) is the official, Unicode.org's own reference. — Jongware, Oct 12 '16 at 22:04
@jcaron I don´t know, she just told us to prompt a string, and then show the ASCII value of a given position character from that string; but she said charCodeAt() is not valid because it is the Unicode, but not the ASCII. She told us too that ANY character is valid so when the user would insert one character from the extended ASCII character, I don´t know what to do. Maybe I should ask her for more explanations for what she wants from us. She just said "look on the internet to find the answer" but I can´t find it since two days ago. — , Oct 12 '16 at 22:05
@Rad Lexus I don´t think so. We have never heard about mappings, dictionaries, codepages or so. We are only learning the Javascript basic methods and she told us to make that exercise. I can do everything but the ASCII result. — , Oct 12 '16 at 22:09
Ask your teacher to come here and tell us what is the difference between the ASCII code and the Unicode code point for an ASCII character. `charCodeAt` definitely returns the ASCII code for any ASCII character. No doubt about it. You can actually try it with any ASCII character, you'll find the list here: https://en.wikipedia.org/wiki/ASCII — jcaron, Oct 12 '16 at 22:12
@FranP well `├` is simply not an ASCII character at all; there is no ASCII code for that symbol. It just does not exist in the character set. — Pointy, Oct 12 '16 at 22:12
@Pointy thanks but then, why can I make alt+195 for that character? and why is on this table? http://www.theasciicode.com.ar/ — , Oct 12 '16 at 22:19
@jcaron Then I will ask her what´s the difference; maybe I have misunderstood her, but many people in class have the same doubts and problem, so I think it is not my understanding problem. — , Oct 12 '16 at 22:20
Alt+ produces characters based on the Code Page 437 character set. Anything beyond 127 is not ASCII, it's Code Page 437. If she wants to get the Code Page 437 code for a given character (which is completely obsolete), she needs to say so explicitly. Not many people use that character set any more, the most common 8-bit character sets nowadays are ISO-8559-*, but Unicode is definitely the way to go. — jcaron, Oct 12 '16 at 22:26
Simply put, some people say "ASCII code" when they mean "character code", despite the confusion it causes. In the context of JavaScript (also HTML, XML, Java, .NET, …), "character code" is a UTF-16 code unit. UTF-16 is one encoding for Unicode. "Extended ASCII" is even more ambiguous. If someone says "ASCII", ask for the relevant specification. — Tom Blodget, Oct 12 '16 at 23:08
@jcaron I (me and MS-DOS and Windows) have used CP437 for 35 years. It's the default encoding for command prompts (for English installations at least). (Go `chcp`) It's ASCII that isn't used except in very specialized contexts. — Tom Blodget, Oct 12 '16 at 23:16
@FranP The text for that web page says that it is CP437. They could have been clearer by not mentioning ASCII at all. — Tom Blodget, Oct 12 '16 at 23:17
lol @Rad Lexus 'cause I am new and I didn´t know the best way to post the solution, and I finally thought it was better to post the solution on the question itself because everyone could see it easier. Then I willl post it as an answer again and check the introductory tour in order to make thing as good as posiible. Thanks :) — , Oct 15 '16 at 12:32
No problem - you may be confusing Stack Overflow with a *forum*. You can undelete your answer, and roll back your post edit - that's all. You can even [Accept](http://stackoverflow.com/help/accepted-answer) your own answer if you feel it's indeed the best answer (as a new user, you probably have to wait a bit). — Jongware, Oct 15 '16 at 12:35
Thanks again @Rad Lexus! No, I don´t think mine is the best answer! I am too newbie with programming to believe that! Maybe someone has a better, smarter solution; so to be honest, I can´t "accept" my own answer as the best one :) jcaron gave a very helpful answer which helped me a lot so I think his answer is the best one. — , Oct 15 '16 at 14:26

jcaron · Accepted Answer · 2016-10-12T22:09:48.183

4

ASCII characters only use 7 bits, with values from 0 to 127 (00 to 7F hex). They include:

control characters (0 to 31, as well as 127)
digits (0 to 9, encoded 48 to 57)
uppercase letters (65 to 90)
lowercase letters (97 to 122)
a limited number of punctuation and other symbols.

ASCII characters are a subset of Unicode (the "C0 Controls and Basic Latin Block"), and they are encoded exactly the same in UTF-8. The ASCII code of "A" (65 or 0x41) is the same as the Unicode code point for "A" (U+0041).

The character (├) you're considering is not ASCII. It's part of many different character sets / code pages, where it may have different numerical values / encodings, but it's definitely not ASCII.

That characters is not even defined in the most common ASCII 8-bit extensions, known as ISO-8859-*. It is part of the code page 437 (used on MS-DOS), where its numerical code is 0xC3 (195). But that's definitely not ASCII.

The Unicode code point for that character is U+251C (9500 decimal), which is the return value of charCodeAt for this character, not 226.

You're probably getting 226 because you're interpreting an UTF-8 string that has not been recognised as such.

edited Oct 12 '16 at 22:09

answered Oct 12 '16 at 21:54

jcaron

17,302
6
32
46

Thanks a lot for your answer @jcaron, but if it ├ (alt+195) is not an ASCII, why is it included in this table with all the "extended ASCII characters"? – Oct 12 '16 at 22:14
Problem is we know nothing about UTF, Unicode, ASCII... we have only learned a bit about ASCII, and nothing more. So I Think your post will help me a lot :) – Oct 12 '16 at 22:16
@FranP "Eventually, ISO released this standard as ISO 8859 describing its own set of eight-bit ASCII extensions." https://en.wikipedia.org/wiki/Extended_ASCII --- it's a lot of interesting read there. – zerkms Oct 12 '16 at 22:17
2

Those are not ASCII characters. They're [Code page 437](https://en.wikipedia.org/wiki/Code_page_437) characters, which is one of the many supersets of ASCII (it uses the same characters for codes 0-127, with additional characters for 128-255), but it's not ASCII. There are many other supersets of ASCII, including the ISO-8859-* character sets, the windows-1252 character set, the MacRoman character set, and of course Unicode (which goes beyond 0-255 to be able to include many, many more characters). – jcaron Oct 12 '16 at 22:19
Note that many people say "ASCII code" to describe the numerical value associated with a character, but that depends on the character set used. Without specifying the character set, you can't define what numerical value any character could have, and Unicode is just as good as any (or actually, it's much better, as it includes all characters defined in each of the many different character sets, which only included limited subsets each). – jcaron Oct 12 '16 at 22:22
Ok, one mate from class has confirmed me that she said "The Unicode value and the ASCII are not the same and you must look for the solution". So let´s see what she tells me. – Oct 12 '16 at 22:38
Other people in class is just looking for a pattern, like I did, so let´s see which is what she wants, or her explanation. – Oct 12 '16 at 22:40

score 2 · Answer 2 · answered Oct 14 '16 at 21:53

Today my teacher has apologized because maybe it was her fault to tell us that charCodeAt() is wrong to obtain the ASCII code; she wanted us to use that method, like @Rad Lexus suggested.

So, it is not neccesary in my excercise, but as a practice and to help everyone who could need it, what I have done is to add to the code a small validation in order to avoid that the user could enter ASCII extended characters bigger than or equal to 128, where the problems with charCodeAt() seem to start.

Maybe it is not a smart solution and it was certainly not necessary in my exercise, plus it makes that some necessary characters in another languages (ö for German or ñ for Spanish, for example) are forbidden... but I think it is good to post the code and let everyone which uses it to choose whether using this validation or not.

Thanks to everyone who helped me.

Defining function:

function validate(text)
{

    var isValid=false;

    var i=0;


    if(text != null && text.length>0 && text !='' )
    {
        isValid=true;

        for (i=0;i<text.length;++i)/*this is not necessary, but I did*/
        {
            if(text.charCodeAt(i)>=128)
            {
                isValid=false;
            }
        }

    }

    return isValid;

}

Using function

var isValid=false;

var position=0; 

while(isValid==false)
{
    text=prompt("Enter your text");

    isValid=validate(text);
}

Can we convert Unicode to ASCII in Javascript? charCodeAt() is only for Unicode?

2 Answers2