25

See following output on my system:

[STEP 101] # python3 -c 'import sys; print(sys.stdout.encoding)'
ANSI_X3.4-1968
[STEP 102] #
[STEP 103] # locale
LANG=C
LANGUAGE=en_US:en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
[STEP 104] #

Googled but found very little info about it. Even Python's The Python Library Reference (v3.5.2) does not mention it. Any international standard defines it?


(Copied the authoritative ref from the accepted answer's comment: Character Sets)

sophros
  • 14,672
  • 11
  • 46
  • 75
pynexj
  • 19,215
  • 5
  • 38
  • 56

2 Answers2

23

This is another name for USAS X3.4-1968, a revision of ASCII that is distinguished by being:

  • the first revision to allow a linefeed (LF) to occur on its own (i.e. not preceded by or followed by a carriage return (CR)).

  • the revision that introduced the common name of (US-)ASCII.

This is basically ASCII as we think of it, although there were two minor revisions that followed it.

donkopotamus
  • 22,114
  • 2
  • 48
  • 60
  • 5
    is `ANSI_X3.4-1968` an official name? or only in Python? – pynexj Feb 12 '18 at 09:43
  • 5
    Yes it is an official name, the name has nothing to do with python see eg https://www.iana.org/assignments/character-sets/character-sets.xhtml – donkopotamus Feb 12 '18 at 09:45
  • 1
    also wondering why python uses this name but does not mention it in the doc. it only mentions `ascii` and `us-ascii`. this introduces unnecessary confusions. – pynexj Feb 12 '18 at 09:51
  • 2
    @pynexj I was curious as well so I dug into the implementation -- see my answer below :) – anthony sottile Feb 22 '18 at 06:25
9

If you're curious where it comes from in cpython, the value is computed from the locale module using langinfo.

Here's a tiny C program which demonstrates how the _locale module determines this information:

#include <langinfo.h>
#include <locale.h>
#include <stdio.h>

int main () {
    setlocale(LC_ALL, "");
    printf("%s\n", nl_langinfo(CODESET));
    return 0;
}

And some sample output:

$ LANG= ./a.out 
ANSI_X3.4-1968
$ LANG=en_US.UTF-8 ./a.out 
UTF-8

python normalizes the ansi name to ascii (or US-ASCII)

anthony sottile
  • 61,815
  • 15
  • 148
  • 207