How can one define a language which does not fit in the Chomsky Hierarchy?

Question

I'm asking this question because I've stumbled across the accepted answer of Chomsky Language Types

This quote is referring to Type-0 Grammars:

This means that if you have a language that is more expressive than this type (e.g. English), you cannot write an algorithm that can list each an every (and only these) words of the language

As far as I know:

There is no mathematical description for what English is so it is meaningless to argue about where it lands in the hierarchy of formal languages.
If there was, then English would certainly be recognizable by some Type-0 Grammar by virtue of it being defined by a finite amount of reasoning - where it be axioms, a grammar, anything. (If not - how could've someone define it if not by a finite amount of steps?)

Hence:

We can't start talking about how 'expressive' a grammar needs to be to generate precisely an unknown mathematical object

Therefore my problem:

How can one define a language which does not fit in the Chomsky Hierarchy?
If (?) it takes a finite amount of steps for mathematicians to define sets with cardinalities that do not make them recursively enumerable - then grammars must exist which are more expressive than Type-0 since they (mathematicians) have followed a finite amount of rules (production rules if you will) to produce a non-RE set. Where are they?

rici · Accepted Answer · 2020-07-13T01:17:52.900

3

A language is a possibly-infinite set of finite words written with some finite alphabet. Since the alphabet is finite and the length of each word is finite, the words of any language are enumerable, in the sense that there exists an enumeration. In other words, the size of any language is at most countably infinite.

However, since any subset of the Kleene closure of the alphabet is a language, the number of languages is not countably infinite. Hence, there is no enumeration of languages.

The Chomsky hierarchy is based on a formalism which can be expressed as a finite sentence with a finite alphabet (the same alphabet as the language being described, plus a couple of extra symbols). [Note 1] So the number of possible Type 0 grammars is countably infinite, and there cannot be a correspondence between the set of grammars and the set of languages.

However. The existence of languages (i.e. sets) for which no generative grammar exists does not necessarily mean that there is some other way of describing these languages which is "more expressive" than generative grammars. Any description which can be written as a finite string using a finite alphabet can only describe a countable infinity of sets. Whether or not it is the same countable infinity will depend on the formalisms, and in general there will be no algorithm which can demonstrate homomorphism. But some equivalences are known (such as the equivalence with Turing machines, which is a particularly interesting equivalence).

So, we have an interesting little conundrum, which is (of course) related to Gödel's Incompleteness Theorems. That is, there are more languages than ways of describing a language, no matter what system we use to describe a language. So the question "How do we describe a language for which no description is available?" does not have a good answer (and if we answer it, by calling some set "Sue", then there will still be an uncountable infinitude of possible sets for which no name exists).

While all this foraging into infinitudes is interesting, it has a few issues:

It has very little (if anything) to do with programming, so it's questionable whether it's on topic for StackOverflow.
Kurt Gödel and Georg Cantor, the two mathematicians responsible for most of the concepts in this answer, both suffered from severe depression. Just saying.

Notes

Although at first glance it might appear that the alphabet for a Type 0 grammar might be arbitrarily larger than the alphabet of the language being described, that is not actually the case. The grammar's alphabet consists of the target alphabet plus a finite set of non-terminals plus an → symbol; the non-terminals can be written using numbers in any convenient base, say binary. So only three additional symbols are required (and you could reduce that to two by arbitrarily designating one of the non-terminal numbers to be the arrow). (It might seem like you need a third symbol to delimit the names of non-terminals, but you can use a fibonacci encoding to produce codes which always start with a 1 and never include two 1s, so that you can use an extra 1 at the beginning to unambiguously mark the start of the symbol.)

edited Jul 13 '20 at 01:17

answered Jul 12 '20 at 23:39

rici

234,347
28
237
341

Every single answer you've ever given me has been extremely insightful. "However, since any subset of the powerset of the alphabet is a language, the number of languages is not countably infinite." So now we've moved from a language as a set of strings to a language as a set of sets of strings? (subset of a powerset is a set of sets, unless you meant element of a powerset?) Furthermore, if the alphabet itself is finite, then the powerset is finite, and hence any subset of the powerset (or element of the powerset) is finite. Could you elaborate on this quote? Thank you very much. – Novicegrammer Jul 12 '20 at 23:55
1

@Novicegrammer: sigh. My brain is failing... I meant "any subset of the Kleene closure of the alphabet" (edited) but it would probably have been better to have said "any element of the power set of the Kleene closure of the alphabet". Anyway, the cardinality of the set of languages is 2 to the power of Aleph0, which would be easier to write legibly on [cs.se] which has MathJax. – rici Jul 13 '20 at 01:35
I've given your answer some thought. Since "Any description which can be written as a finite string using a finite alphabet can only describe a countable infinity of sets" - and mathematicians use finite strings to talk about other finite strings (I've yet to meet a mathematician that truly deals with infinite statements) - would it be fair to say that all of the concepts in Mathematics discovered so far are expressible by a single (or several) Type-0 Grammars? – Novicegrammer Jul 13 '20 at 11:52
Example: Vector spaces - Plug in some [strings](https://www.math.ucla.edu/~tao/resource/general/121.1.00s/vector_axioms.html) as axiomatic truths, plug in some production rules, start pumping out syntactic derivations (proofs). EDIT: I suppose one caveat in my reasoning is that the "alphabet" of mathematics is ever-expanding - but still finite. I wonder how that works out. – Novicegrammer Jul 13 '20 at 11:53
@Novicegrammer: Sure, as I said, that's the essence of Gödel's Incompleteness Theorems, which can be brutally summarised as: (1) Everything mathematical proof can be written down, and (2) Any countably infinite set of proofs fails to include at least one statement which is incontrovertibly true. (That's the incompleteness part.) To really understand the historical impact, though, you need a time machine; you need to imagine a world without even the concept of computer programs, in which Russell and Whitehead had recently published their extraordinary *Principia Mathematica*... – rici Jul 13 '20 at 14:18
In that context, things which today might not seem so radical fell from the intellectual debates like meteors, blasting great craters in the understanding of mathematics. However, a century later we've more or less integrated these ways of thinking (which is not to say that there are not periodic outbreaks of misguided breathless enthusiasm which, also like the meteors, generate a lot more heat than light. I'd put *Godel, Escher, Bach* in this category; if you want something amusing and insightful, find one of Raymond Smullyan's books.)... – rici Jul 13 '20 at 14:23
Mmm, I have some issues with classifying a set of proofs as countably anything... Looks like circular reasoning or coherentism (which I like) - how can you understand what is a countable set when you're still devising the proof system for say, ZFC itself? I know the GIT as (perhaps incorrectly, I admit): Given a 1. Formal Grammar for well defined statements, 2. Finite set of Axioms, 3. A metalanguage that defines a proof system, then : Some strings generated by the grammar have a truth value undiscoverable by the proof system provided by the metalanguage. – Novicegrammer Jul 13 '20 at 14:25
But I digress. Gödel showed that you can enumerate mathematical statements; a little while later, Turing showed that you can construct a (theoretical) machine which can produce them all. (The "Universal Turing Machine" -- a machine which interprets a written description of any Turing Machine -- is an important precursor of the modern computer. But it also serves to unify the concept of TMs.) Generative grammars are not much different from turing machines, so it is not really surprising that they can be proven equivalent. – rici Jul 13 '20 at 14:26
Thus, the idea that all of mathematics can be expressed as a generative grammar is not different from the idea that all of mathematics can be output by a turing machine (or a computer, in some theoretical sense); this is the Turing-Church thesis. And it's simultaneously the best we have and provably incomplete. – rici Jul 13 '20 at 14:28
But I don't think that was the point I wanted to make. The point is that you can read way too much into the concept of a grammar. We tend to think of "grammar" as giving structure to an sentence, but that's not true of Type 0 grammars. Since an unrestricted grammar cannot be used to parse a sentence, no structure is provided. Even if we enumerate sentences until we find the target sentence (and the enumeration includes the derivation sequence, which is reasonable) we really have not discovered anything useful. . – rici Jul 13 '20 at 14:32
The grammar has become a computer program, whirling gears and flashing lights which generates an unending series of statements which are true but devoid of context. Since the statements can be enumerated, they can be numbered and that number (written in binary, say) is a possible encoding of the statement. The set of binary numbers is not grammatically interesting. But the enumeration means that it is (one of) the languages which encodes all of (some set of) mathematical axioms. – rici Jul 13 '20 at 14:34
That's all worth thinking about (and in a form less superficial than a SO comment thread). But it does not offer what it seems to offer: A "grammar" (in the traditional sense) of mathematics. The countability of finite statements is important, but the algorithm used to count them less so. I think I'll leave it at that for now. – rici Jul 13 '20 at 14:36
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/217829/discussion-between-novicegrammer-and-rici). – Novicegrammer Jul 14 '20 at 16:42

How can one define a language which does not fit in the Chomsky Hierarchy?

1 Answers1

Notes