Extension to CFG, what is it?

Question

Consider the following extension to context-free grammars that permits rules to have in the left-hand side, one (or more) terminal on the right side of the non-terminal. That is, rules of the form:

A b -> ...

The right-hand side may be anything, like in context-free grammars. In particular, it is not required, that the right-hand side will have exactly the same terminal symbol at the end. In that case, this extension would be context-sensitive. But the terminal is not just a context. Sometimes, this terminal is called "pushback".

Clearly, this is no longer CFG (type-2). It includes type-1. But what is it? Really type-0 already?

This particular extension is permitted in Definite Clause Grammars dcg in Prolog. (To avoid misunderstandings, I do not consider here Prolog's full extensions. I.e. I assume terminals to come from a finite alphabet and not being arbitrary terms, also I do not consider Prolog's additional arguments that are permitted in DCGs, which also go into type-0 already.)

Edit: Here is a simpler way to describe the extension: Add to a CFG rules of the form

A b -> <epsilon>

@CookieMonster: That's what Colmerauer's formalism is about. — false, Aug 29 '12 at 16:29
It doesn't make sense to call "b" in your question a terminal, if it can be rewritten to something. A "terminal", as the name implies, can't be rewritten into something else. Cf. http://cstheory.stackexchange.com/questions/14006/name-for-terminals-on-the-left-hand-side-of-grammar-rules — imz -- Ivan Zakharyaschev, Oct 28 '12 at 23:18
@imz: Calling it a terminal, even on the left-hand side is [common terminology](http://en.wikipedia.org/wiki/Context-sensitive_grammar#Formal_definition): "α and β are strings of nonterminals and terminals". There **are** names for this used since decades in DCGs: context, and pushback list. Both names are not ideal. Currently I favour semi-context. — false, Oct 28 '12 at 23:59
@false You gave a link to the definition of context-sensitive grammars, and there the terminals that appear on the LHS stay untouched on the RHS. So I see no contradiction with the idea that we call a terminal something that is not rewritten by a rule. But your example breaks this pattern. Your example is like a general rewriting system, in particular, a [string rewriting system](http://en.wikipedia.org/wiki/Rewriting_system#String_rewriting_systems), whose definition doesn't employ the notion of "terminals"; it deals just with strings over an alphabet of symbols. — imz -- Ivan Zakharyaschev, Oct 29 '12 at 12:41
@false I see, when we talk about [unrestricted grammars](http://en.wikipedia.org/wiki/Unrestricted_grammar), there's really no essentail destinction between terminals and non-terminals. The distinction is there for comparison with the other types of grammars in the Chomsky hierarchy. I can't yet quite understand whther your restriction of b to terminals is essential for the expressiveness (or other properties) of your formalism. — imz -- Ivan Zakharyaschev, Oct 29 '12 at 12:58
@imz: There is still a distinction between terminal and non-terminals for me: The left-hand side consists always of (in this order:) a non-terminal and 0, one, or more terminals. — false, Oct 29 '12 at 15:42
@false Please give a complete example of such a grammar, which would demonstrate the specific usefulness of such kind of grammars. Then we all could discuss whether it makes sense to distinguish termials and non-terminals on some ground. — imz -- Ivan Zakharyaschev, Oct 29 '12 at 20:31
@imz: These grammars are in use since about 1975. See [for some material](http://www.complang.tuwien.ac.at/ulrich/iso-prolog/#dcg) about it. If you want to discuss things further, you really need to ask a new question. It's the way things happen on SO. E.g. ask for examples... — false, Oct 29 '12 at 20:35

score 5 · Answer 1 · edited Jun 20 '20 at 09:12

Here's a partial answer:

The grammar is within type 0. A context-sensitive grammar (type-1) has rules of the form wAx -> wBx where w and x are strings of terminals and non-terminals, and B is not empty. Note that since B is not empty, |wAx| <= |wBx|. Your grammar has rules of the form Ax -> z where z is a string of terminals and non-terminals and can be empty, and x can be removed. This violates two principles of CSGs.

Unsatisfyingly, I did not answer two things:

Is the language generated by your grammar type-1? The grammar is type-0, but that does not mean the language cannot be type-1. For example, regular languages (type-3) can be described by CFGs (type-2).
Is the language recursive? This is important since, if so, the language is decidable (does not suffer from the halting problem).

I'm currently attempting a proof for the second point, but it's probably beyond my ability.

Small typo in your post, probably should read |wAx| <= |wBx|? — , Aug 22 '12 at 19:30
Thanks, updated again...I should not be doing formal grammars with little sleep, too error prone. — DPenner1, Aug 23 '12 at 04:12

false · Accepted Answer · 2014-06-12T14:56:18.507

3

To answer my question with respect to Prolog's DCG formalism, this extension is now called a semicontext. See N253 DIN Draft for DCGs 2014-04-08 - ISO/IEC WDTR 13211-3:2014-04-08

Given

a1, [b] --> ... .

a2, [b,b] --> ... .

The terminal-sequence [b] is now a semicontext, as well as the terminal-sequence [b,b].

Would the same terminal sequence now appear at the end of the rule, we would have a context:

a3, [b,b] --> ..., [b,b].

So "semi" means here "half" - similar to a semigroup where half of the algebraic properties of a group hold.

edited Jun 12 '14 at 14:56

answered Jun 12 '14 at 12:08

false

10,264
13
101
209

Do you know any reference to use semicontext for CSG? It seems your approach only works for right-context Aβ --> γβ, when I read your last example correctly. Formal definition of right context sensitive at end of article here: https://en.wikipedia.org/wiki/Context-sensitive_grammar#Formal_definition – Feb 08 '19 at 14:09

score 1 · Answer 3 · answered Aug 22 '12 at 16:36

1

Yup it's type 0 I think. Principle for first 3 types (3, 2 and 1) is that those can't perform reduction. Those are generative only types.

Here you can transform a terminal into epsilon so it's type 0.

answered Aug 22 '12 at 16:36

m09

7,490
3
31
58

Extension to CFG, what is it?

3 Answers3

Linked