PostScript mark token

Question

In PostScript if you have

[4 5 6]

you have the following tokens:

mark integer integer integer mark

The stack goes like this:

| mark |
| mark | integer |
| mark | integer | integer |
| mark | integer | integer | integer |
| array |

Now my question: Is the ]-mark operator a literal object or an executable object?

Am I correct that the [-mark is a literal object (just data) and that the ]-mark is an executable object (because you always need to create an array when you see this ]-mark operator) ?

PostScript Language Reference Manual section 3.3.2 gives me:

The [ and ] operators, when executed, produce a literal array object with the en-closed objects as elements. Likewise, << and >> (LanguageLevel 2) produce a literal dictionary object.

That is not clear for me if both [ ] operators are executable or only the ] operator.

luser droog · Accepted Answer · 2023-04-04T06:46:13.993

Summary.

All of these special tokens, [, ], <<, >>, come out of the scanner as executable names. [ and << are defined to yield a marktype object (so they are not operators per se, but they are executable names defined in systemdict where all the operators live). ] and >> are defined as procedures or operators which are executed just like any other procedure or operator. These use the counttomark operator to find the opening bracket. But all of these tokens are treated specially by the scanner, which recognizes them without surrounding whitespace since they are part of its delimiter set.

Details.

It all depends on when you look at it. Let's trace through what the interpreter does with these tokens. I'm going to illustrate this with a string, but it works just the same with a file.

So if you have an input string

([4 5 6]) cvx exec

cvx makes a literal object executable. The program stream is a file object also labeled executable. exec pushes an object on the Execution Stack, where it is encountered by the interpreter on the next iteration of the inner interpreter processing loop. When executing the program stream, the executable file object is topmost on the Execution Stack.

The interpreter uses token to call the scanner. The scanner skips initial whitespace, then reads all non-whitespace characters up to the next delimiter, then attempts to interpret the string as a number, and failing that it becomes an executable name. The brackets are part of the set of delimiters, and so are termed 'self-delimiting'. So the scanner reads the one bracket character, stops reading because it's a delimiter, discovers it cannot be a number, so it yields an executable name.

Top of Exec Stack | Operand Stack
(4 5 6]) [        |

Next, the interpreter loop executes anything executable (unless it's an array). Executing a name means loading it from the dictionary, and then executing the definition if it's executable. [ is defined as a -mark- object, same as the name mark is defined. It's not technically an operator or a procedure, it's just a definition. Automatic loading happens because the name comes out of the scanner with the executable flag set.

(4 5 6])  | -mark-

The scanner then yields 4, 5, and 6 which are numbers and get pushed straight to the operand stack. 6 is delimited by the ] which is pushed back on the stream.

(])  | -mark- 4 5 6

The interpreter doesn't execute the numbers since they are not executable, but it would be just the same if it did. The action for executing a number is simply to push it on the stack.

Then, finally the scanner encounters the right bracket ]. And that's where the magic happens. Self-delimited, it doesn't need to be followed by any whitespace. The scanner yields the executable name ] and the interpreter executes it by loading and it finds ...

{ counttomark array astore exch pop }

Or maybe an actual operator that does this. But, yeah. counttomark yields the number of elements. array creates an array of that size. astore fills an array with elements from the stack. And exch pop to discard that pesky mark once and for all.

For dictionaries, << is exactly the same as [. It drops a mark. Then you line up some key-value pairs, and >> is procedure that does something to effect of ...

{ counttomark dup dict begin 2 idiv { def } repeat pop currentdict end }

Make a dictionary. Define all the pairs. Pop the mark. Yield the dictionary. This version of the procedure tries to create a fast dictionary by making it double-sized. Move the 2 idiv to before dup to make a small dictionary.

So, to get philosophical, counttomark is the operator you're using. And it requires a special object-type that isn't used for anything else, the marktype object, -mark-. The rest is just syntactical sugar to let you access this stack-counting ability to create linear data-structures.

Appendix

Here's a procedure that models the interpreter loop reading from currentfile.

{currentfile token not {exit} if dup type /arraytype ne {exec} if }loop

exec is responsible for loading (and further executing) any executable names. You can see from this that token really is the name of the scanner; and that procedures (arrays) directly encountered by the interpreter loop are not executed (type /arraytype ne {exec} if).

Using token on strings lets you do really cool stuff, however. For example, you can dynamically construct procedure bodies with substituted names. This is very much like a lisp macro.

/makeadder { % n  .  { n add }
    1 dict begin
    /n exch def
    ({//n add}) token % () {n add} true
    pop exch pop % {n add}
    end
} def

token reads the entire procedure from the string, substituting the immediately-evaluated name //n with its currently defined value. Notice here that the scanner reads an executable array all at once, effectively executing [ ... ] cvx internally before returning (In certain interpreters, like my own xpost, this allows you to bypass the stack-size limits to build an array, because the array is built in separate memory. But Level 2 garbage collection makes this largely irrelevant).

There is also the bind operator which modifies a procedure by replacing operator names with the operator objects themselves. These tricks help you to factor-out name lookups in speed-critical procedures (like inner loops).

@juFo No, thank *you*. Good question. It's an interesting little corner. — luser droog, Feb 11 '13 at 08:26
but about the ([4 5 6]) will that result just in a string [4 5 6] or will that actually result in an empty string (because it is executed as an array?) because i never thought of it that way (inside a string) — juFo, Feb 11 '13 at 08:32
In general, executing from a string (`(...)cvx exec`) works the same as executing a file (`(...)run`). It's roughly the same as executing a procedure (array), except that the procedure is *pre-parsed*. Using the string to illustrate the scanning lets me show how much of the input has been consumed. — luser droog, Feb 11 '13 at 08:39
@juFo I've updated with a little more detail on the interpreter loop and the scanner. If you're interested, I've got c-code for a scanner in [this answer](http://stackoverflow.com/a/6758622/733077). — luser droog, Feb 13 '13 at 04:11
Thanks luser droog!! I'm currently trying to write my own parser in C# but still having some basic problems. also i'm not sure how the GraphicsState and CTM work exactly at this moment hèhè :-) — juFo, Feb 13 '13 at 13:25

score 3 · Answer 2 · answered Feb 10 '13 at 20:32

3

Both [ and ] are executable tokens. [ produces a mark object, ] creates an array of objects to the last mark

answered Feb 10 '13 at 20:32

KenS

30,202
3
34
51

1

the left bracket seems a bit of a philosophical question. It is listed as a command in the manual therefore it is. I don't think anyone would call an integer an executable that pushes its value onto the stack, but you'd need to get into implementation issues to see how the left bracket is any different. – agentp Feb 10 '13 at 21:03
exactly george. Thanks for your comment – juFo Feb 10 '13 at 21:24

PostScript mark token

2 Answers2

Summary.

Details.

Appendix