The Perl INTERCAL compiler

... Parsers

CLC-INTERCAL 1.-94 no longer includes a parser. Instead, it contains a parser generator. The source language for the parser generator is a includes the CREATE statement and the ability to assign to special registers (to control the runtime operating mode). This document describes the syntax of a CREATE statement, and shows an example of a simple parser for a subset of INTERCAL-72

Syntax

The CREATE statement has the form:
    CREATE grammar class template AS code
The grammar is optional, and can only be used when compiling a compiler. Normal program can only modify their own compiler by leaving grammar to the default value (_1); compilers can modify other compilers, including but not limited to the grammar used to compile the program (compiler modifying itself) and the one used to compile the compiler itself. This advanced use of the CREATE statement is discussed elsewhere.

The class specifies a syntactic class (some other languages might call it a nonterminal). Usually, this takes the form of a what ("?") followed by some alphanumerics, although anything which evaluates to a number will do. Please note that in CLC-INTERCAL the what does not introduce a unary logical operator as in C-INTERCAL, and it always produces a named constant (of sorts).

The template is composed of a sequence of terminals and nonterminals which vaguely resemble the syntax you are trying to define. Nonterminals are specified as a special type of constant, usually introduced by the "what" discussed before. Terminals are specified as "array slices", that is sequences of numbers enclosed in tails or hybrids and representing a 16- or 32-bit array. The text which would be produced by READing OUT these arrays specifies the syntax to define. For example, consider the following line from 1972.iacc:

    DO CREATE _2 ?VERB ?LVALUE ,#91 + #91 + #79, ,#95 + #91 + #67,
       ?EXPRESSION AS STO + ?EXPRESSION #1 + ?LVALUE #1
Ignore for just now the bit after the AS. We are creating something in class ?VERB of grammar _2 (which by default contains the compiler - the compiler compiler is in grammar _1). There is a nonterminal (?LVALUE), which presumably will match a register and anything else you can assign to; then there are two terminals. To see what these are, you can run the following program:
    PLEASE ,1 <- #3
    DO ,1 SUB #1 <- #91
    DO ,1 SUB #2 <- #91
    DO ,1 SUB #3 <- #72
    PLEASE ,2 <- #3
    DO ,2 SUB #1 <- #95
    DO ,2 SUB #2 <- #91
    DO ,2 SUB #3 <- #67
    PLEASE READ OUT ,1 + ,2
    DO GIVE UP
We assign the values to 16-bit arrays because the slices are encloses in tails. If they had been hybrids, we'd use 32-bit arrays. Running the program, we get "<-". This is not a coincidence: the above statement CREATEs the assignment. It should not surprise anybody that the next nonterminal is called ?EXPRESSION.

ICBM

A suitable sequence of CREATE statements can define the syntax for all the known variants of INTERCAL. The semantics is defined by the code part of the CREATE statements. Before detailing how the semantics is specified, we must take a detour.

The INTERCAL Common Bytecode Machine (ICBM) is an assembler-like language which is used by the CLC-INTERCAL compiler as an intermediate language. It can be interpreted directly by the bytecode interpreter, or run directly on any computer with the INTERCAL Operating System; alternatively, a second level compiler will be made available to convert ICBM programs to other languages, for example assembler, Perl, or DD/SH. Please note that it is currently not possible to create a code generator for C, FORTRAL, COBOL or Pascal, but this may change in future - however you will need to have the C, FORTRAN, COBOL or Pascal compiler available at runtime if the program happens to use CREATE in a way which cannot be optimised away (all the examples in this documentation can, so you won't need a compiler at runtime to run them).

ICBM assumes a memory model with multiple stashes: any value can be stashed and retrieved on any of the stashes, which function independently of each other. There is no limit to the size of an element on a stash, or to the number of elements in each stash. Memory allocated to a stash element is automatically freed when the element is removed from the stash.

ICBM programs are inherently multi-threaded. Memory between the threads is shared unless otherwise specified. When a thread terminates, it remains in memory until it is absolutely clear that the thread cannot came back to life. This is important, since threads in quantum programs can have their behaviour modified after they have completed execution.

ICBM programs also contain a number of compilers. Usually, there will be a "program compiler" and a "compiler compiler", provided by the system. When compiling a compiler, however, just the "compiler compiler" is provided: the "program compiler" is built during the execution. A compiler compiler is a special type of compiler, which replaces the original compiler compiler after (re-)building itself. In other words, the compiler compiler coincides with the compiler compiler compiler; this is necessary to avoid infinite regression.

Each block of ICBM code in a program has an associated source code and a compiler used to compile it. This allows to avoid to recompile a block if the compiler has not changed since the last time it has been compiled. A separate document describes this "just too late" compiler in some more detail.

ICBM code is described in detail in the documentation provided with the Perl module Language::INTERCAL::ByteCode. Use the perldoc program to access it, or man if your version of Perl installs prebuilt manpages from POD.

Semantics

The semantics of a compiled object is specified in the CREATE statement after the AS keyword. It consists of a list of mnemonics (separated by intersection symbols). If the template refers to n other templates, the codes generated by these templates will be available in the code by repeating the template identifier (with its own "what" or whatever was used to introduce it) followed by an expression. If there is only one instance of each different template, the expression must be #1, otherwise it indicates which instance to use. For example, considering again the assignment statement from 1972.iacc:
    DO CREATE _2 ?VERB ?LVALUE ,#91 + #91 + #79, ,#95 + #91 + #67,
       ?EXPRESSION AS STO + ?EXPRESSION #1 + ?LVALUE #1
We see that we refer to both the ?EXPRESSION and the ?LVALUE: since there is one of each, these references are followed by #1. Suppose the statement to execute is DO .1 <- #2. Then the ?LVALUE will generate SPO + #1 and the ?EXPRESSION generates #2. So, replacing what needs to be replaced, the assignment produces:
    STO + #2 + SPO + #1
Executing it, we first check whether calculation is to be ABSTAINed FROM; if it is not, we STOre #2 to register SPOt #1. Just as we required.

Example

See the various compiler parsers provided with the distribution for plenty of examples.

These examples should be self-explanatory. If not, reread this document.


Back