This file contains an overview of the design of the compiler.

OUTLINE

The main job of the compiler is to translate Mercury into C, although it can also translate (subsets of) Mercury to some other languages: Goedel, bytecode (for a planned bytecode interpreter), and RL (the Aditi Relational Language).

The top-level of the compiler is in the file mercury_compile.m. The basic design is that compilation is broken into the following stages:

Note that in reality the separation is not quite as simple as that. Although parsing is listed as step 1 and semantic analysis is listed as step 2, the last stage of parsing actually includes some semantic checks. And although optimization is listed as steps 3 and 5, it also occurs in steps 2, 4, and 6. For example, elimination of assignments to dead variables is done in mode analysis; middle-recursion optimization and the use of static constants for ground terms is done in code generation; and a few low-level optimizations are done in llds_out.m as we are spitting out the C code.

In addition, the compiler is actually a multi-targeted compiler with several different back-ends. When you take the different back-ends into account, the structure looks like this:


DETAILED DESIGN

This section describes the role of each module in the compiler. For more information about the design of a particular module, see the documentation at the start of that module's source code.


The action is co-ordinated from mercury_compile.m.

Option handling

The command-line options are defined in the module options.m. mercury_compile.m calls library/getopt.m, passing the predicates defined in options.m as arguments, to parse them. It then invokes handle_options.m to postprocess the option set. The results are stored in the io__state, using the type globals defined in globals.m.


FRONT END

1. Parsing

The result at this stage is the High Level Data Structure, which is defined in four files:

  1. hlds_data.m defines the parts of the HLDS concerned with function symbols, types, insts, modes and determinisms;
  2. hlds_goal.m defines the part of the HLDS concerned with the structure of goals, including the annotations on goals;
  3. hlds_pred.m defines the part of the HLDS concerning predicates and procedures;
  4. hlds_module.m defines the top-level parts of the HLDS, including the type module_info.
The module hlds_out.m contains predicates to dump the HLDS to a file. The module goal_util.m contains predicates for renaming variables in an HLDS goal.

2. Semantic analysis and error checking

Any pass which can report errors or warnings must be part of this stage, so that the compiler does the right thing for options such as `--halt-at-warn' (which turns warnings into errors) and `--error-check-only' (which makes the compiler only compile up to this stage).

implicit quantification
quantification.m handles implicit quantification and computes the set of non-local variables for each sub-goal. It also expands away bi-implication (unlike the expansion of implication and universal quantification, this expansion cannot be done until after quantification). This pass is called from the `transform' predicate in make_hlds.m.
checking typeclass instances (check_typeclass.m)
check_typeclass.m both checks that instance declarations satisfy all the appropriate superclass constraints and performs a source-to-source transformation on the methods methods from the instance declarations. The transformed code is checked for type, mode, uniqueness, purity and determinism correctness by the later passes, which has the effect of checking the correctness of the instance methods themselves (ie. that the instance methods match those expected by the typeclass declaration). During the transformation, pred_ids and proc_ids are assigned to the methods for each instance. In addition, while checking that the superclasses of a class are satisfied by the instance declaration, a set of constraint_proofs are built up for the superclass constraints. These are used by polymorphism.m when generating the base_typeclass_info for the instance.
type checking
assertions
assertion.m is the abstract interface to the assertion table. Currently all the compiler does is type check the assertions and record for each predicate that is used in an assertion, which assertion it is used in. The set up of the assertion table occurs in post_typecheck__finish_assertion.
purity analysis
purity.m is responsible for purity checking, as well as defining the purity type and a few public operations on it. It also calls post_typecheck.m to complete the handling of predicate overloading for cases which typecheck.m is unable to handle, and to check for unbound type variables. Elimination of double negation is also done here; that needs to be done after quantification analysis and before mode analysis.
polymorphism transformation
polymorphism.m handles introduction of type_info arguments for polymorphic predicates and introduction of typeclass_info arguments for typeclass-constrained predicates. This phase needs to come before mode analysis so that mode analysis can properly reorder code involving existential types. (It also needs to come before simplification so that simplify.m's optimization of goals with no output variables doesn't do the wrong thing for goals whose only output is the type_info for an existentially quantified type parameter.)

This phase also converts higher-order predicate terms into lambda expressions, and copies the clauses to the proc_infos in preparation for mode analysis.

The polymorphism.m module also exports some utility routines that are used by other modules. These include some routines for generating code to create type_infos, which are used by simplify.m and magic.m when those modules introduce new calls to polymorphic procedures.

mode analysis
indexing and determinism analysis
checking of unique modes (unique_modes.m)
unique_modes.m checks that non-backtrackable unique modes were not used in a context which might require backtracking. Note that what unique_modes.m does is quite similar to what modes.m does, and unique_modes calls lots of predicates defined in modes.m to do it.
simplification (simplify.m)
simplify.m finds and exploits opportunities for simplifying the internal form of the program, both to optimize the code and to massage the code into a form the code generator will accept. It also warns the programmer about any constructs that are so simple that they should not have been included in the program in the first place. (That's why this pass needs to be part of semantic analysis: because it can report warnings.) simplify.m converts complicated unifications into procedure calls. simplify.m calls common.m which looks for (a) construction unifications that construct a term that is the same as one that already exists, or (b) repeated calls to a predicate with the same inputs, and replaces them with assignment unifications. simplify.m also attempts to partially evaluate calls to builtin procedures if the inputs are all constants (see const_prop.m),

3. High-level transformations

The first pass of this stage does tabling transformations (table_gen.m). This involves the insertion of several calls to tabling predicates defined in mercury_builtin.m and the addition of some scaffolding structure.

The next pass of this stage is a code simplification, namely removal of lambda expressions (lambda.m):

(Is there any good reason why lambda.m comes after table_gen.m?)

The next pass is termination analysis. The various modules involved are:

Most of the remaining HLDS-to-HLDS transformations are optimizations:

The module transform.m contains stuff that is supposed to be useful for high-level optimizations (but which is not yet used).


a. LLDS BACK-END

4a. Code generation.

pre-passes to annotate the HLDS
Before code generation there are a few more passes which annotate the HLDS with information used for code generation:
choosing registers for procedure arguments (arg_info.m)
Currently uses one of two simple algorithms, but we may add other algorithms later.
annotation of goals with liveness information (liveness.m)
This records the birth and death of each variable in the HLDS goal_info.
allocation of stack slots
This is done by live_vars.m, which works out which variables need to be saved on the stack when (trace.m determines what variables are needed for debugging purposes). It then uses graph_colour.m to determine a good allocation of variables to stack slots.
migration of builtins following branched structures
This transformation, which is performed by follow_code.m, improves the results of follow_vars.
allocating the follow vars (follow_vars.m)
Traverses backwards over the HLDS, annotating some goals with information about what locations variables will be needed in next. This allows us to generate more efficient code by putting variables in the right spot directly. This module is not called from mercury_compile.m; it is called from store_alloc.m.
allocating the store map (store_alloc.m)
Annotates each branched goal with variable location information so that we can generate correct code by putting variables in the same spot at the end of each branch.
computing goal paths (goal_path.m)
The goal path of a goal defines its position in the procedure body. This transformation attaches its goal path to every goal, for use by the debugger.
code generation
For code generation itself, the main module is code_gen.m. It handles conjunctions and negations, but calls sub-modules to do most of the other work:

It also calls middle_rec.m to do middle recursion optimization.

The code generation modules make use of

code_info.m
The main data structure for the code generator.
code_exprn.m
This defines the exprn_info type, which is a sub-component of the code_info data structure which holds the information about the contents of registers and the values/locations of variables.
exprn_aux.m
Various preds which use exprn_info.
code_util.m
Some miscellaneous preds used for code generation.
code_aux.m
Some miscellaneous preds which, unlike those in code_util, use code_info.
continuation_info.m
For accurate garbage collection, collects information about each live value after calls, and saves information about procedures.
trace.m
Inserts calls to the runtime debugger.
code generation for `pragma export' declarations (export.m)
This is handled seperately from the other parts of code generation. mercury_compile.m calls the procedures `export__produce_header_file' and `export__get_pragma_exported_procs' to produce C code fragments which declare/define the C functions which are the interface stubs for procedures exported to C.

The result of code generation is the Low Level Data Structure (llds.m). The code for each procedure is generated as a tree of code fragments which is then flattened (tree.m).

5a. Low-level optimization (LLDS).

The various LLDS-to-LLDS optimizations are invoked from optimize.m. They are:

Depending on which optimization flags are enabled, optimize.m may invoke many of these passes multiple times.

Some of the low-level optimization passes use basic_block.m, which defines predicates for converting sequences of instructions to basic block format and back, as well as opt_util.m, which contains miscellaneous predicates for LLDS-to-LLDS optimization.

6a. Output C code


b. MLDS BACK-END

The original LLDS code generator generates very low-level code, since the LLDS was designed to map easily to RISC architectures. We're currently developing a new back-end that generates much higher-level code, suitable for generating Java, high-level C, etc. This back-end uses the Medium Level Data Structure (mlds.m) as its intermediate representation.

4b. MLDS code generation

5b. MLDS transformations

6b. MLDS output

mlds_to_c.m converts MLDS to C/C++ code.


c. Aditi-RL BACK-END

4c. Aditi-RL generation

5c. Aditi-RL optimization

6c. Output Aditi-RL code


d. BYTECODE BACK-END

The Mercury compiler can translate Mercury programs into bytecode for interpretation by a bytecode interpreter. The intent of this is to achieve faster turn-around time during development. However, the bytecode interpreter has not yet been written.


MISCELLANEOUS

builtin_ops:
This module defines the types unary_op and binary_op which are used by several of the different back-ends: bytecode.m, llds.m, and mlds.m.
c_util:
This module defines utility routines useful for generating C code. It is used by both llds_out.m and mlds_to_c.m.
det_util:
This module contains utility predicates needed by the parts of the semantic analyzer and optimizer concerned with determinism.
special_pred.m, unify_proc.m:
These modules contain stuff for handling the special compiler-generated predicates which are generated for each type: unify/2, compare/3, and index/1 (used in the implementation of compare/3).
dependency_graph.m:
This contains predicates to compute the call graph for a module, and to print it out to a file. (The call graph file is used by the profiler.) The call graph may eventually also be used by det_analysis.m, inlining.m, and other parts of the compiler which could benefit from traversing the predicates in a module in a bottom-up or top-down fashion with respect to the call graph.
passes_aux.m
Contains code to write progress messages, and higher-order code to traverse all the predicates defined in the current module and do something with each one.
opt_debug.m:
Utility routines for debugging the LLDS-to-LLDS optimizations.
error_util.m:
Utility routines for printing nicely formatted error messages.


CURRENTLY USELESS

The following modules do not serve any function at the moment. Some of them are obsolete; other are work-in-progress. (For some of them its hard to say which!)

excess.m:
This eliminates assignments that merely introduce another name for an already existing variable. The functionality of this module has been included in simplify.m, however sometime in the future it may be necessary to provide a version which maintains superhomogeneous form.
lco.m:
This finds predicates whose implementations would benefit from last call optimization modulo constructor application. It does not apply the optimization and will not until the mode system is capable of expressing definite aliasing.
mercury_to_goedel.m:
This converts from item_list to Goedel source code. It works for simple programs, but doesn't handle various Mercury constructs such as lambda expressions, higher-order predicates, and functor overloading.


Last update was $Date: 1999/12/02 05:48:24 $ by $Author: fjh $@cs.mu.oz.au.