Guide to the Interpreter Source Code

The Fortress interpreter is written in Java.

It can be run from ant, from the command line, or as junit tests. The plan is to, as much as possible, allow Fortress programs to run as if they were interpreted (without explicit compile, link, and run steps that a user can do wrong), and perform compiler-like operations "behind the curtain". This will change over time as the implementation acquires platform-specific and application-specific optimizations -- the caches will become more like the "Fortresses" in the original language plan, and will acquire operations more interesting than just get, put, and is-up-to-date. Explain in more detail

AST nodes

The Fortress AST nodes are automatically generated by the ASTGen tool from the AST node hierarchy description. Among other language constructs, Fortress expressions are represented as subclasses of the abstract class Expr. The Expr node hierarchy is particularly interesting because most of the Fortress type checking deals with the subclasses of Expr. Some of the nodes are created internally and some of the nodes are desugared away after a particular phase. Detailed information about the Expr node hierarchy is available.

Phase structure

Preparing to run

The main entry point of the interpreter is Shell. This is normally invoked from a shell script fortress ( fortress.bat) that chooses sensible defaults for classpaths and VM options.

The front-end uses repository-like caches to avoid the cost of reparsing, re-syntax-expanding, and re-checking source files over and over again. The caches themselves manage rebuilding, as needed. Once all components and APIs are discovered and up to date, the Shell invokes Driver to manage the various phases of the interpreter.

Directories full of Fortress programs can be compiled for testing purposes. Wrappers (see SystemJUTests or DemoTests) invoke FileTests to accomplish this. When FileTests runs a Fortress program, success is defined as the absence of Java exceptions from the interpreter, and the absence of "FAIL" in the program output, unless the name of the test begins with "XXX" in which case failure and success are swapped (that is, "XXXFoo.fss" is expected to fail in some way.). It is a convention within the ant script build.xml that any class whose name ends in JUTest or JUTests is run as a target of "ant test"; therefore the SystemJUTests are part of a normal test run, but the DemoTests are not.

Syntax Expansion

Syntax expansion can be applied to components, and depends on syntax definitions found in APIs. The syntax expansion package is called to obtain component ASTs. It performs a partial parse of the source to determine the imports, and then consults the imports (which are expected to contain checked and disambiguated AST) for any applicable syntax expansions. If there are any, then it generates a modified grammar, runs Rats, compiles the resulting parser, and then loads the new parsing classes for the full parse of the source code.

more here

Parsing

The interpreter uses a packrat parser generator called Rats! developed by Robert Grimm. Although Rats! provides AST utilities, we are not using those, and are instead using our own. All AST nodes include line+column begin and end locations, and we try hard to preserve those and use them appropriately when reporting errors.

We chose a packrat parser generator because it works better than all the others that we tried, which included LL(k) (JavaCC), LALR(1) (Javacup), hand-written recursive descent, and GLR(1) (Elkhound). Elkhound was capable of expressing the Fortress language, but sometimes was very slow.

The Fortress grammar implemented in Rats is very close to the Fortress grammar described in the Fortress language specification.

Checking and disambiguation

We are moving more and more semantics checks into a static pass, aiming for an organization more like what would be found in a conventional compiler. The static analysis also performs a first disambiguating rewrite of the input AST, determining which instances of a.b.c are API member references, and which are object member references.

At the moment, our current progress on the type-checker is described on the TypeCheckerProgress page.

The expanded, parsed, checked, disambiguated version of the AST is what is written into the cache.

AST rewriting

Before the interpreter can run, each component's AST is processed ("desugared") into a simpler form, the component's initial environment is populated with names, and those things that the component exports are copied into the environments of the exported APIs. Desugaring is important because it rewrites some AST nodes into simpler forms, and also makes name references across object scopes more explicit.

Initializing top-level environments

Top-level (component and API) environments are initialized in "four and a half" passes.

  1. The first pass populates the environment with bindings from names to uninitialized things (values, mutables, types, and functions). The entities bound to types and functions require further initialization in later passes; immutable cells are initialized with lazy thunks; mutable cells are initialized with empty references.
    The first-and-a-half-pass gathers names form imported APIs and injects them into the importing component's environment.
  2. The second pass initializes types.
  3. The third pass initializes functions, which refer to types.
  4. The fourth pass ensures that all mutable and immutable variables are initialized. Immutable cells' thunks are forced and their link is snapped; mutable cells have their initial values computed in order.

All components' environments run pass N before any environment runs pass N+1.

Once an environment is populated, no more names may be added to it (this is not yet enforced because of legacy bugs from an older design, but that is the plan), but the environment may be copied and extended. Environments are implemented with applicative trees, so that copying is a cheap constant-time operation, and environment extension is a log(number-of-names) operation. This data structure was chosen because (1) it supports scalable unsynchronized reading, and (2) when benchmarked on a uniprocessor (where locking is cheap) it was faster than an obvious implementation build from stacks of hash tables.

It would be possible to avoid this strict structuring into four passes if the various Fortress entities were initialized-on-demand, but we plan to eventually write a compiler, and think that this will help us understand the compiler better. We think it also helps with error reporting, but this might not really be so.

Evaluation

Driver looks up the specified main entry point in the initialized environment of the main component, and invokes it.

Run time entities

These are roughly divided into types and values. Values in turn comprise objects and functions. Functions and types may be generic, though generic types are not exactly types and generic functions are not exactly functions. This distinction is muddied somewhat by the lack of type inference, which forces us to sometimes treat generic functions as if they were values.

The "Fcn" hierarchy

Suggested change: rename PartiallyDefinedMethod to TraitMethod

Other values

Environment builders

Types

The types are somewhat in flux, as more and more of the "primitive" types are replaced by instances of traits, except where the constraints of practical implementation forbid it. For example, FTypeArray will eventually vanish because arrays are implemented in libraries.

TO DO

There's no discussion of the evaluator visitor at all.

There's no discussion of structure of phases as visitors over ASTs.

We need a description of each package and what's contained in it, as well as other important directories. Separate page

We should discuss the various Ant targets. Probably on a separate page

There's no discussion of parallel execution and how we do it.

There's no discussion of what libraries we rely on (dstm2, etc.)

We should mention FortressLibrary.fss and how it hooks in (as well as other libraries).

We could add some discussion of how to create your own visitor.

If there are abstract facilities in place for transforming ASTs, we should mention them.