Compiler Design Notes
Long-term plan is a series of compilers, as we learn more about Fortress and as potential targets evolve. Here, we describe short/medium-term plans to build the first Fortress compiler.
Assumptions
We may not be able to get rid of the interpreter completely.
We may not be able to compile 100% of the code ahead of time.
Plan to generate Java, that we then compile to bytecodes. Don't depend on fancy bytecode/classloader hacks until we see a clear need (source code is easier to read, debug, and instrument). Nope, we're using (and will continue to use) ASM.
In all interpreter-compiler interactions, we make the choice that maximizes the performance of compiled code. We may need to generate extra, slower, code for the interpreter's general-purpose view of the world.
The advantages of this approach are that we will not take a backwards step in language support, and we will have an ample testing framework for the compiler as it is developed. It also reduces the risk of being unable to implement the entire language if the interpreter is necessary.
The disadvantage of this approach is that we will pay an "interpreter tax" in our work.
Preliminary work
There's a portion of the "interpreter" that is really a static analysis pass. This needs to be made into a true static analysis pass (this will improve startup time, too, even for the interpreter). There may be some entanglements into the later parts of the interpreter; that will become meta-data, saved to disk. (done; information is cached, recomputed only if necessary, using "modern" visitors).
That static analysis pass in the interpreter can "see" the difference between top level references, local references, and member references, but it throws that information away. We need to encode that information in rewriting into subtypes of VarRef so that the compiler will know where to look (the interpreter's environments are unstructured).
All references to BetterEnv should be replaced with references to an appropriate interface. (done)
Augment the interface so that it allows expression of access-intent -- get from the top level, get from the Nth-deep scope, that sort of thing. The interface will probably need to include factory methods for creating nested environments of various types.
Clean up environment management within Objects and ObjectExpressions (right now, Objects have an excess environment that they do not need). IN PROGRESS -- ObjectExpressions are being closure-converted to the top level
There are some AdditionalInterpreterEnvironmentIssues that need to be resolved.
Front end / phase structure
(Historical and debugging notes) The driver for the Fortress compiler is complicated, mostly because of changing requirements. One goal has been to make it possible to run Fortress programs as if it were a scripting language -- that is, to do demand-driven parsing, analysis, and code generation. However, it complicates things to expose this demand-driven behavior to all the analyses, so the front-end handles it by doing a closure over imports. Incorporating syntax extensions into this made everything much more delicate.
Fortress-the-language includes the notion of repositories, which are intended to formalize linking and make it possible to work with multiple versions of components. These are not yet implemented, but it is a goal. The current system includes a step in that direction, but it is not yet done.
Interface to native code
See NativeInterfaceCodeGeneration. This has turned into a higher-priority task because
- it has relatively few prerequisites
- it will quickly increase the usefulness of Fortress
- we'll figure things out that can be used in other code generation steps.
Code-generate environments
For various environments (top-level, trait, object, lexical) create a customized struct-like class whose members are the names defined in the environment. The class will implement the environment interface, so that it can be a drop-in replacement for a populated environment. Notice that object "constructors" will need to reflectively invoke the Java class constructor for the corresponding environment object.
Implementation of the environment interface is currently accomplish (for the top-level environment) using a code-generated binary search on the sorted hashCodes of the string names of the fields. For objects, the environment needs to be slightly extended; because Java lacks the notion of an "environment pointer", the environment interface needs additional methods to permit access of the the form "call the method named 'foo', using this parameter list".
Traits
For each generic-erased trait, there may be:
- a corresponding Java interface (this can be used to simplify dispatch and type tests)
- a corresponding Java class and Java object (in which information relevant to the trait can be stored)
- (for generics) a set of Java objects, one for each instantiation of the generic.
Code-generate objects
Objects are almost "just environments", perhaps this will not be too hard. Objects will implement Environment (for interpretive name lookup, should it occur) and will also implement FObject (which needs to be converted into an interface, instead of an abstract class).
The tricky cases are object expressions, generic objects, singleton objects, and singleton generic objects. Object expressions will be dealt with via closure conversion.
Code-generate functions
This has to work for both functions and for generic functions; too much of our library is generic. The best representation for a top-level, first-order function should be a Java static method. A closure (and hence, a function value) has to be an object. Any such closure will have interpreter-path and compiler-path apply methods, where the compiler path may incorporate unboxing optimizations.
Dispatch
There are several ways to do dispatch:
- current: the interpreter takes care of it, using matching and type comparison, finally choosing a closure and sticking it in a cache.
- If-tree encoded: A tree of instanceof, generic parameter recovery.
- Java-dispatch-encoded: This is too small to describe precisely here, but in much the same way that the multiple dispatch of the visitor pattern can be encoded into a series of single dispatches, Fortress's multiple dispatch can be encoded into a series of single dispatches. Initially, this presents a "slow path" to the compiled code, since it will use some combination of boxed values and type representations to perform the dispatch.
- Lookup-and-call: This uses a series of type representations to lookup the appropriate closure to call, and the closure is returned. Optionally, the closure is cast to a type that takes unboxed representations in its apply method. The caller then invokes the closure. This is perhaps more friendly to optimization (hoisting of invariant dispatch decisions, for example) but in the short run may not perform as well as leveraging Java's dispatch.
