Compiler Design Notes

Long-term plan is a series of compilers, as we learn more about Fortress and as potential targets evolve. Here, we describe short/medium-term plans to build the first Fortress compiler.

Assumptions

We may not be able to get rid of the interpreter completely.

We may not be able to compile 100% of the code ahead of time.

Plan to generate Java, that we then compile to bytecodes. Don't depend on fancy bytecode/classloader hacks until we see a clear need (source code is easier to read, debug, and instrument).

In all interpreter-compiler interactions, we make the choice that maximizes the performance of compiled code. We may need to generate extra, slower, code for the interpreter's general-purpose view of the world.

The advantages of this approach are that we will not take a backwards step in language support, and we will have an ample testing framework for the compiler as it is developed. It also reduces the risk of being unable to implement the entire language if the interpreter is necessary.

The disadvantage of this approach is that we will pay an "interpreter tax" in our work.

Preliminary work

There's a portion of the "interpreter" that is really a static analysis pass. This needs to be made into a true static analysis pass (this will improve startup time, too, even for the interpreter). There may be some entanglements into the later parts of the interpreter; that will become meta-data, saved to disk.

That static analysis pass in the interpreter can "see" the difference between top level references, local references, and member references, but it throws that information away. We need to encode that information in rewriting into subtypes of VarRef so that the compiler will know where to look (the interpreter's environments are unstructured).

All references to BetterEnv should be replaced with references to an appropriate interface. (done)

Augment the interface so that it allows expression of access-intent -- get from the top level, get from the Nth-deep scope, that sort of thing. The interface will probably need to include factory methods for creating nested environments of various types.

Clean up environment management within Objects and ObjectExpressions (right now, Objects have an excess environment that they do not need).

There are some AdditionalInterpreterEnvironmentIssues that need to be resolved.

Code-generate environments

For various environments (top-level, trait, object, lexical) create a customized class whose members are the names defined in the environment. The class will implement the environment interface, so that it can be a drop-in replacement for a populated environment. Notice that object "constructors" will need to reflectively invoke the Java class constructor for the corresponding environment object.

Traits

For each generic-erased trait, there may be:

  • a corresponding Java interface (this can be used to simplify dispatch and type tests)
  • a corresponding Java class and Java object (in which information relevant to the trait can be stored)
  • (for generics) a set of Java objects, one for each instantiation of the generic.

Code-generate objects

Objects are almost "just environments", perhaps this will not be too hard. The tricky cases are object expressions, generic objects, singleton objects, and singleton generic objects.

Code-generate functions

This has to work for both functions and for generic functions; too much of our library is generic. The best representation for a top-level, first-order function should be a Java static method. A closure (and hence, a function value) has to be an object. Any such closure will have interpreter-path and compiler-path apply methods, where the compiler path may incorporate unboxing optimizations.

Dispatch

There are several ways to do dispatch:

  • current: the interpreter takes care of it, using matching and type comparison, finally choosing a closure and sticking it in a cache.
  • If-tree encoded: A tree of instanceof, generic parameter recovery.
  • Java-dispatch-encoded: This is too small to describe precisely here, but in much the same way that the multiple dispatch of the visitor pattern can be encoded into a series of single dispatches, Fortress's multiple dispatch can be encoded into a series of single dispatches. Initially, this presents a "slow path" to the compiled code, since it will use some combination of boxed values and type representations to perform the dispatch.
  • Lookup-and-call: This uses a series of type representations to lookup the appropriate closure to call, and the closure is returned. Optionally, the closure is cast to a type that takes unboxed representations in its apply method. The caller then invokes the closure. This is perhaps more friendly to optimization (hoisting of invariant dispatch decisions, for example) but in the short run may not perform as well as leveraging Java's dispatch.

Unboxing