= Compiler Design Notes = Long-term plan is a series of compilers, as we learn more about Fortress and as potential targets evolve. Here, we describe short/medium-term plans to build the first Fortress compiler. == Assumptions == ~~We may not be able to get rid of the interpreter completely.~~ The interpreter does not interoperate with compiled code. We may not be able to compile 100% of the code ahead of time. This has proven to be the case; for closures, and various arrow types, we generate code on the fly. We're using (and will continue to use) [http://asm.objectweb.org/index.html ASM]. == Preliminary work == ~~There's a portion of the "interpreter" that is really a static analysis pass. This needs to be made into a true static analysis pass (this will improve startup time, too, even for the interpreter). There may be some entanglements into the later parts of the interpreter; that will become meta-data, saved to disk.~~ (done; information is cached, recomputed only if necessary, using "modern" visitors). == Front end / phase structure == (Historical and debugging notes) The driver for the Fortress compiler is complicated, mostly because of changing requirements. One goal has been to make it possible to run Fortress programs as if it were a scripting language -- that is, to do demand-driven parsing, analysis, and code generation. However, it complicates things to expose this demand-driven behavior to all the analyses, so the front-end handles it by doing a closure over imports. Incorporating syntax extensions into this made everything much more delicate. Fortress-the-language includes the notion of repositories, which are intended to formalize linking and make it possible to work with multiple versions of components. These are not yet implemented, but it is a goal. The current system includes a step in that direction, but it is not yet done. FortressRepositoryInternals == Interface to native code == See NativeInterfaceCodeGeneration. This has turned into a higher-priority task because * it has relatively few prerequisites * it will quickly increase the usefulness of Fortress * we'll figure things out that can be used in other code generation steps. == Traits == For each generic-erased trait, there is: * a corresponding Java interface (this can be used to simplify dispatch and type tests) * a corresponding Java class and Java object (in which information relevant to the trait can be stored) * (for generics) a set of Java objects, one for each instantiation of the generic. == Code-generate objects == Objects are almost "just environments", perhaps this will not be too hard. Objects will implement Environment (for interpretive name lookup, should it occur) and will also implement FObject (which needs to be converted into an interface, instead of an abstract class). The tricky cases are ~~object expressions~~, generic objects, singleton objects, and singleton generic objects. Object expressions will be dealt with via closure conversion. == Code-generate functions == This has to work for both functions and for generic functions; too much of our library is generic. The best representation for a top-level, first-order function should be a Java static method. A closure (and hence, a function value) has to be an object. Any such closure will have an apply method. More details: closures of top-level functions are generated on-demand by reference to an encoded class name. The encoding is ''API'' "." mangle(''name'' "✉" ''type'') where ''API'' is the api name, ''name'' is the top-level function or opr name, and ''type'' is the fortress type of the closure (an arrow type, Arrow⟦''domain'',''range''⟧). Ond-demand generation is used to cut down on the creation of unused classes (each closure results in a class). For generic functions, each instantiated top-level generic function will result in a class, so it might as well do double-duty as a closure. ''Question: is it really API.whatever, or is it API$whatever (i.e., subpackage, or inner class)?'' The proposed encoding of a generic function is a class named ''API'' "." mangle("⚙" ''name'' ''parameters'' "✉" ''type'') where ''parameters'' is a sequence of tokens representing whether the parameter is a type (T) , opr (O) , nat (N) , int (I) , or bool (B) . An instantiated reference to a generic function replaces the parameter tokens with their instantiated values, and also rewrites the closure type as necessary. That class will have an instance method "apply" (because it is a closure) and a static method "staticApply" containing the body of the top-level function. Within the body of the generic, reference to static parameters will use the token and parameter index, for example ... == Dispatch == There are several ways to do dispatch: * current: the interpreter takes care of it, using matching and type comparison, finally choosing a closure and sticking it in a cache. * If-tree encoded: A tree of instanceof, generic parameter recovery. This is slightly complicated by Arrow and Tuple types; because of contravariance, there is no compile-time bound on the number/depth of supertypes of an Arrow type, but encoding into Java types requires exactly such a bound. Tuple types have a bound, but the number of (transitive) super types is potentially exponential in number of members of the tuple. For Arrows, the test is done by parts, using contravariance to test; for Tuples, the test is done iteratively by elements. * Java-dispatch-encoded: This is too small to describe precisely here, but in much the same way that the multiple dispatch of the visitor pattern can be encoded into a series of single dispatches, Fortress's multiple dispatch can be encoded into a series of single dispatches. Initially, this presents a "slow path" to the compiled code, since it will use some combination of boxed values and type representations to perform the dispatch. * Lookup-and-call: This uses a series of type representations to lookup the appropriate closure to call, and the closure is returned. Optionally, the closure is cast to a type that takes unboxed representations in its apply method. The caller then invokes the closure. This is perhaps more friendly to optimization (hoisting of invariant dispatch decisions, for example) but in the short run may not perform as well as leveraging Java's dispatch. The current implementation uses an if-tree encoding, and some of this will always be necessary for those cases where Java instanceof (which has a dispatch equivalent) is not the correct test. APIs present several interesting problems for overloading: * because an overloaded function may have the same signature as one of the functions that it overloads, there has to be a way to mangle one of the names out of the way; * because an overloaded implementation is not necessarily evident at the API level (the type of the overloaded function could be a simple Fortress type), (exported) overloaded functions within components; * overload dispatch can occur twice; once within the component, and again within the API (and if the chosen overloaded function happens to be a functional method, one more time after that). Flattening it into a single dispatch is not trivial; because component-level overloadings behind the API may not be visible, it is possible that a naively combined overloading would not satisfy the meet rule; that is, there would be ambiguities, and dispatch order would matter. ASIF dispatching is an interesting special case. It separates into several simpler cases: * ASIF of named function -- compile this to a reference to an encoded name (as we do for closures) that will trigger subsequent code generation. One way to encode the necessary information for the tailored dispatch, is simply to use the input bytecodes, assuming that we can recognize a reference to an overloaded function. * ASIF of method -- these are named, use a similar trick? * ASIF of function value -- these are not named; therefore, anytime a closure is generated, if it is a closure for an overloaded function (how do we know this at runtime?) it must include a value/type pairs entry point. == Unboxing ==