Implicit Parallelism in Fortress
In Fortress, (potential) parallelism is the default. Expressions within a tuple, or multiple operands to an operator, may be evaluated in parallel. Generators for data structures, and hence, loops, reductions, and summations, may be evaluated in parallel.
This parallelism is potential, not required; if only one processor is available, then evaluation will be serial. This potential parallelism is managed using work-stealing, which was developed for the Cilk programming language at MIT. Informally, whenever a thread encounters potential parallelism, it splits it roughly in half, sets one half aside on a stack/queue of future work, and starts work on the other half. Any thread that runs out of work to do may "steal" work that was set aside by another. The crucial trick is that work is stolen from the LIFO (queue) end of the work/stealing queue; this both reduces the need for synchronization betweeen threads, and ensures that the largest, oldest, and stalest tasks are what is stolen. This same trick also tends to be beneficial for memory locality. Arora, Blumofe, and Plaxton wrote the seminal paper describing the work-stealing queue and its use. As part of our work, we've extended this data structure (pdf) to have no particular bound on its size.
