Phased Logic (PL) is a self-timed design methodology that provides an automated translation of a clocked system in the form of D-flip-flops and combinational gates to a self-timed netlist of PL gates. The only global net in the self-timed netlist is a reset signal. The PL netlist is a micropipelined system with two-phase control. Two distinct implementation technologies are supported, fine-grain and coarse-grain.
The fine-grain approach uses a one-to-one mapping of gates in the clocked system to PL gates that use a 4-input Lookup-Table (LUT4) as the logic element with delay-insensitive dual-rail routing between gates. This technology could form the basis for the implementation of a self-timed FGPA. Because all routing between gates is delay-insensitive, there are no timing mechanisms external to a PL gate that can cause a failure due to timing.
The coarse-grain approach maps groups of gates in the clocked netlist to the combinational compute function of a PL block, with bundled data signaling used between blocks. The combinational compute function of a coarse-grain PL block can be implemented using a traditional standard cell library. The coarse-grain technology is an ASIC approach to the implementation of PL systems. All timing concerns in a coarse-grain implementation are block-to-block; there are no global mechanisms that can cause failure due to timing.
Both fine-grain and coarse-grain approaches support a speedup mechanism known as early evaluation that can allow the PL system to outperform the clocked system. All micropipeline approaches suffer a performance degradation compared to clocked systems because the output latch latency of a micropipeline block is in the critical path. Early evaluation allows PL systems to overcome this performance penalty. Simulations on fine-grain and coarse-grain netlists of a MIPs-compatible 5-stage pipelined CPU mapped to a LUT4 technology indicate a speedup of over 35% compared to the clocked system, using no timing margin assumptions in the coarse-grain version. The same CPU mapped to a commercial standard cell library in an 0.13u process via the coarse-grain approach shows about a 10% speedup using a generous timing margin for the matched control/data delays. A fine-grain netlist of the Sun Microsystems picojava-II Floating Point Unit shows a 20% speedup over the clocked implementation. Design efforts ongoing as of Fall 2003 include a coarse-grain version of the picoJava-II FPU, and coarse-grain/fine-grain mappings of the LEON2 CPU (a Sparc-compatible CPU).
The PL approach does not require designers to learn a new design methodology as traditional synthesis tool flows for clocked systems can be used to create the netlist that is the starting point for the clocked-to-clockless transformation.