Phased Logic Animations



Your browser must be equipped with a Macromedia Flash Player to view these animations ( Macromedia Download Site , step forward and step backward on animations may not work in all browsers).

  1. Basic Token Flow. Token flow in small netlist . The green net is a feedback wire that had to be added to ensure safety and liveness in the new netlist. The old netlist had 2 DFFs and 2 Combinational gates.

  2. Loop Delay Averaging. An unbalanced clocked pipeline with a clock cycle equal to 4 gate delays (DFF delay + DFF setup = 1 gate delay). Obviously the goal in a clocked pipeline is to balance the delays of pipeline stages, but sometimes that is not always possible. The same pipeline in Phased logic , note that the average throughput of this pipeline is 3 gate delays per token, the same as if the clocked pipeline had been perfectly balanced. This illustrates the loop averaging aspect of self-timed loops, in the average performance of PL system can be better than the worst case. This means that in PL systems, designers do not have to worry as much about balancing path delays as loop averaging can help overall performance.

  3. Token Buffering for Peformance. This unbuffered two-stage pipeline averages 4 gate delays per token. However, its performance is limited because of inadequate buffering on the lower barrier gate which fans out to two gates in the top pipeline stage. This problem is fixed in this buffered two-stage pipeline . Note that a gate was added in on the output of the lower barrier gate -- this is just a buffer function, and adds no new functionality. It is only added to improve throughput.

  4. Seven computation waves through a ripple structure with 1 input stage, 1 output stage. This illustrates bit-level dataflow in a PL system. PL systems only have to be word synchronized at the input/output boundaries -- within PL systems data flows at a bit or nybble level.

  5. Seven computation waves through a ripple structure with multiple input/output stages. Notice that input/output stages bracketing the ripple structure can improve throughput.

  6. Six computation waves through a netlist. The average throughput of this system is (6+2)/2 = 4 gate delays.

  7. Six computation waves through the same netlist with one extra gate added to shortest delay path. The average throughput of this system is (4+2+2)/3 = 2.7 gate delays. Bit-level data flow takes advantage of whatever parallelism is available.

  8. Control for a Coarse-Grained PL System . PL 'wrappers' can also be placed around large blocks of logic to form a coarse-grained system. This shows the control required for a coarse-grained system that implements a [1x4][4x4]=[1x4] matrix multiply. Each block of gates has two tokens inside of it because each block actually forms two gates as far as the PL control is concerned. Because each block of gates have DFFs in it, this makes the block a 'barrier' gate in terms of PL control. There cannot be feedback signals between barrier gates, so the required splitter gates that seperate the barrier gates are integrated into the PL 'wrapper' that is placed around the block of gates.

  9. Non-gated PL Operation versus Gated PL Operation . This illustrates recent work on halting token circulation within portions of a PL netlist to reduce power consumption and improve performance. Special interface gates must be placed on the boundaries of the gated portion of the netlist in order to halt token circulation. If the halted portion of the netlist is on the critcal path, then this can improve system performance.