Phased Logic Animations
Your browser must be equipped with a Macromedia Flash Player to view
these animations (
Macromedia Download Site , step forward and step backward on
animations may not work in all browsers).
- Basic Token Flow. Token flow in small netlist . The
green net is a feedback wire that had to be added to ensure
safety and liveness in the new netlist. The old netlist had 2
DFFs and 2 Combinational gates.
- Loop Delay Averaging. An unbalanced clocked pipeline
with a clock cycle equal to 4 gate delays (DFF delay + DFF setup
= 1 gate delay). Obviously the goal in a clocked pipeline is to
balance the delays of pipeline stages, but sometimes that is not
always possible. The same pipeline in
Phased logic , note that the average throughput of this
pipeline is 3 gate delays per token, the same as if the clocked
pipeline had been perfectly balanced. This illustrates the loop
averaging aspect of self-timed loops, in the average performance
of PL system can be better than the worst case. This means
that in PL systems, designers do not have to worry as much about
balancing path delays as loop averaging can help overall
performance.
- Token Buffering for Peformance.
This unbuffered two-stage pipeline
averages 4 gate delays per token. However, its performance is
limited because of inadequate buffering on the lower barrier
gate which fans out to two gates in the top pipeline stage. This
problem is fixed in this buffered
two-stage pipeline . Note that a gate was added in on the
output of the lower barrier gate -- this is just a buffer
function, and adds no new functionality. It is only added to
improve throughput.
- Seven computation waves through a ripple
structure with 1 input stage, 1 output stage. This
illustrates bit-level dataflow in a PL system. PL systems only
have to be word synchronized at the input/output boundaries --
within PL systems data flows at a bit or nybble level.
- Seven computation waves through a ripple
structure with multiple input/output stages. Notice that
input/output stages bracketing the ripple structure can improve
throughput.
- Six computation waves through a
netlist.
The average throughput of this
system is (6+2)/2 = 4 gate delays.
- Six computation waves through the same netlist
with one extra gate added to shortest delay path.
The average throughput of this system is (4+2+2)/3 = 2.7 gate
delays. Bit-level data flow takes advantage of whatever
parallelism is available.
- Control for a Coarse-Grained PL System
. PL 'wrappers' can also be placed around large blocks of
logic to form a coarse-grained system. This shows the control
required for a coarse-grained system that implements a
[1x4][4x4]=[1x4] matrix multiply. Each block of gates has two
tokens inside of it because each block actually forms two gates
as far as the PL control is concerned. Because each block of gates
have DFFs in it, this makes the block a 'barrier' gate in terms of
PL control. There cannot be feedback signals between barrier gates, so
the required splitter gates that seperate the barrier gates are
integrated into the PL 'wrapper' that is placed around the block
of gates.
- Non-gated PL Operation versus
Gated PL Operation . This
illustrates recent work on halting token circulation within
portions of a PL netlist to reduce power consumption and improve
performance. Special interface gates must be placed on the
boundaries of the gated portion of the netlist in order to halt
token circulation. If the halted portion of the netlist is on
the critcal path, then this can improve system performance.