). This model was designed using System Generator 14.7 and MATLAB R2012b.
Figure 1: System Generator reference example
This model essentially measures the power of a complex input signal over 1000 samples. The implemented equation is provided here for reference (with N = 1000):
The first processing stage is a down-sampling by 2 filter. This filtering enables the signal to be limited to the wanted bandwidth and down-samples the filtered signal by two.
The second processing stage is the power measurement itself. The accumulator is first reset to zero at the start each valid signal (rising edge). Then, for a period of time determined by the valid signal, the square of the samples are accumulated. Finally, the result is divided by 1000 (multiplied by 1/1000) to provide the power measurement. A valid output signal is generated when the power has been calculated.
Our model highlights an important aspect of System Generator (more specifically, of MATLAB Simulink): the sample time. For our convenience, let’s define the sample time of a block as the frequency at which the block will run. Please refer to  for more information on the sample time. In our example, the logic following the down-sampling by 2 (in green) runs two times slower than the logic before the down-sampling (in red).
We previously stated that the timing constraints come from the fact that the propagation delay must be smaller than the clock period. This is true, unless you tell the synthesis tool otherwise. When a different sample time is used, the constraints are replaced by multi-cycle path ones. In our example, a multi-cycle path constraint of two cycles will be generated for the green logic. This is very important to take note of since logic with longer sample times has more time to complete and can then more easily meet the timing requirements.
Identify the failing paths
The first step to resolve timing issues is to identify the failing paths. Figure 2 presents a piece of the timing summary for the first failing path (i.e. the path with the longest delay).
Figure 2: Timing constraint first failing path
Some observations can be made from this report:
- The failing requirement is 10 ns (2*5 ns), so the multi-cycle path constraint of two is correctly applied.
- The propagation delay (data path delay) is 16.841 ns and is composed of 22 logic elements.
- The failing path is identified from its source to its destination. It is from the FIR output to the register labelled Delay2.
- The clock path skew and the clock uncertainty are negligible.
Add registers to cut the critical paths
Most of the timing issues can be resolved with a simple trick: adding registers (delay blocks) to cut the critical paths (longest paths). This is simple and sufficient most of the time. In the previous blog post , we described critical paths in terms of their basic elements. Each logic block brings its own delay and there is an additional delay to reach the next logic block. We have 22 successive logic elements in our example! By adding registers we can reduce significantly this number. It is also a good design practice to register all the input and output signals for each module. This limits potentially long processing chains when connecting different modules together.
Figure 3 shows the new design with the delay insertions. Note that additional delays must also be added on the non critical paths in order to match the delays. You can observe the matching delays on the valid signal.
Figure 3: Model with inserted delays
After rebuilding our model with the new design, we now get the following timing errors:
Figure 4: Timing errors
Most of the previously failing paths have been resolved, but there are still some remaining where the longest path goes from Delay7 to Delay10 through the multiplications.
Set Xilinx block optimization options
Adding registers to cut the critical paths works the same way for most of the Xilinx blocks. While most of the Xilinx blocks have at least a delay parameter, some have more sophisticated options for timing. For example, you can choose either speed or area for the multiplication optimization parameter. The resolution of the operations also plays a significant role as it directly affects the logic (and often the required number of consecutive logic blocks).
In our example, we added a delay of two within the multiplications and a delay of one for the constant multiplication. Now, we no longer get timing issues.
Figure 5: Model with delays within the multiplications
Resolving timing issues summary
You can resolve most System Generator timing issues by following two simple rules:
- Add registers at the inputs and outputs of all modules and where the data path performs too much consecutive logic.
- Set the Xilinx block timing options.
Unfortunately, sometimes these tricks are not enough. In our next blog post, we’ll look at some more advanced tricks.