In this series:
- The soft processor in the Perseus reference designs – Part 1: Benefits
- The soft processor in the Perseus reference designs – Part 2: Resource usage
- The soft processor in the Perseus reference designs – Part 4: No MicroBlaze at all
As discussed in the previous blog posts, processors (or soft processors in our case) bring a lot of advantages to FPGA designs, especially for handling sequential, less-demanding tasks. Unfortunately, the inclusion of soft processors is not free – their main drawback is their resources usage. In this blog post I discuss different trade-offs or optimizations that can be done without too much effort in order to reduce the FPGA logic consumed by the soft processor sub-system.
First, I look at the distribution of FPGA resources, using the Perseus reference design presented in my previous post as a reference. The logic distribution is useful when precise FPGA resources need to be reduced. Next, I look at different design and MicroBlaze-level optimizations.
Figure 1 shows each FPGA resource usage described in the previous blog post as a pie chart. The interesting FPGA logic is the Block RAMs (BRAMs), the slices, and the DSP48. Note that even if the MicroBlaze only uses 6 DSP48, it constitutes 100% of the one consumed by the reference soft processor design. Another important observation is that most of the BRAMs and the slices are consumed by the MicroBlaze and Ethernet logic. So, to considerably reduce the logic, we must concentrate our efforts on these modules.
Figure 1: MicroBlaze sub-system resources distribution
The Perseus BSDK reference design is an open Xilinx EDK project, meaning that users can use it as a starting point and modify it to better fit their requirements.
Design-level optimizations are modifications that change the architecture of the overall design (affecting more than one module). The following modifications discuss a few optimizations that might be good trade-offs, depending on the user’s requirements.
- No Ethernet (only serial port or PCIe ) – Ethernet connectivity is used for two main reasons: data exchange and to control and monitor the FPGA logic and surrounding devices. For applications already using the PCIe link, all Ethernet activities can be redirected through the PCIe link, assuming the hardware architecture allows it. This is fully supported by the BSDK itself. For applications where no data needs to be exchanged, control and the monitoring could be done via the serial port. In this case, the user would code his or her own MicroBlaze application to support the serial link.
- No OS (DDR3 and FLASH controllers removed) – The actual solution supports Peta Linux. For fully embedded and standalone applications, significant pieces of the design can be removed. A boot loader and Linux are contained in external Flash. Without Linux, the flash controller might be transparently removed from the design. The boot loader loads Linux into a dedicated SDRAM (DDR3). For small to medium program sizes, this external memory might not be required, with the FPGA embedded BRAM being sufficient.
If no design-level optimizations can be performed or they do not free up enough resources, MicroBlaze-level optimizations may be a good option. The global objective is to scale down the MicroBlaze as much as possible without falling under the required performances. The constraints might be, for example, an initialization time limit or a maximum delay to respond to a particular event. With optimizations, we often think first about frequency, where reducing the frequency not only decreases the performance but also reduces timing issues and power consumption. But here we are talking about playing with the MicroBlaze architecture itself!
The Xilinx MicroBlaze soft core processor is highly configurable, letting you select a specific set of features required by your design.
The fixed feature set of the processor includes:
- Thirty-two 32-bit general purpose registers
- 32-bit instruction word with three operands and two addressing modes
- 32-bit address bus
- Single issue pipeline
The range of configurable parameters for performance and area trade-offs include:
- Arithmetic/logic unit (ALU) (divider, barrel shifter, multiplier, pattern compare, etc)
- Cache and cache line sizes of the instruction and data caches
- Floating point unit (FLU)
- Memory management unit (MMU)
Generally, playing with the MicroBlaze configurable parameters requires detailed knowledge of the application that it runs. There are different optimization approaches. One is to start from a basic MicroBlaze configuration. Then look to see if it handles the job. If it doesn’t, figure out why and iteratively modify the program or the MicroBlaze architecture to meet the requirements.
And what if the MicroBlaze does not worth all the benefits?
In some cases a designer may want to target the smallest FPGA as possible for its product and needs to reduce as much as possible the logic surrounding the main core application. The reasons are obvious, like real-estate, price, and power consumption. In these particular cases, the effort to remove or to move elsewhere the sequential tasks performed by the soft core processor might worth it.
In the following blog post in this series, I present different solutions and observations that need to be taken into account when thinking about completely removing the soft processor of the Perseus reference design.