In this post we present ‘Accelerating Massive MIMO Wireless Waveform Deployment Using GNU Radio’. The content of this blog is taken directly from our paper which you can download here.

Hi, you’re here with Tristan, field application engineer with Nutaq. Today I’m doing a short video to introduce Nutaq’s Massive MIMO Waveform Development Radio platform.

As you can see from the figure on the screen, each MicroTCA Chassis has 10 Perseus601x FPGA boards. Those FPGA boards are on the AMC form factor. Each Perseus has a Radio420M FMC card with two radio transmitters, and two radio receivers.

One chassis does 20×20. The MCH has a gigabit ethernet switch and a PCI express switch for data exchange between FPGA boards, but also for interfacing with an external data switch that leads to the processing node.

The processing node has a quad core i7 embedded PC that I’m using today to run my GNU Radio model.

What you have on the screen is the GNU Radio Companion environment that I’ve used to build my Massive MIMO application for the Nutaq platform. We’re in a 20×20 Massive MIMO system, the Nutaq architecture is entirely scalable up to 100×100 channels or beyond by combining 20 channel racks together over gigabit ethernet or PCI express on copper or fiber optics.

On the left-hand side, I have a “Carrier Perseus” blocks for each FPGA in the system. Each of the 10 blocks contain its FPGA IP address. Unfortunately, the system I am using today only has eight radio transceivers populated instead of 20. For that reason, I have only instantiated eight Radio420 Rx blocks and eight Radio420 Tx blocks.

The image on the screen shows the physical Perseus601x AMC FPGA board along with the Radio420M. Stacking two cards together on a single FMC side enables 2×2 MIMO on a single FPGA, hence increasing the channel density for the overall system.

Accordingly, I have disabled six of the 10 FPGA boards in the system. Each Radio420 block allows me to configure each of the radio parameters, such as the carrier frequency, the IQ ADCs and DACs sampling rate, the gains of amplifiers in the receiver chain, the low pass, anti-aliasing filters and the band pass pre-selection RF filters.

In this particular application I am going to use two single source from the GNU Radio default library to generate two tones sent to the FPGA using the Nutaq “RTDEx Sync” blocks. The block diagram shown on the screen summarizes the different functionalities of the system. RTDEX is a low latency, high data rate streaming engine between the PC and FPGAs in the system. It can achieve up to 750 MBps. RAM record allows using the onboard DDR3 RAM memory for recording up to 4 GB of base pane IQ samples. Its interface maximum throughput is 5.7 GB per second. The “RAM Playback” core can be used with the same DDR3 for the transmission of pre-generated test vectors. Time stamping as well as triggering capabilities are provided. The reminder of the FPGA user logic which represents more than 80% of the available resources can be programmed to do whatever signal processing you would like. We’ll cover in the next videos how to program this user logic. For the current demonstration, only the default design provided by Nutaq was instantiated in the FPGA.

All of those functionalities are controlled externally from a host computer, in this case, from a GNU Radio application. Those signals are going to be passed directly to the transmitters. In this example, no processing is done in the FPGA.
In the receiver chain I am going to use the eight receivers. To do so I have instantiated four Nutaq RTDEX source blocks that allows me to bring back the data received by the eight radios from the four FPGAs in GNU radio, and I’m using some to combine those signals and visualize them in a single FFT scope.

The RTDEX streaming engine by Nutaq is configurable. It can operate over standard gigabit ethernet link or PCI express link. Today we are going to ethernet.

In our example, in order to distinguish the eight received signals received by eight individual radios, I have put an offset in the carrier frequency of each of those radios. So my radio receiver 1 operates at carrier frequency labeled “carrier_freq”, my radio 2, “carrier_freq” plus an offset. Radio 3, “carrier_freq” plus 2 times the offset. Three times the offset, and so forth. This is radio number eight, at “carrier_freq” plus 7 times the offset.

I put my carrier frequency at 751 MHz. This is entirely configurable on the Nutaq Radio420, down from 300 MHz up to 3.8 GHz. The solution is based on the telecom industry standard MicroTCA form factor. So the radio cards are FMC cards and they are modular and can be replaced or upgraded in the future by cards that have a broader carrier coverage.

The offset I’m using between each of my radios is 250 kHz. If I run my application, I can see the two tones received by each of the 8 radio receivers. I’ve put sliders at the top to control the frequency of the tones. By sliding the first slider, I change the frequency of my first tone. By sliding the second slider, I change the frequency of second one. We can see the two tones received by the eight radios with 250 kHz offset between each.

This was Tristan, Field Application Engineer at Nutaq demonstrating Nutaq’s Massive MIMO software defined radio platform. I hope you enjoyed. For further information you can contact me by email at Thanks for listening.


Development and implementation of narrowband and SISO wireless waveforms can be performed easily by feeding RF to today’s high end processors, such as the new Quad Core™ i7 family of processors from Intel®.

But when it comes to wideband, multi-user, or MIMO waveforms, processors rapidly struggle to achieve real-time implementation, eventually requiring parallel processor computing. An example of this would be high-performance computing (HPC) systems using a large matrix of general purpose processors (GPPs) to share the processing load.

Including an FPGA between the RF module and the computer drastically reduces the load on the CPU by offloading high-speed and high-parallel computing PHY-related algorithms, as well as reducing the power consumption.

These are two important considerations when deploying a waveform on either a portable device (where increased power consumption reduces battery life, and a lower power CPU might not be able to cope with the high processing demands) or an infrastructure device (where increased power consumption and processing demands result in increased overall cost of the system).

Where software is concerned, waveform developers need to focus on adding value to the system rather than wasting time re-implementing standard algorithms (such as FFTs). On the communications side, customers don’t want to spend time on low-value tasks such as programming FPGA interfaces, adjusting FPGA constraints, debugging drivers, and so on.

Given these constraints, how can we accelerate development and deployment of an advanced wireless waveform to a mixed PC-FPGA hardware architecture?

Benefits of targeting a mixed PC-FPGA hardware architecture

Using an FPGA between the computer and the RF module enables very efficient MIMO radio development. FPGA processors are good at parallelism and high speed logical operations.
In the diagram above, which shows an example of a MIMO RX PHY processing chain, such as MIMO OFDM application, the MIMO processing chain of each antenna can easily be replicated within an FPGA architecture. Of course, a bigger FPGA is required to replicate all the necessary logic, but the performance of the system and latency would not be impacted (as opposed to an implementation done entirely within the computer).

Furthermore, using an FGPA reduces the computer CPU usage, freeing it up to process the upper waveform layer protocols (L2-L3) and more. These upper layer protocols are better suited for the RISC processor architecture found on many modern personal computers and tablets.

Another benefit to migrating PHY-related and parallel computing algorithms to an FPGA is that this approach can ease and validate the transition to real fabric. In real life applications, MIMO radios that sell in lower volumes can take advantage of the newly introduced low-power and low-cost FPGAs (for example, the Artix™-7 and Zynq® processors from Xilinx®) while high-volume radios could make the switch to ASIC PHY external chips.

Benefits of targeting a mixed PC-FPGA hardware architecture cont’d

Another important point to consider when targeting a mixed PC-FPGA architecture is the bandwidth and latency link quality between the two processors. Communication interfaces between the FPGA and PC may use various industry standards that may provide advantages or limitations, depending on the waveform application development. Let’s compare two of the main interfaces used in the industry:

Tools to accelerate waveform development

Accelerating the development of radio processing using a mixed PC-FPGA architecture is not trivial. There really isn’t any one tool that can leverage the advantages of each processor in this type of architecture, given that these processors require very different programming languages. Instead, what we can use are individual tools that take advantage of each processor’s characteristics as they apply to radio waveform development.

When considering tools, we need to keep in mind the following feature requirements for supporting accelerated development:

  • Reusability of existing resources, for example, IP cores or other code from user communities or open source libraries
  • Model-based design or high level system design approach
  • Automatic code generation
  • Debug/simulation capabilities


The content of this blog is taken directly from our whitepaper which you can download here.