I have completed the framework for the MCLR5 Quad-issue Supersclar RISC v core. It supports most of the RV32I Base Instruction Set with exceptions for the timers and shifters. I have targeted the core to the Xilinx Ultrascale+ and the Intel Stratix-10 series FPGAs.
Each of the MCLR5 RISC V cores is implemented as a combinational, single-cycle ALU. Up to four register updates can occur per clock cycle and results are are forwarded to the appropriate cores when appropriate within the same clock cycle. This just means that if core0 updates r4 and core1 uses r4 as an input, the updated value will be passed to core1.
Loads and stores are handled by core0 and are treated initially as JUMP opcodes. The MCLR5 performs a JUMP to the address containing the LOAD/STORE instruction which aligns it with core0. Instructions following the LOAD/STORE are blocked until the LOAD/STORE is completed.
The User’s Program ROM is four-opcodes wide and dual-ported so any instruction alignment is supported. Only one clock is consumed for JUMP opcodes. Because the ALU is single-cycle, no other pipelining penalties are incurred.
The Quad-issue Superscalar MCLR5 achieves nearly 90Mhz when targeted to Stratix-10 or Ultrascale+. A Single MCLR5 can reach over 250Mhz. The relatively slow timing of the quad-issue core is due to the very long combinaional paths through all four cores which is not surprising. I am actually impressed that these clock frequencies were reached at all. 🙂
It is hard to come up with a picture to illustrate a quad-issue CPU core, so I will just post a screen-shot of a sixteen ADD r1 , r1 , 1 instructions with a SW stuck in the middle.