I thought I would finish up this project with a few pictures of some games using the MCL65+ in cycle-accurate mode as a drop-in replacement for the Apple II+’s 6502.
Another flashy title, but again probably true! The MCL65+, when running in accelerated mode is, I estimate, more than ten times faster than a stock 1Mhz Apple II+! This was accomplished by emulating all of the computer’s ROM and RAM in the 600Mhz microcontroller’s memory. Just the I/O and video memory ranges were left as regular 6502 bus access to the motherboard which run at 1Mhz.
The MCL65+ is a 6502 accelerator card which uses a 600Mhz Arduino Teensy4.1 microcontroller to emulate a 6502 microprocessor as well as its bus interface signals. It was designed to be a drop-in replacement for the original 6502 processor found in computers like the VIC-20, the early Apple computers, and others.
I took some videos of two BASIC programs I made to measure the system’s performance before and after the acceleration. One is the classic x=x+1, print x, goto 10 program and the other prints an array of characters. Both very simple however the accelerated speed increase is dramatic.. the text just flies by!
I was surprised that the video and keyboard worked so well under acceleration! The next thing I need to try is booting the computer from either the 5.25″ diskette drive, or a compact flash drive emulator…
Here are some videos of it running with acceleration enabled and disabled. Please note that these programs print a lot to the screen which is accessed via the 1Mhz 6502 bus to the video memory and slows the test down. If less is printed to the screen the acceleration is even faster…
Cycle accurate mode: https://www.youtube.com/watch?v=UuSb7mrw3xg&feature=youtu.be
Accelerated mode: https://www.youtube.com/watch?v=rvJsCMR0qbo&feature=youtu.be
I thought I would give this post a catchy title, and I believe it is probably true! The MCL65+ runs an emulated 6502 on a 600Mhz microcontroller, so when it is not running cycle-accurate it is quite a bit faster than the original 1Mhz 6502 in a VIC-20.
Here are some of the details:
The MCL65+ can emulate the complete 64KB of the 6502’s address range at 600Mhz, so I was able to add certain components to see what worked and what didn’t. It turns out that the VIC-20 BASIC was not tolerant of much acceleration… When I ran an accelerated ZeroPage and Stack range the performance boost was only about 15%. This is because when I tried to accelerate the BIOS and video regions, the VIC-20 video would no longer work. I guess there are timing dependancies with the BIOS that must not be exceeded.
I had better luck with some of the cartridge games. Some of them actually ran better when they were accelerated because, at the normal clock speed, they were slow and less responsive! When accelerating the game and VIC-20 memory ranges they ran much faster which was more enjoyable. Donkey Kong, Pac-Man, and Jungle Hunt all ran well at the accelerated speed. Defender was a little too fast to control!
I will post some videos of the accelerated games so you can behold the World’s Fastest VIC-20! 🙂
I was able to “max out” my VIC-20’s memory by using the Teensy’s internal array memory to supplement the 5K on the motherboard. I believe 28159 is the maximum amount of RAM the VIC-20’s BASIC will recognize.
I was also able to load a number of cartridge games into the emulated memory. They range from 4K to 16K games which span two address ranges 0x6000 and 0xA000.
I am trying to run a number of applications to see if the MCL65+’s 6502 emulated core is functioning correctly. The core implements all of the legal opcodes as well as most of the undocumented ones which some applications depend on.
Here are a few of the games I tried.
The MCL65+ boards came in last week and the parts just arrived today, so I soldered one of them together and swapped it for the 6502 in my VIC-20… I had some luck as you can see below! 🙂
The next step will be to use some of the Teensy’s RAM to expand the VIC-20’s memory, then try loading one of the VIC-20’s cartridges on-chip as well. I actually don’t have any VIC-20 cartridges, tapes, or disks, so once I can load ROMs into memory it will be fun to be able to run any of them!
The emulated 6502 has the memory interface abstracted from the CPU, so in theory I can accelerate the core by accessing some memory ranges using internal memory rather than going out to the 1Mhz bus interface. In this accelerated mode the 6502 would run at 600Mhz! 🙂
I used a logic analyzer to observe the 6502 bus timing that I able to achieve and it appears to have under 300ns clock to out for the address bus which is within spec. It is difficult to implement parallel GPIO input and outputs with the Teensy 4.1, so I decided to use parallel inputs for the received data[7:0] and sequential writes for each bit of the address bus. I will try to perform parallel writes to GPIO6 directly, but I believe there could be issues with unwanted bits being updated…
I thought it would be fun to implement a MOS 6502 inside of a Teensy 4.1 and build a board to allow drop-in replacement of the original CPU! The Teensy is a 600Mhz micro controller board which uses the Arduino IDE and should provide enough speed to implement both the 6502 instruction set and the bus interface… at least at 1Mhz. With 1MB of memory, it could also emulate the system’s memory and run it at 600Mhz!
The initial steps were to write the 6502 emulator, test the correctness, port it to the Teensy, and check the bus timing. I believe the core is cycle accurate, maybe even cycle exact, to the original 6502. The bus timing also seems to be within the data sheet requirements…
The next step was to build a small PCB which performs the 5V to 3.3V translation (Teensy is not 5V tolerant). I was able to tweak the pinouts to allow the use of an inexpensive ($5 for quantity 10) two-sided PCB with solid ground plane and 100% through-hole components. This way anyone can solder together a board at home! I used two transparent latches and one hex-inverter.
The board and parts should arrive in a few weeks, after which I will drop it into my first test machine, the Commodore VIC-20.
The plan is to first run the system at normal speed, meaning the MCL65+ will be cycle accurate and will access all memory ranges through the external bus which should run exactly the same as the real MOS 6502.
The next goal is to move RAM and ROM into the Teensy and then run those address ranges in “accelerated” mode which is at 600Mhz. This would probably make this the World’s Fastest VIC-20! 🙂
All of the code and design files will be uploaded to GitHub. Stay tuned…
I have finished tweaking the performance of this card and Im happy to say that it increases the PCjr’s speed by nearly 6X, over 4X faster than an IBM PC/XT, and about as fast as the 8Mhz IBM PC AT! I took some screenshots below of the latest performance numbers.
I uploaded the FPGA and board files to Github:
The FPGA is a Xilinx Spartan-6 which contains the MCL86 microsequencer-based 8088 core and two SRAM controllers for the 512KB of SRAM on the board. The slow controller runs at the bus speed of the PCJr so that when using the cycle accurate mode it will perform the same as the regular physical DRAM on the PCjr’s motherboard or side-card. When the MCL86jr is put into “unlocked” or non-cycle accurate mode, it then uses a fast SRAM controller which accesses the SRAM the fastest possible speed of around 50ns! It is the combination of this fast controller and the CPU running in non-cycle accurate mode that allows the MCL86jr to boost the PCjr’s speed by more than almost 6X!
The schematics and layout was done using KiCad and the board was manufactured and assembled by PCBWay with a total total cost per board of around $50.
I was able to successfully test a number software packages and tools which will be listed at the bottom. There are a few issues that arose when using the MCL86jr Accelerator board in the PCjr. One is that the PCjr’s internal diagnostics, which are started by pressing Ctrl-Ins-Del do not all work. Another is that the PCjr’s BASIC cartridge does not work, but the BASIC included with any DOS version does work. So does the PCjr’s internal cassette BASIC. I believe both of these issues are due to speed differences which IBM is checking. Perhaps it is a way to ensure this version of BASIC will only run on a PCjr and not any other machine…
The MCL86jr will boot into cycle-accurate mode with an additional 512KB of RAM which brings the total memory to the maximum of 640KB. To run in accelerated mode the user can run a small DOS program called MCL_FAST.COM. The program MCL_SLOW.COM will bring the CPU back into cycle-accurate mode.
Software verified to work:
ITT DOS 2.11
BASIC ** Disk version with PCjr cartridge BASIC not installed
IBM PCjr Sampler
Exploring the IBM PCjr
IBM Macro Assembler 1.0
Turbo Pascal 2.08 — Runs obviously faster in “unlocked” mode
Norton Utilities 3.0
King’s Quest I
Microsoft Flight Simulator II
I made some feature and speed improvements to the MCL86jr board which set a new speed record for the World’s Fastest PCjr!
The MCL86jr board brings the PCjr’s total memory to 640KB and supports 8088 cycle-accurate mode as well as an “unlocked” mode which makes it around 4X faster than the IBM PC and about as fast as an IBM PC-AT!
I added two controllers to access the 512KB of SRAM. One of them runs at the 8088 bus speed which is just as fast as the memory you would plug into the side of the PCjr and runs about 1uS per access. The other controller, when selected, runs at the speed of the SRAM which is 50ns. The slower controller is used in 8088 cycle accurate mode and performs identically to the PCjr’s physical memory. The fast controller is used in “unlocked” mode and runs many times faster than the original hardware.
I also reduced some register pipeline delays in the FPGA which increased the performance to a new record.