Turning The Arduino Uno Into An Apple ][


Emulated Apple ][ Running On a Stock Arduino Uno.

I’ve always been fascinated by the early days of the computer revolution. Today we take tremendously powerful machines for granted but it was not always that way. As a personal project I decided to implement an early eighties era microcomputer on the Arduino Uno to demonstrate just how powerful even the most basic of our microcontrollers are today.

My microcomputer of choice was the Apple II, this was the computer that was responsible for making Apple Computer a household name, with over five million units sold it was one of the most popular microcomputers of the era.

The Apple II was originally designed in 1977 by Steve Wozniak. In order to reduce costs and to bring the computer into the mass consumer market, Steve made many unique design decisions that reduced the cost and complexity of the machine. One of these goals was to drastically reduce the chip count of the machine.

The first Apple II machines featured 4 kilobytes of RAM that was shared with the video frame buffer. For CPU it featured a MOS 6502 clocked at 1 MHz. It was capable of generating text video at a resolution of 40 columns and 24 rows, and it featured two graphics modes capable of indexed colour video at up to 280x192 pixels.

Original Apple II
Original Apple II Microcomputer.

The MOS 6502 CPU was a rather revolutionary device by its own right. The 6502 was designed by the fledgeling semiconductor manufacturer, MOS Technology in 1975. The MOS Technology CPU project was headed by Chuck Peddle and three other ex Motorola employees. They sought to produce low cost CPU designs for the broader consumer market, a revolutionary idea at the time, and an idea that lead to them leaving Motorola.

When the 6502 was first released it was priced at $25 USD. At the time this was unheard of, being up to six times cheaper than the nearest competitors. Some people even thought that the low price had to be some form of scam. Ultimately the 6502 was no scam and went on to power many of the early microcomputers, including the Apple I/II and Commodore 64. The 6502 has been referred to as the “The original RISC processor”. Its elegant instruction set and historical significance has led to the processor remaining relevant and revered even forty years on.

The first step for my project of emulating an Apple II was to emulate the 6502 processor. Emulating the 6502 is in itself is a significant undertaking. Several of the emulator design decisions were made easily, it had to be written in plain c, it had to be memory/cpu efficient, it didn’t have to be cycle accurate and it did not need to support BCD arithmetic (as Steve Wozniak never used it in his BASIC code).

The 6502 instruction set is remarkably simple, each opcode is a fixed 8 bits in length, there are 56 different instructions and 13 address modes. Most instructions work with most address modes. This simple (instruction/address mode) relationship is referred to as instruction set orthogonality. This approach leads to a much simpler emulation strategy. We can reuse the majority of memory fetching and instruction decoding code.

The MOS 6502 Instruction Set
The MOS 6502 Instruction Set.

Before starting the task of writing my 6502 emulator I took a slightly atypical approach. For programming tasks where you have a predefined input / output relationship it can be handy to use test driven development. In this case test driven development involves first writing a test in 6502 assembly that you expect to perform some action and then running it and observing the output. If the output matches what you expect (from data sheets etc) then your code for that action is correct and functional.

In this case I wrote an exhaustive test for each of the 6502’s opcodes and connected this to my emulator code. By running the test routines I could ensure my code was 6502 compatible throughout the development process. It also helped highlight unimplemented functionality.

I originally wrote the emulation code as a small C application on my OSX box. Running it locally allowed me to quickly test and make changes. My final design used a simple switch statement to decode instructions and a collection of operand decoding utilities. The goal was to keep it as simple as practical.

I decided to use a switch statement for instruction decoding due to it being easy for the C compiler to optimise into an efficient jump table. However I could have used an array of function pointers which would have likely been optimised as well (plus possibly more readable). However this would have required some navigating around the AVR memory model and you would have risked the overhead of call/return instructions.

Completing the emulator was a case of reading the 6502 programming guide and consulting the plentiful resources on 6502 emulation. After completing my first compatible build I decided to test the simulator against some of the original Apple firmware. Sadly I was plagued with strange bugs that made no sense. In a pinch I found the source code another emulator and decided to wire it into my processor unit tests. It turned out I hadn’t properly understood the x-indexed, indirect addressing mode and my unit tests were broken. It was a seriously frustrating one line fix.

The original Apple II firmware, totalling 12 kilobytes, was stored on six 2 kilobyte socketed ROM’s. These memory chips were mapped to the program address space between $D000 - $FFFF. Originally only four of the ROM sockets were populated.

Apple II Firmware:

The system monitor program functions as a sort of simple shell. It includes functionality that allows you to manipulate memory contents, trace/debug programs, and execute memory locations. It also includes a large amount of hardware routines, such as initialising memory, reading characters from the keyboard, displaying characters on the screen, and saving/loading programs. When the Apple II first starts it loads a reset vector from $FFFC which points to the beginning of the system monitor program.

Integer basic was a handwritten BASIC interpreter written by Steve Wozniak. It was syntax compatible with HP BASIC and used 16 bit signed numbers for math operations. Steve had originally intended to implement floating point math however in order to save several weeks development time he released it in integer only mode. This was the primary software users encountered when using their Apple II. In fact later models booted directly into BASIC.

The original Apple II shipped with 4 kilobytes of DRAM memory, of this approximately 1 kilobytes was dedicated to text video frame buffer memory. This left 3 kilobytes of general purpose memory. The first 768 bytes of this memory was shared between system monitor variables in the first page of memory and the processor stack / input buffer.

Apple II RAM Memory Map:

The Apple II used a rather novel approach for video generation, there’s a couple of popular methods for storing frames in memory. Rows can either be stored sequentially in memory or interleaved. Interleaving provides advantages when generating NTSC video. The Apple II took this approach one step further, using an 8:1 interleaving scheme. This had the first line followed by the ninth line. This peculiar approach allowed Steve Wozniak to simplify the video generation circuitry. A very smart hack!

Video Memory Layout
Apple II Video Memory Layout Demonstrating the 8:1 Interleaving Scheme.

As shown in my previous post on the GhettoVGA project, I designed a video interface for the Arduino Uno that uses the secondary USB interface IC to store/generate video. In the Arduino code it made no sense to keep the interleaved video mode instead my emulator decodes screen addresses and converts them into sequential memory locations. In order to save memory on the Arduino, storing the frame buffer is left to the secondary processor. This frees between 512 - 1024 bytes of memory.

Apple II Video Character Set
Apple II Video Character Set Including Inverse And Flashing Modes.

The original Apple II supports a custom video character set that is loosely based on ASCII however it adds two custom modes, flashing and inverse text. In the original hardware this is implemented with discrete digital logic that essentially inverts the output of the video generator IC using an exclusive-or gate. The signal for the flashing text is generated from a simple clock divider and flashes at approximately 2 Hz.

Apple II Video Interface
Block Diagram Of The Apple II Video Interface Showing XOR On Output.

When I first presented my GhettoVGA project it was focused primarily on the ASCII character set. In order to improve efficiency and simplicity I converted it over to the Apple character set. Within the tight timing constraints of my AVR VGA generator it was not possible to implement the full inverse mode.
However it proved possible to generate flashing characters.

This was achieved by exclusive-or'ing the character lookup address with $80, this had the effect of toggling character lookups above and below the $80 boundary. By keeping remapped normal mode characters constant, but toggling between inverse and non-inverse for flashing I was able to achieve flashing text with very little CPU time. The clock for the flashing text came from dividing the frame counter.

unsigned character_lookup[256];
volatile unsigned char bitmask = 0x00;
// Every second this changes the MSB of bitmask
void asynchronous_thread() {
    for(;;) {
      bitmask |= 0x80;
      delay(1000);
      bitmask = 0;
      delay(1000);
   }
}
// Main character drawing routine
void main() {
   character_lookup[0x01] = NONINVERTEDCHAR;
   character_lookup[0x81] = INVERTEDCHAR;
   for(;;) {
      unsigned char character = 0x01;
      // perform the exclusive-or
      character ^= bitmask;
      putchar(character_lookup[character]);
    }
}

Pseudocode Showing The Software Implementation Of XOR Flashing.

Historically keyboard input hasn’t been subject to the same degree of standardisation as ASCII and thus the Apple II uses a custom keyboard protocol. The keyboard itself is based on a modified QWERTY layout.
As for sourcing a keyboard for this project I decided to use an old PS/2 device.

PS/2 is extremely easy to interface with Arduino, it uses 5 volt TTL logic and uses a synchronous serial protocol. PS/2 outputs are open collector which means you must use a pull up resistor. However the Arduino has internal pull-ups on many pins which can easily be used.

PS/2 Keyboard Timing Diagram
PS/2 Keyboard Timing Diagram.

PS/2 packets are 11 bits in length consisting of a fixed start bit (LOW), 8 data bits, a parity bit, and a fixed stop bit (HIGH). Data is transferred Least Significant Bit first. The parity bit is used to ensure transmission occurred properly. In my project I chose not to implement parity checking. Listening for data bits relies on monitoring the clock lines. One could poll for state changes, however the Arduino provides interrupt on change capability on several pins which is vastly superior.

PS/2 keyboards use an interesting protocol for communicating key presses/releases. When a key is pressed, the keyboard sends a scan code corresponding to the key press. When a key is released, the keyboard first sends the byte $F0 and then the scan code value. It’s up to the host to track modifier keys, etc. For extra fun, the scan codes themselves are mostly random, being a product of the key matrix.

PS/2 Keyboard Scan Codes
Default PS/2 Keyboard Scan Codes.

The only realistic way of mapping PS/2 scan codes to Apple keyboard codes is through the use of a lookup table. The Apple II lacks a key up command and modifiers are processed within the keyboard hardware. This is much simpler, the Apple II acknowledges key reads by clearing the uppermost bit of the keyboard register. The keyboard handling code for my project is shown below. You can modify the scan code lookup table for easy ascii decoding.

Arduino PS/2 Keyboard Decoder: keyboard.c

For nonvolatile storage the Apple II originally shipped with a cassette interface. The idea was that you would plug the cassette interface into the earphone and recording jacks of a standard cassette player. Data was stored using a simple Frequency Shift Keying scheme at approximately 1500 baud.

Apple II Cassette, Frequency Shift Keying
Captured Audio Demonstrating Apple II FSK Modulation.

The idle state of the cassette interface was a 770 Hz square wave, data began with a 200 us sync signal followed by a series of bits, either one full cycle of a 1 KHz square wave (HIGH) or one full cycle of a 2 KHz square wave (LOW). It was the responsibility of the host to keep track of the number of bytes shifted in and terminate the read when appropriate. For most encoding schemes the Apple II saved two records of data to tape. The first record contained a two byte data length indicator and the second record included the data itself.

Apple II Cassette, Zero Crossing Detector
Schematic Of Zero Crossing Detector Used In Apple II’s Cassette Interface.

Tape data is detected using an incredibly simple circuit utilising a single 741 op amp. The headphone jack of the cassette player is passed through an inverting zero crossing detector with approximately 100mv of hysteresis.

The circuit acts as a sort of comparator, when the input signal is less than -100mv the op-amp’s output is driven high, when the input signal is greater than 100mv the output is driven low (-4v). R29 limits the maximum output current of the op-amp and the input clamping diode clamps the signal to approximately TTL levels (0 - 4v).

Zero Crossing Detector Behaviour
Effect Of A Zero Crossing Detector On A Sinusoidal Input Signal.

The output of the zero crossing detector is made available as a software register. By using a carefully timed loop and looking for the pin toggling one can detect the incoming frequency and hence extract data. It’s an incredibly elegant approach.

Flip Flop Audio Generation
Schematic Of Flip Flop Based Audio Generators Used In Apple II.

Tape data is also written out using an incredibly simple approach, the Apple II uses a 74LS74 flip flop to generate tape and audio signals. By writing to a register address you can cause the flip flop to change state. Essentially toggling its output. By using a carefully timed loop you can toggle the flip flop at the desired frequency and generate an audio signal. R18 and R19 act as a voltage divider to limit the output.

The speaker uses the same approach, however in order to drive the low impedance load of a speaker a darlington transistor was used to provide high current gain.

Early on in the design process I chose to not implement a cycle accurate 6502 cpu, it added extra complexity and on the AVR any extra speed I could get was a huge bonus. However the lack of cycle accurate instructions makes keeping tight timing loops for signal generation / decoding impossible.

In order to avoid these complexities I decided to implement the tape encoding/decoding in the native AVR instruction set and implement hooks that would interrupt the 6502’s execution of the system monitor routines. This gave me a huge amount of flexibility in my decoding approach.

void cassette_header(unsigned short periods) {
  for(int i = 0; i < periods*128; ++i) { // Header Tone
    digitalWrite(SPEAKER_PIN, HIGH);
    delayMicroseconds(650);
    digitalWrite(SPEAKER_PIN, LOW);
    delayMicroseconds(650);
  }
  // Sync pulse, one half cycle at 2500hz and then 2000hz
  digitalWrite(SPEAKER_PIN, HIGH);
  delayMicroseconds(200);
  digitalWrite(SPEAKER_PIN, LOW);
  delayMicroseconds(250);
}

void cassette_write_byte(unsigned char val) {
    for(unsigned char i = 8; i != 0; --i) {
     digitalWrite(SPEAKER_PIN, HIGH);
     delayMicroseconds((val&_BV(i-1)) ? 500 : 250);  
     digitalWrite(SPEAKER_PIN, LOW);
     delayMicroseconds((val&_BV(i-1)) ? 500 : 250);
   }
}

void cassette_write_block(unsigned short A1, unsigned short A2) {
  unsigned char checksum = 0xFF, val = 0;
  for(unsigned short addr = A1; addr <= A2; ++addr) {
    val = read8(addr);
    cassette_write_byte(val);
    checksum ^= val;
  }
  cassette_write_byte(checksum);
  digitalWrite(SPEAKER_PIN, HIGH);
  delay(10);
  digitalWrite(SPEAKER_PIN, LOW);
}

float cassette_center_voltage = 512; //center voltage
boolean cassette_read_state() { //zero crossing detector
  static boolean zerocross_state = false;
  short adc = (analogRead(CASSETTE_READ_PIN) - (short)cassette_center_voltage); // get value
  cassette_center_voltage += adc*0.05f;  // bias drift
  // ~7mv hysteresis
  if(zerocross_state && adc < -7) zerocross_state = false;
  else if(!zerocross_state && adc > 7) zerocross_state = true;  
  return zerocross_state;
}

short cassette_read_transition() {
  unsigned long start_time;
  static boolean last = false;
  boolean cur = last;
  // loop until state transition
  for(start_time = micros();cur == last;) cur = cassette_read_state();
  last = cur;
  //return duration of transition us
  return micros() - start_time;
}

boolean cassette_read_block(unsigned short A1, unsigned short A2) {
  short bitperiod;
  unsigned char val, checksum = 0xFF, datachecksum = 0x00;
  for(short i = 0; i < 10000; ++i) cassette_read_state();
  cassette_read_transition(); //tape in edge
  cassette_read_transition();
  delay(500); //settling delay
  while(cassette_read_transition() > 300); //find sync
  cassette_read_transition(); //skip second cycle sync
  for(unsigned short addr = A1; addr <= A2; ++addr) {
    val = 0;
    for(unsigned char i = 8; i != 0; --i) {
      bitperiod = (cassette_read_transition() + cassette_read_transition()) / 2;
      if(bitperiod > 300) val |= _BV(i-1);
    }
    write8(addr, val); // write byte
    checksum ^= val; //checksum
  }
  for(unsigned char i = 8; i != 0; --i) { //read checksum
    bitperiod = (cassette_read_transition() + cassette_read_transition()) / 2;
    if(bitperiod > 300) datachecksum |= _BV(i-1);
  }
  return (datachecksum == checksum);
}

void cassette_begin() {
  // ADC prescale, 77khz
  sbi(ADCSRA,ADPS2);
  cbi(ADCSRA,ADPS1);
  cbi(ADCSRA,ADPS0);
  digitalWrite(CASSETTE_READ_PIN, HIGH); //internal pullup
  analogReference(INTERNAL); //1.1v ref
}

Apple II Cassette Demodulation Using Arduino.

The above code generates and decodes Apple II formatted tape cassettes, it uses a very similar algorithm to Steve Wozniak’s original approach, however it allows me to use the handy delay and time routines. This code could potentially be used for other projects, instead of using tape, I recorded audio data to my mobile phone.

Analog Input Circuitry For Cassette Port
Analog Input Circuitry For Arduino.

The analog audio interface on the Arduino is equally simple, all that is needed is a 4.7k resistor and a 10uF capacitor. In order to increase the input sensitivity of the Arduino’s ADC I configured the Arduino to use its internal 1.1 volt reference. I then enabled the internal pull up on the ADC pin, this allowed me to use a single biasing resistor and capacitor.

At this point I had everything I needed in order to boot up an emulated Apple II. At first I configured the emulator code to write characters to the serial port. This proved highly successful and paved the way for further experiments.

Performance Measurement

Device Instructions Microseconds
Real Approx. 10, 000 ~30, 000
Emulated 10, 000 192, 000

Approximately 5 - 8x slower than the MOS 6502 clocked at 1 MHz.

The Atmega328p on the Arduino Uno comes with 2 kilobytes of ram, significantly less than the 4 kilobytes of the original Apple II. As established earlier however not all of that ram was available for general use. Through some smart design the Arduino emulator provides 1.5 kilobytes of general purpose memory. Providing nearly 1 kilobyte for BASIC programs. This has proved sufficient to handle fairly complex programs as demonstrated below.

Completed Apple II Emulator
Completed Apple II Emulator Showing The Complete Hardware.

Now that I had completed the hardware for the Apple II it was time to write some software. I needed a demo to run and prove the device was functional. After seeing a recent article on calculating the Mandelbrot set on early mainframe computers I decided I would attempt to replicate the project using integer BASIC.

The algorithm I chose is known as the escape time algorithm, the escape time algorithm uses a repeating calculation to calculate the number of iterations required before the equation begins to diverge. Iteration values above a threshold are considered to belong to the Mandelbrot set and values below are not. It’s brute force but it’s very simple and memory efficient.

1 DIM LINE$(31)
2 FOR PY=1 TO 15
3 FOR PX=1 TO 31
4 X=0
5 XT=0
6 Y=0
7 FOR I=0 to 11
8 XT = (X*X)/10 - (Y*Y)/10 + (PX-23)
9 Y = (X*Y)/5 + (10*PY - 75)/8
10 X = XT
11 IF (X/10)*X + (Y/10)*Y >= 400 THEN GOTO 15
12 NEXT I
13 LINE$(PX)="*" 
14 GOTO 16
15 LINE$(PX)=" " 
16 NEXT PX
17 PRINT LINE$
18 NEXT PY
19 END

Integer BASIC Program To Calculate The Mandelbrot Fractal.

Calculating the Mandelbrot sequence takes a few minutes on the emulated hardware. The end result is fairly neat considering it was done with 16 Bit fixed point math on an Arduino Uno.

Completed Mandelbrot Fractal
Displaying The Mandelbrot Fractal.

There is a couple small bugs I’ve noticed / improvements. The keyboard code doesn’t reset the bit counter appropriately so very occasionally (quite rare) its possible when resetting the machine to get the keyboard out of sync, need to implement a timeout. Also I think there’s some small bugs with the flashing character functionality. I swear I once saw a back to front flashing “R”, I have absolutely no idea how that happened!

I’d love to add some of the graphics modes functionality but I’ll need more memory for that! Linked below is the source code to the emulator and the video display firmware.

Arduino Apple ][ Sketch: APPLEII.zip

Video Generator Source/Hex: VGAApple.s / VGAApple.hex

Unit Tests For 6502 Processor: 6502tests.zip

As always feel free to Reach Out (LinkedIn) / Follow Me (Twitter). Always interested in people and opportunities!

 
1,153
Kudos
 
1,153
Kudos

Now read this

VGA On The Arduino With No External Parts Or CPU!

For a recent project I had to find a way to display text on a computer monitor using an Arduino Uno. There was a catch, the solution wasn’t allowed to use any third party shields and the Arduino was already busy running a cpu heavy... Continue →

Subscribe to Damian Peckett

Don’t worry; we hate spam with a passion.
You can unsubscribe with one click.

j81WYbY2VU4kRxznhW