ADAPTATION OF A FAULT–TOLERANT FPGA–BASED LAUNCH SEQUENCER AS A CUBESAT PAYLOAD PROCESSOR

by

Jordan K. Goff

June 2014


Approved for public release; distribution is unlimited
The purpose of this thesis is to design and test a fault-tolerant reduced instruction set computer processor running a subset of the multiprocessor without interlocked pipelined stages instruction set. This processor is implemented on a field programmable gate array (FPGA) and will be used as the foundation for a payload processor on a cube satellite developed at the Naval Postgraduate School.

This thesis begins by considering the radiation effects present in the space environment and the various fault-tolerant designs used to guard against specific types of particle events. The internal triple modular redundancy method is selected and implemented at each pipeline stage of the processor. Next, a target FPGA is selected based on the performance requirements of the processor. The Virtex-5 (registered trademark of Xilinx, Inc.) is selected over the ProASIC3 (registered trademark of Microsemi, Inc.) due to its enhanced capabilities and potential to support expansion for future applications.

The hardware design is presented as a hybrid Verilog and schematic based design. The system consists of the processor and a universal asynchronous receiver/transmitter that reads and writes data received from a generic serial interface. The device is simulated to ensure proper logic functionality. Conclusions and future work are discussed.
ADAPTATION OF A FAULT–TOLERANT FPGA–BASED LAUNCH SEQUENCER AS A CUBESAT PAYLOAD PROCESSOR

Jordan K. Goff
Lieutenant, United States Navy
B.S., Rensselaer Polytechnic Institute, 2007

Submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

from the

NAVAL POSTGRADUATE SCHOOL
June 2014

Author: Jordan K. Goff

Approved by: Herschel H. Loomis, Jr.
Thesis Co-Advisor

James H. Newman
Thesis Co-Advisor

Clark Robertson
Chair, Department of Electrical and Computer Engineering
THIS PAGE INTENTIONALLY LEFT BLANK
ABSTRACT

The purpose of this thesis is to design and test a fault–tolerant reduced instruction set computer processor running a subset of the multiprocessor without interlocked pipelined stages instruction set. This processor is implemented on a field programmable gate array (FPGA) and will be used as the foundation for a payload processor on a cube satellite developed at the Naval Postgraduate School.

This thesis begins by considering the radiation effects present in the space environment and the various fault–tolerant designs used to guard against specific types of particle events. The internal triple modular redundancy method is selected and implemented at each pipeline stage of the processor. Next, a target FPGA is selected based on the performance requirements of the processor. The Virtex–5 (registered trademark of Xilinx, Inc.) is selected over the ProASIC3 (registered trademark of Microsemi, Inc.) due to its enhanced capabilities and potential to support expansion for future applications.

The hardware design is presented as a hybrid Verilog and schematic based design. The system consists of the processor and a universal asynchronous receiver/transmitter that reads and writes data received from a generic serial interface. The device is simulated to ensure proper logic functionality. Conclusions and future work are discussed.
# TABLE OF CONTENTS

I. INTRODUCTION AND OBJECTIVES
   A. PURPOSE
   B. PREVIOUS WORK
      1. NPSCuL Development
      2. Launch Sequencer
      3. Fault–tolerant, Pipelined Processor
   C. DESIGN METHODOLOGY AND THESIS ORGANIZATION

II. RADIATION HARDENING AND FAULT–TOLERANCE
   A. RADIATION HARDENING
      1. Single–Event Effect
      2. Single–Event Upset
      3. Single–Event Transient
      4. Single–Event Latch–up
      5. Single–Event Burnout
   B. FAULT–TOLERANCE METHODS
      1. Triple Modular Redundancy
         a. Block TMR
         b. Local TMR
         c. Global TMR
         d. Internal TMR
      2. Other Fault–tolerance Approaches
      3. Selection of a Fault–Tolerant Scheme for the Payload Processor
   C. CHAPTER SUMMARY

III. DESIGN REQUIREMENTS AND SELECTION OF AN FPGA DEVICE
   A. INTRODUCTION
   B. PERFORMANCE REQUIREMENTS
      1. CLB Capability
      2. RADHARD Requirements
      3. Memory Requirements
      4. Clocking Requirements
      5. I/O Requirements
      6. SWAP Requirements
   C. CANDIDATE FPGA DEVICES
      1. Actel ProASIC3
         a. Architecture
         b. Radiation Tolerance
         c. Memory Capabilities
         d. I/O Capabilities
         e. Clock Management
         f. Power Consumption and Distribution
LIST OF FIGURES

Figure 1. Standard 1U CubeSat ................................................................. 4
Figure 2. Eight P-PODs within a NPSCuL box ................................. 5
Figure 3. BTMR circuit structure with majority voter ..................... 12
Figure 4. Local TMR circuit structure with one majority voter ......... 13
Figure 5. Global TMR circuit structure with three clocks and three majority voters .......... 14
Figure 6. Internal TMR structure with a single clock and three majority voters ......... 15
Figure 7. Generic internal structure of an FPGA featuring CLBs, switching interconnects, and input/output buffers (IOBs), from [10] .................. 20
Figure 8. Two Actel ProASIC3 boards developed for flight testing by Parobek........ 26
Figure 9. Actel ProASIC3 development board .................................... 27
Figure 10. Digilent Genesys Virtex–5 development board ............. 27
Figure 11. Standard layout of a Military ProASIC3EL FPGA, from [20] .......... 28
Figure 12. Standard VersaTile on the Actel ProASIC3EL FPGA, from [21] ..... 29
Figure 13. Xilinx Virtex–5 SLICEL CLB, from [24] ......................... 33
Figure 14. Xilinx Virtex–5 SLICEM CLB, from [24] ....................... 33
Figure 15. Standard CLB format of a Virtex–5 FPGA, from [24] ............ 34
Figure 16. Organization of I/O banks within the Virtex–5 FPGA, from [24] ...... 36
Figure 17. Payload processor pipeline architecture, from [5] ............. 39
Figure 18. Three primary MIPS instruction formats supported by the payload processor ......................................................... 40
Figure 19. Implementation of ITMR in the payload processor pipeline ....... 41
Figure 20. Schematic view of the payload processor IF stage modules ....... 42
Figure 21. Schematic view of the payload processor ID stage modules ...... 43
Figure 22. Schematic view of the register file internal organization .......... 45
Figure 23. Schematic view of the payload processor EX stage modules ....... 47
Figure 24. Schematic view of the data memory module in the MEM stage .......... 50
Figure 25. Schematic view of the branch multiplexer and address selector modules in the MEM stage .......................................................... 51
Figure 26. Schematic view of the PC selector module and its connection to the PC register ................................................................. 52
Figure 27. Schematic view of the WB stage multiplexer ........................ 53
Figure 28. Schematic view of the UART’s connection to the payload processor .... 54
Figure 29. Waveform outputs of the triplicated PC register ................. 60
Figure 30. ID stage outputs for the I–format instructions ANDI, and ORI ........ 63
Figure 31. EX stage outputs for the I–format instructions ANDI and ORI .......... 64
Figure 32. WB stage outputs for the I–format instructions ANDI and ORI .......... 65
Figure 33. ID stage inputs and outputs for the R–format instructions AND, OR, XOR, and NOR ................................................................. 66
Figure 34. EX stage outputs for the R–format instructions AND, OR, XOR, and NOR ................................................................. 67
Figure 35. WB stage outputs for the R–format instructions AND, OR, XOR, and NOR ................................................................. 68
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>36</td>
<td>ID stage outputs for the load word and store word instructions.</td>
</tr>
<tr>
<td>37</td>
<td>EX stage outputs for the load word and store word instructions.</td>
</tr>
<tr>
<td>38</td>
<td>MEM stage outputs for the load word and store word instructions.</td>
</tr>
<tr>
<td>39</td>
<td>ISim object panel display of the first eight bytes of data memory after simulation.</td>
</tr>
<tr>
<td>40</td>
<td>WB stage outputs for the load word and store word instructions.</td>
</tr>
<tr>
<td>41</td>
<td>ID stage outputs for the BEQ instruction.</td>
</tr>
<tr>
<td>42</td>
<td>EX stage signals while executing the BEQ instruction.</td>
</tr>
<tr>
<td>43</td>
<td>MEM stage inputs and outputs of the branch instruction.</td>
</tr>
<tr>
<td>44</td>
<td>PC selector module and PC register transitions upon receipt of a branch address.</td>
</tr>
<tr>
<td>45</td>
<td>IF stage outputs of the jump instruction.</td>
</tr>
<tr>
<td>46</td>
<td>MEM stage and PC selector module outputs of the jump instruction.</td>
</tr>
<tr>
<td>47</td>
<td>Inputs and outputs of the UART receive system during testing.</td>
</tr>
<tr>
<td>48</td>
<td>IRQ processing performed by the UART and PC selector module.</td>
</tr>
<tr>
<td>49</td>
<td>MEM stage inputs and outputs during a UART receive operation.</td>
</tr>
<tr>
<td>50</td>
<td>Load word instruction referencing the UART transmit address space.</td>
</tr>
<tr>
<td>51</td>
<td>Data memory module inputs and outputs during the UART transmission process.</td>
</tr>
<tr>
<td>52</td>
<td>UART signal transitions during a transmission.</td>
</tr>
</tbody>
</table>
LIST OF TABLES

Table 1. Minimum required I/O standards to be supported by a FPGA supporting the CubeSat payload processor .................................................................24
Table 2. I/O standards, types, voltages, and operating frequencies for the ProASIC3L, after [19]. ....................................................................................30
Table 3. I/O standards, types, voltages, and operating frequencies for the Virtex–5, after [25]...............................................................................................36
Table 4. Control module flag effects when asserted and de–asserted, after [29] .........44
Table 5. Listing of the ALU operations necessary to implement the subset of MIPS core instructions used by the payload processor.........................................48
Table 6. EX stage comparison flags and their assertion criteria..........................48
Table 7. Gray code sequence of inputs to the pipeline registers..........................60
Table 8. Assembly program used for processor testing.....................................61
# LIST OF ACRONYMS AND ABBREVIATIONS

<table>
<thead>
<tr>
<th>Acronym</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALU</td>
<td>arithmetic and logic unit</td>
</tr>
<tr>
<td>BTMR</td>
<td>block TMR</td>
</tr>
<tr>
<td>CLB</td>
<td>configurable logic block</td>
</tr>
<tr>
<td>COTS</td>
<td>commercial off-the-shelf</td>
</tr>
<tr>
<td>CubeSat</td>
<td>cube satellite</td>
</tr>
<tr>
<td>DOD</td>
<td>Department of Defense</td>
</tr>
<tr>
<td>ECC</td>
<td>error correcting code</td>
</tr>
<tr>
<td>EELV</td>
<td>evolved expandable launch vehicle</td>
</tr>
<tr>
<td>ESPA</td>
<td>EELV secondary payload adapter</td>
</tr>
<tr>
<td>EX</td>
<td>execution (pipeline stage)</td>
</tr>
<tr>
<td>FF</td>
<td>flip-flop</td>
</tr>
<tr>
<td>FPGA</td>
<td>field programmable gate array</td>
</tr>
<tr>
<td>GTL</td>
<td>Gunning transistor logic</td>
</tr>
<tr>
<td>GTLP</td>
<td>Gunning transistor logic plus</td>
</tr>
<tr>
<td>GTMR</td>
<td>global TMR</td>
</tr>
<tr>
<td>HDL</td>
<td>hardware description language</td>
</tr>
<tr>
<td>HSTL</td>
<td>high-speed transceiver logic</td>
</tr>
<tr>
<td>ID</td>
<td>instruction decode</td>
</tr>
<tr>
<td>IF</td>
<td>instruction fetch</td>
</tr>
<tr>
<td>I/O</td>
<td>input/output</td>
</tr>
<tr>
<td>ISR</td>
<td>interrupt service routine</td>
</tr>
<tr>
<td>ITMR</td>
<td>internal TMR</td>
</tr>
<tr>
<td>JTAG</td>
<td>joint test action group</td>
</tr>
<tr>
<td>LED</td>
<td>light-emitting diode</td>
</tr>
<tr>
<td>LET</td>
<td>linear energy transfer</td>
</tr>
<tr>
<td>LTMR</td>
<td>local TMR</td>
</tr>
<tr>
<td>LUT</td>
<td>look-up table</td>
</tr>
<tr>
<td>LVDS</td>
<td>low-voltage differential signaling</td>
</tr>
<tr>
<td>LVPECL</td>
<td>low-voltage positive emitter-coupled logic</td>
</tr>
<tr>
<td>MBU</td>
<td>multiple bit upset</td>
</tr>
</tbody>
</table>
MEM memory (pipeline stage)
MIPS multiprocessor without interlocked pipelined stages
NPS Naval Postgraduate School
NPSCuL NPS CubeSat launcher
PAR place and route
PCB printed circuit board
PCI peripheral component interconnect
P–POD poly–picosatellite orbital deployer
QFDR quadruple force decide redundancy
RADHARD radiation hardened
RAM random–access memory
RISC reduced instruction set computing
ROM read–only memory
SBU single bit upset
SEB single–event burnout
SEE single–event effect
SEL single–event latch–up
SET single–event transient
SEU single–event upset
SDRAM synchronous dynamic RAM
SRAM static RAM
SSTL stub series terminated logic
SWAP size, weight, area, and power
TID total ionizing dose
TIR triplicated interwoven redundancy
TMR triple modular redundancy
UART universal asynchronous receiver/transmitter
USB universal serial bus
WB write–back
EXECUTIVE SUMMARY

The purpose of this research is to design a fault–tolerant, 32–bit, reduced instruction set computing (RISC) processor that interfaces with a universal asynchronous receiver/transmitter (UART) for a field programmable gate array (FPGA). This system serves as the first step towards developing a complete payload processor for a cube satellite (CubeSat) that will successfully interface with a sensor payload device attached via a serial connection. Development of the payload processor is a critical step in the advancement of CubeSat technology because it will provide the interface to new functionalities such as attitude control with star–trackers, imagery, and on–orbit data processing.

Outer space provides a challenging environment in which to deploy and operate electronics. A CubeSat operating in low–Earth orbit is susceptible to numerous high energy particle collisions resulting from solar wind, galactic cosmic rays, and exposure to the inner Van Allen radiation belts. Collisions between the spacecraft and these particles resulting in a disruption of electronic signals within the processor are referred to as single–event effects (SEEs). Certain types of SEEs can be prevented through the implementation of fault–tolerant logic designs. The fault–tolerant design methods considered in this thesis are capable of preventing single–event upsets, a type of SEE affecting the state of a single bit in the processor. It was determined that internal triple modular redundancy (ITMR) would be the most suitable fault–tolerant method to handle errors within the processor pipeline. This is due primarily to its ability to detect and correct single bit errors with no interruption to the processor.

The processor design features a standard five–stage pipeline with fetch, decode, execute, memory read/write, and write–back stages. Each pipeline stage is a collection of combinatorial logic components taking input from a pipeline register and providing output to the next pipeline register. The pipeline registers implement the aforementioned ITMR fault–tolerance, which is a set of three voter circuits for each data element and control signal passed between stages. The processor contains an arithmetic and logic unit capable of eleven mathematical operations. Instruction and data memory are triplicated in 

xvii
the design, constituting three identical blocks of each. Both memories are byte–
addressable. This was due in part to the simplicity of using the fixed–length
microprocessor without interlocked pipeline stages (MIPS) instruction set architecture
(ISA) and to enable the data memory to write single bytes it receives from the UART.

The UART is implemented as a memory–mapped input/output (I/O) device that
writes single–byte parallel data to a specified block within data memory and reads
single–byte parallel data from a specified address in data memory. An interrupt service
routine (ISR) is also included within the instruction memory for data to be written by the
UART. When the UART has queued data it is ready to write into processor memory, it
signals the interrupt forcing the processor to save its next program counter address and
jump to the ISR. Once complete, the ISR jumps back to the previously saved address and
pipeline execution can continue as before the interrupt. This implementation is somewhat
elementary and currently only supports one I/O interface; however, the ISR is only
executed to write data from the UART to data memory and does not require context
preservation of the registers.

Testing and verification of the payload processor was accomplished using
behavioral simulation in two stages. First, the processor had to demonstrate it correctly
handled each instruction within its ISA. This was verified by running a short program in
which the processor attempts to execute at least one of each type of instruction. Using the
Xilinx ISE\textsuperscript{1} Simulator (ISim), we verified the output of each pipeline stage at each clock
cycle. The second testing stage involved the integration of the UART and ISR as part of
the system. Again, ISim was used to verify the processor entered and exited the ISR and
wrote the data into the correct memory location. The ISA supported on the processor
features 24 of the most common MIPS instructions capable of performing most functions
desired on a RISC processor. Advanced multiply, divide, floating point, and other
pseudo–instructions are not supported in this evolution; however, the processor can be
further scaled to support their implementation.

\textsuperscript{1}ISE\textsuperscript{®} is a registered trademark of Xilinx, Inc.
A basic framework for the payload processor on which advanced payloads will eventually be supported for cube satellites was presented in this thesis. Each payload must be capable of passing data to a RISC processor that can perform processing and storage functions on its data. Future work should seek to develop a full–featured, fully tested, fault–tolerant processor supporting a wide variety of payloads. Continued development of the payload processor technology for CubeSats will offer the DOD a low–cost solution for space sensors and communications and ensure dominance in the space environment.
ACKNOWLEDGMENTS

I would like to thank God for all of the opportunities I continue to receive that enhance my career and education. I would also like to thank my family for supporting and encouraging me to work hard and finish my research, often from thousands of miles away. Prof. Herschel Loomis and Prof. Jim Newman have been fantastic advisors throughout this journey. Their patience and willingness to allow me to make mistakes was greatly appreciated and admired. Finally, I must thank the entire faculty and staff who contributed to my education while at NPS. They are a team of professionals who are truly dedicated to helping me achieve my goals, and I could not have completed my degree without them.
THIS PAGE INTENTIONALLY LEFT BLANK
I. INTRODUCTION AND OBJECTIVES

A. PURPOSE

Operation of electronic devices in the space environment requires increased redundancy in component design due to the increased probability of high–energy particle collisions. The sources of these particles are often the solar wind or more violent solar events such as galactic cosmic rays (GCRs), coronal mass ejections (CMEs), and solar flares. It is also possible for particles within the Earth’s magnetosphere and those arriving from more distant stars to interfere with electronic devices on–orbit. Collisions with these particles can have damaging effects on spacecraft electronics. One solution to this issue is to construct electronic devices with radiation–hardened materials; however, this process raises the cost of the component significantly. Another solution is to design onboard spacecraft electronics with added logic redundancy, known as fault–tolerance. Development of fault–tolerant machines is of primary interest to the Naval Postgraduate School Cube Satellite (CubeSat) Program.

Previous research performed by the Naval Postgraduate School CubeSat Launcher (NPSCuL) program has focused on the design of a fault–tolerant launch sequencer that precisely determines the correct time and order in which to launch satellites once on orbit [1], [2]. Similar design concepts employed to create the launch sequencer can also be used to develop a payload processor, which will eventually provide an interface to a sensor device on any satellite. Methods and architectures previously used to develop the launch sequencer technology are applied towards developing a payload processor that interfaces with a serial–to–parallel I/O interface in this thesis.

Field programmable gate arrays (FPGAs) can be purchased in several different varieties, with each one offering a unique combination of logic functionality and reliability. Models generally fall within the categories of commercial, industrial, military, and radiation–hardened (RADHARD). Commercial and industrial FPGAs are suited for use in common electrical applications. Military–grade FPGAs are typically manufactured with higher quality packaging and thermal characteristics. A RADHARD model
generally refers to a FPGA that has electronic and structural features providing resistance to all types of particle events including long–term total dose effects. CubeSat research at NPS is primarily interested in the application of commercial–grade FPGAs [1], [2]. The benefits of producing a payload processor using fault–tolerant techniques embedded in a commercial off–the–shelf (COTS) FPGA are cost reduction, re–configurability, and ownership of proprietary rights to the processor design. These benefits are explained in greater detail in the following paragraphs.

Implementing fault–tolerance on a COTS FPGA is substantially cheaper than purchasing a RADHARD FPGA. A RADHARD model can cost ten to one–hundred times a COTS model. While the inherent reliability of a RADHARD FPGA is lost by using a COTS device, the possibility of more COTS devices being purchased and installed on CubeSats increases. Lowering cost and increasing reproducibility are primary motivating factors of CubeSat development. The gains made in cost and sacrifices made in hardware reliability must be met with the appropriate fault–tolerant design scheme to ensure the satellite is capable of operating in the space environment.

The primary advantage of using a re–programmable logic device, particularly an FPGA, is its ability to reconfigure and test its logic functionality throughout development and while on orbit. This introduces the possibility of resetting the processor in the event of a hard logic fault or re–defining the processor capabilities on orbit to support new missions. Partial reconfiguration of the device is also possible and would allow updates to occur while the existing configuration is still operating. The ability to reconfigure a CubeSat’s processor while on orbit was discussed by Parobek in his thesis [1] but is still outside the scope of this thesis; however, establishing an initial design for a payload processor represents the next step towards implementing this capability.

Finally, by implementing a custom payload processor design on a FPGA, the Department of Defense (DOD) retains proprietary rights over the associated software files and the ability to make modifications that exclusively enhance its own mission. It might become desirable to perform specific digital signal processing (DSP) algorithms on imagery or tune an antenna to receive frequencies within a very narrow band. If the payload processor is developed from base logic components using DOD intellectual
property, these tasks can be considered as future design requirements or configuration revisions. This would provide the DOD with substantial control over the implementation of its CubeSat assets.

The goals of this thesis are as follows:

- Determine the best fault–tolerant architecture to use in the payload processor.
- Analyze two types of FPGAs and determine which device is most suitable to support the payload processor.
- Produce a software design of the fault–tolerant processor using a software package that supports schematics and HDL code.
- Interface the processor with a generic serial–to–parallel device (also designed in schematic or HDL code).
- Test the processor design using logic simulation software.

Complete design of the processor and its implementation on the selected FPGA cannot be realized at this early stage of development. A significant amount of further testing and development will be necessary to implement the processor on a FPGA. The focus of this thesis is on laying a basic foundation for the processor on which more advanced methods and technologies can be tested.

B. PREVIOUS WORK

1. NPSCuL Development

NPSCuL supports the development and testing of DOD nano–satellite technologies by deploying CubeSats as secondary payloads on DOD launch vehicles. Successful demonstrations of CubeSats with sensory payloads could result in important surveillance and communications capabilities to be gained by the DOD. CubeSats are constructed in three varieties based on their size. The smallest is a single (1U) satellite that is 100.0 mm in transverse width and 113.5 mm in height [3]. The height of the CubeSat can be elongated to make double (2U) and triple (3U) CubeSats with heights of 227.0 mm and 340.5 mm, respectively [3]. A typical 1U CubeSat similar to those developed at NPS is displayed in Figure 1.
After a CubeSat is constructed it must be appropriately mounted on the launch vehicle for deployment. The eight modules shown inside of the NPSCuL box in Figure 2 are poly–picosatellite orbital deployers (P–PODs) and are responsible for housing CubeSats until they are ready to be deployed. The P–PODs are typically grouped up to a set of eight inside of an NPSCuL box connected to an Evolved Expandable Launch Vehicle (EELV) Secondary Payload Adapter (ESPA). The ESPA provides the mechanical coupling of the P–POD to a variety of launch vehicles. Once the appropriate orbit is achieved, the launch sequencer opens the individual P–POD doors and deploys the CubeSats, which are ejected by a spring.
2. **Launch Sequencer**

The task of reliably sequencing deployment of CubeSats from a P–POD was the motivation for the launch sequencer developed by Parobek and Brandt in their theses [1], [2]. From this work, it was determined that the Actel ProASIC3 FPGA was best suited to perform the launch sequencer function of the CubeSat. This was due primarily to the resistance of its flash configuration memory to single–event upsets (SEUs); however, it was noted by Parobek that for a more general payload processor, the Xilinx Virtex–5 model FPGA would be a better choice because of its increased computing capabilities [1]. Both products are reconsidered, and which FPGA model is most capable of implementing a payload processor is determined in this thesis.

---

1 ProASIC3® is a registered trademark of Microsemi, Inc.
2 Virtex-5® is a registered trademark of Xilinx, Inc.
3. Fault–tolerant, Pipelined Processor

Substantial influence in the design of the payload processor was generated through the thesis work of Majewicz [4]. His thesis culminated in the design of the “Pix” triple modular redundancy (TMR) processor. The “Pix” is a 32-bit, pipelined microprocessor without interlocked pipeline stages (MIPS) design that uses internal TMR (ITMR) vice an interrupt service routine (ISR) to correct SEUs encountered in its pipeline. This rapid correction of SEUs combined with the adaptability of a MIPS architecture provided a solid framework for the payload processor; however, the pre-packaged “Pix” will not automatically function as a payload processor because it does not include a method to handle I/O interrupts. It is first necessary to update the design for a new hardware description language (HDL) and version of the Xilinx ISE Webpack software suite. Additionally, an I/O interrupt handling method was implemented that determined how the processor would read and write I/O data.

C. DESIGN METHODOLOGY AND THESIS ORGANIZATION

The first step in designing the payload processor was to determine the fault–tolerance method that would best support its desired function. A consideration of several variations of both triplicated and quadruplicated logic redundancies in addition to the potential single–event effects (SEEs) is discussed in Chapter II. The ideal fault–tolerance method offers sufficient redundancy against an SEU with minimal impact to logic design resources and signal propagation time through the circuit.

A comparison of the basic performance requirements to the available resources on FPGAs, and a selection on which device and development software to implement the payload processor design is discussed in Chapter III. Generic CubeSats typically have loosely–defined design criteria. Quantification of the design parameters and requirements often is not finalized until the mission of a particular satellite is known; thus, the focus of device selection was dedicated towards determining the most capable FPGA with potential for expansion to support future missions.

Following selection of the target device, we discuss the design of the logic components and organization of the payload processor in Chapter IV. The design closely
follows a typical five–stage pipeline proposed by Patterson and Hennessy [5]. This design does not support all of the associated instructions in the MIPS core instruction set architecture (ISA). The instructions currently supported by the payload processor are detailed in Appendix C. These instructions represent a baseline set that allow the processor to perform most functions available in the core ISA. The selected instructions focus on memory loads and stores, register manipulation, and program execution.

In addition to the pipeline processor, a universal asynchronous receiver/transmitter (UART) was constructed to serve as an input/output (I/O) interface with the processor. Generically, a UART is representative of several I/O interfaces that exist on a modern personal computer including universal serial bus (USB), RS–232C, and Ethernet. The UART employed in this thesis is not robust enough to connect the payload processor to a standardized serial interface; however, the functionality included in the control signals and data closely match the RS–232C format and were used to simulate the processor’s ability to handle interrupts from an I/O device.

A summary of the testing procedures used to determine that the design was fully functional is presented in Chapter V. The first testing phase was to verify the processor could perform all instructions included in the instruction set. This was accomplished by loading the instruction memory with a short program exercising its entire ISA. This method allows troubleshooting to occur at each pipeline stage by viewing the output of that stage’s combinatorial logic. Secondly, the TMR capabilities of the processor had to be tested with known errors to ensure an SEU does not propagate through more than one pipeline stage of the processor. Finally, testing of the interrupt service routine associated with the I/O interface was necessary to ensure the processor can communicate with a sensor device through a serial interface.
II. RADIATION HARDENING AND FAULT–TOLERANCE

A. RADIATION HARDENING

In the space environment, integrated circuits are not as protected from high energy particle events and radiation as those operating below the ionosphere. Even satellites traveling in low–Earth orbit (LEO) are exposed to hazardous solar events which can induce havoc within their circuitry. The impact of long–term radiation on a computer processor is generally referred to as total dose effects and is elaborated upon in Chapter III. Immediate events caused by this radiation are referred to as SEEs; however, subdivisions of these phenomena have also been classified according to their cause and severity [6]. Particle events significant to CubeSats are detailed in the following paragraphs.

1. Single–Event Effect

SEE is the general term pertaining to damage incumbent on an electronic component resulting from a high–energy particle event. SEEs can be classified into two categories based on the type of errors they produce. Soft error SEEs cause a temporary disruption in the operation of the processor. Recovery from a soft error SEE can be accomplished by clearing the processor to a safe state before resuming operation. Hard error SEEs permanently affect the operation of one or more components in the processor. These errors cannot be overcome while on–orbit by means other than physical repair of the damaged satellite component. Hard error SEEs are rare but can cause partial loss of function or total loss of a spacecraft. The fault–tolerance methods explored in this thesis are not capable of combating hard error SEEs. FPGAs generally must be designed with additional shielding and smaller feature size among other methods to significantly reduce the probability of a hard error SEE.

2. Single–Event Upset

SEUs are soft errors that cause unintended logic signal inversions to occur in memory modules associated with the processor. SEUs can manifest as single–bit upsets
(SBUs) or multiple–bit upsets (MBUs). SBUs are the most common errors encountered in spaceflight [6] but can generally be overcome by using an error correction code (ECC) for memory read operations or fault–tolerant detection for registers. MBUs become increasingly likely to occur as the device size decreases due to the increased density of transistors [7]. This occurs because the number of transistors potentially affected by a single high–energy particle per unit area will increase. Since processor technology generally follows Moore’s Law, MBUs will present more issues as devices are reduced in scale and increased in complexity. The fault–tolerant methods considered in this thesis are only guaranteed to correct SBUs, since two or three incorrect inputs to a three or four–input voter will result in an incorrect output.

3. Single–Event Transient

Single–event transients (SETs) are similar to SEUs except that they affect combinatorial logic instead of memory [6]. An error present in the combinatorial logic of the processor can propagate throughout the circuit. If a SET propagates far enough in the processor to become an input to a flip–flop (FF), it effectively becomes an SEU. SETs are dependent on the time and location of the particle strike in addition to the waveform it creates within the circuit following impact [8].

4. Single–Event Latch–up

A single–event latch–up (SEL) is a hard error that occurs when a high–energy particle causes a transistor to remain frozen in a certain state [6]. SELs generally pertain to transistors temporarily being stuck in a single state; however, SELs can progress to become a permanent single–event burnout (SEB) if corrective action is not taken. Recycling power to the device is a common method for SEL recovery [6].

5. Single–Event Burnout

A SEB is a hard error that occurs when a high–energy particle creates a short between the transistor gate and ground. SEBs may, but are not required to, result from an unattended SEL. The effects of a SEB are permanent, and the power cycling recovery
method used for SELs will not eradicate a SEB. If a CubeSat suffers a SEB, the entire mission could potentially be lost.

B. FAULT–TOLERANCE METHODS

Several fault–tolerant logic implementations are available to mitigate SEUs. Initially, techniques involving variations of TMR are explored. Then more alternative techniques including quadruple force decide redundancy (QFDR) and quadded logic are investigated. The culmination of this subsection provides resolution on the best fault–tolerance method for the payload processor.

1. Triple Modular Redundancy

TMR is accomplished by triplicating logic memory components (exclusively flip–flops and random access memory in the payload processor) and using majority voting logic to determine the correct output. Several variations of triplication are possible within a logic device. These variations typically differ by their level of granularity within the device. Finer grain TMR implementations perform voting at a lower level within the design. This implementation is typically more complex and requires more resources but provides quicker masking and correction of errors. Additionally, as transistor sizes become smaller, finer grain mitigation techniques are better suited to resist SEUs [7]. Coarser grain implementations perform logic voting at a higher level. This capitalizes on simplicity but sacrifices the localization of error handling. Four of the most common TMR implementations, block, local, global, and internal TMR, are considered for the payload processor.

a. Block TMR

Block TMR (BTMR) is implemented by using the outputs of three high–level logic devices as inputs to a voter circuit. The voter then chooses the majority of the three logic device outputs. A graphic representation of a BTMR design for the payload processor is displayed in Figure 3. The primary advantage of this approach is its simplicity. A BTMR payload processor can be realized simply by triplicating a 32–bit processor template and attaching a voter. The primary disadvantage of BTMR as applied
to the payload processor is its inability to provide reliable error correction to the system. This is because a 32–bit processor does not produce a single logic output but updates numerous registers and memory modules on each clock cycle. Additionally, one processor failure in the set results in complete loss of error correction capabilities and potentially the entire mission [7]. A more robust form of TMR is preferable to meet the high reliability requirements of the space environment.

Figure 3. BTMR circuit structure with majority voter.

\textbf{b. Local TMR}

Local TMR (LTMR) implements redundancy at the flip–flop (FF) level of logic structures, as displayed in Figure 4. Each FF in the payload processor acts as part of a triple set feeding their outputs to a voter circuit. The voter circuit provides feedback to the individual FFs and a single data path to the next processing stage. LTMR has the advantage of finer grain error correction within the processor data path, making it better suited than BTMR to withstand a complete failure of error correction logic. The primary disadvantage of LTMR is its single voter, which represents a single point of failure within the circuit [7]. Error correction using a feedback path on the FF is also complex and requires the use of additional resources when implemented on the FPGA. Thus, LTMR is a more suitable solution than BTMR due to its finer grain implementation but
still contains vulnerabilities that make it too unstable to implement in the payload processor.

![Diagram of Local TMR circuit structure with one majority voter.](image)

**Figure 4.** Local TMR circuit structure with one majority voter.

c. **Global TMR**

Global TMR (GTMR) effectively uses three individual processors running in parallel that output to three voting circuits, as shown in Figure 5. GTMR offers excellent resistance against SEUs since each processor operates on its own memory. Like LTMR, error correction through a feedback path is replaced by error masking through the use of three voting circuits. The greatest disadvantage of GTMR is the potential clock skew between the three processing domains [7]. GTMR also requires more logic resources, but the additional voting resources enable it to correct for SETs in a voter. The robustness of GTMR is a distinct advantage over LTMR and BTMR; however, the issue of clock skew generally outweighs the redundancy of using three individual clocks within the device.
The most practical triplicated voter fault-tolerance for the payload processor is a GTMR approach with a single clock signal synchronizing all three circuits.

Figure 5. Global TMR circuit structure with three clocks and three majority voters.

d. Internal TMR

The concept of ITMR was proposed and implemented by Majewicz in his design of the “Pix” TMR processor [4] and is illustrated in Figure 6. It closely follows the GTMR implementation but only uses one clock for all three processors. This greatly reduces the potential for clock race conditions within the design. Like GTMR, ITMR requires additional FPGA resources for voting but effectively eliminates SEUs and SETs in the processor. Majewicz noted the vulnerability to this approach lies in the register file, where no internal voting occurs [4]; however, SEUs present in the register file are voted out after passing through a single pipeline stage. The ITMR architecture is clearly the best compromise amongst TMR variants for the payload processor.
2. **Other Fault–tolerance Approaches**

There are several other methods to implement fault–tolerance in addition to the TMR variants proposed. Parobek’s research into the launch sequencer considered quadded logic, QFDR, and triplicated interwoven redundancy (TIR) [1]. Quadded logic enforces fault–tolerance by quadrupling each logic gate and its inputs and appropriately interchanging connections of the four output signals at the next sequential set of logic gates. QFDR is similar to quadded logic but decides between four inputs to a FF and look–up table (LUT) combination. This allows QFDR to be implemented at a higher logic level than quadded logic. Finally, TIR triplicates logic functions and randomly interconnects output signals between consecutive logic blocks.

Quadded logic, QFDR, and TIR do not present themselves as viable options for the payload processor. Quadded logic must be implemented at the gate level and has no built–in protection against SETs because it cannot be implemented in sequential logic.
Conversely, QFDR protects against SETs because of the presence of FFs in its design. This implementation is achievable but more complex to implement in logic design software than most of the TMR schemes considered. TIR offers no distinct advantage over traditional TMR methods and is significantly more complex to troubleshoot when designing.

3. Selection of a Fault–Tolerant Scheme for the Payload Processor

Among all the fault–tolerant schemes presented, ITMR is considered to be the best selection for the payload processor. It has a distinct advantage over the other TMR variations because it eliminates SEUs in the voter circuits as well as the combinatorial logic circuits of the processor. Majewicz noted a vulnerability of the TMR architecture in a pipelined processor was the possibility of an SEU occurring inside the register file [4]. While this temporarily increases the probability of an MBU occurring in the register file, the processor operates without issue since the voting logic ensures the error does not propagate beyond that particular pipeline stage. Ultimately, ITMR offers the best solution to eradicate SEUs in the payload processor.

C. CHAPTER SUMMARY

The motivation for fault–tolerance in space computing applications by categorizing the single–event effects that can occur while operating in the space environment was first discussed in this chapter. It was determined that the fault–tolerance method to be employed on the payload processor would provide increased resistance to SEUs, the most common SEE experienced by CubeSats. It was also noted that the selected fault–tolerance method could not provide additional protection against hard–error SEEs (i.e., SELs and SEBs), which can only be reduced by using FPGAs with better RADHARD characteristics.

Each of the candidate fault–tolerance methods was then presented for consideration. Four variations of TMR (BTMR, LTMR, GTMR, and ITMR) were investigated. ITMR was determined to be the most suitable of these schemes because of its enhanced voter logic and single clock signal. Three additional fault–tolerance schemes from Parobek’s thesis were revisited (quadded logic, QFDR, and TIR); however, ITMR
was selected for implementation in the payload processor because of its simplicity and ability to mask errors in each stage of sequential logic. Thus, ITMR was selected as the best fault–tolerance scheme to implement in the payload processor.
III. DESIGN REQUIREMENTS AND SELECTION OF AN FPGA DEVICE

A. INTRODUCTION

The payload processor serves as the interface between the spacecraft data bus and a sensor device the satellite uses to obtain data. It can buffer and process data received from the sensor, store data into memory, and provide data on-demand to the satellite control processor thereby allowing access to users at a ground station. The increased capabilities required of a payload processor over the launch sequencer necessitate additional performance requirements of the target FPGA. Logic cell capability, inherent radiation hardness, processor speed, memory size and configuration, and size, weight, area, and power (SWAP) are all necessary considerations for implementing a successful design.

FPGAs are organized into a large two-dimensional grid of configurable logic blocks (CLBs) interconnected by switching circuitry [9]. An example of internal FPGA structure and organization is shown in Figure 7. Each of these cells represents a certain amount of programmable logic functionality through devices such as LUTs, random access memory (RAM), logic gates, and multiplexers. The complexity and scale of the payload processor design determines the percentage of these resources required for implementation.
Figure 7. Generic internal structure of an FPGA featuring CLBs, switching interconnects, and input/output buffers (IOBs), from [10].

B. PERFORMANCE REQUIREMENTS

Performance requirements must be identified prior to selecting a FPGA for development. During this early phase of the payload processor technology, selection focused primarily on resource optimization vice meeting specific safety and testing requirements. These requirements become more applicable once the processor is being fitted for a specific satellite and mission. More detail on the performance requirements with regard to the payload processor concept of operations are provided in the following paragraphs.

1. CLB Capability

Each vendor offers many families and models that feature a variable quantity of CLBs providing a range of functionality. There is a continuous trade–off between the
type and quantity of these blocks with the speed, power consumption, and cost of the
device. The payload processor does not impose any requirements on the capability of
individual CLBs on a FPGA. Software optimizes the majority of logic functions while
the place–and–route (PAR) process is being performed. Though advanced CLB
capability is preferred, it is less significant compared to the macroscopic features the
device has to offer.

2. RADHARD Requirements

Although device selection is limited to commercial (non–RADHARD) FPGAs, it
is logical to consider any available data on a commercial FPGA’s inherent radiation
hardness during the selection process. Selecting a device with an above average
resistance to hard error SEEs ensures the best protection against unrecoverable events the
ITMR architecture cannot prevent. Each commercial model features a variety of
semiconductor technologies which have variable levels of resistance to radiation. Data
describing the inherent radiation hardness of non–RADHARD FPGAs from
manufacturers is often unavailable. The majority of this data comes from published
academic or industry research. Consequently, the methods used to obtain this data for the
various FPGA models are not standardized [11]; therefore, it is necessary to gain some
perspective on the characteristics that affect a FPGA’s inherent radiation hardness.

In order to obtain data regarding the radiation received by a semiconductor
device, it is necessary to consider the concentration and energy of incident particles.
Fluence describes the number of particles passing through a unit area. Flux is defined as
the amount of fluence exhibited in a period of time. A radiation absorbed dose (rad)
describes the total amount of radiation absorbed by a material (i.e., semiconductor
material on an FPGA). A rad is equivalent to 100 ergs per gram of energy absorbed by
the material under consideration [12]. This quantity is specific to both the type of
radiation under consideration and the absorbing material. Linear energy transfer (LET)
expresses the amount of energy a particle transfers to the semiconductor material per unit
length in units of MeV–cm²/mg [12]. The combination of a specific LET and fluence are
often used to describe the environment in which a device is operating [13]. Total ionizing
dose (TID) is a measurement of the total radiation exposure of a device during a given time period and is generally expressed in units of rad. TID is used to quantify the effects of prolonged radiation exposure a semiconductor device experiences during its lifetime. FPGAs that exhibit low LET and TID values are preferable since they are less likely to experience SEEs while on orbit.

In addition to LET and/or TID levels, transistor feature size, configuration memory type, and operating clock frequency are all relevant factors when considering which device is best suited to handle the space environment. Smaller transistors require less energy to switch and, consequently, are more susceptible to SEEs [14]. FPGAs featuring static RAM (SRAM) configuration memory are inherently more susceptible to SEEs than their flash memory counterparts [15]. Finally, increased operating clock frequency on FPGAs using FFs is associated with increased potential for SEEs [11].

There are a multitude of factors to consider regarding the radiation hardness of a particular FPGA, but device selection should not automatically favor a FPGA that possesses better inherent radiation hardness. It is also important to bear in mind that each FPGA reacts differently based on the specific model, logic design, and type of radiation incident on the device. For the payload processor, a device featuring a TID of at least 20 krad(Si) in an environment with a LET of at least 10 MeV–cm²/mg and a fluence greater than 1.6x10⁷ ions/cm² is sufficient for a CubeSat mission.

3. Memory Requirements

Device memory selection needs to take capacity and latency into account. Configuration memory type was accounted for as part of the RADHARD performance requirements of the FPGA. Capacity requirements of the payload processor substantially outweigh those of the launch sequencer. This is primarily due to the simplicity of the launch sequencer’s design as a finite–state machine (FSM) with only 13 distinct states [2]; however, the payload processor is a complete pipelined design that runs continuously on the contents of its instruction memory and performs processing on the data it receives from its various I/O interfaces. Furthermore, the total block RAM available on the FPGA is divided into two separate memory segments for instruction and data storage. It is also
important to consider a 32–bit word (4 bytes) is the smallest usable segment of instruction memory for this processor. At a minimum, the selected device must be able to store 1 kilobyte of both instructions and data.

Memory latency was also considered as part of the selection process. This initial design of the payload processor does not maximize use of memory read and write operations; however, the purpose of the payload processor development is to eventually support more advanced sensor technologies for future evolutions of CubeSats. The possibility exists to support star trackers performing attitude control, signal monitoring antennas, and cameras to perform high–resolution imagery. These technologies frequently interact with memory and require the lowest–latency capabilities available. During this phase of development, latency was considered secondary to memory capacity and type but gains in importance as new payloads are added. Since instruction throughput is limited by instruction memory latency, the selected device should have a memory interface clocking frequency that is at least half of its maximum clock frequency. If the selected device has a memory interface clock frequency less than half of its maximum frequency, the use of caches for instruction and data memories is considered.

4. Clocking Requirements

Clock speed was considered the most critical performance requirement regarding each device’s clock management capabilities. The fastest clock available was preferred for the payload processor to reduce the time required to execute its instructions. No minimum clock speed can be established at this time since the payload processor is limited to executing a small instruction set from a small instruction memory. As payloads become more advanced, a minimum processing speed may need to be established based on the associated algorithm. A metric of interest during the design process of this thesis is clock speed when the design is placed on the device relative to the device’s maximum possible clock speed. This gives an indication of the processor’s combinatorial logic delay.
5. **I/O Requirements**

The payload processor must give greater consideration to the I/O capabilities of a FPGA compared to the launch sequencer. Given the size of a CubeSat, only a limited number of physical payloads can be attached prior to launch. Even the smallest payloads will likely limit a FPGA to no greater than 20 I/O interfaces. Most FPGAs easily exceed this requirement; however, the robustness of the I/O standards supported on the available interfaces a more important factor in device selection. Maximizing the number of I/O standards and placement options allows the greatest flexibility in mission planning and satellite capabilities. The selected FPGA must support the I/O standards included in left column of Table 1, which are used on some of the most common physical I/O interfaces listed in the right–hand column.

<table>
<thead>
<tr>
<th>I/O Standard</th>
<th>Commonly Associated Interfaces</th>
</tr>
</thead>
<tbody>
<tr>
<td>Low Voltage Differential Signaling (LVDS)</td>
<td>SCSI, Serial ATA, PCI Express, RapidIO, FireWire, SpaceWire</td>
</tr>
<tr>
<td>Stub Series Terminated Logic (SSTL)</td>
<td>DDR SDRAM, PCI Express</td>
</tr>
<tr>
<td>Peripheral Component Interconnect (PCI)</td>
<td>USB/Serial, Disk Drives, Network Cards</td>
</tr>
</tbody>
</table>

6. **SWAP Requirements**

All spacecraft are subject to limitations on SWAP resources. Power was the primary SWAP performance requirement considered for this thesis since the payload processor aims to greatly expand its computing capabilities over the launch sequencer. There is no strict requirement on the power ratings of batteries or other power sources carried onboard a CubeSat [3]. Ideally, the payload processor should consume a small fraction of the power used by its associated payload. FPGA manufacturers do not often publish power consumption explicitly because it is highly dependent on the number and type of I/O applications being used among other factors. The devices considered in Section C of this chapter are evaluated based on the maximum and normal operating voltage and current characteristics instead.
Size, area, and weight limitations are primarily dictated by the type of satellite. The FPGA and associated hardware components are required to fit on a printed circuit board (PCB) to be placed within the CubeSat rails. The interior length and width dimensions of a CubeSat are not directly specified by [3]; however, scale models in the NPS CubeSat laboratory suggest a PCB length and width of approximately 80 mm fits well within the CubeSat rails. The weight of the FPGA is relatively small even compared to the CubeSat. Once development reaches a phase where a complete PCB design and associated payload are required, size and weight become significant factors.

C. **CANDIDATE FPGA DEVICES**

Two candidate devices for initial implementation and testing of the payload processor are considered in this thesis. The first of these devices was the Actel (now Microsemi, Inc.) ProASIC3 model FPGA, which had been previously selected for implementation of the launch sequencer. Parobek chose this device primarily because its flash–based memory was more resistant to SEUs and SELs, and it provides a memory state immediately after power was applied [1]. The ProASIC3 logic architecture, though small, was sufficient to support the launch sequencer state machine. The second device considered was the Xilinx Virtex–5 model FPGA. This device is significantly more powerful in processor and I/O capabilities than the ProASIC3 but was considered a less qualified device for the launch sequencer because of its SRAM configuration memory. The utility of these devices in the context of a complete payload processor must be reconsidered given new design criteria. This analysis should not serve as the final selection of a device for space–flight but rather to help determine the type and quantity of future applications the payload processor is able to support.

Multiple variants of each FPGA family are available on the market. The ProASIC3 family features the ProASIC3E, ProASIC3 nano, ProASIC3L, and radiation tolerant ProASIC3 variants. Xilinx manufactures the generic Virtex–5, Virtex–5Q (defense–grade), and Virtex–5QV (space–grade) devices [16]. The generic Virtex–5 features the LX (high–performance general logic), LXT (high–performance logic with advanced serial connectivity), SXT (high–performance signal processing with advanced
serial connectivity), TXT (high-performance systems with double density advanced serial connectivity), and FXT (high-performance embedded systems with advanced serial connectivity) [17]. Further elaboration of both the ProASIC3 and Virtex–5 devices is made in the following sections with respect to the performance requirements discussed in Section B of this chapter.

As of this writing, the NPS CubeSat laboratory possesses four FPGA development boards. Three of these boards feature ProASIC3 family devices, while only one features the Virtex–5. Two of the ProASIC3 boards are identical and were custom developed as flight test models by Parobek [1]. These boards are depicted in Figure 8 and feature the “A3PN250” model of the ProASIC3 family, eight light-emitting diodes (LEDs), power adapter, and joint test action group (JTAG) chain. The third ProASIC3 development board, shown in Figure 9, was a development board product purchased from Actel. It contains the A3P1000 model FPGA, eight LEDs, and eight switches. The Virtex–5 is featured in the Digilent Genesys development board, displayed in Figure 10. This board features eight LEDs, eight switches, three buttons, display screen, video output, USB–A/B, RS–232C, Ethernet, JTAG, and mini–USB ports. The mini–USB ports are used exclusively to program the device directly via the Xilinx iMPACT or Digilent Adept software programs.

Figure 8. Two Actel ProASIC3 boards developed for flight testing by Parobek.
1. Actel ProASIC3

The pertinent performance characteristics for the Actel ProASIC3 family of devices are summarized in the following sections.
a. Architecture

The ProASIC3 FPGA is optimized for cost and minimum power consumption while maintaining competitive processing speed [18]. A physical layout of the ProASIC3EL device is displayed in Figure 11. The hardware architecture is accomplished through the use of proprietary CLBs called VersaTiles. The ProASIC3 FPGA contains 384 – 24,576 VersaTiles depending on the device model [19]. The standard layout of a VersaTile is depicted in Figure 12. Each VersaTile accepts four input signals and can represent one of four logic design functions:

- A three–input logic function
- Latch with clear or set
- D–flip–flop with clear or set
- Enable D–flip–flop with clear or set

![Figure 11. Standard layout of a Military ProASIC3EL FPGA, from [20].](image-url)
Radiation testing performed by Poivey et al. on the ProASIC A3P3000L model produced data describing the response of the device to heavy ion and proton induced SEEs for multiple shift register, memory, and clock configurations [22]. It was determined the ProASIC3 model under consideration can resist the occurrence of SELs and configuration memory SEUs from a LET of at least 55 MeV–cm$^2$/mg and 1x10$^7$ ions/cm$^2$ of fluence [22]. The TID of the device tested was 65 krad(Si). Signal propagation times through combinatorial logic increased by 50% throughout the FPGA as the device approached its TID [22]. As expected, the flash–based ProASIC3 exhibited excellent TID tolerance and resistance to hard–error SEEs during performance testing.

Memory Capabilities

The ProASIC3 family of FPGAs features both a configuration flash memory and SRAM data memory on the device. The flash configuration memory functions as a read–only memory (ROM) and maintains its contents after power is no longer applied to the
device. The data memory contained in the ProASIC3 family ranges from 36,864 – 516,096 bits of SRAM divided into blocks of 4,608 bits each. This capacity translates to at most 16,128 32–bit words that can be stored in SRAM. This memory operates only in synchronous mode, using separate read and write clocks operating at any desired frequency up to 250 MHz. Multiple configurations of width and depth size can be specified via the designated write and read width pins.

\( d \). I/O Capabilities

The ProASIC3 family of FPGAs supports two, four, or eight I/O banks depending on the model selected, ultimately enabling the device to support 68–620 I/O interfaces [19]. Each of the I/O banks contain several mini–banks that provide 8–18 individual I/O pins [19]. Each of the I/O devices on a single mini–bank share a common reference voltage designated by configuring one I/O pin as the controller for that mini–bank [19]. General I/O banks on the ProASIC3 can support differential voltages of 1.2 V, 1.5 V, 1.8 V, 2.5 V, and 3.3 V [19]. The robust “mini–bank” structure and array of supported voltages allow the ProASIC3 FPGAs to support the I/O standards displayed in Table 2.

<table>
<thead>
<tr>
<th>Standard</th>
<th>Type</th>
<th>Voltage(s) (V)</th>
<th>Frequency (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>LVTTL</td>
<td>Single–Ended</td>
<td>3.3</td>
<td>&lt; 200</td>
</tr>
<tr>
<td>LVCMOS</td>
<td>Single–Ended</td>
<td>1.5/1.8/2.5/3.3</td>
<td>&lt; 200</td>
</tr>
<tr>
<td>PCI</td>
<td>Single–Ended</td>
<td>3.3</td>
<td>33/66</td>
</tr>
<tr>
<td>PCI–X</td>
<td>Single–Ended</td>
<td>3.3</td>
<td>33/66/133</td>
</tr>
<tr>
<td>HSTL (Class I and II)</td>
<td>Voltage Referenced (0.75 V)</td>
<td>1.5</td>
<td>&lt; 400</td>
</tr>
<tr>
<td>SSTL (Class I and II)</td>
<td>Voltage Referenced (1.25 V/1.5 V)</td>
<td>2.5/3.3</td>
<td>Not Listed</td>
</tr>
<tr>
<td>GTL</td>
<td>Voltage Referenced (0.8 V)</td>
<td>2.5/3.3</td>
<td>20 – 40</td>
</tr>
<tr>
<td>GTLP</td>
<td>Voltage Referenced (1.0 V)</td>
<td>2.5/3.3</td>
<td>20 – 40</td>
</tr>
<tr>
<td>LVPECL</td>
<td>Differential (+/– 850 mV)</td>
<td>3.3</td>
<td>Not Listed</td>
</tr>
<tr>
<td>LVDS</td>
<td>Differential (+/– 350 mV)</td>
<td>2.5</td>
<td>&lt; 200</td>
</tr>
</tbody>
</table>
e. Clock Management

The ProASIC3 family FPGAs features a total of six global clocks available for routing on across the chip [20]. An additional four clock signals are available for each quadrant in the device [20]. These signals are controlled by clock conditioning circuitry, each of which contains a phase–locked loop (PLL). In the normal power operating mode ($V_{cc} = 1.5 \, \text{V}$), each clock frequency can be set between 0.75 – 350 MHz with a duty cycle between 48.5 – 51.5% [19]. The low power operating mode ($V_{cc} = 1.2 \, \text{V}$) on the ProASIC3EL can produce clock signals between 0.75 – 250 MHz with a duty cycle between 48.5 – 51.5% [23].

f. Power Consumption and Distribution

Each device in the ProASIC3 family supports a 1.5 V mode of operation [19]. The ProASIC3EL variant also features a 1.2 V mode of operation [23]. The recommended operating ranges for core input voltage are 1.425 – 1.575 V for the 1.5 V normal operating mode [19] and 1.14 – 1.575 V for the 1.2 V mode on the ProASIC3EL [23]. Each device will handle maximum voltage limits from –0.3 – 1.65 V for short periods.

ProASIC3 devices achieve the goal of delivering low–power computing capabilities through two features of the device. First, power–on and configuration loading is accomplished using non–volatile flash memory. Since flash memory stores charge to retain its state, it is available immediately following power–on [21]. SRAM initializes to an indeterminate state which can cause “inrush” currents to draw additional power during the power–on stage [21].

The ProASIC3 FPGA’s second power management feature is the low power modes that can be implemented once power–on and normal operating conditions are achieved. There are four low power modes able to be implemented on the ProASIC3 model FPGA [21]:

- Static (Idle) Mode
- User Low Static (Idle) Mode
- Sleep Mode
- Shutdown Mode
Both static mode and user low static mode cease clock function but retain the current states of all SRAM, I/Os, and registers [21]. These devices draw minimum current to maintain their state while in either mode [21]. The ProASIC3EL features two additional modes within the static mode of operation known as flash freeze modes, which retain the states of memory devices like static mode but do not stop clocking [21]. Sleep Mode shuts off voltage to the entire FPGA core and requires logic states to be saved to nonvolatile memory in order to restart in the same state [21].

2. **Xilinx Virtex–5**

The pertinent performance characteristics for the Xilinx Virtex–5 family of devices are summarized in the following sections. The information summarized focuses primarily on the VLX50T variant found on the Digilent Genesys development board. For information regarding other models and features, the reader should refer to [17], [24], and [25].

**a. Architecture**

The Xilinx Virtex–5 FPGA features a two–dimensional array of CLBs, each containing two sub–elements called logic slices. There are two types of logic slices in the Virtex–5: SLICEL and SLICEM. Each SLICEL element can be programmed to represent one of four standard logic functions [24]:

- Look–up Tables
- Storage Elements
- Wide–function Multiplexers
- Carry Logic

The SLICEM element implements these same four logic functions but also contain 256 bits of distributed RAM and a 128–bit shift register [24]. The SLICEL CLB is displayed in Figure 13 and the SLICEM CLB in Figure 14.
Figure 13. Xilinx Virtex–5 SLICEL CLB, from [24].

Figure 14. Xilinx Virtex–5 SLICEM CLB, from [24].
The row and column arrangement of CLBs and their associated slices is displayed in Figure 15. Each CLB is connected to a switching matrix, which is further connected to the general routing matrix of the FPGA [24]. SLICEM CLBs are featured in alternating columns moving across the FPGA [24]. A total of 7,200 logic slices are available in the VLX50T model, 1,800 of which are SLICEM variants [17].

![Figure 15. Standard CLB format of a Virtex–5 FPGA, from [24].](image)

**b. Radiation Tolerance**

Radiation testing for the Virtex–5 was performed by Hiemstra et al. to determine the characterization of SEUs on an SRAM based FPGA [26]. This experiment was performed on the same FPGA model featured on the Genesys development board in the NPS CubeSat laboratory. The Virtex–5 was tested to an equivalent of 53 years of heavy ion fluence using a LET of 10 MeV–cm²/mg [26]. This resulted in a TID of 23.8 krad(Si) exhibiting no SELs or other hard-error effects [26]. The length and intensity of the Virtex–5 testing is noticeably shorter than the tests performed on the ProASIC3 [22]; however, the Virtex–5 performed well during these tests and satisfied the desired performance criteria for radiation hardness.
c. **Memory Capabilities**

The Virtex–5 features SRAM configuration memory, which loses its memory state when power is removed from the device. Data storage is implemented using 36 kbit SRAM blocks that can support a multitude of word size and depth combinations. The total number of memory blocks available depends on the specific device chosen. The VLX50T variant features 60, 36 kbit RAM blocks, resulting in 2.16 Mbits of total memory [24]; however, other models within the Virtex–5 family are capable of supporting up to 18.5 Mbits of block RAM [17]. The capacity of the VLX50T translates to 67,500 total words that can be stored in block RAM on the specified FPGA. The Virtex–5 block RAM operates at the same frequency (450 – 550 MHz depending on the speed grade) as the primary clock for the device [25].

d. **I/O Capabilities**

The VLX50T device features 15 I/O banks supporting 480 user I/O interfaces [17]. Organization of the I/O banks is depicted in Figure 16 for a VLX30 device. The I/O banks each typically contain 40 I/O inputs for the outer columns [24]. The organization of the inner column of the FPGA varies based on the total number of banks and interfaces featured on the device [24]. A reference voltage pin is automatically allocated for one of every 20 pins if a single–ended I/O standard requiring a differential input buffer is used [24]. Each low–voltage I/O standard implemented on the device must use an I/O bank for its associated output drive voltage [24]. The I/O standards supported by the Virtex–5 are listed in Table 3.
Figure 16. Organization of I/O banks within the Virtex–5 FPGA, from [24].

Table 3. I/O standards, types, voltages, and operating frequencies for the Virtex–5, after [25].

<table>
<thead>
<tr>
<th>Standard</th>
<th>Type</th>
<th>Voltage(s) (V)</th>
<th>Frequency (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>LVTTL</td>
<td>Single–Ended</td>
<td>3.3</td>
<td>&lt; 200</td>
</tr>
<tr>
<td>LVCMOS</td>
<td>Single–Ended</td>
<td>1.2/1.5/1.8/2.5/3.3</td>
<td>&lt; 200</td>
</tr>
<tr>
<td>PCI</td>
<td>Single–Ended</td>
<td>3.3</td>
<td>33/66</td>
</tr>
<tr>
<td>PCI–X</td>
<td>Single–Ended</td>
<td>3.3</td>
<td>66/133</td>
</tr>
<tr>
<td>HSTL (Class I/II/III/IV)</td>
<td>Voltage Referenced (0.75 V)</td>
<td>1.5/1.8</td>
<td>&lt; 400</td>
</tr>
<tr>
<td>HSTL (Class I)</td>
<td>Voltage Referenced</td>
<td>1.2</td>
<td>Not Listed</td>
</tr>
<tr>
<td>SSTL (Class I and II)</td>
<td>Voltage Referenced (1.25 V)</td>
<td>1.8/2.5</td>
<td>Not Listed</td>
</tr>
<tr>
<td>GTL</td>
<td>Voltage Referenced</td>
<td>2.5/3.3</td>
<td>20 – 40</td>
</tr>
<tr>
<td>GTLP</td>
<td>Voltage Referenced</td>
<td>2.5/3.3</td>
<td>20 – 40</td>
</tr>
<tr>
<td>LVPECL</td>
<td>Differential (+/– 850 mV)</td>
<td>3.3</td>
<td>Not Listed</td>
</tr>
<tr>
<td>LVDS</td>
<td>Differential (+/– 350 mV)</td>
<td>2.5</td>
<td>&lt; 200</td>
</tr>
<tr>
<td>Differential HSTL (Class I/II)</td>
<td>Differential</td>
<td>1.5/1.8</td>
<td>Not Listed</td>
</tr>
<tr>
<td>Differential SSTL (Class I/II)</td>
<td>Differential</td>
<td>1.8/2.5</td>
<td>Not Listed</td>
</tr>
<tr>
<td>RSDS</td>
<td>Differential</td>
<td>2.5</td>
<td>Not Listed</td>
</tr>
</tbody>
</table>
e. Clock Management

The Virtex–5 is capable of implementing up to 32 global clock signals [24]. Each global clock signal is generated through a global clock buffer [24]. On the physical FPGA, these clock buffer components are driven by six clock management tiles (CMTs) built into the FPGA [24]. The Virtex–5 datasheet does not offer a single clock speed for reference like the ProASIC3 [25]. Rather, it lists numerous clock speeds based on the speed grade and applicable logic construct being used. Numerous combinatorial and sequential logic modules are necessary to create the payload processor; however, most 32–bit components can be implemented at a clock speed of 450 MHz for the speed grade of the VLX50T available at the NPS CubeSat laboratory [25]. Higher speed grades in the Virtex–5 family feature clock speeds up to 550 MHz for many 32–bit logic structures [25].

f. Power Consumption and Distribution

The typical operating mode of the VLX50T model FPGA requires a supply voltage between 0.95 – 1.05 V and 2 mA of current [25]. Increased speed grades and processing rates for various Virtex–5 models require more current [25]. The internal core cannot handle voltages outside the range of –0.5 – 1.1 V, and no greater than +/-100 mA of current may flow through any pin on the device [25]. The Virtex–5 does not feature any special modes comparable to those of the ProASIC3 family to minimize power consumption.

D. FPGA SELECTION

The two FPGAs considered are representative of the variety of reconfigurable devices manufactured to meet the needs of several different industries. The ProASIC3 device represents a low–power, low–cost solution with impressive inherent RADHARD qualities. The Virtex–5 has substantially more logic resources, a faster clock, and supports more I/O standards than the ProASIC3. It is worth noting each of the FPGAs met the initial performance requirements; therefore, it is possible the payload processor can potentially be implemented and tested on either model. It is the author’s view that the Virtex–5 should be the initial development platform for the payload processor, primarily
because it offers the greatest potential for expansion as the technology continues to develop.

E. CHAPTER SUMMARY

The performance requirements for a FPGA that can implement the payload processor were first discussed in this chapter. The categories of CLB capability, memory, I/O capability, clock management, and SWAP each played some role in the selection of a device. The performance characteristics of two FPGAs considered potential candidates to support the processor were then summarized. It was determined the Virtex–5 would serve as a better model for development of the payload processor because its logic resources, clock speed, and I/O standards offered greater potential for future expansion of the technology.
IV. DESIGN OF THE PAYLOAD PROCESSOR

A. PIPELINED PROCESSOR COMPONENTS AND ORGANIZATION

The payload processor was designed as a standard five-stage pipeline using a subset of the MIPS core ISA proposed by Patterson and Hennessy [5], shown in Figure 17. This processor features instruction fetch (IF), instruction decode (ID), execution (EX), memory (MEM), and write–back (WB) stages, each separated by a pipeline register. The payload processor is effectively three of these processors running in parallel on a common clock, with ITMR voter circuits at each of the inputs to the triplicated pipeline registers. The goal of the ITMR implementation is to provide sufficient redundancy to correct SEUs encountered at any individual stage logic or any pipeline register at the stage input.

![Payload processor pipeline architecture, from [5].](image-url)
The standard MIPS instruction formats executed by the processor are shown in Figure 18. The MIPS ISA features three primary instruction variants: I–format, R–format, and jump format. Every MIPS instruction is 32 bits in length. The six most significant bits always contain an opcode which provides a more specific identification of the function. I–format instructions take two register arguments and a 16–bit offset. Generally, the remaining register \( rt \) is used as a destination for the result of the operation. However, branch I–format instructions compare the \( rs \) and \( rt \) registers and determine a new address using the 16–bit offset. The R–format instruction takes two register addresses \( rs \) and \( rt \) and places the result of the ALU operation in a third register \( rd \).

![Figure 18. Three primary MIPS instruction formats supported by the payload processor.](image)

**B. PAYLOAD PROCESSOR DESIGN**

The standard pipeline design and ITMR architecture had to be meshed into a single design for the payload processor. A conceptual structure for the payload processor is displayed in Figure 19. The pipeline stages feature combinatorial logic and reside on either side of a pipeline register. Each of the triplicated processors passes their outputs to a set of registers, where the values are held until the next positive edge of the clock signal. During the next positive clock edge, each of the three sets of registers pass their values to three voters. The voters produce the same result because they all have the same inputs. This ensures each of the triplicated processors has the majority voted signal during the next stage of combinatorial logic, and any SEUs have effectively been eradicated.
Numerous logic components were necessary to build a pipelined processor. The combinatorial logic components were grouped into five pipeline stages. Each pipeline stage took inputs from and passed outputs to a pipeline register. This process begins with the program counter (PC) register passing an instruction address to the IF stage. The instruction address points to an instruction located in memory, and the instruction data is then passed to the ID stage. The instruction is decoded in the ID stage, and the register operands are obtained. The EX phase performs mathematical operations using the ALU and calculates a possible branch address. The MEM stage can then read or write to data memory and determine if a branch or jump from the next PC address is appropriate. Finally, the WB stage writes new data from memory or the ALU back to the register file. Each of these stages and their associated logic components are discussed in greater detail in the following sections.

1. **IF Stage**

   During the IF stage, a PC address is translated into an actual MIPS instruction. The instruction memory acts as a ROM and forwards the instruction located at the index specified by the PC to the IF/ID register. Additionally, the PC plus four bytes address must be calculated and sent back to the PC selector module. Since the payload processor does not yet implement advanced features such as data forwarding and hazard detection, the programmer must ensure at least four no-operation (NOP) instructions follow a
branch or jump instruction. This prevents the processor from operating on instructions prior to the feedback address of a branch or jump instruction being received at the PC.

The schematic organization of the IF stage is displayed in Figure 20. The input signals \( pc_a \), \( pc_b \), and \( pc_c \) are the triplicated PC addresses. These inputs are sent to the \texttt{inst_mem_trip} module which houses the byte–addressable instruction memory and \texttt{npc_addr_adder_trip} module which calculates the PC plus four address. The output of the \texttt{inst_mem_trip} module is then forwarded to the ID stage. The jump address is also calculated during the IF stage. Per standard MIPS jump instruction format, the jump address is a concatenation of the four most significant bits of the input PC address, the 26 least significant bits of the jump instruction, and two zeroes [28]. The latter ensures the jump address references the first byte of a word–length instruction. This address is then forwarded through each of the pipeline stages and pipeline registers until a jump feedback signal is generated in the MEM stage.

![Figure 20. Schematic view of the payload processor IF stage modules.](image)

2. **ID Stage**

The purpose of the ID stage is to decode an instruction into its various subcomponents and trigger the appropriate control signal flags. A schematic layout of the ID stage is displayed in Figure 21. The first major component featured in the ID stage is
the `control_tmr3` module, which contains three control modules that use the opcode, `funct`, and `rt` register fields of an instruction to determine the appropriate ALU operation code and flag settings. The effect of each of these individual flag settings are described in Table 4.

Figure 21. Schematic view of the payload processor ID stage modules.
Table 4. Control module flag effects when asserted and de–asserted, after [29].

<table>
<thead>
<tr>
<th>Signal</th>
<th>Effect when De–asserted</th>
<th>Effect when Asserted</th>
</tr>
</thead>
<tbody>
<tr>
<td>regdst</td>
<td>The write–back register address is contained within the rt register field (inst. bits 20 – 16).</td>
<td>The write–back register address is contained within the rd register field (instruction bits 15 – 11).</td>
</tr>
<tr>
<td>regwrite</td>
<td>None</td>
<td>The write–back register specified by the instruction will be written with new data from the ALU or data memory during the WB stage.</td>
</tr>
<tr>
<td>alusrc</td>
<td>The second ALU operand comes from the second register file output.</td>
<td>The second ALU operand comes from the immediate field (lowest order 16 bits of the instruction)</td>
</tr>
<tr>
<td>memread</td>
<td>None</td>
<td>Instruction is a load word. Signal is passed as a read enable to data memory. This triggers the memory to read a specified word to write–back to the register file.</td>
</tr>
<tr>
<td>memwrite</td>
<td>None</td>
<td>Instruction is a store word. Signal is passed as a write enable to data memory. This triggers the memory to write a word from the register file.</td>
</tr>
<tr>
<td>memtoreg</td>
<td>The value used to write–back to register comes from ALU.</td>
<td>The value used to write–back to the register comes from data memory.</td>
</tr>
<tr>
<td>beq</td>
<td>None</td>
<td>Triggers the next PC address to be the calculated branch address if the ALU inputs are equal. Used only with the beq instruction.</td>
</tr>
<tr>
<td>bne</td>
<td>None</td>
<td>Triggers the next PC address to be the calculated branch address if the ALU inputs are not equal. Used only with the bne instruction.</td>
</tr>
<tr>
<td>blz</td>
<td>None</td>
<td>Triggers the next PC address to be the calculated branch address if the first ALU input is less than zero. Used only with the blz instruction.</td>
</tr>
<tr>
<td>bgz</td>
<td>None</td>
<td>Triggers the next PC address to be the calculated branch address if the first ALU input is greater than zero. Used only with the bgz instruction.</td>
</tr>
<tr>
<td>blez</td>
<td>None</td>
<td>Triggers the next PC address to be the calculated branch address if the first ALU input is less than or equal to zero. Used only with the blez instruction.</td>
</tr>
<tr>
<td>bgez</td>
<td>None</td>
<td>Triggers the next PC address to be the calculated branch address if the first ALU input is greater than or equal to zero. Used only with the bgez instruction.</td>
</tr>
</tbody>
</table>
A second major component in the ID stage is the register_file module, which holds each of the 32, 32-bit registers used by the processor. A more detailed schematic view of the register file’s internal organization is displayed in Figure 22. The register file is unique amongst all combinatorial logic components featured in the processor in that it must use a clock to perform its operation. During the design process, we determined the register file should be capable of reading from two registers and writing to one register on each clock cycle. This eliminates one additional clock cycle delay that is otherwise required to update data during the WB stage.

Read and write operations within the register file each operate on different logic components. The register file receives three parallel addresses and data inputs to write to registers. The write register address is expressed as a 5-bit value and is decoded to one of 32 individual write enable signal by the reg_addr_decoder_bhv module. These signals are then distributed to the various registers along with the write data, but only one register receives the write enable signal per clock cycle. This register rewrites its old data with the
write input data while the clock cycle is high. When the clock signal is low, two sets of three parallel read address inputs are forwarded to one of six \texttt{reg\_array\_multiplexer} modules. These modules read each of the 32 register states (including the register which was just written on the previous half of the clock cycle) and select the appropriate register as an output based on the decoded input read address. The three \texttt{reg\_array\_multiplexer} modules in the upper–half of Figure 22 operate in parallel for the first register operand input to the ALU. The modules in the lower–half operate in parallel for the second register operand input to the ALU.

3. EX Stage

The EX stage performs mathematical and logical operations on data extracted from the register file and immediate (IMM) field of the instruction. The results of these operations are forwarded to the MEM and WB stages for storage in data memory or the register file, respectively. A schematic layout of the EX stage components is shown in Figure 23. The EX stage heavily relies on the \texttt{addr\_adder\_trip} and \texttt{ALU\_trip} modules to accomplish its functions. The EX stage performs branch address calculation using the \texttt{addr\_adder\_trip} module. It calculates the branch address by adding the offset specified in the IMM field to the PC plus four address calculated in the IF stage. This result is forwarded to a PC selector module and onto the PC register if a branch is taken.

The \texttt{ALU\_trip} module performs all of the necessary mathematical functions required to implement the processor’s ISA. The first of two operands is received from the register file. The second ALU operand is received from the \texttt{multiplex\_2to1\_nbit\_trip} module which selects between the second value forwarded from the register file and the IMM field. The selector bit for the \texttt{multiplex\_2to1\_nbit\_trip} module is the \texttt{alusrc} flag. Eleven ALU operations are required to implement the payload processor. These operations and their corresponding ALU codes are displayed in Table 5. Many other mathematical and logical operations are possible in a MIPS architecture, but a simple ISA was desired for this initial iteration of the payload processor.
Figure 23. Schematic view of the payload processor EX stage modules.
Table 5. Listing of the ALU operations necessary to implement the subset of MIPS core instructions used by the payload processor.

<table>
<thead>
<tr>
<th>Control Code</th>
<th>Operation</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Addition</td>
</tr>
<tr>
<td>1</td>
<td>Subtraction</td>
</tr>
<tr>
<td>2</td>
<td>Logical AND</td>
</tr>
<tr>
<td>3</td>
<td>Logical OR</td>
</tr>
<tr>
<td>4</td>
<td>Logical XOR</td>
</tr>
<tr>
<td>5</td>
<td>Logical NOR</td>
</tr>
<tr>
<td>6</td>
<td>Shift Left Logical</td>
</tr>
<tr>
<td>7</td>
<td>Shift Left Variable</td>
</tr>
<tr>
<td>8</td>
<td>Shift Left Logical</td>
</tr>
<tr>
<td>9</td>
<td>Shift Left Variable</td>
</tr>
<tr>
<td>10</td>
<td>Set on Less Than</td>
</tr>
</tbody>
</table>

In addition to the mathematical operations featured in Table 5, six comparison flags are set for each pair of inputs to the ALU. These comparison flags and their assertion criteria are summarized in Table 6 and match one-to-one with the branch flags generated in the ID stage. These comparison bits and branch flags are evaluated during the MEM stage to determine if the next PC address should be the resulting calculation of the `addr_adder_trip` module.

Table 6. EX stage comparison flags and their assertion criteria.

<table>
<thead>
<tr>
<th>Comparison Flag</th>
<th>Assertion</th>
</tr>
</thead>
<tbody>
<tr>
<td>seq</td>
<td>Both ALU operands are equal</td>
</tr>
<tr>
<td>sne</td>
<td>The ALU operands are not equal.</td>
</tr>
<tr>
<td>slz</td>
<td>The first ALU operand is less than zero.</td>
</tr>
<tr>
<td>sgz</td>
<td>The first ALU operand is greater than zero.</td>
</tr>
<tr>
<td>slez</td>
<td>The first ALU operand is less than or equal to zero.</td>
</tr>
<tr>
<td>sgez</td>
<td>The first ALU operand is greater than or equal to zero.</td>
</tr>
</tbody>
</table>

4. MEM Stage

The MEM stage serves two primary functions in the payload processor. First, it contains the triplicated data memory where words can be loaded into or stored from the register file. Secondly, it is where the processor determines if the conditions for a branch or jump instruction have been met and what the next appropriate address is for the PC.
register. A detailed explanation of the implementation of these functions is the focus of this section.

The payload processor data memory module is contained within the data_mem_trip component, displayed in Figure 24. This memory is divided into three equal segments of 45 kilobytes (kB) each. The memory is byte-addressable, but only one word may be read or written during each clock cycle. This is because the load word and store word instructions are currently the only instructions that can manipulate data memory. Each of these operations requires a complete clock cycle per pipeline stage.

The data memory component of the MEM stage can accept inputs from and read outputs to either the register file or UART. The functionality of the UART read and write operations are explained in Section C of this chapter, but perform a very similar operation on a separate address space in the memory. For standard read operations, the address fields addr_a, addr_b, and addr_c receive the forwarded output of the ALU_trip module which represents a data memory address. Likewise, during a store word instruction, the ID stage forwards data from the register specified in the rt field of the instruction. Both read and write operations must be accompanied by read and write enable signals which are linked to the memread and memwrite flags generated in the ID stage. The ALU_trip module result is also forwarded past the data memory module to the WB stage via the buf_32bit_tmr module for instructions other than load or store which writes the result to the register file.

Determination of a branch or jump instruction is performed by the branch_multiplexer_trip and addr_sel_module_trip modules displayed in Figure 25. The overall output of these modules is forwarded to pc_select_module which makes the final selection of the next PC address by taking pending interrupt requests into account. The branch_multiplexer_trip module receives the branch flags generated in the ID stage and the comparison bits from the EX stage as inputs. If a branch flag and its corresponding comparison flag are both set, the conditions for a branch instruction have been met, and the branch_jump_sel_out output pin of the multiplexer is set high. Next, the branch address calculated in the EX stage and jump address calculated in the IF stage are received as inputs to the address selector module along with the branch and jump
flags at the address selector module. If a branch or jump flag is set high, the module selects the corresponding address to forward back to the PC selector module and sets the **br_jump_sel_out** family of outputs to logic high. If neither flag is set high, the module does not forward a new address or set the **br_jump_sel_out** outputs high. This results in the PC selector module choosing a different PC address on the following clock cycle.

Figure 24. Schematic view of the data memory module in the MEM stage.
After the branch multiplexer and address selector have completed execution, the result is forwarded to the `pc_select_module_trip` component, displayed along with the PC register in Figure 26. The purpose of the `pc_select_module_trip` is to choose whether the next PC address is that of the current PC plus four, the forwarded branch or jump address, or the start of the ISR if an interrupt is present. The module gives highest priority to an interrupt request (IRQ). If an IRQ from the UART is present, it forces the processor to execute the ISR. This mode is discussed in greater detail in Section C of this chapter. If an IRQ is not present but the `br_jump_sel_out` flag is set high, the module uses the forwarded branch or jump address as the next PC. Finally, if neither an IRQ nor the branch/jump flag is present, the module selects the PC plus four address calculated at the IF stage. Refer to Subsection 1 of this section. The PC plus four address present at `pc_select_module_trip` following a branch is not the PC address of the instruction.
immediately following a branch instruction. Four nop instructions must be inserted after a branch to ensure instructions are not executed unintentionally while the processor is making a determination on the branch criteria.

Figure 26. Schematic view of the PC selector module and its connection to the PC register.

5. WB Stage

The WB stage essentially marks the final step in the execution of an instruction, and consists of only one multiplexer, displayed in Figure 27. Every instruction in the payload processor’s ISA, with the exception of store word and jump, writes back new data to the register file. This write–back data comes from either the data memory module (if the instruction is a load word) or the ALU result. The selector bit for this multiplexer is the memtoreg flag generated in the ID stage. The triplicated ALU result is connected to pins input1_a, input1_b, and input1_c in Figure 27 while the triplicated data memory output is connected to the input2_a, input2_b, and input2_c pins. Per Table 4, if the memtoreg flag is set low, the multiplexer forwards the ALU result into the register file. If the memtoreg flag is set high, the multiplexer forwards the data memory output
instead. The new data and register address reach the register file and rewrite the register during the same upper-half of the clock cycle.

Figure 27. Schematic view of the WB stage multiplexer.

C. UART DESIGN

1. Concept

The UART is a generic serial-to-parallel interface the processor pipeline uses to perform data transactions with an I/O device. A single UART module is currently the only I/O interface included on the payload processor as shown in Figure 28. Consequently, it alone generates the IRQ to the processor when it receives data to transfer to memory. This action is completed using an interrupt-driven I/O scheme which calls an ISR built into instruction memory and requires no action by the programmer. The ISR first clears the pipeline, allowing instructions in progress to update the appropriate registers. It then stores the received byte into a data memory address that has been mapped to the UART. This address is stored as data in addresses 22496 – 22499 of data memory. It can be read from memory and tracked by the programmer to determine if new
data has been written to the UART memory space. As additional bytes are written to data memory, the current write address is incremented and replaced in its reserved memory space. Once the UART writes to address 44995 (the end of UART memory space), the address wraps around to address 22500 (the beginning of UART memory space) where it begins overwriting old UART data with new receptions.

The UART also transmits byte–wide parallel data on a serial interface with another I/O device by calling a load word function referencing memory that is mapped to the UART interface. The programmer performs this action by using a store word instruction to write to a data memory address allocated for UART transmission. A load word instruction referencing this memory address then triggers data memory to forward the byte at the same address to the UART. No specific protocol is currently supported by the UART; however, the UART code adapted from [30] was oriented towards a RS–232C interface. The UART’s internal operation and interaction with the processor pipeline are described in the following paragraphs.

![Figure 28. Schematic view of the UART’s connection to the payload processor.](image)

2. **Receive Logic**

The receive logic of the UART is initiated by passing a receive enable signal, indicating the UART should be prepared to receive data immediately. The UART sets the
receive busy signal high if it is not currently receiving data to indicate it is expecting incoming data. Next, the UART initiates a 4–bit sampling counter that is incremented on each cycle of the receive clock. The purpose of this counter is to determine when the UART should sample the input signal to obtain data. Sampling typically occurs near the middle of the input signal time interval to ensure distortion from signal transitions does not cause a bit error [31]. The UART on the payload processor is sampled on the eighth of 16 sample counter increments. Once the signal has been sampled, the resulting bit is appended to a receive register, and a receive counter is incremented. The receive counter samples ten values (bits 0 – 9), after which it sets the receive busy signal to low. The UART uses the first and last receive counter increments to indicate the start or termination of a transaction. Increments 1 – 8 are where the data bits are collected. Upon completion of the ten increments, the data is available to be transferred from the UART to the processor data memory.

3. Transmit Logic

The transmit logic of the UART is initiated by setting the load transmit data signal high. This indicates there is data to be transmitted by the UART that must first be copied into the transmit register. Once data transfer to the transmit register is complete, transmission commences when the transmit enable bit is set high. If the transmit enable bit is set low, the data waits inside the register. The parallel data is transmitted with a zero bit as the preamble, eight data bits, and is finished with a one bit on the tenth clock cycle.

4. Interface with the Payload Processor

The UART transmit and receive functionality must be capable of interfacing with the payload processor. When the UART receives data to pass to memory, it first transmits a set of three IRQ signals to pc_select_module_trip as shown in Figure 28. The pc_select_module_trip component sees these flags during the following clock cycle and automatically switches the new PC address to the IRQ start address since an IRQ has the highest priority. It then saves either the branch/jump address or the PC plus four address into a holding register depending on whether the br_jump_sel family of flags are set
high or low, respectively. This maintains the processor’s place in instruction memory for when the ISR terminates. The PC selector then responds to the UART with an acknowledgement signal, causing the UART to unload its received data as input to the data_mem_trip module in the MEM stage. The data is not immediately written into memory after the unload signal, as the pc_select_module_trip first changes the PC address to the first instruction of the ISR. Four NOP instructions are used to clear the pipeline, followed by a store word instruction. The arguments of the store word instruction are zeroes as the interrupt request processing flag generated by pc_select_module_trip provides indication to the data memory on which input data to write to memory.

After the NOP instructions have cleared the pipeline, the interrupt request flag propagates to the MEM stage where it is received by data memory. Like instruction memory, data memory is divided into three equal segments of 45 kB each; however, data memory is further subdivided into two separate address spaces within each of the three 45 kB segments. One address space is for standard processor data manipulations. The other is dedicated to transactions involving the UART. Byte addresses 0 to 22,499 are used for standard read and write operations. Byte addresses 22,500 to 44,999 are reserved for the UART. The UART writes only a single byte as opposed to a word during an ISR because that is the most data that can be received during a single transaction with the interface. The data received by the UART is already present at the data_mem_trip module because the pc_select_module_trip acknowledged receipt of the original IRQ which triggered the UART to unload its data. The IRQ processing flag triggers the data memory module to take input from the UART data input interface vice the normal input address interface. The data memory module then references an internal register that stores the next appropriate byte address to which the UART data should be written. The UART input data is written to this location and the address register is incremented for the next receive transaction.

Transmit operations require significantly less overhead from the processor. The last index in the UART reserved address space (44,999) is dedicated for transmit operations. This address is the only location in the reserved address space to which the
processor writes data outside of an ISR. A load word instruction sends this data to the UART as well as writes its contents as the eight least significant bits of the register specified in the instruction. The data_mem_trip module also sets the transmit enable flag to high when the data is sent.

D. CHAPTER SUMMARY

The overall design and functionality of the payload processor was discussed in this chapter. The processor pipeline was considered first. Following conventional MIPS design, we saw that it consists of five stages. These five stages contain combinatorial logic modules that operate on smaller components of the 32–bit MIPS instructions. The outputs of these modules are passed onto pipeline registers where they are held until the following clock cycle and ITMR voting is performed. The IF stage translates a PC memory address into an instruction. The ID stage decodes the individual components of this instruction and sets the appropriate flags to control the pipeline modules. The EX stage contains the ALU and calculates the branch address. The MEM stage holds the data memory module and determines if a new branch or jump address should be sent to the PC selector module. Finally, the WB stage selects between the ALU result and data memory output to forward back to the register file.

The UART design and functionality was also considered in this chapter. It is capable of receiving or transmitting a single byte per transaction. The receive logic is interrupt–driven and uses an ISR to clear the pipeline and write the data to a reserved memory space. The transmit logic does not generate an IRQ but requires data to be written a memory address space that is specifically mapped for UART transmissions. A load word instruction then passes the memory contents to the UART for transmission.
THIS PAGE INTENTIONALLY LEFT BLANK
V. TESTING AND ANALYSIS OF THE PAYLOAD PROCESSOR AND UART

The focus of this chapter is on testing the processor at three levels of its logic design. First, the pipeline register and voter functionality is verified by ensuring each of the pipeline registers successfully votes the majority signal. Next, each major category of instruction within the ISA is individually run through the processor to ensure it performs the correct operations, triggers the correct flags, and operates on the correct data. Finally, the UART is integrated into the system, and the receipt and transmission of data on the interface is verified. Each of these simulations was performed using Xilinx’s ISim³ software.

A. PIPELINE REGISTERS

Functionality of the pipeline registers were tested first to ensure the processor is capable of voting out simulated errors. Test cases were introduced to the registers in a Gray code sequence displayed in Table 7. This sequence exercises all eight combinations of the majority voters in the circuit while allowing only one transition per time interval amongst the three members of the set. Four of these combinations contain majority zeroes, and four contain majority ones. Output waveforms of the PC pipeline register are displayed in Figure 29. The output waveforms of all the pipeline registers can be found in Appendix B. The three rows above the clock signal in Figure 29 (npc_a, npc_b, and npc_c) represent the output of each member of the triplicated set of registers. The three rows below the clock signal (pc_a, pc_b, and pc_c) represent input from each of the members of the triplicated set of registers. It is important to note that for each combination of inputs, the register module outputs the majority signal on all three registers during the low to high transition of the clock signal. This confirms the correct operation of the pipeline registers and allows consideration of only one of the three parallel processors during the next stage of testing.

³ ISim® is a registered trademark of Xilinx, Inc.
Table 7. Gray code sequence of inputs to the pipeline registers

<table>
<thead>
<tr>
<th>Sequence</th>
<th>Bit A</th>
<th>Bit B</th>
<th>Bit C</th>
<th>Majority</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>5</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>6</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>7</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Figure 29. Waveform outputs of the triplicated PC register.

B. TEST PROGRAM EXECUTION

The purpose of this testing was to ensure all 24 instructions currently supported by the ISA are executed correctly. This was accomplished by developing an assembly code program that was placed in memory. By providing only the clocking signal and default UART inputs, the processor must run independently. The instruction sequence used to complete this testing is displayed in Table 8. Since instruction memory is byte-addressable, and each new instruction consists of four bytes, instruction address start on multiples of four. Several NOP instructions were required in succession to ensure data hazards were not encountered while register and memory locations were being read and written.
Table 8. Assembly program used for processor testing.

<table>
<thead>
<tr>
<th>Memory Address</th>
<th>Instruction (Assembly Code)</th>
<th>Instruction (Machine Hexadecimal)</th>
<th>Instruction Format</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>ADDI $0, $1, 0×000A</td>
<td>0×2001000A</td>
<td>I</td>
</tr>
<tr>
<td>4</td>
<td>ADDI $0, $2, 0×FF00</td>
<td>0×2002FF00</td>
<td>I</td>
</tr>
<tr>
<td>8 – 12</td>
<td>NOP (x2)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>16</td>
<td>AND $1, $2, $3</td>
<td>0×00221824</td>
<td>R</td>
</tr>
<tr>
<td>20</td>
<td>OR $1, $2, $4</td>
<td>0×00222025</td>
<td>R</td>
</tr>
<tr>
<td>24</td>
<td>NOR $1, $2, $5</td>
<td>0×00222827</td>
<td>R</td>
</tr>
<tr>
<td>28</td>
<td>XOR $1, $2, $6</td>
<td>0×00223026</td>
<td>R</td>
</tr>
<tr>
<td>32</td>
<td>ADDI $0, $7, 0×0006</td>
<td>0×20070006</td>
<td>I</td>
</tr>
<tr>
<td>36</td>
<td>SLL $0, $7, $8</td>
<td>0×00074080</td>
<td>R</td>
</tr>
<tr>
<td>40</td>
<td>SRL $0, $8, $7</td>
<td>0×00083902</td>
<td>R</td>
</tr>
<tr>
<td>44</td>
<td>SLL $0, $7, $8</td>
<td>0×00074004</td>
<td>R</td>
</tr>
<tr>
<td>48</td>
<td>SRL $0, $8, $7</td>
<td>0×00083806</td>
<td>R</td>
</tr>
<tr>
<td>52</td>
<td>ADD $1, $2, $9</td>
<td>0×00224820</td>
<td>R</td>
</tr>
<tr>
<td>56</td>
<td>SUB $1, $2, $10</td>
<td>0×00225022</td>
<td>R</td>
</tr>
<tr>
<td>60</td>
<td>ANDI $2, $11, 0×FFFF</td>
<td>0×304BFFFF</td>
<td>I</td>
</tr>
<tr>
<td>64</td>
<td>ORI $2, $12, 0×0000</td>
<td>0×344C0000</td>
<td>I</td>
</tr>
<tr>
<td>68</td>
<td>SLT $9, $10, $13</td>
<td>0×012A682A</td>
<td>R</td>
</tr>
<tr>
<td>72</td>
<td>SLT $10, $9, $14</td>
<td>0×0149702A</td>
<td>R</td>
</tr>
<tr>
<td>76</td>
<td>SLTI $9, $15, 0×0004</td>
<td>0×292F0004</td>
<td>I</td>
</tr>
<tr>
<td>80</td>
<td>SLTI $10, $16, 0×0004</td>
<td>0×29500004</td>
<td>I</td>
</tr>
<tr>
<td>84</td>
<td>LW $0, $17, 0×0000</td>
<td>0×8C110000</td>
<td>I</td>
</tr>
<tr>
<td>88</td>
<td>SW $0, $11, 0×0000</td>
<td>0×AC0B0004</td>
<td>I</td>
</tr>
<tr>
<td>92</td>
<td>BEQ $0, $0, 0×000A</td>
<td>0×1000000A</td>
<td>I</td>
</tr>
<tr>
<td>96 – 132</td>
<td>NOP (x10)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>136</td>
<td>BNE $1, $2, 0×0000</td>
<td>0×1422000A</td>
<td>I</td>
</tr>
<tr>
<td>140 – 176</td>
<td>NOP (x10)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>180</td>
<td>BLEZ $17, 0×000A</td>
<td>0×1A20000A</td>
<td>I</td>
</tr>
<tr>
<td>184 – 220</td>
<td>NOP (x10)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>224</td>
<td>BLZ $5, 0×000A</td>
<td>0×04A0000A</td>
<td>I</td>
</tr>
<tr>
<td>228 – 264</td>
<td>NOP (x10)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>268</td>
<td>BGEZ $2, 0×000A</td>
<td>0×0441000A</td>
<td>I</td>
</tr>
<tr>
<td>272 – 308</td>
<td>NOP (x10)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>312</td>
<td>BGZ $4, 0×000A</td>
<td>0×1C80000A</td>
<td>I</td>
</tr>
<tr>
<td>316 – 352</td>
<td>NOP (x10)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>356</td>
<td>LW $0, $0, 0×AFC4</td>
<td>0×8C00AFC4</td>
<td>I</td>
</tr>
<tr>
<td>360</td>
<td>LW $0, $18, 0×57E4</td>
<td>0×8C1257E4</td>
<td>I</td>
</tr>
<tr>
<td>364 – 372</td>
<td>NOP (x3)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
<tr>
<td>376</td>
<td>J 0×000000</td>
<td>0×08000000</td>
<td>J</td>
</tr>
<tr>
<td>380 – 388</td>
<td>NOP (x3)</td>
<td>0×00000000</td>
<td>N/A</td>
</tr>
</tbody>
</table>
Due to the number of signals present in the processor and duration of the simulation, the outputs of the ISim timing diagram are summarized in this section. A complete ISim timing diagram for the test program is included in Appendix B. Additionally, since the fault–tolerant capabilities of the pipeline registers have already been verified in Section A of this chapter, only one of the three parallel processors (processor A) is considered. For each instruction, it is critical the control flags are appropriately set during the ID stage. This ensures activation of the appropriate combinatorial logic modules by the remaining stages proceed as expected. I–format instructions must select the IMM operand over a second register operand, obtain the correct ALU result, and write–back to the register. R–format instructions must choose the correct write–back register, obtain the correct ALU result, and write–back the result to the register during the appropriate clock cycle. Load word and store word instructions must reference the appropriate register file and memory addresses and show that these locations were written correctly. Finally, the branch and jump instructions must exhibit successful change of the next PC address at the PC selector module in the MEM stage. Additionally, branch instructions must exhibit the correct comparison between the branch flags from the ID stage and the comparison bits from the EX stage.

1. I–Format Instructions

The ANDI and ORI instructions (memory addresses 60 and 64, respectively) and their propagation through the pipeline are evaluated during this section. First, the ID stage outputs of these instructions are displayed in Figure 30. The instruction being operated on at any point by the ID stage is denoted in the inst_a row of the timing diagram and can be cross–referenced with the hexadecimal format in Table 8. The ANDI instruction is present in the ID stage from 32 – 34 ns and the ORI instruction from 34 – 36 ns as shown in Figure 30. The faluop_a is initially set to two and transitions to three at 34 ns, which correlates with the ALU operations listed in Table 5. The IMM field of each instruction is sign–extended and forwarded to a multiplexer in the EX stage. Only the falusrc_a and fregwrite_a flags are set for I–format instructions. The falusrc_a flag triggers the EX stage multiplexer to select the sign–extended IMM value over the second
register operand. The `fregwrite_a` flag serves as a write enable signal when the result is written back to the register file during the WB stage.

Figure 30. ID stage outputs for the I–format instructions ANDI, and ORI.

The EX phase performs three critical functions for I–format instructions. First, the ALU input multiplexer must select the sign–extended IMM field as the second ALU operand. Second, the ALU must produce the correct mathematical result. Finally, the write–back address multiplexer must determine the correct register to which the result is forwarded. Each of these tasks is verified in Figure 31. The second operand to the ALU is 0x0000FFFF for the ANDI instruction and 0x00000000 for the ORI instruction. Both of these operations produce the result 0x0000FFF0 displayed in the `alu_result_a` row. The `wb_reg_addr_a` row indicates the result of the operation is written to registers $11$ and $12$. 
In the WB stage, the WB multiplexer selects the correct data source to write to the register file. This data is written to the register file during the same clock cycle. This function is displayed in Figure 32. The WB multiplexer data changes at 38 ns of simulation time to the ALU result of 0x0000FFF0, previously established as the correct result of the ANDI and ORI instructions. The data_out11 row displays the output of the register $11 and transitions from 0x00000000 to 0x0000FFF0 at 38 ns. Likewise, register $12 makes the same transition at 40 ns, one complete clock cycle after the write–back of the ANDI instruction is complete. The undefined red signal shown in the input2_a row of Figure 32 originates from the data memory module. This signal obtains a defined value when a data memory read operation via a load word instruction is processed.
2. R–Format Instructions

Propagation of the AND, OR, NOR, and XOR instructions (memory addresses 16, 20, 24, and 28, respectively) are considered in this section. The ID stage outputs for this subset of instructions are shown in Figure 33. From 10 – 18 ns the `faulty_a` signal cycles sequentially through values two, three, five, and four. Referencing Table 5, we see that these ALU operations correspond to the AND, OR, NOR, and XOR sequence desired. The two flags set high are `fregdst_a` and `fregwrite_a`. The `fregdst_a` flag indicates the destination register is contained in the `rd` register field, which is the correct setting for R–format instructions. The `fregwrite_a` flag has the same functionality displayed during the WB stage of the I–format instructions discussed in the Subsection 2 of this section.
In the EX stage, each of these instructions takes the data contained in registers $1$ (0x0000000A) and $2$ (0x0000FFF0) as operands. The result of each logical operation is shown by the **alu_result_a** row in Figure 34. Only the **regwrite_a** control flag is forwarded from this stage (displayed as **fregwrite_a** in Figure 34). The **regdst_a** flag is terminated in the EX stage when it is used as a selector bit to the write–back register address multiplexer. It is also important to note the comparison bits **sne_a**, **sgz_a**, and **sgez_a** are set high following the output of all four R–format instructions because the register values for each of them are the same. Since no branch flags are set during the ID stage, the processor does not trigger a branch/jump selector flag during the MEM stage.
Finally, the WB stage of the R–format instruction is evaluated. This stage of execution is shown in Figure 35. Identical to an I–format instruction, the WB multiplexer first chooses between the ALU result and data memory output to be written into the register file. The $\text{sel}_a$ input, controlled by the $\text{memtoreg}_a$ flag is set low, which ensures the write–back data originates from the ALU. During the positive clock edges occurring from 16 – 24 ns, the WB multiplexer output, $\text{data}_{out}\_a$, matches the data received as input to the register file. The $\text{input2}_a$ signal is still undefined at this point in the simulation because the first load word instruction has not yet entered the MEM stage.

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\text{tbeq}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{tlz}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{t instructions}$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{tlb}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{tbra}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{tj}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{fmemrcd}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{fmemreg}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{fmemwrite}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{fregwrite}_a$</td>
<td>0</td>
</tr>
<tr>
<td>$\text{treq}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{specz}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{spz}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{sleq}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{sleq}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{sw}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{sw}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{sb}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{sw}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{sne}_a$</td>
<td>1</td>
</tr>
<tr>
<td>$\text{wb}_{reg_add}$</td>
<td>3</td>
</tr>
<tr>
<td>$\text{wr_mem_dst}$</td>
<td>0000FFFF</td>
</tr>
<tr>
<td>$\text{jump_add}$</td>
<td>00000000</td>
</tr>
<tr>
<td>$\text{alu_result_al}$</td>
<td>00000000</td>
</tr>
<tr>
<td>$\text{br_addr_out}$</td>
<td>00000000</td>
</tr>
</tbody>
</table>

---

Figure 34. EX stage outputs for the R–format instructions AND, OR, XOR, and NOR.
3. Load Word and Store Word Instructions

The load word and store word instructions are currently the only instructions in the ISA that interact with data memory. They essentially have the same structure as I–format instructions, but the control flags they trigger force different outputs, particularly during the MEM and WB stages. There is one of each load word and store word instruction in the testing program located at addresses 84 and 88, respectively. The ID stage outputs for each of these instructions are displayed in Figure 36.

The load word instruction is present in the ID stage from 43 – 45 ns of simulation time. Decoding this instruction produces the registers $0 in the $reg_1a row and $17 in the $reg_2a row, a zero in the $aluop_a row indicating addition, and an IMM field with the offset value 0x00000000 in the $half_word_ext_a row. Additionally, the $fregwrite_a, $falusrc_a, $fmemread_a, and $fmemtoreg_a control flag rows are set high. The $fmemread_a flag serves as a read enable signal to the data memory module in the MEM stage. The $memtoreg_a flag is forwarded to the WB multiplexer where it selects the output of the data memory module over the result of the ALU to write to the register file.

The store word instruction is present in the ID stage from 45–47 ns of simulation time. This instruction decodes $reg_1a as $11 and $reg_2a as $0, zero as the $aluop_a
output, and 0x00000004 for the offset. The store word instruction sets the falusrc_a and fmemwrite_a flags from the control module. The fmemwrite_a flag is forwarded to the MEM stage where it functions as a write enable bit for the data memory module.

In the EX stage, the sign–extended IMM operand is added to the first register operand to obtain the data memory address which is read or written. This action is shown in Figure 37 where opa_a and opb_a are the base register and sign–extended offset, respectively. These results are added to obtain the data memory address 0x00000000 and 0x00000004 displayed in the alu_result_a row. This simulation uses data memory address 0x00000000 for the load word instruction and address 0x00000004 for the store word instruction. Prior to the start of the simulation, each byte in data memory was initialized to zeroes with the exception of the first four byte addresses which contain ones; thus, a read operation performed on the first byte of data memory writes a value of 0xFFFFFFFF back to the register file.
Figure 37. EX stage outputs for the load word and store word instructions.

The MEM stage is where the load word and store word instructions perform the unique operation of interacting with data memory. The MEM stage timing diagram for the load word and store word instructions is displayed in Figure 38. First, the load word instruction enters the MEM stage at 48 ns in simulation time. The \texttt{wr\_en} signal is set high because it is linked to the \texttt{memread\_a} control flag set in the ID stage. The \texttt{data\_out} row of the data memory module transitions from an undefined output to the aforementioned value \texttt{0xFFFFFFFF} expected at the memory location \texttt{0x00000000}.

The store word instruction enters the MEM stage at 50 ns in simulation time. Just as the \texttt{re\_en} signal was linked to the \texttt{memread\_a} control flag, the \texttt{wr\_en} signal receives the forwarded \texttt{memwrite\_a} control flag as input. This causes the inversion of each of the \texttt{re\_en} and \texttt{wr\_en} signals on the load word and store word instructions. Data memory is too large to display in the ISim software; however, since the memory location was only written once during the simulation, a successful write can be confirmed by checking the value column of the objects panel in ISim. This panel displays the first 64 byte addresses in data memory and their value at the end of simulation time. The status of this panel at the end of the simulation is displayed in Figure 39 and shows byte addresses four through seven written with the data \texttt{0x00000000}.

70
The WB stage of the processor performs the same basic operation for the load word instruction as for the I–format and R–format instructions. For the load word instruction, data sent to the register file comes from the data memory module output vice the ALU. The WB stage timing diagram for the load word instruction in the processor test program is displayed in Figure 40. When the simulation reaches 50 ns, the \texttt{sel\_a} bit in the WB multiplexer is set high, and the \texttt{data\_out} row carries the data read from memory. During this same clock cycle, near the bottom of Figure 40, a transition occurs in \texttt{data\_out17} of the register file indicating register\ $17$ was written with the data from memory location 0x00000000.
4. Branch Instructions

Branch instructions, like load word and store word, essentially follow the same structure as I–format instructions. During the EX stage, branch instructions compare the \texttt{rs} and \texttt{rt} registers, but the ALU result is insignificant. The comparison bits from the ALU and branch flags from the ID stage are then compared in the MEM stage to determine if the requirements for a branch instruction have been met. If the branch criteria are met, the address and a branch flag are forwarded to the PC selector module. The PC selector module is then responsible for changing the PC to the branch address if the flag is set high. The branch–on–equal (BEQ) instruction located at address 92 of instruction memory is evaluated to show the functionality of branch instructions.
In the ID stage, branch instructions trigger at least one of the six possible branch control flags. The BEQ instruction triggers the `fbeq` flag at 160 ns into the simulation shown in Figure 41. No other control flags are set by branch instructions during the ID stage.

![Figure 41. ID stage outputs for the BEQ instruction.](image)

The ALU result calculated in the EX stage is not used in data memory or the WB stage. The ALU result is by–passed by setting the control flags `fmemread_a`, `fmemwrite_a`, and `fregwrite_a` low from 162 – 164 ns shown in Figure 42. In the BEQ instruction, both ALU operands are compared to determine the branch conditions have been met. These outputs are depicted in Figure 42 as `seq_a`, `sne_a`, `sgez_a`, `sgz_a`, `slez_a`,

73
and \texttt{slz\_a}. Since both register operands have a value of zero, the \texttt{seq\_a}, \texttt{sgez\_a}, and \texttt{slez\_a} bits are set high. These flags and comparison bits are further analyzed by the branch multiplexer and address selector modules during the MEM stage.

![Figure 42. EX stage signals while executing the BEQ instruction.](image)

The EX stage also calculates the branch address that is forwarded to the PC selector module. The memory offset included in the IMM field essentially represents the number of instructions that are skipped if the branch criteria are met. This value is first converted to a byte–addressable memory offset performed by logical left–shifting the
sign–extended IMM field by two bits, effectively multiplying the offset by four. This function is shown by the `addr_in`, `increment`, and `addr_out` rows in Figure 42. The offset from the IMM field, 0x0000000A, is converted from 10 words to 40 bytes in the increment row. This result is added to the `addr_in` row to obtain a branch address location of 136 which is forwarded to the MEM stage via the `addr_out` output.

The MEM stage is particularly important for branch instructions because of the operations performed by the address selector and branch multiplexer modules. Inputs and outputs of these modules are displayed in Figure 43. At 164 ns of simulation time the branch control flags and comparison bits from the EX stage enter the MEM stage. The branch multiplexer performs comparisons between each branch control flag and its corresponding comparison result from the ALU. For the BEQ instruction under consideration, the `beq` control flag and `seq` comparison bit are both set high during this clock cycle. This results in the branch output signal being set high and forwarded to the address selector module.

The address selector module takes the branch address obtained in the EX stage and branch output of the branch multiplexer as inputs. The branch input is represented in the `br_sel` row of Figure 43. A comparison is then made between the `br_sel` and `jump_sel` inputs that determines the appropriate address to forward to the PC selector module. The output `br_jump_sel_out` is essentially a XOR of the two bits since it is not possible to have a branch and jump flag in the same clock cycle. For the BEQ instruction under consideration, the `br_jump_sel_out` bit is set high, which causes address 136 to be placed on the `addr_out` connection to the PC selector module.
Finally, the PC selector module detects the `br_jump_sel_out` signal and changes the PC address. The branch PC address is received at the `br_jump_addr` input shown in Figure 44. The `npc` address in the PC Selector grouping changes from 104 to 136 at 164 ns, indicating the branch address has been selected over the PC plus four address location. It is important to note the `br_jump_addr` and `br_jump_sel` flag inputs arrive at the PC selector module on the clock cycle spanning 164 – 166 ns. This indicates the result of the branch instruction is obtained immediately following any combinatorial logic delay experienced through the branch multiplexer and address selector modules. It is also shown in Figure 44 that two complete clock cycles must execute before the branch address is available in the PC register; thus, two NOP instructions must follow a branch to ensure pipeline hazards do not occur.
5. Jump Instructions

The jump instruction has a unique format but executes in a very similar manner to the branch instruction. The only jump instruction featured in the test program occurs at address 376, which is performed to restart execution of the test program again at address 0 in instruction memory. The majority of work associated with jump functions occurs in the IF stage. A jump address is formed by concatenating the four most significant bits of the PC plus four address, the 26 least significant bits of the jump instruction, and two zero bits. This operation is verified in Figure 45, which displays the 26–bit inst_in and 4–bit npc signals as inputs. The complete 32–bit output address is displayed in the inst_out row of Figure 45.
In the ID stage, the jump control flag is the only flag bit set for the jump instruction. The jump address and control flag are then forwarded to the address selector module in the MEM stage. Similar to the branch instruction, the address selector uses the jump control flag and jump address to forward the correct output back to the PC selector module. This output is verified in Figure 46. The `jump_addr` value is zero, and `jump_sel` flag bit is set high from 110 – 112 ns of simulation time. This produces an `addr_out` output of zero and sets the `br_jump_sel_out` flag set high. During the same clock cycle, the zero address is received at the PC selector module indicated by the `br_jump_sel_a` and `br_jump_addr_a` signals in Figure 46.
C. UART TESTING

The UART is the last critical component of the payload processor requiring functional testing. The UART passes data received on its serial transmission connection to the pipeline via the PC selector module as discussed in Chapter IV; therefore, a majority of the testing for the receive function focuses on the PC selector module. The test program verifies successful placement of received UART data in memory using a load word instruction since the ISim software does not display more than the first 63 data memory addresses in its wave configuration panel. UART transmit verification was simpler to perform since the load word instruction referencing mapped memory initiates the transmit process. The load word instructions associated with transmitting and receiving UART data are located at addresses 356 and 360 of the test program, respectively.

1. UART Receive Testing

The UART is required to send an IRQ to the processor pipeline upon successful receipt of a single byte of data. Once acknowledgment is received from the PC selector module of the processor, the data is forwarded to memory where it remains until the interrupt processing flag arrives at the MEM stage. In order to verify this task, simulated data was sent on the serial connection line of the UART via a test bench file. Once the UART and processor perform their aforementioned tasks, the test program eventually executes the load word instruction at address 360, which places the received data into a register that can be viewed from the ISim software package.

The receipt of data from the serial interface begins when the rx_enable bit is set high. This action is performed at 2 ns into the simulation depicted by Figure 47, when the reset bit transitions from high to low. Signal sampling on the serial connection begins immediately following the low to high transition of the rx_enable_a signal shown in Figure 47. The receive and processor clocks were run at the same rate during this simulation. Typically, the receive clock operates at a much slower rate than the processor clock, but the two were matched to better view the simulation results. The receive clock required 16 clock cycles (32, 1 ns clock transitions) to obtain one bit. Ten total bits were
received, including the preamble zero and terminating one which remains on the rx_in line following receipt of the bits.

Figure 47. Inputs and outputs of the UART receive system during testing.

The signal is completely received by the UART following its final sample at 312 ns into the simulation as shown in Figure 48. Immediately following the tenth sample, the rx_irq_a interrupt flag is raised and sent to the PC selector module. The PC selector module recognizes the IRQ as the highest priority operation and immediately sends an acknowledgement via the ack_a signal back to the UART. The ack_a signal then triggers the uld_rx_data_a bit in the UART causing it to forward the received data to the data memory module. The PC selector module then places the processor in an IRQ processing state by setting the irq_processing flag bit high. The npc_a value is changed to the first address of the ISR in instruction memory, and the curr_pc register holds the second ISR address. This configuration of the npc_a and curr_pc values causes the processor to continue its ISR during the next clock cycle.
During the clock cycles from 312 – 316 ns, the PC selector module updates the $pc4_{-}addr_{-}a$, $curr_{-}pc$, and $npc_{-}a$ values using the same method as a standard PC plus four increment. From 316 – 318 ns, while the ISR is using NOP instructions to clear the pipeline, the branch instruction at address 356 generates a new branch address and raises the $br_{-}jump_{-}sel_{-}a$ flag. Since a programmer does not know when the processor is performing an ISR, the PC selector module must update the value of the $irq_{-}return_{-}addr$ register to the $br_{-}jump_{-}addr_{-}a$ value if the $br_{-}jump_{-}sel_{-}a$ flag is raised while the ISR is executing. When the ISR is performing its final instruction at address 44996, the PC selector module sets the $curr_{-}pc$ and $npc_{-}a$ values to address 356, which was the last address stored during the ISR. The $irqp_{-}a$ flag is also raised to accompany the store word instruction and propagates through the processor to the MEM phase.

When the $irqp_{-}a$ signal reaches the data memory module, it provides an indication to read input data from the UART interface vice the standard input interface. It is shown in Figure 49 at 328 ns the $irqp_{-}a$ flag bit enters the MEM stage where the UART data has been present since the PC selector module acknowledged the initial IRQ. During this clock cycle the data is written to address 22500, which is the first memory
address mapped to the UART. Upon completion of the memory write, the `uart_wr_addr` register is incremented to 22501 on the following clock cycle. If another UART write occurs during the simulation, it is written to this address. A load word instruction that reads the data in address 22500 immediately follows the UART write operation in data memory. The `data_out` signal indicates the read operation from this memory produces the data written by the UART on the previous clock cycle.

![Figure 49. MEM stage inputs and outputs during a UART receive operation.](image)

2. **UART Transmit Testing**

The testing featured in this subsection focuses on the UART transmit functionality. Transmission on the UART serial interface is accomplished by passing the data written at a specific memory location (address 44999) to the UART output. It was stated in Chapter IV that the data written to this address can only be sent to the UART when a load word instruction references this memory address. This approach has the advantage of avoiding interrupt procedures which delay the processor from completing additional operations on unaffected data.

The transmission data was first placed into the data memory initialization file at address 44999 prior to the start of the simulation. During normal operation of the processor this action is typically performed by the store word instruction which was verified in Section B of this chapter. An additional load word instruction was added to the test program at address 356 of instruction memory and is decoded as 0x8C00AFC4 in
Figure 50. Using the instruction set of Appendix C, we can determine when this instruction attempts to load a word with base register $0$ and offset 44996 and store the result back into register $0$.

During the ID and EX stages this particular instruction propagates in the same manner as would any other load word instruction. When the MEM stage is reached, the read address location of 44996 causes the data memory module to interpret this load word instruction as a signal to transmit the data in address 44999 to the UART. The data memory module is programmed to accept any read address input from 44996 – 44999 as referencing the memory mapped to the UART. This activity is shown in Figure 51, where the combination of the addr and re_en inputs prompts the transition of uart_data_out and the ld_tx_data_flag bits at 100 ns.
The **ld_tx_data_flag** is set high when the UART mapped memory is read to provide indication of incoming data to the UART. This data is saved into the **tx_data_a** register, shown in Figure 52, which holds data prior to the **tx_enable_a** flag being set high. This transition occurs in the **tx_data_a** row of Figure 52 at 100 ns. In the next clock cycle beginning at 102 ns, the **tx_empty_a** bit is set low and the **tx_enable_a** bit is set high which indicates data is received in the receive register and transmission begins. Simultaneously, the **tx_out** output of Figure 52, representing the serial transmission medium, transitions from high to low signaling the preamble bit of the UART transmission. This is followed by eight data bits (alternating ones and zeroes) from 102 – 120 ns. At 120 ns of simulation time the **tx_enable_a** bit transitions from high to low and the **tx_empty_a** flag from low to high, indicating the transmission is complete. The **tx_out** bit ends its transmission by transitioning from low to high where it remains until indication of a new transmission is initiated by the **ld_tx_data_flag**.
D. TROUBLESHOOTING

The majority of components contained within the processor module performed simple tasks and were easy to design using Verilog behavioral modeling. During the testing and verification process, however, some components exhibited unexpected behavior when inserted into the complete processor unit. The most difficult issues encountered in the troubleshooting process were related to the PC selector module. The PC selector module is undoubtedly the most complex component of the processor pipeline since it both interfaces with the external UART and accepts a feedback input from the IF stage. Some of the methods and design principles used to overcome errors when testing the processor are discussed in this section.

Initially, the PC selector module design changes its IRQ return address following the high to low transition of the branch/jump flag and at each update of the PC plus four address while performing ISR instructions. Reassignment of the IRQ return address should occur no more than once during an ISR; during same clock cycle the branch/jump flag transitions from low–to–high. In the ISim software, it was determined during this high–to–low transition of the branch/jump flag that its asserted value was one. This is contrary to the other observed transitions in the simulation, which typically take the value of the final state of the signal. Asserting a value of one during the high–to–low transition of the branch/jump flag combined with the low–to–high transition of the clock resulted in the conditions being met for IRQ return address reassignment. The reason for this unexpected assertion of the branch/jump signal has not yet been determined; however, the issue was resolved by removing assignment of the IRQ return address from the sensitivity list of the primary “always” statement in the Verilog module. A new “always”
statement was constructed that assigns the IRQ return address to the same value of the branch/jump address only at the positive edge of the branch/jump signal.

Another major issue encountered while designing the PC selector module was the assignment of the next PC address during transition from the ISR back to the test program. The PC plus four address, which is received as a feedback input from the IF stage, points to address 45,000 of instruction memory from 320 – 322 ns which is when the ISR to test program transition is imminent. This value is outside the boundaries of instruction memory address space. The PC selector module was designed to recognize this invalid address, reassign the next PC to the IRQ return address, and add four to the current PC holder instead; however, this caused the PC plus four address to have the same value as the current PC register on the following clock cycle. An additional manual reassignment of the current PC register was included for when the processor exits interrupt processing mode and attempts to assign the next PC value. After the second increment of the new PC holder, the values are again synchronized and reassignment can be performed to the PC plus four address received from the IF stage.

The branch–on–less–than–zero (BLZ) and branch–on–less–than–or–equal–to–zero (BLEZ) instructions initially did not work because the ALU recognized these values as unsigned integers. This became apparent in the simulation when the appropriate branch flags were set by the control module in the ID stage, but the comparison bits set by the ALU in the EX stage indicated a value greater than or equal to zero. Thus, the branch multiplexer module did not raise the branch/jump selector flag, and the PC selector module passed the PC plus four address vice the branch/jump address to the current PC register. This issue was mitigated by testing the most significant bit of the first operand to the ALU vice comparing the entire operand to zero. Since negative numbers always have a one as the most significant bit in the two’s complement representation of integers, this provides an equivalent indication of a negative number. Verilog features an integer register declaration to construct registers that are treated as signed vice unsigned values, however, this method was unable to be successfully employed in the payload processor. Several declarations were attempted according to the format provided in [31] but each produced a syntax error in the ISE Webpack compiler. The most significant bit
comparison is adequate for testing operands, but implementing the solution with established Verilog declarations is preferred for future versions of the processor. This prevents the need for additional logic code in the ALU which is required to handle comparisons between registers and perform more complex mathematical operations.

E. CHAPTER SUMMARY

Testing and verification of the payload processor to ensure the correct execution of its entire ISA and interface with the UART was presented in this chapter. First, the pipeline registers were tested to ensure they passed the correct data to the voter circuits. These demonstrations also exhibited the voting logic’s ability to clear SEUs that might occur by selecting the majority of input bits as output. Next, the proper function of the 24 instructions supported in the ISA was summarized by organizing the signals into five groups and tracking their propagation through the pipeline stages. Finally, the processor’s ability to interface with the UART was verified by performing simulated receive and transmit operations.
VI. CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE WORK

The various fault–tolerant methods and hardware on which a CubeSat payload processor could be implemented were investigated, and a design capable of supporting a payload attached by a serial communication link was proposed and tested. While a solid framework for the payload processor was established, a significant amount of testing and development must be accomplished before the device is ready to fly an actual mission. Additionally, various concepts in logic programming development were found to be more beneficial at different levels of processor design. The conclusions section speaks primarily to the logic design methodology, while the future work section outlines several major steps necessary to launch the payload processor as part of a CubeSat mission. Ultimately, the successful development of a payload processor for CubeSat missions can greatly expand the capabilities of the DOD’s satellite constellation.

A. CONCLUSIONS

The initial goals proposed in Chapter I of this thesis were successfully accomplished. First, the ITMR fault–tolerant architecture was selected as the best fault–tolerance implementation method for the processor. The Xilinx Virtex–5 FPGA was determined to be the best implementation platform for the processor due to its increased logic resources and performance capabilities over the Actel ProASIC3. A hybrid HDL/schematic design of the processor was then produced along with a UART serving as the serial–to–parallel communication device. Finally, verification of the operation and interaction of these devices was performed using the Xilinx ISim logic simulation software.

The payload processor was a complex and detailed design which required tremendous attention to detail. While building the ITMR pipeline registers a substantial amount of iteration was used in the HDL code. Devices 32 bits or greater in width are efficiently constructed using the Verilog “generate” keyword. This syntax allows the user to create serialized instances of elements using a “for” loop vice declaring and wiring
each individual module in Verilog code or schematics. When the individual pipeline registers were combined into ITMR modules, each of the triplicated members was declared without use of the “generate” keyword. This allowed for better organization of the modules within the ITMR registers, which proved helpful when debugging the processor. Top level synthesis of the complete processor pipeline and combinatorial logic modules within the pipeline stages was performed using schematic files. The schematic layouts allow graphic representation of dataflow within the pipeline and grouping of module input and output pins with regard to their association in the ITMR architecture.

Using a hybrid I/O scheme for the processor offers a flexible approach to memory management. By placing control logic in data memory, the processor avoids an IRQ when sending data to the UART. Expanding the number of flag bits in the processor control module and employing more advanced memory writing techniques can further reduce IRQs. Additionally, the hybrid I/O scheme offers a window into what re-configurability may look like on orbit. While the Virtex–5 block RAM memory primitives in [32] are more restrictive than the simulated RAM used in this thesis, the logic controlling data memory effectively allows re-sizing and re-addressing. This control logic may require an increased level of detail to interface with a Virtex–5 library primitive, however, placement of the memory control logic external to the block RAM will enable the same functionality achieved in this simulation.

B. FUTURE WORK

The payload processor designed in this thesis represents an initial step in the advancement of CubeSat technology. It recycles an effective fault–tolerance method and uses a simple UART interface to communicate with an I/O device. A significant amount of testing and development is required to produce a payload processor able to support a CubeSat mission. The following sections are semi–sequentially ordered to provide a way forward for development.

1. Hazard Detection and Exception Handling

The design proposed in this thesis is not yet capable of handling data and control hazards or exceptions [33] [34]. To take full advantage of the speed and efficiency
pipelined processor offers, hazard detection units, data forwarding modules, and exception handling modules should be developed. Implementation of these modules is relatively complex since they must not only operate in conjunction with the processor, but cooperate with IRQs generated by the UART.

2. **Assembler, Compiler, and ISA Expansion**

CubeSat designers must be able to use high–level programming languages to instruct the processor on how to interact with and manage its payload. This requires development of an assembler and compiler that are tailored to the processor’s ISA. The compiler should support either C or Python as the high–level programming language. Development of the compiler and assembler should be performed concurrently with expansion of the processor ISA to support all instructions in the MIPS core ISA. Further expansion beyond the MIPS core ISA may be required to support onboard processing of data using advanced algorithms.

3. **Memory Management**

The data and instruction memory currently implemented in the payload processor is small and simplified. The triplicated data memory segments can support a single file no greater than 22.5 kB, and instruction memory limits a program to 11,250 instructions. Additionally, memory management is entirely contained within the data memory module and features no ECCs or caches. Components that handle FPGA memory as a cache or virtual memory system will likely need to be developed allowing for interaction with a secondary, off–chip storage memory. Secondary storage will likely require several gigabytes (GB) of memory for the processor to perform useful operations. The type of secondary memory and ECCs that must be implemented to protect stored bits from SEEs should also be considered in this research.

4. **Implementation and Testing**

Once the processor hardware design and simulation is finalized, the associated Verilog and schematic files should be implemented and tested on the Virtex–5 as a complete system. Testing will likely require the creation of additional modules that allow
the tester to artificially insert SEUs into the system. Similarly, the desired outputs must be identified and routed to a logic analyzer. The PAR results performed by the logic design software during this stage will also provide insight into the speed of the implemented processor and the quantity of logic resources it consumes on the FPGA. This research should seek to verify the ISim results obtained for the HDL models and define performance parameters for the hardware with the implemented design.

5. **Hardware Production**

After all of the major hardware components have been designed and successfully tested under a variety of inputs, a PCB with the FPGA and associated I/O connections should be manufactured. This hardware would serve as a flight test model similar to the SADv3 test boards prepared by Parobek for the launch sequencer [1]. The design of a standalone PCB would best be performed in an advanced software environment such as Altium Designer. The completed PCB must be appropriately wired to support all of the desired I/O interfaces, JTAG pins, and power inputs. This PCB would serve the purpose of prototype testing for future CubeSat missions and for radiation testing discussed in the next section.

6. **Radiation Testing**

As discussed in Chapter III, data pertaining to the radiation hardness of non–RADHARD FPGAs is relatively sparse. It was established in [11] and [35] that as the embedded device under test (DUT) becomes more complex, control over its testing diminishes, and delays resulting from alterations to the original design may overwhelm the project. Since the payload processor is a very complex logic design, comprehensive measurements of the individual logic modules will likely produce the best estimates for overall device radiation hardness. The data obtained in this experiment would provide insight regarding which modules and locations of the FPGA are most susceptible to SEUs. This research would involve coordination with a facility that possesses a cyclotron or other controlled radiation source. Based on the experimental methodology performed in [35], this research should be performed concurrently with that proposed in Subsections 3 and 4 of this section.
C. CLOSING REMARKS

In conclusion, an initial design for the payload processor of a CubeSat featuring a fault-tolerant architecture that prevents specific SEEs from disrupting its operation was proposed and investigated in this thesis. The design contains an ISA of 24 instructions that implement the most fundamental operations of a MIPS processor. Additionally, a generic UART device was proposed and tested concurrently with the processor to ensure the complete system was capable of receiving and transmitting instructions on a serial connection. Though this processor design requires substantial development before it is ready for a mission, it establishes a foundation on which future CubeSat payloads can be developed. Continued research into payload processor design methods and implementation hardware is critical to ensuring the DOD retains a competitive edge in CubeSat technology.
APPENDIX A. VERILOG CODE AND SCHEMATICS

This appendix provides the complete code and schematic files required to build the payload processor model tested in this thesis. Each section consists of a top level pipeline stage or register displayed in a schematic. The components within the pipeline stage schematics are triplicated and consist of a wrapper schematic or Verilog file. Base level components that implement the actual logic functions are primarily Verilog files with some exceptions. The complete payload processor system with the pipeline UART modules followed by the pipeline layout is presented in Section A. Schematics and Verilog files that comprise each stage and pipeline register beginning with the PC register and working sequentially towards the WB stage are displayed in Sections B – K.
A. PAYLOAD PROCESSOR SYSTEM, PIPELINE, AND UART
B. PC REGISTER

module pc_reg(clk, pc, irqp, npc, firqp);

  parameter n = 32; // Processor width

  input [n-1:0] pc; // Input PC value
  input irqp, clk; // Input interrupt request processing flag bit and clock signal
  output reg [n-1:0] npc; // Output PC value
  output reg firqp; // Output interrupt request processing flag bit

  // Swap outputs on the positive edge of the clock
  always @(posedge clk)
    begin
      npc <= pc;
      firqp <= irqp;
    end

  // Initialize values
  initial
    begin
      firqp <= 0;
      npc <= 32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxx;
    end

endmodule
C. IF STAGE

1. New PC Adder Module

```
npc_addr_adder

npc_addr_in a(31:0) → addr_in(31:0) → addr_out(31:0) → npc_addr_out a(31:0)
                jump_bits(3:0) → jump_bits a(3:0)

npc_addr_adder

npc_addr_in b(31:0) → addr_in(31:0) → addr_out(31:0) → npc_addr_out b(31:0)
                jump_bits(3:0) → jump_bits b(3:0)

npc_addr_adder

npc_addr_in c(31:0) → addr_in(31:0) → addr_out(31:0) → npc_addr_out c(31:0)
                jump_bits(3:0) → jump_bits c(3:0)
```
module npc_addr_adder(addr_in, addr_out, jump_bits);

    input [31:0] addr_in; // Input address
    output reg [31:0] addr_out; // Output address
    output reg [3:0] jump_bits; // Most significant bits as part of the jump address

    always @(addr_in)
        addr_out = addr_in + 4; // Calculates the PC + 4 address

    always @(addr_in)
        jump_bits = addr_in [31:28]; // Forwards the jump address bits

endmodule

2. Instruction Memory

```
inst_mem

addr_a(31:0) --> addr(31:0) --> addr_out(31:0) --> inst_out_a(31:0)

inst_mem

addr_b(31:0) --> addr(31:0) --> addr_out(31:0) --> inst_out_b(31:0)

inst_mem

addr_c(31:0) --> addr(31:0) --> addr_out(31:0) --> inst_out_c(31:0)
```
module inst_mem(addr, inst_out);

    input wire [31:0] addr; // Input address
    output reg [31:0] inst_out; // Output instruction
    reg [7:0] b0; // Byte 0
    reg [7:0] b1; // Byte 1
    reg [7:0] b2; // Byte 2
    reg [7:0] b3; // Byte 3

    reg [7:0] inst_memory_module [0:44999]; // Instruction memory data structure

    // Perform an address to instruction translation each time the input address
    // changes.
    always @(addr)
        begin
            // Each byte is assigned to an offset value based on the start address
            // of the instruction. Each MIPS instruction is 4 bytes. Once translated
            // from a memory address to an instruction, the bytes are concatenated.
            b0 = inst_memory_module [addr];
            b1 = inst_memory_module [addr + 1];
            b2 = inst_memory_module [addr + 2];
            b3 = inst_memory_module [addr + 3];
            inst_out = {b0, b1, b2, b3};
        end

    // An initial file is provided with the instructions already translated into
    // machine language.
    initial
        begin
            $readmemh("payload_processor_inst_memory_init5.txt", inst_memory_module);
            inst_out = 0;
        end

endmodule
module shift_left2_jump(npc, inst_in, inst_out);

    input [25:0] inst_in; // Number of words to jump in instruction memory
    input [3:0] npc; // Four most significant bits from PC
    output reg [31:0] inst_out; // Output instruction address

    initial
        inst_out = 0;

    // At each transition in the instruction address or PC bits, concatenate the
    // PC bits, jump offset, and two zeroes. The appended zeroes ensure the next
    // register address will be the start of a word.
    always @(inst_in or npc)
        inst_out = {npc, inst_in, 2'b00};

endmodule
D. IF/ID REGISTER

```vhdl
module IF_ID_reg_tmtr(clk, instruction_a, pc4_addr_a, jump_addr_a, irqp_a,
        finstruction_a, fp4_addr_a, fjump_addr_a, firqp_a, instruction_b, pc4_addr_b,
        jump_addr_b, irqp_b, finstruction_b, fp4_addr_b, fjump_addr_b, firqp_b,
        instruction_c, pc4_addr_c, jump_addr_c, irqp_c, finstruction_c, fp4_addr_c,
        fjump_addr_c, firqp_c);

    // Input:
    input clk; // Clock signal
    input irqp_a, irqp_b, irqp_c; // Interrupt request processing bits
    input [31:0] instruction_a, instruction_b, instruction_c; // Instructions
    input [31:0] pc4_addr_a, pc4_addr_b, pc4_addr_c; // PC + 4 addresses
    input [31:0] jump_addr_a, jump_addr_b, jump_addr_c; // Jump addresses

    // Wires connecting the registers to voters
    wire irqp_a, irqp_b, irqp_c;
    wire [31:0] finstruction_a, finstruction_b, finstruction_c;
    wire [31:0] npc4_addr_a, npc4_addr_b,npc4_addr_c;
    wire [31:0] njump_addr_a, njump_addr_b, njump_addr_c;

    // Output
    output firqp_a, firqp_b, firqp_c;
    output [31:0] finstruction_a, finstruction_b, finstruction_c;
    output [31:0] fp4_addr_a, fp4_addr_b, fp4_addr_c;
    output [31:0] fjump_addr_a, fjump_addr_b, fjump_addr_c;

    // TRN voters
    trn_voter_32bit_trip VoterO (.a(finstruction_a), .b(finstruction_b), .c(finstruction_c),
                        .data_out_a(finstruction_a), .data_out_b(finstruction_b), .data_out_c(finstruction_c));
    trn_voter_32bit_trip voter1 (.a(npc4_addr_a), .b(npc4_addr_b), .c(npc4_addr_c),
                        .data_out_a(npc4_addr_a), .data_out_b(npc4_addr_b), .data_out_c(npc4_addr_c));
    trn_voter_32bit_trip Voter2 (.a(njump_addr_a), .b(njump_addr_b), .c(njump_addr_c),
                        .data_out_a(njump_addr_a), .data_out_b(njump_addr_b), .data_out_c(njump_addr_c));
    trn_voter_trip Voter3 (.a(irqp_a), .b(irqp_b), .c(irqp_c), .data_out_a(irqp_a),
                        .data_out_b(irqp_b), .data_out_c(irqp_c));

    // Pipeline registers
    IF_ID_reg reg_a (.clk(clk), .instruction(instruction_a), .pc4_addr(pc4_addr_a),
                        .jump_addr(jump_addr_a), .irqp(irqp_a), .finstruction(finstruction_a),
                        .fp4_addr(fp4_addr_a), .fjump_addr(fjump_addr_a), .firqp(firqp_a));
    IF_ID_reg reg_b (.clk(clk), .instruction(instruction_b), .pc4_addr(pc4_addr_b),
                        .jump_addr(jump_addr_b), .irqp(irqp_b), .finstruction(finstruction_b),
                        .fp4_addr(fp4_addr_b), .fjump_addr(fjump_addr_b), .firqp(firqp_b));
    IF_ID_reg reg_c (.clk(clk), .instruction(instruction_c), .pc4_addr(pc4_addr_c),
                        .jump_addr(jump_addr_c), .irqp(irqp_c), .finstruction(finstruction_c),
                        .fp4_addr(fp4_addr_c), .fjump_addr(fjump_addr_c), .firqp(firqp_c));

endmodule
```
E. ID STAGE
module control_mr3 (opcode_a, funct_a, rt_a, regdst_a, alusrc_a, memtoreg_a,
   rewrite_a, memread_a, memwrite_a, beq_a, bne_a, bge_a, bgt_a, blez_a, bzt_a,
   jump_a, aluop_a, opcode_b, funct_b, rt_b, regdst_b, alusrc_b, memtoreg_b,
   rewrite_b, memread_b, memwrite_b, beq_b, bne_b, bge_b, bgt_b, blez_b, bzt_b,
   jump_b, aluop_b, opcode_c, funct_c, rt_c, regdst_c, alusrc_c, memtoreg_c,
   rewrite_c, memread_c, memwrite_c, beq_c, bne_c, bge_c, bgt_c, blez_c, bzt_c,
   jump_c, aluop_c);

input [5:0] opcode_a, opcode_b, opcode_c;
input [5:0] funct_a, funct_b, funct_c;
input [4:0] rt_a, rt_b, rt_c;
wire nregdst_a, nregdst_b, nregdst_c;
wire nalusrc_a, nalusrc_b, nalusrc_c;
wire nnmemtoreg_a, nnmemtoreg_b, nnmemtoreg_c;
wire nnrewrite_a, nnrewrite_b, nnrewrite_c;
wire nnmemread_a, nnmemread_b, nnmemread_c;
wire nnmemwrite_a, nnmemwrite_b, nnmemwrite_c;
wire nbeq_a, nbeq_b, nbeq_c;
wire nbnz_a, nbnz_b, nbnz_c;
wire nbnez_a, nbnez_b, nbnez_c;
wire nbge_a, nbge_b, nbge_c;
wire nbgt_a, nbgt_b, nbgt_c;
wire nblez_a, nblez_b, nblez_c;
wire nbzr_a, nbzr_b, nbzr_c;
wire njump_a, njump_b, njump_c;
wire [3:0] naluop_a, naluop_b, naluop_c;

output regdst_a, regdst_b, regdst_c;
output alusrc_a, alusrc_b, alusrc_c;
output memtoreg_a, memtoreg_b, memtoreg_c;
output rewrite_a, rewrite_b, rewrite_c;
output memread_a, memread_b, memread_c;
output memwrite_a, memwrite_b, memwrite_c;
output beq_a, beq_b, beq_c;
output bne_a, bne_b, bne_c;
output bge_a, bge_b, bge_c;
output bgt_a, bgt_b, bgt_c;
output blez_a, blez_b, blez_c;
output bgez_a, bgez_b, bgez_c;
output bgtz_a, bgtz_b, bgtz_c;
output jump_a, jump_b, jump_c;
output [3:0] aluop_a, aluop_b, aluop_c;

control cnt_a (.opcode(opcode_a), .funct(funct_a), .rt(rt_a), .regdst(regdst_a),
   .alusrc(alusrc_a), .memtoreg(memtoreg_a), .rewrite(rewrite_a), .memread(memread_a),
   .memwrite(memwrite_a), .beq(beq_a), .bne(bne_a), .bge(bge_a), .bgt(bgt_a),
   .blez(blez_a), .bgez(bgez_a), .jump(jump_a), .aluop(aluop_a));

control cnt_b (.opcode(opcode_b), .funct(funct_b), .rt(rt_b), .regdst(regdst_b),
   .alusrc(alusrc_b), .memtoreg(memtoreg_b), .rewrite(rewrite_b), .memread(memread_b),
   .memwrite(memwrite_b), .beq(beq_b), .bne(bne_b), .bge(bge_b), .bgt(bgt_b),
   .blez(blez_b), .bgez(bgez_b), .jump(jump_b), .aluop(aluop_b));

control cnt_c (.opcode(opcode_c), .funct(funct_c), .rt(rt_c), .regdst(regdst_c),
   .alusrc(alusrc_c), .memtoreg(memtoreg_c), .rewrite(rewrite_c), .memread(memread_c),
   .memwrite(memwrite_c), .beq(beq_c), .bne(bne_c), .bge(bge_c), .bgt(bgt_c),
   .blez(blez_c), .bgez(bgez_c), .jump(jump_c), .aluop(aluop_c));

endmodule
1. Control Module

module control(opcode, funct, rt, regist, alusrc, memtoreg, regwrite, memread, memwrite, 
    beq, bne, blz, bge, bgez, bnez, jump, aluop);

    input [5:0] opcode;
    input [5:0] funct;
    input [4:0] rt;

    output reg regist, alusrc, memtoreg, regwrite, memread, memwrite;
    output reg beq, bne, blz, bge, bgez, bnez, jump;
    output reg [3:0] aluop;

    initial
    begin
        regist <= 0;
        alusrc <= 0;
        memtoreg <= 0;
        regwrite <= 0;
        memread <= 0;
        memwrite <= 0;
        beq <= 0;
        bne <= 0;
        blz <= 0;
        bge <= 0;
        bgez <= 0;
        bnez <= 0;
        jump <= 0;
        aluop <= 0;
    end

(continued on next page)
always $ (opcode or funct or rt)
begin
if (opcode == 0)
begin
// ADD, ADDU (Add, Add Unsigned)
// R-Format
if (funct == 32 | funct == 33)
begin
regdst = 1'b1;
alusrc = 1'b0;
memtoreg = 1'b0;
rewrite = 1'b1;
memread = 1'b0;
memwrite = 1'b0;
jump = 1'b0;
beq = 1'b0;
bne = 1'b0;
blz = 1'b0;
bgz = 1'b0;
blez = 1'b0;
bgez = 1'b0;
elucz = 0;
end
// SUB, SUBU (Subtract, Subtract Unsigned)
else if (funct == 34 | funct == 35)
begin
regdst = 1'b1;
alusrc = 1'b0;
memtoreg = 1'b0;
rewrite = 1'b1;
memread = 1'b0;
memwrite = 1'b0;
jump = 1'b0;
beq = 1'b0;
bne = 1'b0;
blz = 1'b0;
bgz = 1'b0;
blez = 1'b0;
bgez = 1'b0;
elucz = 1;
end
// AND (Logical)
else if (funct == 36)
begin
regdst = 1'b1;
alusrc = 1'b0;
memtoreg = 1'b0;
rewrite = 1'b1;
memread = 1'b0;
memwrite = 1'b0;
jump = 1'b0;
beq = 1'b0;
bne = 1'b0;
blz = 1'b0;
bgz = 1'b0;
blez = 1'b0;
bgez = 1'b0;
elucz = 2;
end
(continued on next page)
// OR (Logical)
else if (funct == 37)
begin
  regdst = 1'b1;
  alusrc = 1'b0;
  memtoreq = 1'b0;
  regwrite = 1'b1;
  memread = 1'b0;
  memwrite = 1'b0;
  jump = 1'b0;
  beq = 1'b0;
  bne = 1'b0;
  bhi = 1'b0;
  bgez = 1'b0;
  bgtz = 1'b0;
  aluop = 3;
end

// XOR (Logical)
else if (funct == 38)
begin
  regdst = 1'b1;
  alusrc = 1'b0;
  memtoreq = 1'b0;
  regwrite = 1'b1;
  memread = 1'b0;
  memwrite = 1'b0;
  jump = 1'b0;
  beq = 1'b0;
  bne = 1'b0;
  bhi = 1'b0;
  bgez = 1'b0;
  bgtz = 1'b0;
  aluop = 4;
end

// XOR (Logical)
else if (funct == 39)
begin
  regdst = 1'b1;
  alusrc = 1'b0;
  memtoreq = 1'b0;
  regwrite = 1'b1;
  memread = 1'b0;
  memwrite = 1'b0;
  jump = 1'b0;
  beq = 1'b0;
  bne = 1'b0;
  bhi = 1'b0;
  bgez = 1'b0;
  bgtz = 1'b0;
  aluop = 5;
end

(continued on next page)
// SLT (Set on Less than)
else if (funct == 42)
begin
    regdest = 1'b1;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    regwrite = 1'b1;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blz = 1'b0;
    bgez = 1'b0;
    blez = 1'b0;
    bgtz = 1'b0;
    aluop = 10;
end

// SLLV (Shift Left Logical Variable)
else if (funct == 4)
begin
    regdest = 1'b1;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    regwrite = 1'b1;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blz = 1'b0;
    bgez = 1'b0;
    blez = 1'b0;
    bgtz = 1'b0;
    aluop = 6;
end

// SLL (Shift Left Logical)
else if (funct == 0)
begin
    regdest = 1'b1;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    regwrite = 1'b1;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blz = 1'b0;
    bgez = 1'b0;
    blez = 1'b0;
    bgtz = 1'b0;
    aluop = 7;
end

(continued on next page)
// SRLV (Shift Right Logical Variable)
else if (funct == 6)
begin
    regist = 1'b1;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    regwrite = 1'b1;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blt = 1'b0;
    bge = 1'b0;
    bltz = 1'b0;
    bgez = 1'b0;
    aluop = 8;
end

// SRL (Shift Right Logical)
else if (funct == 2)
begin
    regist = 1'b1;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    regwrite = 1'b1;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blt = 1'b0;
    bge = 1'b0;
    bltz = 1'b0;
    bgez = 1'b0;
    aluop = 9;
end

// ADDI, ADDIU (Add Immediate, Unsigned)
else if (opcode == 8 | opcode == 9)
begin
    regist = 1'b0;
    alusrc = 1'b1;
    memtoreg = 1'b0;
    regwrite = 1'b1;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blt = 1'b0;
    bge = 1'b0;
    bltz = 1'b0;
    bgez = 1'b0;
    aluop = 0;
end

(continued on next page)
// ANDI (Logical And Immediate)
else if (opcode == 12)
    begin
        regdst = 1'b0;
        alusrc = 1'b1;
        memstore = 1'b0;
        regwrite = 1'b1;
        memread = 1'b0;
        memwrite = 1'b0;
        jump = 1'b0;
        beq = 1'b0;
        bne = 1'b0;
        bls = 1'b0;
        bgs = 1'b0;
        blez = 1'b0;
        bgez = 1'b0;
        aluop = 2;
    end

// ORI (Logical Or Immediate)
else if (opcode == 13)
    begin
        regdst = 1'b0;
        alusrc = 1'b1;
        memstore = 1'b0;
        regwrite = 1'b1;
        memread = 1'b0;
        memwrite = 1'b0;
        jump = 1'b0;
        beq = 1'b0;
        bne = 1'b0;
        bls = 1'b0;
        bgs = 1'b0;
        blez = 1'b0;
        bgez = 1'b0;
        aluop = 3;
    end

// SLTI (Set on Less than Immediate)
else if (opcode == 10)
    begin
        regdst = 1'b0;
        alusrc = 1'b1;
        memstore = 1'b0;
        regwrite = 1'b1;
        memread = 1'b0;
        memwrite = 1'b0;
        jump = 1'b0;
        beq = 1'b0;
        bne = 1'b0;
        bnc = 1'b0;
        bhi = 1'b0;
        bgs = 1'b0;
        blez = 1'b0;
        bgez = 1'b0;
        aluop = 10;
    end

(continued on next page)
// SEQ (Branch on Equal)
else if (opcode == 4)
begin
    regdst = 1'b0;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    rewriter = 1'b0;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    bge = 1'b1;
    bne = 1'b0;
    bhi = 1'b0;
    bgez = 1'b0;
    bnez = 1'b0;
    aluop = 0;
end

// BNE (Branch on Not Equal)
else if (opcode == 5)
begin
    regdst = 1'b0;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    rewriter = 1'b0;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    bge = 1'b0;
    bne = 1'b1;
    bhi = 1'b0;
    bgez = 1'b0;
    bnez = 1'b0;
    aluop = 0;
end

// BGEZ (Branch on Greater than or Equal to Zero)
else if (opcode == 1 & rt == 1)
begin
    regdst = 1'b0;
    alusrc = 1'b0;
    memtoreg = 1'b0;
    rewriter = 1'b0;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    bge = 1'b0;
    bne = 1'b0;
    bhi = 1'b0;
    bgez = 1'b0;
    bnez = 1'b0;
    aluop = 0;
end

(continued on next page)
// BLTZ (Branch on Less than or Equal to Zero)
else if (opcode == 6 & rt == 0)
  begin
    regist = 1'b0;
    alusrc = 1'b0;
    memtoref = 1'b0;
    require = 1'b0;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blr = 1'b0;
    bsr = 1'b0;
    blez = 1'b0;
    bgtz = 1'b0;
    aluop = 0;
  end

// BTSZ (Branch on Greater than Zero)
else if (opcode == 7 & rt == 0)
  begin
    regist = 1'b0;
    alusrc = 1'b0;
    memtoref = 1'b0;
    require = 1'b0;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blr = 1'b0;
    bsr = 1'b0;
    blez = 1'b0;
    bgtz = 1'b0;
    aluop = 0;
  end

// BLTZ (Branch on Less than Zero)
else if (opcode == 1 & rt == 0)
  begin
    regist = 1'b0;
    alusrc = 1'b0;
    memtoref = 1'b0;
    require = 1'b0;
    memread = 1'b0;
    memwrite = 1'b0;
    jump = 1'b0;
    beq = 1'b0;
    bne = 1'b0;
    blr = 1'b0;
    bsr = 1'b0;
    blez = 1'b0;
    bgtz = 1'b0;
    aluop = 0;
  end

(continued on next page)
// J (Jump)
else if (opcode == 2)
begin
  regdst = 1'b0;
  alusrc = 1'b0;
  memstore = 1'b0;
  regwrite = 1'b0;
  memread = 1'b0;
  memwrite = 1'b0;
  jump = 1'b0;
  bne = 1'b0;
  bne = 1'b0;
  bsz = 1'b0;
  bnez = 1'b0;
  bgez = 1'b0;
  aluop = 0;
end

// LW (Load Word)
else if (opcode == 35)
begin
  regdst = 1'b0;
  alusrc = 1'b1;
  memstore = 1'b1;
  regwrite = 1'b1;
  memread = 1'b1;
  memwrite = 1'b0;
  jump = 1'b0;
  bne = 1'b0;
  bne = 1'b0;
  bsz = 1'b0;
  bnez = 1'b0;
  bgez = 1'b0;
  aluop = 0;
end

// SW (Store Word)
else if (opcode == 43)
begin
  regdst = 1'b0;
  alusrc = 1'b1;
  memstore = 1'b0;
  regwrite = 1'b0;
  memread = 1'b0;
  memwrite = 1'b1;
  jump = 1'b0;
  bne = 1'b0;
  bne = 1'b0;
  bsz = 1'b0;
  bnez = 1'b0;
  bgez = 1'b0;
  aluop = 0;
end
endmodule
2. Register File

module reg_addr_decoder_bhv(wr_addr_in, wr_addr_out);
  input [4:0] wr_addr_in; // Input write address
  output reg [51:0] wr_addr_out; // Output write address

  // Initialize values
  initial
    wr_addr_out = 0;

  // For each transition of the input write address, decode the 5-bit write address
  // into 32 individual write enable signals. Each of the 32 bits is routed to
  // one register in the register file as a write enable signal
  always @(wr_addr_in)
    begin
      if(wr_addr_in == 0)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_00000000_00000000;
      else if(wr_addr_in == 1)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_00000010;
      else if(wr_addr_in == 2)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_00000010;
      else if(wr_addr_in == 3)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_00010000;
      else if(wr_addr_in == 4)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_00100000;
      else if(wr_addr_in == 5)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 6)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 7)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 8)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 9)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 10)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 11)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 12)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 13)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 14)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 15)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 16)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 17)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 18)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 19)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 20)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 21)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 22)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
      else if(wr_addr_in == 23)
        wr_addr_out = 32'b00000000_00000000_00000000_00000000_01000000;
    end

(continued on next page)


```verilog
endmodule

module reg_array_bhv (wr_reg, wr_en, wr_data, clk, data_out0, data_out1, data_out2,
                        data_out3, data_out4, data_out5, data_out6, data_out7, data_out8,
                        data_out9, data_out10, data_out11, data_out12, data_out13, data_out14,
                        data_out15, data_out16, data_out17, data_out18, data_out19, data_out20,
                        data_out21, data_out22, data_out23, data_out24, data_out25, data_out26,
                        data_out27, data_out28, data_out29, data_out30, data_out31);

input [31:0] wr_data, wr_reg; // Write data and write enable signals
input clk, wr_en; // wr_en is the forwarded register flag from the control module
// Output wires for 32 registers
output wire [31:0] data_out0, data_out1, data_out2, data_out3, data_out4,
output wire [31:0] data_out5, data_out6, data_out7, data_out8, data_out9,
output wire [31:0] data_out10, data_out11, data_out12, data_out13, data_out14,
output wire [31:0] data_out15, data_out16, data_out17, data_out18, data_out19,
output wire [31:0] data_out20, data_out21, data_out22, data_out23, data_out24,
output wire [31:0] data_out25, data_out26, data_out27, data_out28, data_out29,
output wire [31:0] data_out30, data_out31;

// 32 behavioral registers, each receiving the data, write enable, and clock signals
// in addition to its decoded address bit.
reg_bhv reg0 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[0]), .clk(clk), .Q(data_out0));
reg_bhv reg1 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[1]), .clk(clk), .Q(data_out1));
reg_bhv reg2 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[2]), .clk(clk), .Q(data_out2));
reg_bhv reg3 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[3]), .clk(clk), .Q(data_out3));
reg_bhv reg4 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[4]), .clk(clk), .Q(data_out4));
reg_bhv reg5 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[5]), .clk(clk), .Q(data_out5));
reg_bhv reg6 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[6]), .clk(clk), .Q(data_out6));
reg_bhv reg7 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[7]), .clk(clk), .Q(data_out7));
reg_bhv reg8 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[8]), .clk(clk), .Q(data_out8));
reg_bhv reg9 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[9]), .clk(clk), .Q(data_out9));
reg_bhv reg10 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[10]), .clk(clk), .Q(data_out10));
reg_bhv reg11 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[11]), .clk(clk), .Q(data_out11));
reg_bhv reg12 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[12]), .clk(clk), .Q(data_out12));
reg_bhv reg13 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[13]), .clk(clk), .Q(data_out13));
reg_bhv reg14 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[14]), .clk(clk), .Q(data_out14));
reg_bhv reg15 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[15]), .clk(clk), .Q(data_out15));
reg_bhv reg16 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[16]), .clk(clk), .Q(data_out16));
reg_bhv reg17 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[17]), .clk(clk), .Q(data_out17));
reg_bhv reg18 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[18]), .clk(clk), .Q(data_out18));
reg_bhv reg19 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[19]), .clk(clk), .Q(data_out19));
reg_bhv reg20 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[20]), .clk(clk), .Q(data_out20));
reg_bhv reg21 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[21]), .clk(clk), .Q(data_out21));
reg_bhv reg22 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[22]), .clk(clk), .Q(data_out22));
reg_bhv reg23 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[23]), .clk(clk), .Q(data_out23));
reg_bhv reg24 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[24]), .clk(clk), .Q(data_out24));
reg_bhv reg25 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[25]), .clk(clk), .Q(data_out25));
reg_bhv reg26 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[26]), .clk(clk), .Q(data_out26));
reg_bhv reg27 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[27]), .clk(clk), .Q(data_out27));
reg_bhv reg28 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[28]), .clk(clk), .Q(data_out28));
reg_bhv reg29 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[29]), .clk(clk), .Q(data_out29));
reg_bhv reg30 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[30]), .clk(clk), .Q(data_out30));
reg_bhv reg31 (.D(wr_data), .wr_en(wr_en), .wr_reg(wr_reg[31]), .clk(clk), .Q(data_out31));

endmodule
```
module reg_array_multiplexer(read_addr, read_data, clk, data_in0, data_in1, data_in2, data_in3, data_in4, data_in5, data_in6, data_in7, data_in8, data_in9, data_in10, data_in11, data_in12, data_in13, data_in14, data_in15, data_in16, data_in17, data_in18, data_in19, data_in20, data_in21, data_in22, data_in23, data_in24, data_in25, data_in26, data_in27, data_in28, data_in29, data_in30, data_in31);

input clk;
input [4:0] read_addr; // Register address to read
input [31:0] data_in0, data_in1, data_in2, data_in3, data_in4, data_in5, data_in6, data_in7, data_in8, data_in9, data_in10, data_in11, data_in12, data_in13, data_in14, data_in15, data_in16, data_in17, data_in18, data_in19, data_in20, data_in21, data_in22, data_in23, data_in24, data_in25, data_in26, data_in27, data_in28, data_in29, data_in30, data_in31;
output reg [31:0] read_data;

initial
read_data = 0;

// The read multiplexer receives data from each of the 32 registers in the register
// file. However, only one register's data is sent from the ID stage as output on
// the lower-half of the clock cycle.
always @(negedge clk)
begin
if(read_addr == 0)
    read_data = data_in0;
else if(read_addr == 1)
    read_data = data_in1;
else if(read_addr == 2)
    read_data = data_in2;
else if(read_addr == 3)
    read_data = data_in3;
else if(read_addr == 4)
    read_data = data_in4;
else if(read_addr == 5)
    read_data = data_in5;
else if(read_addr == 6)
    read_data = data_in6;
else if(read_addr == 7)
    read_data = data_in7;
else if(read_addr == 8)
    read_data = data_in8;
else if(read_addr == 9)
    read_data = data_in9;
else if(read_addr == 10)
    read_data = data_in10;
else if(read_addr == 11)
    read_data = data_in11;
else if(read_addr == 12)
    read_data = data_in12;
else if(read_addr == 13)
    read_data = data_in13;
else if(read_addr == 14)
    read_data = data_in14;
else if(read_addr == 15)
    read_data = data_in15;
else if(read_addr == 16)
    read_data = data_in16;
else if(read_addr == 17)
    read_data = data_in17;
else if(read_addr == 18)
    read_data = data_in18;
else if(read_addr == 19)
    read_data = data_in19;
else if(read_addr == 20)
    read_data = data_in20;
else if(read_addr == 21)
    read_data = data_in21;
else if(read_addr == 22)
    read_data = data_in22;
else if(read_addr == 23)
    read_data = data_in23;
(continued on next page)
3. Sign Extension Module

```verilog
module sign_extend_16to32bit(half_word, full_word);
    input [15:0] half_word;
    output reg [31:0] full_word;
    initial
        full_word = 0;
    // 16 zero bits are concatenated with the lower 16 bits of the instruction
    always @(half_word)
        full_word = '{16'b00000000_00000000, half_word};
endmodule
```

F. ID/EX REGISTER

```verilog
module ID_EX_reg_tm2clk, regdst_a, regwrite_a, alusrc_a, memread_a, memwrite_a,
    mementr_a, beq_a, bne_a, blz_a, bgez_a, bltz_a, jumpl_a, icrl_a, aluop_a,
    regl_a, reg2_a, pc4_addr_a, sign_ext_a, jump_addr_a, wb_reg_addr1_a,
    wb_reg_addr2_a, shamt_a, fredat_a, frewrite_a, faluord_a, fmemread_a,
    fmemwrite_a, fmemtoreg_a, fbeq_a, fbeqz_a, fbltz_a, fblez_a, fbltz_a,
    fjump_a, firegp_a, falegp_a, fregl_a, freq2_a, fpc4_addr_a, fsign_ext_a,
    fjump_addr_a, fwb_reg_addr1_a, fwb_reg_addr2_a, fshamt_a,
    memreg_b, beq_b, bne_b, blz_b, bgez_b, bltz_b, jumpl_b, icrl_b, aluop_b,
    regl_b, reg2_b, pc4_addr_b, sign_ext_b, jump_addr_b, wb_reg_addr1_b,
    wb_reg_addr2_b, shamt_b, fredat_b, frewrite_b, faluord_b, fmemread_b,
    fmemwrite_b, fmemtoreg_b, fbeq_b, fbeqz_b, fbltz_b, fblez_b, fbltz_b,
    fjump_b, firegp_b, falegp_b, fregl_b, freq2_b, fpc4_addr_b, fsign_ext_b,
    fjump_addr_b, fwb_reg_addr1_b, fwb_reg_addr2_b, fshamt_b,
    regdst_c, regwrite_c, alusrc_c, memread_c, memwrite_c,
    mementr_c, beq_c, bne_c, blz_c, bgez_c, bltz_c, jumpl_c, icrl_c, aluop_c,
    regl_c, reg2_c, pc4_addr_c, sign_ext_c, jump_addr_c, wb_reg_addr1_c,
    wb_reg_addr2_c, shamt_c, fredat_c, frewrite_c, faluord_c, fmemread_c,
    fmemwrite_c, fmemtoreg_c, fbeq_c, fbeqz_c, fbltz_c, fblez_c, fbltz_c,
    fjump_c, firegp_c, falegp_c, fregl_c, freq2_c, fpc4_addr_c, fsign_ext_c,
    fjump_addr_c, fwb_reg_addr1_c, fwb_reg_addr2_c, fshamt_c);
```

(continued on next page)
// All of the control flags decoded in the ID stage are forwarded to the EX stage
input clk;
input rregst_a, rregst_b, rregst_c;
input regwrite_a, regwrite_b, regwrite_c;
input alusrc_a, alusrc_b, alusrc_c;
input memread_a, memread_b, memread_c;
input memwrite_a, memwrite_b, memwrite_c;
input memtoeq_a, memtoeq_b, memtoeq_c;
input bneq_a, bneq_b, bneq_c;
input bltz_a, bltz_b, bltz_c;
input bgteq_a, bgteq_b, bgteq_c;
input bgez_a, bgez_b, bgez_c;
input jump_a, jump_b, jump_c;
input irqeq_a, irqeq_b, irqeq_c;
input [3:0] aluopc_a, aluopc_b, aluopc_c;
input [4:0] wb_reg addra1_a, wb_reg addra1_b, wb_reg addra1_c;
input [4:0] wb_reg addra2_a, wb_reg addra2_b, wb_reg addra2_c;
input [4:0] shamt_a, shamt_b, shamt_c;
input [31:0] regt_a, regt_b, regt_c;
input [31:0] rs2_a, rs2_b, rs2_c;
input [31:0] jump addr_a, jump addr_b, jump addr_c;
input [31:0] pc4 addr_a, pc4 addr_b, pc4 addr_c;
input [31:0] sign ext_a, sign ext_b, sign ext_c;

// Wires connecting pipeline registers to voter
wire nregst_a, nregst_b, nregst_c;
wire nregwrite_a, nregwrite_b, nregwrite_c;
wire nalusrc_a, nalusrc_b, nalusrc_c;
wire nnmemread_a, nnmemread_b, nnmemread_c;
wire nnmemwrite_a, nnmemwrite_b, nnmemwrite_c;
wire nnmemtoeq_a, nnmemtoeq_b, nnmemtoeq_c;
wire nbneq_a, nbneq_b, nbneq_c;
wire nnlitz_a, nnlitz_b, nnlitz_c;
wire nbltz_a, nbltz_b, nbltz_c;
wire nbgteq_a, nbgteq_b, nbgteq_c;
wire nnjump_a, nnjump_b, nnjump_c;
wire nnirseq_a, nnirseq_b, nnirseq_c;
wire [3:0] naluopc_a, naluopc_b, naluopc_c;
wire [4:0] nbw_reg addra1_a, nbw_reg addra1_b, nbw_reg addra1_c;
wire [4:0] nbw_reg addra2_a, nbw_reg addra2_b, nbw_reg addra2_c;
wire [4:0] nshamt_a, nshamt_b, nshamt_c;
wire [31:0] nregt_a, nregt_b, nregt_c;
wire [31:0] nbneq_a, nbneq_b, nbneq_c;
wire [31:0] njump addr_a, njump_addr b, njump addr_c;
wire [31:0] npc4 addr_a, npc4 addr_b, npc4 addr_c;
wire [31:0] nsign ext_a, nsign ext_b, nsign ext_c;

(continued on next page)
// Output of the voter circuits
output freg2_a, freg2_b, freg2_c;
output fregds1_a, fregds1_b, fregds1_c;
output tmr_voter1_a, tmr_voter1_b, tmr_voter1_c;
output tmr_voter2_a, tmr_voter2_b, tmr_voter2_c;
output tmr_voter3_a, tmr_voter3_b, tmr_voter3_c;
output tmr_voter4_a, tmr_voter4_b, tmr_voter4_c;
output tmr_voter5_a, tmr_voter5_b, tmr_voter5_c;
output tmr_voter6_a, tmr_voter6_b, tmr_voter6_c;
output tmr_voter7_a, tmr_voter7_b, tmr_voter7_c;
output tmr_voter8_a, tmr_voter8_b, tmr_voter8_c;
output tmr_voter9_a, tmr_voter9_b, tmr_voter9_c;
output tmr_voter10_a, tmr_voter10_b, tmr_voter10_c;
output tmr_voter11_a, tmr_voter11_b, tmr_voter11_c;
output tmr_voter12_a, tmr_voter12_b, tmr_voter12_c;
output tmr_voter13_a, tmr_voter13_b, tmr_voter13_c;
output tmr_voter14_a, tmr_voter14_b, tmr_voter14_c;
output tmr_voter15_a, tmr_voter15_b, tmr_voter15_c;
output tmr_voter16_a, tmr_voter16_b, tmr_voter16_c;
output tmr_voter17_a, tmr_voter17_b, tmr_voter17_c;
output tmr_voter18_a, tmr_voter18_b, tmr_voter18_c;
output tmr_voter19_a, tmr_voter19_b, tmr_voter19_c;
output tmr_voter20_a, tmr_voter20_b, tmr_voter20_c;
output tmr_voter21_a, tmr_voter21_b, tmr_voter21_c;
output tmr_voter22_a, tmr_voter22_b, tmr_voter22_c;
output tmr_voter23_a, tmr_voter23_b, tmr_voter23_c;
output tmr_voter24_a, tmr_voter24_b, tmr_voter24_c;
output tmr_voter25_a, tmr_voter25_b, tmr_voter25_c;
output tmr_voter26_a, tmr_voter26_b, tmr_voter26_c;
output tmr_voter27_a, tmr_voter27_b, tmr_voter27_c;
output tmr_voter28_a, tmr_voter28_b, tmr_voter28_c;
output tmr_voter29_a, tmr_voter29_b, tmr_voter29_c;
output tmr_voter30_a, tmr_voter30_b, tmr_voter30_c;
output tmr_voter31_a, tmr_voter31_b, tmr_voter31_c;
output tmr_voter32_a, tmr_voter32_b, tmr_voter32_c;
output tmr_voter33_a, tmr_voter33_b, tmr_voter33_c;
output tmr_voter34_a, tmr_voter34_b, tmr_voter34_c;
output tmr_voter35_a, tmr_voter35_b, tmr_voter35_c;
output tmr_voter36_a, tmr_voter36_b, tmr_voter36_c;
output tmr_voter37_a, tmr_voter37_b, tmr_voter37_c;
output tmr_voter38_a, tmr_voter38_b, tmr_voter38_c;
output tmr_voter39_a, tmr_voter39_b, tmr_voter39_c;
output tmr_voter40_a, tmr_voter40_b, tmr_voter40_c;
output tmr_voter41_a, tmr_voter41_b, tmr_voter41_c;
output tmr_voter42_a, tmr_voter42_b, tmr_voter42_c;

// Voter declaration

// Voter declaration

tmr_voter_trip voter0 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter1 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter2 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter3 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter4 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter5 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter6 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter7 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter8 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter9 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter10 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter11 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter12 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter13 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter14 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter15 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter16 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter17 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter18 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter19 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter20 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );
tmr_voter_trip voter21 ( .a(freg2_a) , .b(freg2_b) , .c(freg2_c) );

// Output of the voter circuits
output reg2_a, reg2_b, reg2_c;
output reg3_a, reg3_b, reg3_c;
output reg4_a, reg4_b, reg4_c;
output tmr_voter1_a, tmr_voter1_b, tmr_voter1_c;
output tmr_voter2_a, tmr_voter2_b, tmr_voter2_c;
output tmr_voter3_a, tmr_voter3_b, tmr_voter3_c;
output tmr_voter4_a, tmr_voter4_b, tmr_voter4_c;
output tmr_voter5_a, tmr_voter5_b, tmr_voter5_c;
output tmr_voter6_a, tmr_voter6_b, tmr_voter6_c;
output tmr_voter7_a, tmr_voter7_b, tmr_voter7_c;
output tmr_voter8_a, tmr_voter8_b, tmr_voter8_c;
output tmr_voter9_a, tmr_voter9_b, tmr_voter9_c;
output tmr_voter10_a, tmr_voter10_b, tmr_voter10_c;
output tmr_voter11_a, tmr_voter11_b, tmr_voter11_c;
output tmr_voter12_a, tmr_voter12_b, tmr_voter12_c;
output tmr_voter13_a, tmr_voter13_b, tmr_voter13_c;
output tmr_voter14_a, tmr_voter14_b, tmr_voter14_c;
output tmr_voter15_a, tmr_voter15_b, tmr_voter15_c;
output tmr_voter16_a, tmr_voter16_b, tmr_voter16_c;
output tmr_voter17_a, tmr_voter17_b, tmr_voter17_c;
output tmr_voter18_a, tmr_voter18_b, tmr_voter18_c;
output tmr_voter19_a, tmr_voter19_b, tmr_voter19_c;
output tmr_voter20_a, tmr_voter20_b, tmr_voter20_c;
// Pipeline register declaration
ID_EX_reg reg_a (.clk(clk), .regst(regdst_a), .regwrite(regwrite_a),
.aluop(aluop_a), .memread(memread_a), .memwrite(memwrite_a), .memtoreg(memtoreg_a),
.pc4_adr(pc4_adr_a), .sign(sign_a), .jump(jump_a), .ireq(ireq_a), .ireg(ireg_a),
.reg2(reg2_a), .reg2ad(reg2ad_a), .reg_adr(reg_adr_a), .fpc4_adr(fpc4_adr_a),
.reg3(reg3_a), .reg3ad(reg3ad_a), .reg_adr2(reg_adr2_a), .shamt(shamt_a),
.regdst(regdst_a), .regwrite(regwrite_a), .aluop(reg_a), .aluop_adr(aluop_adr_a),
.memread(memread_a), .memwrite(memwrite_a), .memtoreg(memtoreg_a),
.signed(sign_a), .jump(jump_a), .ireq(ireq_a), .ireg(ireg_a),
.reg2(reg2_a), .reg2ad(reg2ad_a), .reg_adr(reg_adr_a), .fpc4_adr(fpc4_adr_a),
.reg3(reg3_a), .reg3ad(reg3ad_a), .reg_adr2(reg_adr2_a), .shamt(shamt_a),
.regdst(regdst_a), .regwrite(regwrite_a), .aluop(reg_a), .aluop_adr(aluop_adr_a),
.memread(memread_a), .memwrite(memwrite_a), .memtoreg(memtoreg_a),
.signed(sign_a), .jump(jump_a), .ireq(ireq_a), .ireg(ireg_a),
.reg2(reg2_a), .reg2ad(reg2ad_a), .reg_adr(reg_adr_a), .fpc4_adr(fpc4_adr_a),
.reg3(reg3_a), .reg3ad(reg3ad_a), .reg_adr2(reg_adr2_a), .shamt(shamt_a),
.regdst(regdst_a), .regwrite(regwrite_a), .aluop(reg_a), .aluop_adr(aluop_adr_a),
.memread(memread_a), .memwrite(memwrite_a), .memtoreg(memtoreg_a),
.signed(sign_a), .jump(jump_a), .ireq(ireq_a), .ireg(ireg_a),
.reg2(reg2_a), .reg2ad(reg2ad_a), .reg_adr(reg_adr_a), .fpc4_adr(fpc4_adr_a),
.reg3(reg3_a), .reg3ad(reg3ad_a), .reg_adr2(reg_adr2_a), .shamt(shamt_a),
.endmodule

ID_EX_reg reg_b (.clk(clk), .regst(regdst_b), .regwrite(regwrite_b),
.aluop(aluop_b), .memread(memread_b), .memwrite(memwrite_b), .memtoreg(memtoreg_b),
.pc4_adr(pc4_adr_b), .sign(sign_b), .jump(jump_b), .ireq(ireq_b), .ireg(ireg_b),
.reg2(reg2_b), .reg2ad(reg2ad_b), .reg_adr(reg_adr_b), .fpc4_adr(fpc4_adr_b),
.reg3(reg3_b), .reg3ad(reg3ad_b), .reg_adr2(reg_adr2_b), .shamt(shamt_b),
.regdst(regdst_b), .regwrite(regwrite_b), .aluop(reg_b), .aluop_adr(aluop_adr_b),
.memread(memread_b), .memwrite(memwrite_b), .memtoreg(memtoreg_b),
.signed(sign_b), .jump(jump_b), .ireq(ireq_b), .ireg(ireg_b),
.reg2(reg2_b), .reg2ad(reg2ad_b), .reg_adr(reg_adr_b), .fpc4_adr(fpc4_adr_b),
.reg3(reg3_b), .reg3ad(reg3ad_b), .reg_adr2(reg_adr2_b), .shamt(shamt_b),
.regdst(regdst_b), .regwrite(regwrite_b), .aluop(reg_b), .aluop_adr(aluop_adr_b),
.memread(memread_b), .memwrite(memwrite_b), .memtoreg(memtoreg_b),
.signed(sign_b), .jump(jump_b), .ireq(ireq_b), .ireg(ireg_b),
.reg2(reg2_b), .reg2ad(reg2ad_b), .reg_adr(reg_adr_b), .fpc4_adr(fpc4_adr_b),
.reg3(reg3_b), .reg3ad(reg3ad_b), .reg_adr2(reg_adr2_b), .shamt(shamt_b),
.endmodule

ID_EX_reg reg_c (.clk(clk), .regst(regdst_c), .regwrite(regwrite_c),
.aluop(aluop_c), .memread(memread_c), .memwrite(memwrite_c), .memtoreg(memtoreg_c),
.pc4_adr(pc4_adr_c), .sign(sign_c), .jump(jump_c), .ireq(ireq_c), .ireg(ireg_c),
.reg2(reg2_c), .reg2ad(reg2ad_c), .reg_adr(reg_adr_c), .fpc4_adr(fpc4_adr_c),
.reg3(reg3_c), .reg3ad(reg3ad_c), .reg_adr2(reg_adr2_c), .shamt(shamt_c),
.regdst(regdst_c), .regwrite(regwrite_c), .aluop(reg_c), .aluop_adr(aluop_adr_c),
.memread(memread_c), .memwrite(memwrite_c), .memtoreg(memtoreg_c),
.signed(sign_c), .jump(jump_c), .ireq(ireq_c), .ireg(ireg_c),
.reg2(reg2_c), .reg2ad(reg2ad_c), .reg_adr(reg_adr_c), .fpc4_adr(fpc4_adr_c),
.reg3(reg3_c), .reg3ad(reg3ad_c), .reg_adr2(reg_adr2_c), .shamt(shamt_c),
.regdst(regdst_c), .regwrite(regwrite_c), .aluop(reg_c), .aluop_adr(aluop_adr_c),
.memread(memread_c), .memwrite(memwrite_c), .memtoreg(memtoreg_c),
.signed(sign_c), .jump(jump_c), .ireq(ireq_c), .ireg(ireg_c),
.reg2(reg2_c), .reg2ad(reg2ad_c), .reg_adr(reg_adr_c), .fpc4_adr(fpc4_adr_c),
.reg3(reg3_c), .reg3ad(reg3ad_c), .reg_adr2(reg_adr2_c), .shamt(shamt_c),
.endmodule
G. EX STAGE

[Diagram of EX STAGE]
1. **ALU (ALU\_bhv)**

```verilog
module ALU\_bhv(a, b, shamt, cnt, result, seq, sne, slz, slze, sgz, sgze);

input [31:0] a, b; // ALU operands
input [4:0] shamt; // Number of spaces to shift in SLV and SHR instruction
output [3:0] cnt; // Indicates the type of ALU operation to be performed
output reg seq, sne, slz, slze, sgz, sgze; // Comparison bits

initial begin
    result <= 0;
    seq <= 0;
    sne <= 0;
    slz <= 0;
    slze <= 0;
    sgz <= 0;
    sgze <= 0;
end

    // Set ALU comparison bits seq and sne
    always @ (a or b or cnt)
    begin
        if (a == b)
            begin
                seq = 1'b1;
                sne = 1'b0;
            end
        else
            begin
                seq = 1'b0;
                sne = 1'b1;
            end
    end

    // Set ALU comparison bits sgz, sgze, slz, and slze
    always @ (a or b or cnt)
    begin
        if (a[31] == 0)
            begin
                if (a == 0)
                    begin
                        sgz = 0;
                        sgze = 1;
                        slz = 0;
                        slze = 1;
                    end
                else
                    begin
                        sgz = 1;
                        sgze = 1;
                        slz = 0;
                        slze = 0;
                    end
            end
        else
            begin
                if (a[31] == 1)
                    begin
                        sgz = 0;
                        sgze = 0;
                        slz = 1;
                        slze = 1;
                    end
            end
    end

endmodule
```

(continued on next page)
// ALU Operations
always @(a or b or cnt)
begin
  if (cnt == 0)
    result = a + b; // Addition
  else if (cnt == 1)
    result = a - b; // Subtraction
  else if (cnt == 2)
    result = a & b; // AND
  else if (cnt == 3)
    result = a | b; // OR
  else if (cnt == 4)
    result = a ^ b; // XOR
  else if (cnt == 5)
    result = ~(a | b); // NOR
  else if (cnt == 6)
    result = a << b; // SLL
  else if (cnt == 7)
    result = a << shamt; // SLL
  else if (cnt == 8)
    result = a >> b; // SRL
  else if (cnt == 9)
    result = a >> shamt; // SRL
  else if (cnt == 10) // SLT
    begin
      if (a < b)
        result = 1;
      else
        result = 0;
    end
end
endmodule
2. Address Adder (addr_adder)

```verilog
testbench
module addr_adder(addr_in, increment, addr_out);
    input [31:0] addr_in, increment;
    output reg [31:0] addr_out;

    initial
        addr_out = 0;

    // The offset is added to the PC + 4 address to obtain the potential branch
    // address result. This result will only be forwarded if the criterion for a
    // branch instruction are met in the MUX stage.
    always @(addr_in or increment)
        addr_out = addr_in + increment;

endmodule
```

3. Second ALU Operand Multiplexer (multiplex_2to1_nbit)

```verilog
testbench
module multiplex_2to1_nbit(reg1, reg2, sel, sel_reg);
    input [31:0] reg1, reg2;
    input sel;
    output reg [31:0] sel_reg;

    initial
        sel_reg = 0;

    // Select either the second register file output value or the sign extended
    // immediate value to use as an operand to the ALU.
    always @(reg1 or reg2 or sel)
        begin
            if (sel == 1'b0)
                sel_reg = reg1;
            else
                sel_reg = reg2;
        end

endmodule
```

4. Immediate Offset Shifting Module (shift_left2)

```verilog
testbench
module shift_left2(inst_in, inst_out);
    input [31:0] inst_in;
    output reg [31:0] inst_out;

    initial
        inst_out = 0;

    // Shifts the immediate field of the instruction left two bits to translate it
    // from a word offset to a byte offset.
    always @(inst_in)
        inst_out = inst_in << 2;

endmodule
```
5. Register Address Multiplexer (multiplex_2to1_reg_addr)

module multiplex_2to1_reg_addr(addr1, addr2, s, wb_addr);
input [4:0] addr1;
input [4:0] addr2;
input s;
output reg [4:0] wb_addr;

initial
  wb_addr = 0;
always @ (addr1 or addr2 or s)
  begin
    if (s == '1'b0)
      wb_addr = addr1;
    else
      wb_addr = addr2;
  end
endmodule

H. EX/MEM REGISTER

module EX_MEM_reg_rmr2(memread_a, memwrite_a, memtoreg_a, regwrite_a, beq_a, bne_a, bgez_a, bltz_a, bgtz_a, jumpp_a, jeqz_a, nge_a, slt_a, sltu_a, sgez_a, sgtz_a, jumpt_a, br_addr_a, jump_addr_a, alu_result_a, wr_data_a, wb_reg_addr_a, fmemread_a, fmemwrite_a, fmemtoreg_a, fregwrite_a, fbeq_a, fbr_addr_a, fbltz_a, fbgz_a, fbgez_a, fbltz_a, fbgez_a, fjumpp_a, fjeqz_a, fnte_a, fsle_a, fsleb_a, fsslh_a, fsub_a, fsubu_a, fadd_a, fw_result_a, fwb_reg_addr_a, memread_b, memwrite_b, memtoreg_b, regwrite_b, beq_b, bne_b, bgez_b, bltz_b, bgtz_b, jumpp_b, jeqz_b, nge_b, slt_b, sltu_b, sgez_b, sgtz_b, jumpt_b, br_addr_b, jump_addr_b, alu_result_b, wr_data_b, wb_reg_addr_b, fmemread_b, fmemwrite_b, fmemtoreg_b, fregwrite_b, fbeq_b, fbr_addr_b, fbltz_b, fbgez_b, fbgez_b, fjumpp_b, fjeqz_b, fnte_b, fsle_b, fsleb_b, fsslh_b, fsub_b, fsubu_b, fadd_b, fw_result_b, fwb_reg_addr_b, memread_c, memwrite_c, memtoreg_c, regwrite_c, beq_c, bne_c, bgez_c, bltz_c, bgtz_c, jumpp_c, jeqz_c, nge_c, slt_c, sltu_c, sgez_c, sgtz_c, jumpt_c, br_addr_c, jump_addr_c, alu_result_c, wr_data_c, wb_reg_addr_c, fmemread_c, fmemwrite_c, fmemtoreg_c, fregwrite_c, fbeq_c, fbr_addr_c, fbltz_c, fbgez_c, fbgez_c, fjumpp_c, fjeqz_c, fnte_c, fsle_c, fsleb_c, fsslh_c, fsub_c, fsubu_c, fadd_c, fw_result_c, fwb_reg_addr_c, clk);

(continued on next page)
// Inputs
input clk;
input memread_a, memread_b, memread_c;
input memwrite_a, memwrite_b, memwrite_c;
input memtoreg_a, memtoreg_b, memtoreg_c;
input regwrite_a, regwrite_b, regwrite_c;
input beq_a, beq_b, beq_c;
input bne_a, bne_b, bne_c;
input blzl_a, blzl_b, blzl_c;
input bgez_a, bgez_b, bgez_c;
input bgez_a, bgez_b, bgez_c;
input jump_a, jump_b, jump_c;
input sce_0, sce_1, sce_c;
input aeq_a, aeq_b, aeq_c;
input sltz_a, sltz_b, sltz_c;
input sreg_a, sreg_b, sreg_c;
input slez_a, slez_b, slez_c;
input svez_a, svez_b, svez_c;
input lreg_a, lreg_b, lreg_c;
input [31:0] br_addr_a, br_addr_b, br_addr_c;
input [31:0] jump_addr_a, jump_addr_b, jump_addr_c;
input [31:0] alu_result_a, alu_result_b, alu_result_c;
input [31:0] wr_data_a, wr_data_b, wr_data_c;
input [4:0] wb_reg_addr_a, wb_reg_addr_b, wb_reg_addr_c;

// Wires connecting pipeline registers to voters
wire memread_a, memread_b, memread_c;
wire memwrite_a, memwrite_b, memwrite_c;
wire memtoreg_a, memtoreg_b, memtoreg_c;
wire regwrite_a, regwrite_b, regwrite_c;
wire beq_a, beq_b, beq_c;
wire bne_a, bne_b, bne_c;
wire blzl_a, blzl_b, blzl_c;
wire bgez_a, bgez_b, bgez_c;
wire jump_a, jump_b, jump_c;
wire aeq_a, aeq_b, aeq_c;
wire sltz_a, sltz_b, sltz_c;
wire slez_a, slez_b, slez_c;
wire svez_a, svez_b, svez_c;
wire lreg_a, lreg_b, lreg_c;
wire [31:0] br_addr_a, br_addr_b, br_addr_c;
wire [31:0] jump_addr_a, jump_addr_b, jump_addr_c;
wire [31:0] alu_result_a, alu_result_b, alu_result_c;
wire [31:0] wr_data_a, wr_data_b, wr_data_c;
wire [4:0] wb_reg_addr_a, wb_reg_addr_b, wb_reg_addr_c;

(continued on next page)
// Outputs
output mmread a, mmread b, mmread c;
output mmwrite a, mmwrite b, mmwrite c;
output fmreg a, fmreg b, fmreg c;
output frewrite a, frewrite b, frewrite c;
output fbeq a, fbeq b, fbeq c;
output fina a, fina b, fina c;
output fols a, fols b, fols c;
output fbeq a, fbeq b, fbeq c;
output fjump a, fjump b, fjump c;
output fseq a, fseq b, fseq c;
output ffalse a, ffalse b, ffalse c;
output ftrues a, ftrues b, ftrues c;
output firq a, firq b, firq c;
output [31:0] fbr_addr a, fbr_addr b, fbr_addr c;
output [31:0] fjump_addr a, fjump_addr b, fjump_addr c;
output [31:0] falu_result a, falu_result b, falu_result c;
output [31:0] fwr_data a, fwr_data b, fwr_data c;
output [4:0] fwb_reg_addr a, fwb_reg_addr b, fwb_reg_addr c;

// Voter declaration
voter_trip voter0 (.a(mmread_a), .b(mmread_b), .c(mmread_c), .data_out_a(fmreg_a), .data_out_b(fbeq_b), .data_out_c(fbeq_c));
voter_trip voter1 (.a(mmwrite_a), .b(mmwrite_b), .c(mmwrite_c), .data_out_b(fmmwrite_b), .data_out_c(fmmwrite_c));
voter_trip voter2 (.a(fmreg_a), .b(fmreg_b), .c(fmreg_c), .data_out_c(fmreg_c));
voter_trip voter3 (.a(fmmwrite_a), .b(fmmwrite_b), .c(fmmwrite_c), .data_out_c(fmmwrite_c));
voter_trip voter4 (.a(fnreg_a), .b(fnreg_b), .c(fnreg_c), .data_out_c(fnreg_c));
voter_trip voter5 (.a(fbr_addr_a), .b(fbr_addr_b), .c(fbr_addr_c), .data_out_c(fbr_addr_c));
voter_trip voter6 (.a(fn2a), .b(fn2b), .c(fn2c), .data_out_c(fn2c));
voter_trip voter7 (.a(fjump_addr_a), .b(fjump_addr_b), .c(fjump_addr_c), .data_out_c(fjump_addr_c));
voter_trip voter8 (.a(fbr_addr_a), .b(fbr_addr_b), .c(fbr_addr_c), .data_out_c(fbr_addr_c));
voter_trip voter9 (.a(fnreg_a), .b(fnreg_b), .c(fnreg_c), .data_out_c(fnreg_c));
voter_trip voter10 (.a(mmwrite_a), .b(mmwrite_b), .c(mmwrite_c), .data_out_a(fmmwrite_a), .data_out_b(fmmwrite_b), .data_out_c(fmmwrite_c));
voter_trip voter11 (.a(fnreg_a), .b(fnreg_b), .c(fnreg_c), .data_out_a(fnreg_a), .data_out_b(fnreg_b), .data_out_c(fnreg_c));
voter_trip voter12 (.a(fn2a), .b(fn2b), .c(fn2c), .data_out_a(fn2a), .data_out_b(fn2b), .data_out_c(fn2c));
voter_trip voter13 (.a(fnreg_a), .b(fnreg_b), .c(fnreg_c), .data_out_a(fnreg_a), .data_out_b(fnreg_b), .data_out_c(fnreg_c));
voter_trip voter14 (.a(fn2a), .b(fn2b), .c(fn2c), .data_out_a(fn2a), .data_out_b(fn2b), .data_out_c(fn2c));
voter_trip voter15 (.a(fnreg_a), .b(fnreg_b), .c(fnreg_c), .data_out_a(fnreg_a), .data_out_b(fnreg_b), .data_out_c(fnreg_c));
voter_trip voter16 (.a(fbr_addr_a), .b(fbr_addr_b), .c(fbr_addr_c), .data_out_a(fbr_addr_a), .data_out_b(fbr_addr_b), .data_out_c(fbr_addr_c));
voter_trip voter17 (.a(fjump_addr_a), .b(fjump_addr_b), .c(fjump_addr_c), .data_out_a(fjump_addr_a), .data_out_b(fjump_addr_b), .data_out_c(fjump_addr_c));
voter_trip voter18 (.a(fnreg_a), .b(fnreg_b), .c(fnreg_c), .data_out_a(fnreg_a), .data_out_b(fnreg_b), .data_out_c(fnreg_c));
voter_trip voter19 (.a(fbr_addr_a), .b(fbr_addr_b), .c(fbr_addr_c), .data_out_a(fbr_addr_a), .data_out_b(fbr_addr_b), .data_out_c(fbr_addr_c));
voter_trip voter20 (.a(fjump_addr_a), .b(fjump_addr_b), .c(fjump_addr_c), .data_out_a(fjump_addr_a), .data_out_b(fjump_addr_b), .data_out_c(fjump_addr_c));
voter_trip voter21 (.a(fnreg_a), .b(fnreg_b), .c(fnreg_c), .data_out_a(fnreg_a), .data_out_b(fnreg_b), .data_out_c(fnreg_c));

(continued on next page)
// Pipeline register declaration
EX_MEM_reg reg_a (.clk(clk), .memread(memread_a), .memwrite(memwrite_a),
.memreg(memoreg_a), .regwrite(regwrite_a), .beq(beq_a), .bnez(one_a),
.blz(blz_a), .bgtz(bgtz_a), .blez(blez_a), .bnez(bnez_a), .sext(sext_a),
.slt(slt_a), .sltz(sltz_a), .sgt(sgtz_a), .sgtz(sgtz_a), .ireg(ireg_a),
.jump(jump_a), .b.add(br_addr_a), .jump.addr(jump_addr_a),
.alu_result(alu_result_a), .wr_data(wr_data_a), .wb_reg.addr(wb_reg_addr_a),
.memreg(memread_a), .memwrite(memwrite_a), .memreg(memoreg_a),
.regwrite(regwrite_a), .fbeq(fbeq_a), .fjumpl(fjumpl_a),
.blz(fblz_a), .fblez(fblez_a), .fsgt(fsgtz_a), .fsgtz(fsgtz_a), .fsext(fsext_a),
.fireq(fireq_a), .fjump(fjump_a), .fjumpl(fjumpl_a), .faddr(faddr_a),
.alu_result(falu_result_a), .fwr_data(fwr_data_a), .fw_reg.addr(fwb_reg_addr_a));

EX_MEM_reg reg_b (.clk(clk), .memread(memread_b), .memwrite(memwrite_b),
.memreg(memoreg_b), .regwrite(regwrite_b), .beq(beq_b), .bnez(one_b),
.blz(blz_b), .bgtz(bgtz_b), .blez(blez_b), .bnez(bnez_b), .ireg(ireg_b),
.jump(jump_b), .b.add(br_addr_b), .jump.addr(jump_addr_b),
.alu_result(alu_result_b), .wr_data(wr_data_b), .wb_reg.addr(wb_reg_addr_b),
.memreg(memread_b), .memwrite(memwrite_b), .memreg(memoreg_b),
.regwrite(regwrite_b), .fbeq(fbeq_b), .fjumpl(fjumpl_b),
.blz(fblz_b), .fblez(fblez_b), .fsgt(fsgtz_b), .fsgtz(fsgtz_b), .fsext(fsext_b),
.fireq(fireq_b), .fjump(fjump_b), .fjumpl(fjumpl_b), .faddr(faddr_b),
.alu_result(falu_result_b), .fwr_data(fwr_data_b), .fw_reg.addr(fwb_reg_addr_b));

EX_MEM_reg reg_c (.clk(clk), .memread(memread_c), .memwrite(memwrite_c),
.memreg(memoreg_c), .regwrite(regwrite_c), .beq(beq_c), .bnez(one_c),
.blz(blz_c), .bgtz(bgtz_c), .blez(blez_c), .bnez(bnez_c), .ireg(ireg_c),
.jump(jump_c), .b.add(br_addr_c), .jump.addr(jump_addr_c),
.alu_result(alu_result_c), .wr_data(wr_data_c), .wb_reg.addr(wb_reg_addr_c),
.memreg(memread_c), .memwrite(memwrite_c), .memreg(memoreg_c),
.regwrite(regwrite_c), .fbeq(fbeq_c), .fjumpl(fjumpl_c),
.blz(fblz_c), .fblez(fblez_c), .fsgt(fsgtz_c), .fsgtz(fsgtz_c), .fsext(fsext_c),
.fireq(fireq_c), .fjump(fjump_c), .fjumpl(fjumpl_c), .faddr(faddr_c),
.alu_result(falu_result_c), .fwr_data(fwr_data_c), .fw_reg.addr(fwb_reg_addr_c));

//endmodule

endmodule
I. MEM STAGE
1. **Branch Multiplexer (branch_multiplexer)**

```verilog
module branch_multiplexer(beq, bne, blz, bgs, blez, bgez, seq, snc, slz, sgs, slez, sgez, branch);

      input beq, bne, blz, bgs, blez, bgez, seq, snc, slz, sgs, slez, sgez;
      output reg branch;

      initial
      
branch = 0;

      // Branch conditions are met when a pair of branch flags and comparison flags
      // are both high. This only triggers the branch/jump selector hit for the
      // address selector module, which will forward the result of this module and the
      // branch address calculated in the EX stage.
      always @(eq or bne or blz or bgs or blez or bgez or seq or snc or slz or sgs or
      slez or sgez)
        branch = (beq & eq) | (bne & bne) | (blz & blz) | (bgs & bgs) | (blez & blez) | (bgez & bgez);

endmodule
```
2. Address Selector Module (addr_sel_module)

```verilog
class addr_sel_module (br_addr, jump_addr, br_sel, jump_sel, addr_out, br_jump_sel_out)

module addr_sel_module (br_addr, jump_addr, br_sel, jump_sel, addr_out, br_jump_sel_out);

input br_sel, jump_sel; // Branch and jump selector bits set during the ID stage.
input [31:0] br_addr, jump_addr; // Branch and jump addresses set during the IF
// and EX stages, respectively.
output reg [31:0] addr_out; // Output address
output reg br_jump_sel_out; // Output flag

initial
begin
    addr_out = 0;
    br_jump_sel_out = 0;
end

always @(br_addr or jump_addr or br_sel or jump_sel)
begin
    if (br_sel & ~jump_sel) // Branch criterion met
    begin
        addr_out = br_addr; // Forward branch address to the PC selector module
        br_jump_sel_out = 1; // Raise the branch/jump selector flag
    end
    else if (~br_sel & jump_sel) // Jump criterion met
    begin
        addr_out = jump_addr; // Forward jump address to the PC selector module
        br_jump_sel_out = 1; // Raise the branch/jump selector flag
    end
    else
    begin
        br_jump_sel_out = 0; // No branch or jump criterion have been met, the next
        // instruction address should be the normal PC + 1.
    end
endmodule
```

3. Data Memory (data_mem)

```verilog
module data_mem(addr, data_in, uart_data_in, wr_en, re_en, irq, tx_enable, tx_empty, ld_tx_data_flag, data_out, uart_data_out);

input [31:0] addr, data_in; // Input data and address
input [7:0] uart_data_in; // UART write address
input wr_en, re_en, irq, tx_enable, tx_empty; // Flags and control bits
output reg [31:0] data_out; // Data read from memory
output reg [7:0] uart_data_out; // Data read from special UART address
output reg ld_tx_data_flag; // Flag to initiate UART transmit

reg [7:0] data_byte0, data_byte1, data_byte2, data_byte3; // Byte components
reg [7:0] data_memory_module [0:4999]; // Memory data structure
reg [31:0] uart_wr_addr; // Current UART write address
reg [31:0] data_assemble; // Assembly or byte-wide components for load word instructions

always @(addr or data_in or uart_data_in or wr_en or re_en or irq or tx_enable or
tx_empty)
begin
    ld_tx_data_flag = 0; // Ensure the UART transmit flag is low if there is no irq
    if (~irq)
    begin
        if (wr_en & ~re_en) // Memory write operation
        begin
            // If the write address is in the UART transmit block, ensure the entire
            // word is written into the last four addresses.
```

(continued on next page)
if (addr == 44996 | addr == 44997 | addr == 44998 | addr == 44999)
begin
    data_memory_module [44996] = data_in [31:24];
data_memory_module [44997] = data_in [23:16];
data_memory_module [44998] = data_in [15:8];
data_memory_module [44999] = data_in [7:0];
end
// The remaining if/else statements prevent data from being written
// into the block of address (22496 - 22499) where data memory stores
// the current address for UART writes. If an instruction is
// attempting to write outside the bounds of its address space, the
// most significant bits that fit in the allocated space will be written.
else if (addr == 22495)
data_memory_module [addr] = data_in [31:24];
else if (addr == 22494)
begin
data_memory_module [addr] = data_in [31:24];
data_memory_module [addr + 1] = data_in [23:16];
end
else if (addr == 22493)
begin
data_memory_module [addr] = data_in [31:24];
data_memory_module [addr + 1] = data_in [23:16];
data_memory_module [addr + 2] = data_in [15:8];
end
else if (addr < 22493)
begin
data_memory_module [addr] = data_in [31:24];
data_memory_module [addr + 1] = data_in [23:16];
data_memory_module [addr + 2] = data_in [15:8];
data_memory_module [addr + 3] = data_in [7:0];
end
if (w_en & re_en) // Memory read operation
begin
    // If a read operation occurs where the address specified is in the
    // UART transmit block, only a single byte of data from the very last
    // address in memory will be sent to the UART for transmission. The
    // memory addresses will then be sent to the register specified in
    // the load instruction, with the least significant byte being the
    // one transmitted by the UART.
    if (addr == 44996 | addr == 44997 | addr == 44998 | addr == 44999)
begin
        uart_data_out = data_memory_module [44998]; // Send the least
        // significant byte to
        // the UART.
        ld_tx_data_flag = 1; // Flag to trigger UART transmission
        // Standard operation to write an entire word to a register as
        // part of the load word instruction.
data_byte0 = data_memory_module [44994];
data_byte1 = data_memory_module [44997];
data_byte2 = data_memory_module [44999];
data_byte3 = data_memory_module [44999];
data_out = {data_byte0, data_byte1, data_byte2, data_byte3};
end
    // Memory in the address space, 22499 < addr < 44996, is reserved for the
    // UART and can only be written in single bytes. Any load word instruction
    // to this memory space will return data as the least significant byte of
    // the register.
else if (addr > 22499 & addr < 44996)
begin
    data_byte0 = $`h00000000;
data_byte1 = $`h00000000;
data_byte2 = $`h00000000;
data_byte3 = data_memory_module [addr];
data_out = {data_byte0, data_byte1, data_byte2, data_byte3};
end
(continued on next page)
end

else if (addr >= 0 & addr < 22497)
begin
    data_byte0 = data.memory_module[addr];
    data_byte1 = data.memory_module[addr + 1];
    data_byte2 = data.memory_module[addr + 2];
    data_byte3 = data.memory_module[addr + 3];
    data_out = (data_byte0, data_byte1, data_byte2, data_byte3);
end
else if (addr == 22497)
begin
    data_byte0 = $b00000000;
    data_byte1 = data.memory_module[addr];
    data_byte2 = data.memory_module[addr + 1];
    data_byte3 = data.memory_module[addr + 2];
    data_out = (data_byte0, data_byte1, data_byte2, data_byte3);
end
else if (addr == 22498)
begin
    data_byte0 = $b00000000;
    data_byte1 = $b00000000;
    data_byte2 = data.memory_module[addr];
    data_byte3 = data.memory_module[addr + 1];
    data_out = (data_byte0, data_byte1, data_byte2, data_byte3);
end
else if (addr == 22499)
begin
    data_byte0 = $b00000000;
    data_byte1 = data.memory_module[addr];
    data_byte2 = data.memory_module[addr + 1];
    data_byte3 = data.memory_module[addr + 2];
    data_out = (data_byte0, data_byte1, data_byte2, data_byte3);
end
end

else if (irq & tx_enable & tx_empty) // Interrupt request present
begin
    // Temporary bytes are assigned to the current write address in UART memory.
    // The temporary bytes are then concatenated to obtain a complete memory
    // address that is then used as a reference in the memory assignment.
    data_byte0 = data.memory_module[22496];
    data_byte1 = data.memory_module[22497];
    data_byte2 = data.memory_module[22498];
    data_byte3 = data.memory_module[22499];
    uart_wr_addr = (data_byte0, data_byte1, data_byte2, data_byte3);
    data.memory_module[uart_wr_addr] = uart_data_in;
    // In the event the UART reaches its last write address, the address counter
    // wraps around to the first address and updates the placeholder in memory.
    if (uart_wr_addr == 44995)
    begin
        uart_wr_addr = 22500;
        data_byte0 = uart_wr_addr[31:24];
        data_byte1 = uart_wr_addr[23:16];
        data_byte2 = uart_wr_addr[15:8];
        data_byte3 = uart_wr_addr[7:0];
        data.memory_module[22496] = data_byte0;
        data.memory_module[22497] = data_byte1;
        data.memory_module[22498] = data_byte2;
        data.memory_module[22499] = data_byte3;
    end
    // If the UART has not yet reached the end of data memory, the address is
    // incremented by one, and the placeholder address is updated.
    else
    begin
        uart_wr_addr = uart_wr_addr + 1;
        data_byte0 = uart_wr_addr[31:24];
        data_byte1 = uart_wr_addr[23:16];
        data_byte2 = uart_wr_addr[15:8];
        data_byte3 = uart_wr_addr[7:0];
    end
end
}
data_memory_module[2496] = data_byte0;
data_memory_module[2497] = data_byte1;
data_memory_module[2498] = data_byte2;
data_memory_module[2499] = data_byte3;
end

initial // Data memory is initialized from a file.
begin
  $readmemh("payload_processor_data_memory_init.txt", data_memory_module);
end
endmodule
4. PC Selector Module

module pc_selector_module(br_jump_sel, irq, ack, br_jump_addr, po4_addr, irqp, npo);

parameter isr_start_addr = 44980;
parameter isr_end_addr = 44996;

input wire br_jump_sel, irq;
input wire [31:0] br_jump_addr, po4_addr;
output reg [31:0] npo;
output reg ack, irqp;
reg irq_processing;
reg [31:0] irq_return_addr;
reg [31:0] curr_pc;

initial
begin
  npo = 4;
  ack = 0;
  irq = 0;
  irq_processing = 0;
  irq_return_addr = 44999;
  curr_pc = 0;
end

always @(posedge br_jump_sel)
begin
  if (irq_processing)
    irq_return_addr = br_jump_addr;
end

always @(br_jump_sel or irq or br_jump_addr or po4_addr)
begin
  // An interrupt request has occurred, and the system is not currently
  // servicing any interrupts.
  if (irq)
    begin
      ack = 1; // Acknowledges IRQ of UART and flags the UART to place
                     // data into the appropriate MEM stage inputs.
      irqp = 0;
      irq_processing = 1; // Places processor in an interrupt state
                     // If the branch/jump address selector is low, there is no branch
                     // or jump instruction about to be processed. The return address
                     // should be PC + 4.
      if (~br_jump_sel)
        begin
          irq_return_addr = po4_addr;
          curr_pc = 1sr_start_addr;
          npo = curr_pc; // Set the new PC for the ISR
          curr_pc = curr_pc + 4; // Set the ISR PC to the
                               // first ISR address.
        end
      else
        begin
          irq_return_addr = br_jump_addr;
          curr_pc = 1sr_start_addr;
          npo = curr_pc; // Set the new PC for the ISR
          curr_pc = curr_pc + 4; // Set the ISR PC to the
                               // first ISR address.
        end
    end

end

(continued on next page)
else begin
    if (irq_processing)
      begin
        ack = 0;
        // If the next PC value is still less than the end address of the
        // ISR, the system is still busy performing the ISR on its
        // current UART data. The next PC address should be the
        // instruction immediately following the current one.
        if (pc4_addr <= isr_end_addr)
          begin
            curr_pc = pc4_addr;
            npc = curr_pc; // Update to the next PC address
            // Trigger the IRQP flag bit during the final ISR instruction
            if (pc4_addr == isr_end_addr)
              irqp = 1;
          end
        // If the next PC value is greater than the end address of the ISR,
        // the system has completed its ISR and the next PC address should
        // be the previously saved return address.
        else begin
          npc = irq_return_addr;
          curr_pc = irq_return_addr;
          irq_processing = 0;
          irqp = 0;
        end
      end // If no interrupt requests are present, and the system is not servicing
      // an interrupt, the next PC value is selected based on the state of the
      // br_jump_sel signal.
    else begin
      ack = 0;
      // The br_jump_sel flag is set high, meaning there is a branch or
      // jump address that must serve as the next PC address.
      if (br_jump_sel)
        begin
          curr_pc = br_jump_addr;
          npc = curr_pc;
          curr_pc = curr_pc + 4;
        end
      // The br_jump_sel is set low, meaning there is no branch or jump
      // address, and the next PC value should be FC + 4.
    else begin
      // Perform special processing upon return from the
      if (curr_pc == irq_return_addr)
        begin
          curr_pc = curr_pc + 4;
          npc = curr_pc;
          irq_return_addr = 44999; // Ensures the if statement is
          // not re-entered during the
          // next clock cycle
          irqp = 0;
        end
      // Standard assignment of the next PC address
    else begin
      curr_pc = pc4_addr;
      npc = curr_pc;
      irqp = 0;
    end
    end // irq_processing = 0;
end
module multiplex_2to1_32bit_tripv2(input_a, input_b, input_c, input2_a, input2_b, input2_c, sel_a, sel_b, sel_c, data_out_a, data_out_b, data_out_c);
    input [31:0] input_a, input_b, input_c, input2_a, input2_b, input2_c;
    input sel_a, sel_b, sel_c;
    output [31:0] data_out_a, data_out_b, data_out_c;

    multiplex_2to1_32bitv2 mux1 (.a(input_a), .b(input2_a), .sel(sel_a), .data_out(data_out_a));
    multiplex_2to1_32bitv2 mux2 (.a(input_b), .b(input2_b), .sel(sel_b), .data_out(data_out_b));
    multiplex_2to1_32bitv2 mux3 (.a(input_c), .b(input2_c), .sel(sel_c), .data_out(data_out_c));
endmodule
1. Write–Back Multiplexer

module multiplex_2to1_32bitv2(a, b, sel, data_out);

    input [31:0] a, b;
    input sel;

    output reg [31:0] data_out;

    always @(a or b or sel)
        begin
            if (~sel)
                data_out = a;
            else
                data_out = b;
        end
endmodule
APPENDIX B. TEST BENCHES AND WAVEFORMS

This appendix contains the simulation results for the test program with UART integration. In Section A, all of the pipeline registers and voters are tested to ensure they are capable of correcting SBUs. Similar to the methods proposed in Chapter V, operational verification of the pipeline registers and voters allows the test program to be viewed through the lens of a single member of the triplicated set of processors (processor A).

In Sections B – M of this appendix, the test program is verified using the inputs and outputs of each pipeline register and stage. Each section displays a single pipeline stage, pipeline register, or UART input and output signals. Within these sections, the inputs and outputs for that module are displayed consecutively from 0 – 336 ns in increments of 16 ns. These simulation results include the receipt of a byte of data at the UART receive interface that transitions to the ISR and the transmission of a data byte from memory through the UART transmit interface. The test bench used to initiate the simulation is included in Section N.
A. PIPELINE REGISTERS AND VOTING LOGIC

1. PC Register

2. IF/ID Register

3. ID/EX Register
4. EX/MEM Register
5. MEM/WB Register

B. PC SELECTOR MODULE
C. PC REGISTER
D. **IF STAGE**

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
<th>0ns</th>
<th>1ns</th>
<th>2ns</th>
<th>3ns</th>
<th>4ns</th>
<th>5ns</th>
<th>6ns</th>
<th>7ns</th>
<th>8ns</th>
<th>9ns</th>
<th>10ns</th>
<th>11ns</th>
<th>12ns</th>
<th>13ns</th>
<th>14ns</th>
<th>15ns</th>
<th>16ns</th>
<th>17ns</th>
</tr>
</thead>
<tbody>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>IF Stage</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input A</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Input B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Output</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

162
<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>P Stage</td>
<td></td>
</tr>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Output</td>
<td></td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>pc addr</td>
<td>0</td>
</tr>
<tr>
<td>oprd addr</td>
<td>0</td>
</tr>
<tr>
<td>Name</td>
<td>Value</td>
</tr>
<tr>
<td>---------------</td>
<td>----------------</td>
</tr>
<tr>
<td></td>
<td>128 ns</td>
</tr>
<tr>
<td></td>
<td>304</td>
</tr>
<tr>
<td></td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
<tr>
<td></td>
<td>0.0125P0</td>
</tr>
</tbody>
</table>
E. IF/ID REGISTER
<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>X-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Y-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Z-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>X-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Y-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Z-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>X-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Y-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Z-AX register</td>
<td>0</td>
</tr>
<tr>
<td>inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_a</td>
<td>0</td>
</tr>
<tr>
<td>freg_b</td>
<td>0</td>
</tr>
<tr>
<td>freg_c</td>
<td>0</td>
</tr>
<tr>
<td>instruction_a</td>
<td>0</td>
</tr>
<tr>
<td>instruction_b</td>
<td>0</td>
</tr>
<tr>
<td>instruction_c</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>pick_addr_c</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_a</td>
<td>0</td>
</tr>
<tr>
<td>jump_addr_b</td>
<td>0</td>
</tr>
<tr>
<td>Name</td>
<td>Value</td>
</tr>
<tr>
<td>--------</td>
<td>-------</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>Name</td>
<td>Value</td>
</tr>
<tr>
<td>------</td>
<td>-------</td>
</tr>
<tr>
<td>Input</td>
<td></td>
</tr>
<tr>
<td>Name</td>
<td>Value</td>
</tr>
<tr>
<td>---------------</td>
<td>-------</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
<tr>
<td>3#D/1D register</td>
<td>0</td>
</tr>
</tbody>
</table>
F. ID STAGE
### Table

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
</tr>
</tbody>
</table>

### Diagram

The diagram shows a series of stages with inputs and outputs. The specific values for inputs and outputs are not clearly visible due to the resolution and quality of the image.
<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Stage</td>
<td>2</td>
</tr>
<tr>
<td>Inputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_x</td>
<td>0</td>
</tr>
<tr>
<td>freg_y</td>
<td>0</td>
</tr>
<tr>
<td>freg_z</td>
<td>0</td>
</tr>
<tr>
<td>freg_t</td>
<td>0</td>
</tr>
<tr>
<td>freg_s</td>
<td>0</td>
</tr>
<tr>
<td>freg_d</td>
<td>0</td>
</tr>
<tr>
<td>time_data_x</td>
<td>0</td>
</tr>
<tr>
<td>time_data_y</td>
<td>0</td>
</tr>
<tr>
<td>time_data_z</td>
<td>0</td>
</tr>
<tr>
<td>Outputs</td>
<td>0</td>
</tr>
<tr>
<td>freg_x</td>
<td>0</td>
</tr>
<tr>
<td>freg_y</td>
<td>0</td>
</tr>
<tr>
<td>freg_z</td>
<td>0</td>
</tr>
<tr>
<td>freg_t</td>
<td>0</td>
</tr>
<tr>
<td>freg_s</td>
<td>0</td>
</tr>
<tr>
<td>freg_d</td>
<td>0</td>
</tr>
<tr>
<td>time_data_x</td>
<td>0</td>
</tr>
<tr>
<td>time_data_y</td>
<td>0</td>
</tr>
<tr>
<td>time_data_z</td>
<td>0</td>
</tr>
</tbody>
</table>

```
<table>
<thead>
<tr>
<th>Address</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x00000000</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000001</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000002</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000003</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000004</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000005</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000006</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000007</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000008</td>
<td>00000000</td>
</tr>
<tr>
<td>0x00000009</td>
<td>00000000</td>
</tr>
<tr>
<td>0x0000000A</td>
<td>00000000</td>
</tr>
<tr>
<td>0x0000000B</td>
<td>00000000</td>
</tr>
<tr>
<td>0x0000000C</td>
<td>00000000</td>
</tr>
<tr>
<td>0x0000000D</td>
<td>00000000</td>
</tr>
<tr>
<td>0x0000000E</td>
<td>00000000</td>
</tr>
<tr>
<td>0x0000000F</td>
<td>00000000</td>
</tr>
</tbody>
</table>
```
## G. ID/EX REGISTER

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>CL</td>
<td>0</td>
</tr>
<tr>
<td>Inputs</td>
<td>0</td>
</tr>
<tr>
<td>Outputs</td>
<td>0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>

### Diagram

![Diagram of ID/EX Register](image-url)
H. EX STAGE
## Outputs

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_addr</td>
<td>000000</td>
</tr>
<tr>
<td>ax_reg_data</td>
<td>000000</td>
</tr>
</tbody>
</table>

## Inputs

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
<tr>
<td>ax_in</td>
<td>000000</td>
</tr>
<tr>
<td>bx_in</td>
<td>000000</td>
</tr>
<tr>
<td>cx_in</td>
<td>000000</td>
</tr>
<tr>
<td>dx_in</td>
<td>000000</td>
</tr>
</tbody>
</table>
I. EX/MEM REGISTER
J. MEM STAGE
K. MEM/WB REGISTER
L. **WB STAGE**
<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>inputs</td>
<td></td>
</tr>
<tr>
<td>sel_a</td>
<td></td>
</tr>
<tr>
<td>input1</td>
<td>a[31:0]</td>
</tr>
<tr>
<td>input2</td>
<td>a[31:0]</td>
</tr>
<tr>
<td>outputs</td>
<td></td>
</tr>
<tr>
<td>data_out</td>
<td>a[31:0]</td>
</tr>
</tbody>
</table>

248
M. UART
N. TEST BENCH

`timescale 1ns / 1ps

module payload_processor_system_payload_processor_system_sch_tb();

// Inputs
reg p_clk;
reg reset;
reg tx_clk;
reg rx_clk;
reg tx_in;
reg rx_enable_a;
reg rx_enable_b;
reg rx_enable_c;

// Output
wire tx_out;
wire rx_empty_a;
wire rx_empty_b;
wire rx_empty_c;

// Bids

// Instantiate the UUT
payload_processor_system UUT (  
  .p_clk(p_clk),
  .reset(reset),
  .tx_clk(tx_clk),
  .rx_clk(rx_clk),
  .tx_in(tx_in),
  .rx_enable_a(rx_enable_a),
  .rx_enable_b(rx_enable_b),
  .rx_enable_c(rx_enable_c),
  .rx_empty_a(rx_empty_a),
  .rx_empty_b(rx_empty_b),
  .rx_empty_c(rx_empty_c)
);

// Initialize Inputs
initial begin
  $0 p_clk = 0;
  reset = 0;
  tx_clk = 0;
  rx_clk = 0;
  tx_in = 0;
  rx_enable_a = 0;
  rx_enable_b = 0;
  rx_enable_c = 0;
  reset = 1;
  $2 rx_in = 0;
  rx_enable_a = 1;
  rx_enable_b = 1;
  rx_enable_c = 1;
  reset = 0;
  $32 rx_in = 1;
  $32 rx_in = 0;
  $32 rx_in = 1;
  $32 rx_in = 0;
  $32 rx_in = 1;

(continued on next page)
#32 rx_in = 0;
#32 rx_in = 1;
#32 rx_in = 0;
#32 rx_in = 1;
end

initial
begin
#2 p_clk = 1;
tx_clk = 1;
rx_clk = 1;
forever
begin
#1 p_clk = ~p_clk;
tx_clk = ~tx_clk;
rx_clk = ~rx_clk;
end
end

initial
#336 done;
endmodule
## APPENDIX C. PAYLOAD PROCESSOR INSTRUCTION SET

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Format</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Add</strong></td>
<td>add rd, rs, rt</td>
<td>Add rs to rt and place the result in rd</td>
</tr>
<tr>
<td><strong>Subtract</strong></td>
<td>sub rd, rs, rt</td>
<td>Subtract rt from rs and place result in rd</td>
</tr>
<tr>
<td><strong>AND</strong></td>
<td>and rd, rs, rt</td>
<td>Logical AND of rs and rt, place the result in rd</td>
</tr>
<tr>
<td><strong>OR</strong></td>
<td>or rd, rs, rt</td>
<td>Logical OR of rs and rt, place the result in rd</td>
</tr>
<tr>
<td><strong>NOR</strong></td>
<td>nor rd, rs, rt</td>
<td>Logical NOR of rs and rt, place the result in rd</td>
</tr>
<tr>
<td><strong>XOR</strong></td>
<td>xor rd, rs, rt</td>
<td>Logical XOR of rs and rt, place the result in rd</td>
</tr>
<tr>
<td><strong>Shift Left Logical</strong></td>
<td>sll rd, rs, rt</td>
<td>Shift rs left by the number of bits expressed in shamt, place the result in rd.</td>
</tr>
<tr>
<td><strong>Shift Right Logical</strong></td>
<td>srl rd, rs, rt</td>
<td>Shift rs right by the number of bits expressed in shamt, place the result in rd.</td>
</tr>
<tr>
<td><strong>Shift Left Logical Var.</strong></td>
<td>sllv rd, rs, rt</td>
<td>Shift rs left by the number of bits expressed in rt, place the result in rd.</td>
</tr>
<tr>
<td><strong>Shift Right Logical Var.</strong></td>
<td>srlv rd, rs, rt</td>
<td>Shift rs right by the number of bits expressed in rt, place the result in rd.</td>
</tr>
<tr>
<td><strong>Set on Less than</strong></td>
<td>slt rd, rs, rt</td>
<td>Set register rd to 1 if rs is less than rt. 0 if rs is greater than or equal to rt</td>
</tr>
<tr>
<td><strong>Set on Less than Imm.</strong></td>
<td>slti rt, rs, imm</td>
<td>Set register rt to 1 if rs is less than imm. 0 if rs is greater than or equal to imm.</td>
</tr>
<tr>
<td><strong>AND Immediate</strong></td>
<td>andi rt, rs, imm</td>
<td>Logical AND of rs and zero-extended immediate, place result in rt</td>
</tr>
<tr>
<td><strong>OR Immediate</strong></td>
<td>ori rt, rs, imm</td>
<td>Logical OR of rs and zero-extended immediate, place result in rt</td>
</tr>
<tr>
<td><strong>Add Immediate</strong></td>
<td>addi rt, rs, imm</td>
<td>Add rs to imm and place the result in rt</td>
</tr>
<tr>
<td><strong>Branch on Equal</strong></td>
<td>beq rs, rt, offset</td>
<td>Branch the number of instructions specified by offset if rs equals rt</td>
</tr>
<tr>
<td>Instruction</td>
<td>Format</td>
<td>Description</td>
</tr>
<tr>
<td>-------------------------</td>
<td>--------------</td>
<td>--------------------------------------------------</td>
</tr>
<tr>
<td>Branch on Not Equal</td>
<td><code>bne rs, rt, offset</code></td>
<td>Branch the number of instructions specified by offset if rs does not equal rt</td>
</tr>
<tr>
<td>Branch on Less than Zero</td>
<td><code>blz rs, rt, offset</code></td>
<td>Branch the number of instructions specified by offset if rs is less than zero</td>
</tr>
<tr>
<td>Branch on Less than or Equal to</td>
<td><code>blez rs, rt, offset</code></td>
<td>Branch the number of instructions specified by offset if rs is less than or equal to zero</td>
</tr>
<tr>
<td>Branch on Greater than</td>
<td><code>bgz rs, rt, offset</code></td>
<td>Branch the number of instructions specified by offset if rs equals rt</td>
</tr>
<tr>
<td>Branch on Greater than or</td>
<td><code>bgez rs, rt, offset</code></td>
<td>Branch the number of instructions specified by offset if rs is greater than or equal to zero</td>
</tr>
<tr>
<td>Load Word</td>
<td><code>lw rt, address</code></td>
<td>Load the word at address into register rt.</td>
</tr>
<tr>
<td>Store Word</td>
<td><code>sw rt, address</code></td>
<td>Store the word in rt to address.</td>
</tr>
<tr>
<td>Jump</td>
<td><code>j target</code></td>
<td>Unconditionally jump to the instruction located at target</td>
</tr>
</tbody>
</table>
LIST OF REFERENCES


INITIAL DISTRIBUTION LIST

1. Defense Technical Information Center
   Ft. Belvoir, Virginia

2. Dudley Knox Library
   Naval Postgraduate School
   Monterey, California