SUPERCONDUCTIVE SIGNAL-PROCESSING CIRCUITS

University of California at Berkeley

Sponsored by
Ballistic Missile Defense Organization

APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.

The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the Ballistic Missile Defense Organization or the U.S. Government.

Rome Laboratory
Air Force Materiel Command
Griffiss Air Force Base, New York

94-32197
This report has been reviewed by the Rome Laboratory Public Affairs Office (PA) and is releasable to the National Technical Information Service (NTIS). At NTIS it will be releasable to the general public, including foreign nations.

RL-TR-94-140 has been reviewed and is approved for publication.

APPROVED:

ZACHARY O. WHITE
Project Engineer

FOR THE COMMANDER: Robert V. McGahan

ROBERT V. McGAHAN
Acting Director
Electromagnetics & Reliability Directorate

If your address has changed or if you wish to be removed from the Rome Laboratory mailing list, or if the addressee is no longer employed by your organization, please notify RL (ERAA) Hanscom AFB MA 01731. This will assist us in maintaining a current mailing list.

Do not return copies of this report unless contractual obligations or notices on a specific document require that it be returned.
SUPERCONDUCTIVE SIGNAL-PROCESSING CIRCUITS

Theodore Van Duzer

Contractor: University of California at Berkeley
Contract Number: F19628-90-K-0037
Effective Date of Contract: 15 August 1990
Contract Expiration Date: 15 October 1993
Short Title of Work: Superconducting Signal-Processing Circuits

Period of Work Covered: Aug 90 - Oct 93

Principal Investigator: Theodore Van Duzer
Phone: (510) 642-3306

RL Project Engineer: Zachary White
Phone: (617) 377-3191

Approved for public release; distribution unlimited.

This research was supported by the Ballistic Missile Defense Organization of the Department of Defense and was monitored by Zachary O. White, RL (ERAA), 31 Grenier Street, Hanscom AFB MA 01731-3010.
This work addresses new signal processing circuits using the special features of superconductivity. A novel flash-type analog-to-digital converter based on a comparator invented in the preceding contract period was demonstrated. The comparator was shown to be useful as a logic gate and an encoder was designed with it. A high-resolution delta-sigma analog-to-digital converter was devised with superconductive components in spite of the lack of an analog integrator in this technology. Positive theoretical results are being followed up experimentally. A simple flux-shuttle single-flux-quantum shift register was devised and several different readout schemes were studied. A six-bit-long version was successfully tested at 1 GHz. A decoder that takes in a five-bit word to select one of 32 output lines was completed. The design involved very tight limitations on current and power. The decoder was combined with a serial-to-parallel converter and operated at 2 GHz. A study of the appropriate architectures for various types of superconductive digital circuits was made and published. Another computer-aided-design tool for Josephson digital technology was developed: an inductance-extraction program.
TABLE OF CONTENTS

1. Introduction 3
2. Flash-Type Analog-to-Digital Converter 4
3. Delta-Sigma Analog-to-Digital Converter 9
4. Shift Register 12
5. Bit-Serial Decoder 14
6. Architecture Study 16
7. Computer-Aided-Design (CAD) tools 21
APPENDIX: Published Papers 28
# LIST OF FIGURES

<table>
<thead>
<tr>
<th>Fig.</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Circuit diagram for the comparator of the flash-type A/D converter.</td>
<td>5</td>
</tr>
<tr>
<td>2</td>
<td>Implementation of a complete three-bit A/D converter.</td>
<td>7</td>
</tr>
<tr>
<td>3</td>
<td>Truth table for three-bit encoder and low-speed measurements for (a) the first four input patterns shown in the truth table and (b) the last four input patterns shown in the truth table. From top to bottom are the three clocks, the three inputs, and the three outputs, respectively.</td>
<td>8</td>
</tr>
<tr>
<td>4</td>
<td>Structure of a semiconductor delta-sigma converter.</td>
<td>11</td>
</tr>
<tr>
<td>5</td>
<td>Modulator for a superconducting delta-sigma A/D converter employing a low-pass filter.</td>
<td>11</td>
</tr>
<tr>
<td>6</td>
<td>Basic form of a direct-injection flux-shuttle shift register, not showing reading circuits.</td>
<td>13</td>
</tr>
<tr>
<td>7</td>
<td>Oscilloscope photograph showing test data and table of margins for a 6-bit long flux-shuttle shift register with operation at 1 GHz. Top trace is readout of first bit and lower trace is that of fourth bit.</td>
<td>15</td>
</tr>
<tr>
<td>8</td>
<td>Basic logic structure of the parallel decoder comprises a three-input NOR gate feeding a two-input multi-output NOR gate.</td>
<td>17</td>
</tr>
<tr>
<td>9</td>
<td>Complete parallel decoder.</td>
<td>18</td>
</tr>
<tr>
<td>10</td>
<td>Layout of complete bit-serial decoder.</td>
<td>19</td>
</tr>
<tr>
<td>11</td>
<td>Example of use of INDEX. Layout of two-junction SQUID and schematic diagram showing the extracted components.</td>
<td>23</td>
</tr>
<tr>
<td>12</td>
<td>Josephson junctions fabricated in the UCB Microfabrication Laboratory.</td>
<td>25</td>
</tr>
</tbody>
</table>
1. Introduction

During the period of this contract, we have been able to demonstrate very high speed operation of several different superconducting signal processing circuits. The circuits include two different kinds of A/D converters, flash type for the highest speeds and a delta-sigma circuit for high resolution. Out of the work on a flash-type A/D converter grew a new logic family which has potential for very fast operation. As a part of the work on the flash A/D converter, a scheme was proposed for using CMOS circuits built into the substrate to calibrate the input stage of the A/D converter in order to increase its dynamic range. We have designed a tightly specified serial-to-parallel decoder which has been shown to operate correctly with 2 Gbit/s input data. Also, additional evaluation of a current-steering shift register was done and a flux-shuttle shift register was designed and tested.

The project has included architecture and software studies. We have devoted some of our effort to issues of appropriate signal processing architectures for different logic families and have evaluated some of the emerging logic families. We have developed additional computer-aided-design tools to add to our earlier work that produced the tools JSPICE and JSIM circuit simulators, which are widely used in this field. The new tools include a program which allows the determination of the dc superconducting state of a circuit and a program for extraction of inductances in superconducting integrated circuits.

Our niobium integrated circuit process has been developed further under this contract and we have been able to demonstrate
junction fabrication with small spreads of critical currents and successful fabrication of signal processing circuits. We have also introduced innovations in insulator formation for niobium circuit processing.

2. Flash-Type Analog-to-Digital Converter

During the period of this contract we extended the evaluation of an idea developed during the preceding period of Air Force support for a fully parallel flash-type A/D converter. The input comparator circuit consists of a one-junction SQUID, the inductor of which is the control line of a two-junction SQUID latching output stage. The initial idea and initial demonstration in both simulation and experiment were done by E.S. Fang and appeared in publications and in his Ph.D. dissertation [1,2]. Simulations predict that a 4-bit A/D converter with this comparator circuit could be sampled at 20 gigasamples/s and have an analog bandwidth of 10 GHz.

This work is being continued by another student, H. Luong, who has modified the design and optimized the parameters to increase the circuit margins. The present configuration of the comparator circuit is shown in Fig. 1. [3] A two-phase clock is used, with the first phase applied to the one-junction sampling SQUID and the second phase applied to the latching read-out two-junction SQUID. In order to achieve the desired bandwidth-resolution product, it is necessary to have a very short aperture time. In this design the small aperture time is achieved by adding a sharp pulse to the bias and signal applied to the input one-
Fig. 1 Circuit diagram for the comparator of the flash-type A/D converter.
junction SQUID. The pulse is derived from the phase-one clock which also is a bias. The junction $J_6$ was added to improve margins. The clock junction in the output stage acts as a regulator. The correct operation of the comparator stage has been verified both at low speed and with inputs up to 3 GHz.

Fang proposed using a modification of the comparator circuit as a logic gate for the encoder to take the thermometer code from the output of the converter stage and convert it to binary. An advantage of this kind of logic is that inverters are easily realized. Luong subsequently revised and simplified the design of the encoder. [3] A 3-bit encoder has been fabricated and shown to function correctly at 2 Gbits/s. Further work will combine the comparator and encoder as in Fig. 2 and will extend the size of the complete converter to four bits. Figure 3 shows the truth table and the results of the low-speed demonstration of the 3-bit encoder. The follow-on work is being conducted with support of the multi-agency University Research Initiative.

We evaluated the possibility of using Fang's comparator/logic gate in a flux-transfer configuration in which the junctions are nonhysteretic and hence do not latch into a voltage state. [4] A shift register with ±20% margins was simulated at 50 GHz. Other circuits simulated include a buffer, an XOR gate, an OR/AND gate, an inverting gate, and a fan-out gate. Because the junctions used were nonhysteretic, this logic family is potentially useful for high-temperature superconductors, for which only nonhysteretic junctions may be available.
Fig. 2. Implementation of a complete three-bit A/D converter.
Fig. 3  Truth table for three-bit encoder and low-speed measurements for (a) the first four input patterns shown in the truth table and (b) the last four input patterns shown in the truth table. From top to bottom are the three clocks, the three inputs, and the three outputs, respectively.
A further aspect of the A/D converter that was proposed by Fang is the use of CMOS circuits to do a self-calibration of the comparator stage. [1] Because of variations of parameters inherent in the fabrication process, it is difficult to extend beyond four bits of resolution. With five bits, for example, the steps in the digital staircase of references for 31 comparators each are about 3% of full scale. Variations, say, of critical currents of the junctions would be greater than 3% so the staircase would not be monotonic. The solution is to adjust the biases on the comparators to make up for the variations of the circuit parameters. A CMOS circuit was proposed that would measure the switching point for each comparator and adjust the bias to achieve the ideal value. Each comparator would be adjusted in turn upon initial excitation of the circuit. The CMOS circuit turns itself off when the calibration is completed and the A/D conversion involves only the Josephson components. Another PhD student is following up this suggestion with the support of the University Research Initiative.

3. Delta-Sigma Analog-to-Digital Converter

Some applications for A/D converters require high resolution and efforts have been directed toward the use of the high speed of Josephson electronics for this purpose. Several projects in superconductor technology have employed the counting architecture in which a series of pulses are generated by an input SQUID as the analog signal varies and the number of flux quanta in the SQUID change. A series of counting SQUIDs follow and count the pulses over a given interval. The result is a binary representation of
the average signal amplitude during that timing interval. P.H. Xiao in our group has taken a different approach to try to avoid some of the problems of the counting A/D converter; we are seeking to emulate the high-resolution delta-sigma A/D converter popular in semiconductor technology, for which the block diagram is shown in Fig. 4.

A key component of the delta-sigma circuit is an integrator, which requires an amplifier. No suitable amplifier exists in superconductive technology, but a low-pass filter was recognized as having the same frequency characteristics. The lack of an amplifier can be made up for by having a very sensitive comparator to use in the one-bit quantizer which follows the filter. Two kinds of simulations have been done to evaluate the performance. The result for an oversampling ratio of 128 is a signal-to-noise ratio of 70 dB, which corresponds to 11 bits of resolution. Xiao has devised a modulator stage (Fig. 5) which accepts the analog input signal and a clock and provides at its output a density-modulated train of single bits at 1 Gbit/s. [5] The functioning of the components of the modulator has been verified at 1 Gbit/s; subsequent testing will show operation of the entire modulator.

To complete the conversion, the modulator is followed by a decimation filter which suppresses the out-of-band high-frequency quantization noise, prevents the aliasing of the out-of-band signal into the passband, maintains the passband ripple to within specifications, and down-samples the output signal. These functions can be accomplished by a cascade of a linear phase sinc FIR filter and an IIR low-pass filter. Both of these filters can
Fig. 4 Structure of a semiconductor delta-sigma converter.

Fig. 5 Modulator for a superconducting delta-sigma A/D converter employing a low-pass filter.
be realized in superconductor technology but only the FIR filter is being planned for now. Since the output of the FIR filter is at a low data rate, external test equipment will be used to replace the IIR filter's function. This continuation of the project will be supported by the University Research Initiative.

4. Shift Register

The previous Air Force contract supported a study of a dc-powered shift register in which current steering between the legs of a superconducting loop was used to represent "0"s and "1"s. [6] Early work during this contract period continued the evaluation. [7] It became clear that, although the dc powering is an advantage, there were some serious drawbacks, including size and complexity. We decided to look further at an early suggestion for a shift register, the flux shuttle, which comprises a parallel connection of two-junction SQUIDs with three-phase powering to shuttle flux quanta (representing stored bits) along with the clock. [8] This shift register has power dissipation only when transferring data (except for the current sources) and is compatible with our A/D converter design, which will allow future combination of the two into a signal-acquisition subsystem. In the structure chosen (Fig. 6), the three-phase power is directly coupled rather than magnetically coupled because the former has larger margins. This type of circuit uses nonhysteretic junctions and is therefore compatible with high-temperature superconductors.

The structure was thoroughly analyzed and much emphasis was devoted to the method of readout. For some applications, such as
Fig. 6 Basic form of the direct-injection flux-shuttle shift register, not showing reading circuits.
demultiplexing or in filters, correlators, or convolvers, it is necessary to read at every stage so the effect of the readout on margins and speed is important. Several different readout circuits were studied and evaluated experimentally. Shift registers of various lengths were fabricated and tested. The test data and margins for a 6-bit version are shown in Fig. 7 for clocking at 1 GHz. The limitation of speed to 1 GHz was due to the test facilities. Work on the shift register to extend its length, increase testing speed, and to combine it with an A/D converter will be carried out under the follow-on University Research Initiative.

5. Bit-Serial Decoder

One of the proposed applications of superconductive electronics is a crossbar switch which would interconnect 128 semiconductor processors with an equal number of memories. When a processor would attempt to send data to memory, it would send a 2 Gbit/s train of bits containing the address and the data. In order to make the desired connection, decoders are needed. The chosen architecture employs a set of four decoders, each capable of selecting one of 32 lines with a 5-bit address, so the set of four decoders can choose one of 128 lines. The entire address is in the first seven bits received from the processor. There are two parts of the decoder. The first is to convert the serial address bits into parallel and the second is to take five binary coded bits and use them to chose one of 32 lines. This project by D. A. Feld involved mainly the second part but some work was also done
Margins of the 6-Bit Shift Register with Series Junction Read-Out at 1 GHz

<table>
<thead>
<tr>
<th>Parameter</th>
<th>nom</th>
<th>margin (dB)</th>
<th>margin (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>$I_{dc}$</td>
<td>350 µA</td>
<td>N/A</td>
<td>+29 / -31</td>
</tr>
<tr>
<td>$V_{Write}$</td>
<td>300 mV</td>
<td>N/A</td>
<td>+53 / &gt;-17</td>
</tr>
<tr>
<td>$V_{Read Bias}$</td>
<td>41 mV</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>$\Phi_1$</td>
<td>138 mV</td>
<td>+/- 2</td>
<td>+26 / -21</td>
</tr>
<tr>
<td>$\Phi_2$</td>
<td>112 mV</td>
<td>+/- 2.5</td>
<td>+34 / -25</td>
</tr>
<tr>
<td>$\Phi_3$</td>
<td>127 mV</td>
<td>+/- 2.5</td>
<td>+34 / -25</td>
</tr>
</tbody>
</table>

Fig. 7 Oscilloscope photograph showing test data and table of margins for 6-bit long flux-shuttle shift register with operation at 1 GHz. Top trace is readout of first bit and lower trace is that of the fourth bit.
on the first and we did the final demonstration of the combination.

The emphasis in Feld's work was to invent circuits that can meet a tight set of constraints for the parallel part of the decoder and to demonstrate that one of 32 lines can be chosen at a 2 Gbit/s rate. [9,10] For system considerations such as the need to limit total crossbar current and power and to fit the entire crossbar switch on a 1-cm chip, the parallel decoder current was limited to 6 mA and the power to 250 pW. The size of the circuit was required to be no more than 0.4 mm by 1.5 mm. It was also required that the margins be large. In order to meet the limitation on current and still keep the gate currents high enough to avoid noise switching, it was necessary to devise multi-input logic circuits. The basic logic structure is shown in Fig. 8; it is seen that the inputs are the five bits designated A-E. The entire parallel decoder contains 32 of these units to decode the 32 combinations of A-E, and is shown in Fig. 9. All of the specifications were met.

We worked with Hypres, Inc. to eliminate some problems from the serial-to-parallel converter and to combine the two parts of the decoder. The final structure is shown in Fig. 10; it contains 144 junctions. Feld performed tests of the decoder and showed that it functioned completely and correctly with data input of 2 Gbits/s.

6. Architecture Study

Very few workers in Josephson digital circuit technology are knowledgeable in computer and signal processing architecture. We
Fig. 8 Basic logic structure of the parallel decoder comprises a three-input NOR gate feeding a two-input multi-output NOR gate.
Fig. 9  Complete parallel decoder.
Fig. 10 Layout of complete bit-serial decoder.
took advantage of expertise (J. Fleischman) within our group to do a study of architectures appropriate to the various logic families. [8,11] Some of the general issues addressed included statistics on the use of various kinds of instructions in general purpose computers, speed improvement by use of cache memories, pipelining and some of the attendant problems such as the formation of "bubbles" of inactivity in the pipelines, parallelism, and synchronous vs. asynchronous operation.

There are three main categories of superconductive logic gate as defined by the type of clocking, and these define special issues in the architecture. (1) The quantum flux parametron (QFP) and the new Fang (Sec. 2) logic are fully synchronous at the gate level. One logic operation is done on each clock cycle which must be a high frequency for high-speed logic. The system is pipelined at gate level. The deep pipeline and high clock speed make full utilization difficult in most digital systems. (2) Most voltage-state logic allows ripple-through on one clock phase, that is, locally asynchronous operation, with the data being picked up in another set of gates during the next phase of clock or alternatively, held in a latch during a transition of a single phase clock. The ripple-through logic capability makes possible low latency but does not provide high throughput. (3) The rapid single flux quantum (RSFQ) logic can be fully asynchronous, with timing set by internally generated pulses. Asynchronous circuits implemented in RSFQ logic potentially have very high throughput. A complete computing system using RSFQ would only require synchronization at block level.
General purpose computing is unlikely in the near future for systems composed entirely of superconductive components due to RAM and cache requirements, though these may be alleviated by the proposed hybrid Josephson-CMOS memory structures.

In digital signal processing, random-access memory can be replaced by more dedicated memory structures. In addition, a data flow architecture with high levels of pipelining can be selected to maximize throughput. With these choices, digital signal processing is the most likely prospect for implementation with superconductive digital electronics.

7. Computer-Aided-Design (CAD) Tools

The nonlinear behavior of Josephson circuits demands computer tools for their simulation; furthermore, useful digital circuits such as filters, multipliers, etc. involve large numbers of logic gates (typically over 1000) and cannot be optimally designed by hand. In work under earlier Air Force support, we developed simulation tools that can be used to evaluate the dynamic performance of circuits involving small numbers of gates. We first modified the SPICE simulator to include a model for the Josephson junction and called the modified program JSPICE. In Air Force sponsored work just preceding the contract period of this report, we devised a simulator with a computational algorithm specialized to Josephson peculiarities, called JSIM. It can do simulations about an order of magnitude faster than JSPICE and is widely used.

Two new programs were developed under this contract; one provides a way of finding operating points and dc transfer
characteristic curves of Josephson circuits in the superconducting state [12] and the other is for extraction of inductance parameters from a circuit layout [13].

For the program to find operating points, E. S. Fang used a mixed-mode method; this combines source stepping and time-domain calculations. Josephson circuit equations are often multivalued, which implies the existence of multiple solutions. When the paths taken by the independent sources are specified, only one of the many possible solutions can be physical. The mixed mode algorithm follows the paths of the independent sources, detects ill-conditioned points, and converges to stable points on the characteristic curves of the simulated circuit. The algorithm was implemented and case studies were done. The method and techniques are suitable for implementing in a general circuit simulator.

In the inductance extraction program (INDEX) by P. H. Xiao, inductances are calculated on the basis of two-dimensional modeling of sections of the layout. The inductances are modeled by simple analytic expressions to keep the computation time within acceptable limits. INDEX is designed to work with the MAGIC layout system. MAGIC has interfaces with intermediate layout formats such as cif and calma and has a corner-stitch data structure that makes the extraction simple. In MAGIC, polygons are represented by rectangles called tiles. Each tile has four pointers to its four neighbors, which makes neighbor-related operations easy to implement. A two-junction SQUID and its extracted representation is shown in Fig. 11. The main aim of the circuit extraction is to find and evaluate the parasitic inductances. Several improvements
Fig. 11 Example of use of INDEX. Layout of two-junction SQUID and the schematic diagram showing the extracted components.
are under consideration. Many tiles are too short in the current flow direction for two-dimensional modeling to be accurate, so we will consider ways of implementing three-dimensional modeling. Another important help for the designer would be an automatic generation of the schematic including all parasitic components rather than the presently used netlist.

The more general problem of designing large circuits and their layouts starting from logic descriptions, as is done for semiconductor circuits, will be done in follow-on work under the University Research Initiative.

8. Niobium Integrated-Circuit Process

D. F. Hebert has developed a process in our Microfabrication Facility for fabricating niobium superconductive integrated circuits with good parameter control. The process is capable of producing excellent quality Nb/AlOx/Nb Josephson junctions as small as 1.6 \( \mu \text{m} \times 1.6 \mu \text{m} \) with critical current densities as high as 3600 A/cm\(^2\). Some examples of I-V characteristics are shown in Fig. 12. The process features molybdenum resistors in which sheet resistance is controlled to within a few percent of design value at cryogenic temperature by use of an in-situ resistance measurement during deposition.

The innovative use of VLSI quality oxides is being incorporated to make possible high-density circuits. We have replaced the previously used SiO by SiO\(_2\) and pioneered the use of PECVD oxide for layers of insulator below the junctions. Insulators above the junctions should be deposited at a lower
Fig. 12 Josephson junctions fabricated in the UCB Microfabrication Laboratory.
temperature than PECVD since the Nb/AlOx/Nb junctions are known to degrade at temperatures above 150 °C. We have implemented a low temperature (90 °C) reactively sputtered silicon process to form SiO₂ for this purpose. We are evaluating LPCVD oxide which is used in semiconductor technology to form excellent insulating layers. In the follow-on work under the University Research Initiative, the development of improved insulators will continue, as will other developments of the niobium process.

REFERENCES


A MULTI-GIGAHERTZ JOSEPHSON FLASH A/D CONVERTER WITH A PIPELINED ENCODER USING LARGE-DYNAMIC-RANGE CURRENT-LATCH COMPARATORS

Emerson S. Fang, David Hebert and Theodore Van Duzer
Department of Electrical Engineering and Computer Science
University of California at Berkeley
Berkeley, California 94720

Abstract

We present the design of a multi-gigahertz 4-bit A/D converter with a pipelined encoder. A wideband and large dynamic range comparator serves as basic building block for both the quantizer and the encoder, which simplifies the design. We will show the design of the comparator and the building of the quantizer and the encoder with the comparator circuits. Simulation results are presented, and the possibility of adapting the design to high-\(T_c\) circuit is also discussed.

Introduction

A single-stage flash-type Josephson A/D converter consists of a quantizer and an encoder as shown in Fig. 1. The quantizer is a string of \(2^n-1\) comparators in parallel for an \(n\)-bit converter. It generates a thermometer code of the analog input. The encoder converts the thermometer code to a binary output. The speed of an A/D converter is expressed by two factors, clock or conversion rate and bandwidth. The bandwidth of an A/D converter is defined as the maximum frequency of a sinusoidal analog signal it can convert without aliasing. The maximum clock rate is determined by the switching speed of the comparators in the quantizer and the logic gates in the encoder. In the absence of a sample-and-hold, the maximum bandwidth is determined by the aperture time of the comparators in the quantizer, where the aperture time \(t_a \leq 1/(2\pi f_B)[1,2]\). The reader can refer to reference 1 and 2 for a more detailed discussion of speed-limiting factors in flash-type Josephson A/D converters.

The Basic Current-Latch Comparator

A schematic diagram of the current-latch comparator is shown in Fig. 2. It is an improved version of the circuit we reported earlier [3]. Figures 3 and 4 are the characteristic curves of the one-junction SQUID \(S_2\) and the threshold curve of the symmetric two-junction SQUID \(S_3\), respectively. The one-junction SQUID \(S_2\) is a pulser that generates a positive pulse on the rising edge of clock 1 and a negative pulse on the falling edge. The one-junction SQUID \(S_2\) (comprising \(J_2, L_2, \) and \(L'_2\)) is the sampling SQUID operating in a current-latching mode. Figure 3 illustrates the basic operation of the current latch. The dc bias current \(I_{bias}\) establishes the threshold input current for the latch. If \(I_n + I_{mod}\) is above the threshold upon arrival of \(I_p\), the operating point of sampling SQUID \(S_2\) will jump one step on the

---

Manuscript received September 24, 1990.

This work was sponsored by AF Contract 19628-86-K-0033 and F19628-90-K-0037 with RAD funded by SDIO IST.

© 1991 IEEE
opposite will happen if before in Fig. 4. On the rising edge of clock 2, SQUID 2 will switch before $J_2$ and prevent $J_1$ from going into the voltage state. The opposite will happen if $I_m$ is less than the threshold current. The resulting current would be given by the negative peak of $I_{peak}$.

Fig. 4. Threshold curve of the read SQUID $S_2$. $I_{high}$ and $I_{low}$ are the high and low current levels in $L_1$ respectively.

There are several advantages of this design. First, the aperture time of the comparator is determined by the pulse width of the pulser SQUID, which can be a few picoseconds or less. This makes an achievable bandwidth in the multi-gigahertz range for a 4-bit converter. Second, the symmetric read SQUID $S_2$ provides isolation for the sampling SQUID $S_3$ during the aperture time. Any signal or noise fed back from the output will split equally to the two branches of $S_1$, but the couplings back to the sampling $SQUID S_1$ cancel each other, which makes the comparator unidirectional, to first order. This improves the sensitivity of the comparator. Finally, the low impedance of the input node makes biasing and superposition of signals relatively easy. And it is also a desirable load for an analog signal current source, to achieve of minimum signal attenuation, as can be verified by using the Norton equivalent for the signal source.

Dynamic Range of the Comparator

In our previous design [3], the positive pulse $I_p$ is used for sampling, and the negative pulse $I_n$ is used for resetting. Resetting will occur only if $I_p + I_n$ is greater than $I_{high} - I_{low}$ and the range of the input signal $I_{in}$ is less than $I_p + I_n$. The current pulse amplitudes $I_p$ and $I_n$ are limited since they are generated by the one-junction SQUID $S_1$. The dynamic range of the input signal is therefore limited by the pulse amplitudes. On the other hand, for a n-bit single-stage flash A/D converter, the comparators need a dynamic range of $2^n I_{LSB}$, where $I_{LSB}$ is the current corresponding to one least significant bit. In our previous design, the comparator has sufficient dynamic range for a 4-bit converter if $I_{LSB}$ is less than 10 μA, and the junction critical current density is above 2500 A/cm². A larger dynamic range is desirable. With the modulating signal, which can be sinusoidal and is at the clock frequency, the dynamic range of the analog signal is then limited by $Φ/2(I_2 + L_2)$ and the amplitude of the modulating signal. This makes a many-fold improvement in the dynamic range over the previous design. The additional requirement is that the positive pulse should arrive near the peak of the modulating signal, if a sinusoid is used. The required timing of the pulse can be achieved with a delay line, and by adjusting the bias to the one-junction SQUID pulser $S_1$.

The Three-Phase Pipelined Encoder

In a pipelined encoder, the encoding function is done in a series of stages. If we have an encoder for a 2-bit converter, a 3-bit encoder can easily be constructed in two stages using two 2-bit encoders and some additional logic as shown in Fig. 5. A 4-bit encoder can be built from two 3-bit encoders and an additional stage of logic functions. The extension is identical to that shown in Fig. 5.

The circuit shown in Fig. 6 is very similar to the comparator, except that the positions of junction $J_3$ and one-junction SQUID $S_3$ are interchanged; hence, the output is inverted. The bias current can be adjusted to give two kinds of threshold current $I_{thresh}$. If we adjust $I_{thresh}$ so that $I_1 < I_{thresh} < I_2$, a NAND gate will result, where $I_1$ and $I_2$ are the logic "1" current level for A and B inputs, respectively. On the other hand, we can adjust $I_{thresh}$ to be less than either $I_1$ or $I_2$: then a NOR gate results. Changing $J_2$ and $S_1$ back to the position in Fig. 2, we can form AND and OR gates. The basic logic gates that can be implemented with the comparator design are NAND, NOR, AND and OR.

The block diagram of the 2-bit encoder along with clock phases is shown in Fig. 7. All clock signals are assumed to be sinusoidal and at the same frequency. Figure 7 also shows the gate-level implementation of the 2-bit encoder. To build a 4-bit encoder as shown in Fig. 5, we also need 2-input multiplexers. Figure 8 shows a design for the multiplexer. The 2-bit encoder will have four 2-bit encoders, seven 2-input multiplexers and various buffers and inverters, and a pipe latency of 5 1/3 clock cycles.

Fig. 5. A pipelined 3-bit encoder.

Fig. 6. Basic NAND or NOR gate ($R_{L1} + R_A = R_{L2} + R_B = 300$).

Fig. 7. A 2-bit encoder and its gate-level implementation.
Design and Simulation

The schematic of a complete design of the comparator is shown in Fig. 2. Clock 1 and Clock 2 are sinusoidal and Clock 2 has a phase lag of 120°. Junction $J_{eq}$ is a wave-shaping junction. It is not essential for correct operation of the circuit; however, inclusion of the wave-shaping junction improves the tolerance on phase-lag error between Clock 1 and Clock 2. The LC product of the coupling SQUID $S_1$ should be about $\Phi_0$ for the best operating margin. The loop inductance of the read SQUID $S_2$ should be about half of $L_2$ in SQUID $S_2$ to get maximum sensitivity. The design in Fig. 2 is for a junction critical current density of 600 A/cm$^2$. The clock rate for this circuit can reach 2 GHz in simulation. At this speed, the delay line is not needed. The adjustment in the bias current $I_{bias}$ for the SQUID pulser $S_1$ is sufficient to give an effective pulse height, as defined in Fig. 3, of more than 20 μA.

As was pointed out at the beginning, the speed of an A/D converter is expressed by clock rate and bandwidth. Bandwidth is defined as the maximum frequency of a sinusoidal signal that the A/D converter can convert without aliasing. The bandwidth is limited by the sampling theorem to be one half of the clock rate; but the actual bandwidth can be even lower due to nonidealities in the circuit. The limit on clock rate for the comparator is attributed to the punchthrough effect, and the limit on bandwidth is the result of the finite pulse width from the pulser. At low junction current density, the punchthrough bandwidth is the result of the finite pulse width from the pulser. For this design, the analog current value which reset cannot occur is 200 μA. In the A/D converter application, the comparator can take an input peak-to-peak sinusoid up to 400 μA. This is because the sampling theorem limits the bandwidth to half of the sampling rate and between sample and reset, which is half of the clock period, the input sinusoid at the band-limiting frequency cannot slow down more than half of its full range. One point should be noted: when the input current is above 340 μA, the sampling SQUID $S_1$ will jump two steps on its characteristic curve. It is not a problem as long as the operating point jumps down at least one step during reset, which will happen as long as the analog current is less than 500 μA. If for any reason, the input current range must exceed 400 μA, the A/D converter will still operate correctly if the signal bandwidth is limited to substantially below the Nyquist rate, or a current limiter similar to the one proposed by Petersen [4] is used in front of the least significant comparator in the quantizer.

The RMS noise current at the input node is estimated from white noise analysis to be less than 3 μA. This gives a comparator dynamic range of 133 dB, corresponding to 6 bits. None of the present processes can achieve a junction critical current uniformity better than 1 percent, which would be needed to allow a quantization step of 3 μA. For the present design, a quantization step of 20 μA is used.

Simulations of the basic circuits are performed with JSIM [5,6]. Figure 9 shows the simulation result of a comparator clocked at 2 GHz with a 1 GHz sinusoidal input. At the time of the first sampling pulse, the input exceeds the comparator threshold current of 20 μA. During the second sampling pulse, the input is less than 20 μA. The simulation result indicates correct operation of the comparator. Figure 10 shows the simulation result for an Exclusive-OR gate constructed from 3 NAND gates, which were discussed previously. From the simulation result, we can verify that the logic function performed by the circuit is an Exclusive-OR.

Test Results

Initial low-speed test runs were done to verify the functionality of the design and to extract design parameters. A process run without resistors was made. In this run, the basic comparator was laid out without junction $J_0$, the pulser SQUID $S_1$, and the wave-shaping junction $J_{eq}$. The layout is shown in Fig. 11. Damping for the sampling SQUID was provided externally. The clock signal to the read SQUID is at 800 Hz, and the input to the sampling SQUID is at 100 Hz. The output is shown in Fig. 12. In each photo, the bottom waveform is the input signal and the top waveform is the voltage output of the read SQUID. Ground level shift is visible in the photos; this does not affect the circuit since it is due to finite resistance in the sample holder leads. The result of the test verifies correct operation of the comparator.

Scaling to Higher Junction Current Density

For a process with higher junction current density $J_0$, the design essentially remains unchanged, except for the damping resistors. The damping resistance for a SQUID is proportional to $V_{DC}$, where $L$ is the loop inductance and $C$, the junction capacitance. The junction capacitance is inversely proportional to $J_0$. 

![Fig. 8. Implementation of a 2-input multiplexer.](image)

![Fig. 9. (a) Input and pulse signals to the comparator, (b) simulation result of output current in the 10 Ω resistive load.](image)

![Fig. 10. Simulation result of output voltage across the 10 Ω resistive load of an Exclusive-OR gate corresponding to inputs of "00", "01", "10" and "11". The logic "1" input current is at 40 μA.](image)

![Fig. 11. Dampening for the sampling SQUID was provided externally. The clock signal to the read SQUID is at 800 Hz, and the input to the sampling SQUID is at 100 Hz.](image)
Fig. 11. Layout of the sampling SQUID and the read SQUID.

Fig. 12. (a) Waveform of a logic “1” input (bottom) and the corresponding comparator output (top). (b) waveform of a logic “0” input (bottom) and the corresponding output (top).

Therefore, the damping resistance should be increased in proportion to $\sqrt{J_s}$ for a given $L$. Figure 13 shows the same simulation as Fig. 9, except with $J_s$ at 2400 A/cm²; the clock rate is 8 GHz and the input is at 4 GHz. At this junction current density, the punchthrough and the finite pulse width have about equal contribution to speed limit. Below this current density, the bandwidth of the A/D converter is 1/2 of the clock rate. For higher current density, the bandwidth is determined by the pulse width of the pulser.

Adapting to High-$T_c$ Process

It is expected that junctions made with the higher temperature oxide superconductors for sometime into the future will be nonhysteretic. This can pose a significant problem for circuits that require latching operation of the junctions. The current-latching operation of the comparator does not require a hysteretic junction; however, the readout circuit requires some modification. To keep the same design configuration as in Fig. 2, extra capacitance can be added to junction $J_2$ and SQUID $S_2$ to make them hysteretic because the functions of junction $J_1$ and SQUID $S_1$ in Fig. 2 cannot be realized with nonhysteretic junctions.

Adding a lot of capacitance to make nonhysteretic junction hysteretic can significantly slow down an A/D converter. A modification that requires junctions with much less hysteresis can be achieved by removing $J_3$ in Figure 2. Extra shunt capacitance may still be needed in $S_3$ to provide enough current-drive capability. The required current drive is less than 100 μA to a resistive load. Since the junction $J_1$ in Fig. 2 is not present, the amplitude of Clock 2, which now serves as a clocked bias, must be controlled very precisely. There is an alternative to the clocked bias. The junctions in SQUID $S_3$ are not hysteretic provided if they are not shunted with large capacitance; then they are self-resetting, and Clock 2 can be changed to a dc bias.

Conclusion

We have shown the design of a large-dynamic-range comparator and the design of a complete 4-bit A/D converter with pipelined encoder using the basic comparator circuit as a building block. Simulation results show multi-gigahertz operation of the A/D converter. The initial low-speed test results have confirmed the functionality of the comparator. The current-characteristics of the comparator allows adaptation of the circuit to high-$T_c$ superconductors with some modifications.

References


FULLY PARALLEL SUPERCONDUCTING ANALOG-TO-DIGITAL CONVERTER

Howard Luong, David Hebert, and Theodore Van Duzer

Department of Electrical Engineering and Computer Sciences
Electronics Research Laboratory
University of California, Berkeley
Berkeley, CA 94720

Abstract—This paper presents measurements that follow up on Fang's design of a three-bit wideband analog-to-digital converter reported earlier [1]. The original design has been modified, and some circuit parameters have been changed to optimize the margins. Based on this modified design, we have fabricated and been able to demonstrate the functionality not only of simple logic gates, including inverters, AND, OR, NAND, NOR, and XOR, but also of much more complicated combinations, including a complete two-bit analog-to-digital converter and a complete three-bit binary encoder. After a brief description of the design and modifications, low-speed tests of these circuits will be presented and discussed.

I. INTRODUCTION

In Josephson technology, the periodic threshold characteristics of two-junction SQUIDs allow a unique way to implement an N-bit flash-type analog-to-digital converter (ADC) with only N comparators [2, 3, 4]. However, this type of converter suffers from limited bandwidth due to the dynamics of SQUID loops. To achieve wider bandwidth, the conventional flash-type architecture, in which 2^N-1 comparators are used, has been attempted [1, 5]. Fang has reported his design of such a converter, which employs a wideband and large-dynamic-range current-latch comparator as the building block for both the quantizer and the binary encoder [1]. Following up on his design, we have made some design modifications and have changed some circuit parameters to maximize the circuit margins. A two-bit ADC and a three-bit binary encoder based on the modified design have been fabricated, and their functionalities have been successfully verified. In this paper, we will review the design, describe the modifications, and present the experimental results.

II. CIRCUIT DESCRIPTION AND PERFORMANCE

A. Design Overview

As in the conventional flash-type architecture, this design requires a quantizer to sample and assign each sampled analog input to one of the possible output levels and a binary encoder to convert the output of the quantizer from a thermometer code to a useful binary representation. In order to achieve an N-bit resolution, we use a bank of 2^N-1 identical comparators to realize the quantizer and pipelined logic gates to form the encoder. A unique and advantageous feature of this design is that the same comparator circuit used for the quantizer can readily be reconfigured to implement all the logic gates needed for the binary encoder.

Comparator: Shown on Fig. 1 is the schematic diagram of the comparator building block. A hysteretic one-junction SQUID (composed of Jp, Lp, and R) is used as a comparator to sample the analog input. A two-junction SQUID (J2 and J3) in series with a single junction (Jp) functions not only as a readout device but also as a buffer isolating the output from the input. To minimize the aperture time and to widen the bandwidth of the comparator, another one-junction SQUID (Jp, R, and Rp) acting as a pulser is connected to the input. Finally, to reduce the sensitivity of the circuit to the amplitude of the second clock CLK2, the readout SQUID is biased by a clock junction (J2). As will be discussed later, junction Jp is included in the modified version to increase the circuit margins.

The bias current Ibias, together with the critical current of the junction J1 and the pulser output Ip, sets the threshold level for the comparator. If the net input current is less than this threshold level, no current is transferred to the inductors L1 and L2. When the second-phase clock CLK2 rises, junction J2, which has smaller critical current than that of the two-junction SQUID, switches to the voltage state first and thus prevents the two-junction SQUID from switching. As a result, the output is low. On the other hand, if the net input current is larger than the

Fig. 1 Circuit diagram for the comparator. Junction Jp is included in the modified version.
threshold level, current is transferred to $L_1$ and $L_2$. As the control current for the two-junction SQUID, it reduces the SQUID critical current below that of the single junction $J_4$. Consequently, when the clock CLK2 rises, the SQUID switches to the voltage state, and the output goes high.

**Logic Gates:** The comparator configuration described above can be used to implement logic gates as well. Replacing $I_{c(J_3)}$ with another input current, and setting the bias so that the threshold level is larger than either input but smaller than their sum, an AND gate is obtained. Likewise, if the bias is adjusted so that the threshold level is smaller than either input, an OR gate is achieved. Inversion functions, including inverters, NAND, and NOR, can be easily obtained by exchanging the positions of the single junction $J_5$ and the two-junction SQUID.

**B. Design Modifications**

**Comparator:** Even though the readout circuit is very insensitive to the clock bias CLK2, the comparator designed by Fang [1] suffers from small margins, especially in critical current of the junctions. The main reason is that the threshold level of the comparator is directly dependent on the critical current of the junction $J_1$ and the clock amplitude CLK1. Any variation in either of these can reduce the margins significantly.

To improve the margins, another single junction $J_6$ is added in series with the sampling junction $J_1$, as can be seen in Fig. 1. Effectively, this addition creates a "race" between the two junctions $J_1$ and $J_6$ just like that between the single junction $J_4$ and the readout SQUID. As long as the bias is in an appropriate range, one and only one junction, whichever has its critical current exceeded first, will switch to voltage state.

The modified version of the comparator was simulated extensively with JSIM [6] and the circuit parameters were changed to maximize the margins. The final circuit parameters, with which a margin of $\pm 37\%$ for the junction critical current has been achieved, are listed in Table 1. The original parameters are also included for purpose of comparison.

**Encoder:** Fang suggested using different combinations of NAND, NOR, and OR gates to implement two-bit encoders, and then using these two-bit encoders together with MUXes to construct a three-bit binary encoder for the converter [1]. However, since the input to the encoder is in thermometer code, the design can be much simplified. Taking advantage of the unique and special pattern of such a thermometer-coded input, we have modified the architecture and have been able to implement a complete three-bit binary encoder using only buffers and three-input XOR gates.

**C. Circuit Implementation**

Fig. 2 shows the gate-level implementation of a two-bit encoder, which basically consists of a two-stage buffer and a three-input XOR gate. In Fig. 3 is the block diagram of a complete three-bit analog-to-digital converter, including a three-bit quantizer, a buffer-and-inverter stage, and a three-bit binary encoder. The comparators in the quantizer are modified version shown in Fig. 1. The single-stage buffers and inverters are just AND and NAND gates reconfigured from the same comparator circuit with the two inputs connected together. Finally, the three-bit binary encoder is realized with the two-stage buffers and three-input XOR gates that are used for a two-bit encoder.

As illustrated in Fig. 2, the three-input XOR gates are actually "quasi" in the sense that they function correctly only when the inputs are thermometer-coded. However, it requires only three two-input NAND gates to implement this quasi-XOR gate.
gate instead of twelve to implement a pipelined standard three-input XOR gate. By using these quasi-XOR gates, the circuit complexity is greatly reduced, especially in terms of the junction count. The circuit is further simplified by using only NAND gates to implement all the logic blocks. The whole design could be equivalently designed with only NOR gates.

D. Low-Speed Performance

The complete three-bit fully parallel analog-to-digital converter shown in Fig. 3 has been designed and fabricated for low-speed measurements. A chip photograph is shown in Fig. 4, mapping one-to-one every block shown in Fig. 3. The total number of junctions used in the quantizer, the encoder, and the whole ADC is 50, 200, and 320, respectively.

Functionality of the circuit described in Fig. 1 both as a comparator and as basic logic gates has been successfully demonstrated. As expected, all single-stage logic gates, including inverters, AND, OR, NAND, and NOR, work with a margin as large as ±30%. More complicated logic gates that require a cascade of many such simple gates, such as NAND driving an inverter, two-stage buffers, three-input quasi-XOR gates, etc., have also been tested and verified to function correctly even though the margins become somewhat lower than that of single gates.

We have also demonstrated functionality tests on a two-bit quantizer and a two-bit encoder separately. The experimental results of the quantizer are shown in Fig. 5. The first two traces are the two clocks indicated in Fig. 1. The next three traces are three inputs which were added to create a rising step analog signal. With the threshold levels of the three comparators set at 100 µA apart, this choice of inputs covers all the possible combinations. The outputs, shown as the last three traces in the figure, are in the correct thermometer code.

We have also succeeded in verifying the correct operation of the three-bit thermometer-to-binary encoder. Figures 6a and 6b show the outputs of the encoder corresponding to all possible combinations of the inputs, as illustrated in a truth table (Table 2). In each of the figures, the first three traces are the three-phase clock signals, and the last three traces are the three outputs, D2, D1, and D0, respectively. The output in Fig. 6a are obtained for the first four patterns in Table 2; the three lowest-level inputs (I3, I1, and I0) are shown as the middle three traces and the other inputs (I6 - I4) are all low. The outputs in Fig. 6b are obtained for the last four patterns in Table 2; the three highest-level inputs (I6, I5, and I4) are shown.
Table 2: Truth table for a three-bit encoder

<table>
<thead>
<tr>
<th>( I_6 )</th>
<th>( I_5 )</th>
<th>( I_4 )</th>
<th>( I_3 )</th>
<th>( I_2 )</th>
<th>( I_1 )</th>
<th>( I_0 )</th>
<th>( D_2 )</th>
<th>( D_1 )</th>
<th>( D_0 )</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

as the three middle traces and the other inputs \( I_0 - I_3 \) are all high. There is a latency of \( 2/3 \) clock cycles due to the pipeline, and all the outputs are correct.

We were able to show that subcircuits of the converter, including the three-bit binary encoder, functioned correctly. To date, we have not been able to verify the correct operation of the complete three-bit ADC; possible reasons include flux-trapping, circuit defects, and the clock-distribution. Simulations with JSIM [6] have indicated that the converter with a current density of 1000 A/cm² can function at a clock frequency as high as 5 GHz. The current density should be increased to maximize the circuit bandwidth and margins.

### III. SUMMARY

Both comparators in a three-bit quantizer and logic gates in a three-bit binary encoder have been designed using the same circuit configuration. The circuits have been fabricated, and we have been able to demonstrate experimentally the functionality of the comparators and of logic gates at various levels of complexity. We have successfully verified the correct operations of a complete two-bit analog-to-digital converter and of a complete three-bit binary encoder. Simulations have shown that the complete three-bit ADC can work up to 5 GHz.

### IV. REFERENCES


Superconducting Delta-Sigma Oversampling A/D Converter

P. H. Xiao and T. Van Duzer
Department of Electrical Engineering and Computer Sciences
and the Electronics Research Laboratory
University of California
Berkeley Ca 94720

Abstract—A superconducting delta-sigma A/D converter is presented in this paper. The converter uses a low-pass filter instead of the integrator found in the usual delta-sigma architecture. The converter is analyzed by a behavior-level simulation package as well as the circuit simulator JSIM. Its performance is compared to the standard first-order delta-sigma converter. The simulation shows that this converter can achieve a 70 dB of signal-to-noise ratio (S/(N+D)) with an oversampling ratio of 128. This corresponds to an 11-bit resolution.

I. INTRODUCTION

Delta-sigma A/D converters have been receiving much attention lately due to advances of modern VLSI technology. They inherently possess some characteristics which naturally lend themselves to VLSI high-level integration. First, only a small amount of analog modulator circuitry is required in the design and the circuits have a high tolerance to component mismatching. This means that component trimming is not required to achieve high-resolution A/D conversion, in contrast to the strict component matching requirement for the other high-resolution A/D converters. In addition, the resolution of delta-sigma converters can be scaled directly with the signal conversion rate through the digital signal processing in later stages. The resolution can be increased by increasing the sampling rate. Furthermore, delta-sigma converters' oversampling technique greatly relaxes constraints on the anti-aliasing filter at the front end; in many cases, a passive RC filter will suffice to replace the usual complex and expensive high-order analog filters to filter out high frequency noise.

It is a natural extension to implement the delta-sigma A/D converter in high-speed and low-power superconducting integrated circuit technology. The ultra-high speed sampling capability in superconducting circuits can be exploited to achieve higher resolution. And their low power consumption may be of critical importance in applications, such as infra-red image processing, where the power limitation obviates other technologies.

This paper begins with an introduction to the principle of delta-sigma conversion. Then the implementation of the delta-sigma converter in superconducting technology is analyzed. We replace the integrator in the usual delta-sigma converter by a low-pass filter and compare its performance to the integrator converter. Next, a superconducting circuit based on this modified architecture is presented and simulation results are given. This is followed by a discussion of the implementation of the superconductive digital filter which shows that it is achievable within the current superconducting technology.

II. DELTA-SIGMA CONVERSION

A delta-sigma converter consists of two parts (Fig. 1): an analog modulator and a digital decimation filter system. The modulator of the delta-sigma converter has its digital output latched and fed back to subtract from the analog input signal. Thus its 1-bit stream output digital signal y(nT) (n is the sequence number, T is the sampling period) tracks the change of the input analog signal; when the analog signal increases, d(t) increases and the modulator produces positive pulses, which subtract from the analog signal to make d(t) smaller and make it tend toward producing negative pulses. The density of the output pulses is proportional to the input amplitude; and after more processing in the digital decimation filter, the analog input signal can be reconstructed in a digital form.

![Fig. 1 Structure of a delta-sigma converter.](image-url)
There are two criteria for the delta-sigma modulator to operate correctly. First, the D/A feedback signal has to be larger than the maximum input analog signal. Otherwise, the information of how large is the peak analog signal is lost. Second, the sampling frequency has to be much larger than the signal bandwidth, instead of just being twice as the bandwidth as in the other converters. Due to the tracking and averaging nature of the delta-sigma converter, the output will be more accurate if the input signal does not change much during many sampling periods. Thus the delta-sigma converter is often referred to as an oversampling converter.

In an A/D converter, noise is introduced upon quantizing the analog signal into a digital signal. In the delta-sigma converter case, assuming the quantizer generates white noise whose rms value is \( \varepsilon = \Delta \sqrt{12} \), where \( \Delta \) is the quantization step size, the total noise in the base band \( 0 \leq f \leq f_s \) is given by

\[
N_v = \frac{\pi \Delta}{6} \left( \frac{2f_0}{f_s} \right)^{3/2}
\]

where \( f_s \) is the sampling rate and \( f_0 \) is the signal bandwidth [1]. From Eq. 1 we see that with an increase of sampling frequency \( f_s \), the net quantization noise is reduced and resolution therefore increases. Quantitatively, with every doubling of \( f_s \), the signal-to-noise ratio (S/N) increases 9 dB, which corresponds to 1.5 bits of resolution.

III. DELTA-SIGMA CONVERTER IN SUPERCONDUCTING TECHNOLOGY

A unique feature in implementing a delta-sigma converter in superconducting technology is the lack of a high performance analog integrator, which requires a wideband operational amplifier. The bandwidth of the amplifier must be at least as high as the sampling frequency in this application [2]. But the Josephson junction, the active element in superconducting technology, is a two-terminal, low-gain device. Despite many efforts, there is still no suitable wideband amplifier based on the Josephson junction; so some ways must be sought to replace the integrator.

A low-pass filter has a frequency response similar to that of an integrator. In Fig. 2, the frequency responses of a first-order filter and a practical integrator are compared. The transfer function of an ideal integrator has a pole at zero and a one octave roll-off. But due to problems in the implementation, e.g., finite gain and slew rate of the amplifier and capacitor leakage, the transfer function always saturates at low frequency. The low-pass filter has a similar characteristic. The difference between the two is gain; the integrator has a much higher gain. But the significance of this difference is lessened by the presence of a quantizer in the following stage. The quantizer only tracks the signs of the signal, not their magnitude. Therefore, an ideal quantizer will not recognize the difference. A low-pass filter modulator would give same digital output as an integrator modulator. The nonideality in the quantizer, such as its hysteresis will affect the output [2], but due to the averaging nature of the delta-sigma converter, a random switching will be averaged out.

The above principle is checked by using a behavioral delta-sigma simulation package--SDSIM [3]. The package analyzes the performance of a delta-sigma converter by modeling the behavior of different modulator components: integrator, low-pass filter, quantizer, and D/A latch. It can also take into account practical parameters, such as dithering of the sampling clock and the hysteresis of the quantizer. SDSIM has the advantage of speed so that the user can quickly find the relation between performance and circuit parameters. The use of device-level simulators, such as JSPICE or JSIM to simulate delta-sigma converter is very slow because each simulation involves tens of thousands of samplings, thus millions of simulation time steps. Hence, tools like JSIM are only used for confirming the superconducting circuit design here.

The performances of the modulators with a first-order low-pass filter and a single integrator are compared in Fig. 3. In these simulations, the input is a sine wave. The oversampling ratio is 128 and the filter 3 dB frequency \( f_c \) is \( 1/(2\pi \times 6) \).
A superconducting circuit based on the above principles is shown in Fig. 5. The system consists of input transformer, feedback transformer, low-pass filter, QFP [4] comparator, readout SQUIDs, and HUFFLE [5] feedback D/A converter. The input signal is coupled into the low-pass filter by the input transformer. The output of the low-pass filter goes into the quantum flux parametron (QFP) comparator. The comparator gives a positive output when the input is positive, so the readout SQUID $S_{q1}$ will be switched into the voltage state. This causes a current to follow into the control line of the $S_{q1}$ SQUID in the HUFFLE circuit. Thus, the current in the HUFFLE's output inductor $L_h$ is from left to right, and this is coupled back into the low-pass filter to cancel the effect of the positive input current to the QFP comparator. If the input signal to the comparator is negative, SQUID $S_{q2}$ is switched into the voltage state and the feedback signal again cancels the input signal. The QFP's output switches between +1 and -1. The digital signal can be read out from resistors $R_{q1}$ or $R_{q2}$.

The low-pass filter is a very critical component; a large $f_c$ will result in more noise. As expected, in the simulation, we found that when $f_c$ is larger than $f_c/(2\pi \cdot 30)$, the $S/(N+D)$ decreases sharply and when $f_c$ is less than $f_c/(2\pi \cdot 30)$, the $S/(N+D)$ remains high and is insensitive to $f_c$. So, the trade-off is that a large $f_c$ will increase the signal level after the low-pass filter and ease the comparator design, but decrease the dynamic range and $S/(N+D)$. Here we chose $f_c$ to be $1/(2\pi \cdot 64)$ of the sampling frequency.

A QFP is used here not only for its capability of distinguishing bipolar signals and its ultra high speed, but also for its extremely high sensitivity, because the signal after the low-pass filter is quite small. A QFP can resolve signals down to a few microamperes. The HUFFLE circuit employed here is also very critical to the circuit performance. The converter's peak output signal is limited to the HUFFLE circuit output level; this and the level of the noise floor determine the maximum dynamic range of the converter, and it cannot be improved by faster sampling. In the simulation presented above, only the quantization noise is taken into account.

![Fig. 5](image5.png)

**Superconducting circuit of a delta-sigma low-pass modulator**

![Fig. 6](image6.png)

**The output digital signal after the filter operation compared to the input sine wave.**
The circuit was simulated by JSIM. In the simulation, the sampling frequency was 1 GHz and the oversampling ratio was 128. So the signal bandwidth was limited to 4 MHz. The amplitude of the input signal was 20 dB down from the HUFFLE circuit feedback current, and the analog input frequency was 1 MHz. The output signal after the decimation filter is compared to the input sine wave in Fig. 6 to illustrate the correct operation in the time domain. The minimum-sinusoidal-error analysis shows that it reaches a $S/(N+D)$ of 53 dB and $S/N$ of 55 dB, which is very close to the value obtained from the SDSIM program at the same input signal level (Fig. 3).

IV. DIGITAL DECIMATION FILTER IN SUPERCONDUCTOR TECHNOLOGY

The decimation filter is a very important part of the delta-sigma converter. The decimation filter in the delta-sigma converter serves four purposes: suppressing the out-of-band high frequency quantization noise; preventing the aliasing of the out-of-band signal into the passband; maintaining the passband ripple within requirements; and down-sampling the output signal. Depending on their applications and the structures of the analog front ends, many different filter implementations are applicable. For most applications, a cascade of a linear-phase sinc FIR [6] filter and an IIR [6] low-pass filter can achieve the specifications. The delta-sigma converter became feasible only after integrated circuit technology was mature enough to support the complex design of the digital filter. In this section, we will show that the current superconducting integrated circuit technology also can support decimation filter implementation.

An FIR filter is used at the first stage mainly because of the hardware simplicity. The output of the modulator is a one-bit signal so that the multipliers in the FIR filter can be simplified as AND gates. For a first-order delta-sigma converter, a second-order sinc filter is sufficient. It down-samples the signal to twice the Nyquist frequency and leaves the following sharper IIR filter to finish rest of the decimation. Due to the decimation and the simplicity of a second-order sinc filter, the hardware can be multiplexed to further reduce its complexity and a very simple circuit implementation is available [7]. For that circuit design, with the filter coefficients 12-bit wide and a decimation ratio of 64 (128 taps), we estimated that it will take about 2000 MVTI gates.

For an application where the phase linearity of the output signal is not important, an IIR low-pass filter is applied as a second stage to remove the remaining out-of-band noise and further down-sample the signal. Because the elliptic filter has the narrowest transition band among all filters of the same order, a fourth-order elliptic filter is used in our simulation. Since the filter coefficients are predetermined, an area-efficient architecture, called bit-serial implementation, is adopted [8]. In this architecture, all additions and multiplications are serially implemented in a bit-by-bit operation. Though it is slower than the parallel implementation, the signal rate is already reduced by the FIR and the bit-serial architecture can be used here to save area and circuit complexity. After considering the coefficient quantization noise effect, 16-bit wide coefficients are enough for a 12-bit converter. We estimate that a fourth-order elliptic filter will consist of 2000 MVTI gates. Thus, the total filter requires 4000 gates and approximately 12,000 junctions, which is within the limit of current Josephson technology.

V. CONCLUSION

A modified delta-sigma converter architecture is presented wherein the integrator in the analog modulator is replaced by a first-order low-pass filter. This enables the delta-sigma architecture to be implemented in superconducting circuit technology. A superconducting delta-sigma analog modulator was presented. Simulation shows that the converter can achieve a 11-bit resolution (70 dB in peak $S/(N+D)$) with an oversampling ratio of 128. The signal bandwidth is 4 MHz if the sampling rate is 1 GHz.

VI. ACKNOWLEDGEMENTS

Many thanks go to Prof. P. R. Gray and M. F. Mar for discussions on delta-sigma converters and the introduction to using SDSIM.

VII. REFERENCE


A 5-32 BIT DECODER FOR APPLICATION IN A CROSSBAR SWITCH

David A. Feld, David F. Hebert, Theodore Van Duzer
Department of Electrical Engineering and Computer Sciences
and the Electronics Research Laboratory
University of California, Berkeley CA 94720

Abstract--A new voltage state multiple input NOR gate has been designed and tested for use as the basic gate in a 5-32 bit parallel-input decoder. Two versions of this NOR gate are presented, one with a single output and one with a selectable output. The combination of the two types of NOR gate makes it possible to construct a 5-32 bit decoder with considerably less gate current than would be required if it were constructed in other logic families. Since only a single gate current is required by each NOR gate, and because only 12 NOR gates are needed to build the full decoder, a clock with a peak current level of only 6 mA is sufficient to power all of the decoder's 72 constituent SQUIDs. The decoder also occupies a small area compared with other designs. In this paper we review critical design issues of the NOR gates. We also present low-speed and high-speed results of sub-blocks of the full 5-32 bit decoder.

I. INTRODUCTION

In an application such as a superconductive crossbar switch (a massive switching network), about 100 superconductive decoders must receive ac clock power, each from a separate transmission line. Each of these lines originates in room temperature environment and then descends into a 4 K dewar to the decoders. To minimize the heat flow from room temperature into the 4 K dewar, each transmission line must have a small cross section. This constraint, together with the technological limit of how thin the transmission line conductors and dielectrics can be made, makes it difficult to form transmission lines with low characteristic impedances. Thus, room-temperature voltage drivers can drive only a limited amount of ac (1 GHz) current through each transmission line. Consequently, the amount of allowable gate current for each decoder must be kept small. We present a voltage-latching decoder architecture which consumes far less gate current than if it were designed in other voltage latching logic families. We accomplish this task with two new special purpose circuits: a NOR gate, and a NOR gate with a selectable current output. Because of the simplicity of our decoder design, only a small chip area is required (<400 μm x 1500 μm) per decoder. This compact design is particularly important in a crossbar application where we must fit 32 5-32 bit decoders on a 1 cm² chip.

II. DESIGN OF THE 3-INPUT NOR GATE

A. Description of the 3-input NOR gate

The 3-input NOR gate which is used eight times in our decoder is schematically represented by the stack of four SQUIDs shown on the left hand side of Fig. 1. Note that the

---

Fig. 1 Basic block of the 5-32 bit decoder. Left stack of four SQUIDs is 3-Input NOR gate. Right stack of 10 SQUIDs is 2-Input NOR gate with 8 channels for current select output.

Fig. 2 Operation of the 3-Input NOR gate of Fig. 1 for the case A=1, B=0, C=0. (a) Threshold curve for SQUID corresponding to A input. (b) I-V characteristic of same SQUID with load resistor R1. (c) Threshold characteristic for SQUID with SET input.

---
four SQUIDs are stacked together in series. The control line of each SQUID is uniquely connected to each one of the inputs A, B, C, and SET. Figure 2 contains three plots which describe the operation of this gate. Figure 2a shows a SQUID threshold curve corresponding to the SQUID with the A input. Figure 2b shows the I-V characteristic of that SQUID together with the resistive load line $R_1$. Figure 2c shows the threshold curve of the SQUID corresponding to the SET input.

We will explain how the NOR gate works for the input case $A=1, B=0, C=0$. First, a gate current $I_{gate1}$ less than the critical currents of any of the SQUIDs is applied to the stack of four SQUIDs; this is represented by point $\alpha$ in each of the three plots in Fig. 2. Next, the inputs are applied. In this case, input current is applied to input A only. As a consequence, the SQUID corresponding to the input A enters the voltage state, as can be seen in Fig. 2a and 2b at point $\beta$. Note that at this time, most of the gate current $I_{gate1}$ is switched out into the resistor $R_1$. Thus, only a small amount of current flows into the gate of the bottom SQUID of the stack so that later, when the SET control line is applied, that SQUID remains in the zero-voltage state and no current flows into $R_2$. Point $\gamma$ of Fig. 2c shows that the SET signal application can at most force the bottom SQUID to do a vortex-to-vortex transition, but the SQUID cannot do a vortex-to-voltage state transition as long as the gate current to the SQUID is kept small [1]. The absence of current through $R_2$ represents a logical zero at the output of the gate. The gate is reset by turning off the gate current $I_{gate1}$.

Now consider the input case $A=0, B=0, C=0$. Figure 3 shows (a) the SQUID threshold curve and (b) I-V characteristic for the bottom SQUID of the stack. Again when the gate current $I_{gate1}$ is applied, operation is at point $\alpha$ shown in Fig. 3.

Since none of the three inputs is applied, the gate current $I_{gate1}$ is not diverted into $R_1$, and operation continues to reside at point $\alpha$. Next, the SET current is applied, and the bottom SQUID is forced into the voltage state, as shown by point $\beta$. Since the SQUID is loaded by both of the resistors $R_1$ and $R_2$ in parallel, the load line intersects the SQUID I-V characteristic deep into the subgap. Thus, gate current $I_{gate1}$ flows partly through $R_1$ and partly through $R_2$. The current which passes through $R_2$ represents a logical "1" at the output. From these two examples of inputs, it is clear that the gate performs the NOR function: $A \oplus B \oplus C$.

**B. Choice of values for $R_1$ and $R_2$**

As was previously mentioned, a SQUID I-V characteristic which is representative of the SQUID corresponding to the input A is shown in Fig. 2b. Note that the same I-V characteristic applies to the two SQUIDs in the middle of the stack should the B or C inputs be applied. If any of the inputs A, B, or C is a "1", then $R_1$ must be small enough to guarantee that most (~80%) of the gate current is shunted away from the bottom SQUID so that the latter will not enter the voltage state upon application of the SET control current. The question remains as to how $R_2$ should be chosen. If A, B, and C are all "0", then upon the application of the SET signal, at least half of the gate current should leave through $R_2$, to insure that it can drive the input of the next gate. Thus, $R_2$ should be chosen to be less than or equal to $R_1$. This choice of $R_2$ makes the parallel resistance $R_1 + R_2 \leq R_1/2$, and it is the reason that the bottom SQUID latches deep into the subgap portion of the I-V characteristic as shown in Fig. 2b. To assure a latching current output for the resistor $R_2$, $R_1 + R_2$ must not be so small that the bottom SQUID resets; a sensible choice is $R_1 = R_2$.

**C. The need for additional flux gain**

It was mentioned that at least half of the gate current must leave through the output of the NOR gate to "drive" the input of next gate. In theory, even a tiny amount of input current can switch a SQUID. In practice however, a large amount of output current is needed to guarantee that the SQUID will switch over a wide range of applied gate currents. One of the advantages of the stacked NOR gate design is that each SQUID has a single input. This makes it feasible to use a transformer coupling for each SQUID with a turns ratio of 2:1. The 2:1 transformer amplifies the external flux input to each SQUID by nearly a factor of two and it is equivalent to doubling the drive current from the previous stage.

**III. FUNCTION OF BASIC DECODER BLOCK**

As was mentioned earlier, the 5-32 bit decoder consists of two kinds of NOR gates. The single output NOR gate was discussed in Section II. It consists of four SQUIDs and is shown in the left side of Fig. 1. The second kind of NOR gate consists of a stack of ten SQUIDs and it is shown in the right side of Fig. 1. This NOR gate is different from the previously described gate only in that its output can be selected through any one of the eight output resistors labelled $R_{out}$ through $R_{out8}$. Figure 1 shows the interconnection of each of these two kinds of NOR gates. We call this interconnection the basic decoder block since the functionality of this block is essential to the functionality of the full 5-32 bit decoder. The basic decoder block operates as follows: First current is applied to $I_{gate1}$ and $I_{gate2}$. Next, any one of five inputs A, B, C, D, and E is applied. As was mentioned earlier, if none of the inputs A, B, or C is applied, then upon application of a SET signal, current is steered through $R_2$ into the input of the SQUID in the input stack corresponding to $R_{out4}$. If neither the D nor the E input is applied, then this SQUID switches into the voltage state and the gate current $I_{gate2}$ is diverted around the SQUID into its corresponding output resistor $R_{out4}$. Current continues to flow through the gates of the other seven SQUIDs so only $R_{out4}$...
the selected output. Consequently, the basic decoder block performs the function: $R_{OUT} = A + B + C + D + E$.

IV. LOW-SPEED EXPERIMENTAL RESULTS

A. The resonance problem

In testing the single input NOR gate it was noticed that the output (bottom) SQUID could latch into the voltage state at a voltage which was considerably smaller than expected. The problem was found to occur for small values of the gate current. Figure 4 illustrates two modes of operation of the NOR gate: one in which the bottom SQUID latches into the subgap as shown by point B (proper operation) and one in which the SQUID latches into a resonance as shown by point A (failure). Resonances in SQUIDs are caused when an oscillation is set up between the capacitance of the junctions and the loop inductance [2]. These oscillations can only occur when a SQUID loop contains an amount of flux other than an integral multiple of a flux quantum $\Phi_0$. In Fig. 4, we apply an amount of external flux through the bottom SQUID equal to about $\Phi_0/2$, an amount of flux at which the resonance is most pronounced. The resonance peak of the bottom SQUID of the NOR gate is shown as a curved line in Fig. 4 next to point A. The load line must avoid the resonance peak to ensure proper operation of the gate. Load line #2 of Fig. 4 intersects that resonance peak since the initial gate current is not large enough. However, load line #1 has a sufficiently large initial gate current to bypass the resonant peak. The initially applied gate current, for successful operation of the NOR gate, must be larger than the gate current corresponding to point C and must be less than the maximum critical current of the SQUID (point D). This difference should be as large as possible to maximize the gate-current margin. Also, $R_1$ and $R_2$ of the NOR gate should be as large as possible so that the load line $R_1 \parallel R_2$ will be less likely to intersect the resonance peak. In any case, $R_1$ and $R_2$ must meet the conditions discussed in section II.

B. Solution to the resonance problem

One way to ameliorate the resonance problem is to suppress the height of the resonance peak. This is typically done by placing a damping resistor across the SQUID's loop inductance. This damping resistor helps to reduce the height of the peak, but cannot eliminate it. Typically, the minimum height of the resonant peak is approximately 40% of the SQUID's maximum critical current with damping. This peak limits the gate margin of the NOR gate.

We present a circuit solution to avoid getting caught in a resonance: drive the SET line of the basic gate with a pulse of current instead of a step current. The idea stems from the fact that a resonance can exist only as long as external flux is present in the SQUID. By sending a pulse of current into the SQUID, the resonance can only live for a short time, and eventually the SQUID must latch into the subgap. In our basic decoder block, the output current from the resistor $R_2$ latches, thus generating a sustained external flux in the SQUID corresponding to $R_{OUT}$ which can cause that SQUID to latch into a resonance. This is avoided by preventing the current in $R_2$ from latching by choosing $R_2$ to be small. We denote this version of the decoder block with $R_2$ small as the "nonlatching" version.

C. The basic decoder block

A version of the basic decoder block shown in Fig. 1 with fewer output SQUIDs was first fabricated and demonstrated in our laboratory. Subsequently, the full decoder block of Fig. 1 was fabricated by Hypres Inc. Here we present results on the nonlatching version of the basic block, as discussed in the previous section although the latching version was also fabricated and tested. Figure 5 shows oscilloscope traces of input and output waveforms of the test on the nonlatching version. Signals were applied to the A, D, and E inputs which can be seen on lines 3, 4, and 5 respectively from the top of the photograph. The gate currents $I_{gate1}$ and $I_{gate2}$ were applied simultaneously (sixth line of Fig. 5). The second line shows the SET pulses which are intentionally delayed to arrive after the inputs. The output at $R_{out}$ can be seen on the first line. Note that this test of the basic block was done for eight different sets of inputs. The basic block was successfully demonstrated in that the output $R_{out}$ was a "1" only when all three of the outputs were "0". Thus successful operation was demonstrated. Experimentally, it was found that the gate currents $I_{gate1}$ and $I_{gate2}$ could be simultaneously varied by about +/- 35%. Simulations of the
circuit suggest that the margins should be about +/- 30%. Non-uniformities of +/- 7% in the maximum critical current of the SQUIDs are responsible for this discrepancy. These nonuniformities determine the maximum current $I_{gate1}$ and $I_{gate2}$ which can be applied to the NOR gates and thus limit the upper end of the gate margin. The minimum amount of allowed current $I_{gate1}$ and $I_{gate2}$ is determined by the size of the pulse which leaves $R_2$. If this pulse is not large enough, it will not be able to drive the next stage. The experimentally observed minimum allowed gate current has been shown to be in good agreement with simulation provided that the simulation takes into account the transmission line connecting the output of $R_2$ to the input of the next stage. The gate margin of the latching version of our decoder block was experimentally observed to be +/- 18%. In these observations it was found that below a certain gate current value the output resistor $R_{OUT4}$ would latch into a smaller voltage than was expected. Simulation showed that our heavily shunted output SQUID corresponding to $R_{OUT4}$ was actually latching into a resonance instead of into the subgap as was explained in section IV. In this simulation, parasitic inductance associated with the damping resistors of the SQUIDs is essential. As we had expected, resonances at the output resistor $R_{OUT4}$ of the nonlatching version were not observed. An increase of the gate margins could be made by increasing the critical current of the single output NOR gate, and clocking $I_{gate1}$ at a higher level than $I_{gate2}$. This would increase the size of the current pulse which leaves $R_2$. However, it would also increase the decoder's clock current.

D. Full 5-32 bit decoder

Figure 6 shows a circuit schematic of the entire 5-32 bit decoder. The basic decoder sub-block (in Fig. 1) is outlined by the two dotted-line boxes. The full decoder is constructed from the interconnection of eight of the single-output NOR gates with four of the selectable-output NOR gates as shown in the figure. The 8 SET inputs corresponding to each of the single output NOR gates are connected in series. Note that there are 3 inputs for each of these 8 gates corresponding to a total of 24 inputs. These 24 inputs are subdivided into six sets of 4 serially connected inputs. Each of these six sets of inputs is driven by a decoder address line. There are 10 decoder address lines consisting of the 5 decoder addresses and their inverses. Similarly, the 8 inputs corresponding to the 4 multiple output NOR gates are connected in 4 sets of 2 serially connected inputs. Each of these 4 inputs is driven by the 4 remaining decoder address lines. Connections are made such that there exists a unique output (current through one of the 32 output resistors) for each of the 52 possible sets of input addresses. The output is generated upon the application of the SET signal.

As was mentioned earlier, it is of critical importance that the decoder consume a small ac clock current. The 72 SQUIDs of the decoder are contained in 12 gates. If each SQUID has a maximum critical current of 500 $\mu$A, then the full decoder consumes about 6 mA of ac clock current. Thus the decoder is very efficient in its use of current biases. A full decoder was fabricated by Hypres Inc. and tested at Berkeley. Due to wiring errors, only 16 of the 32 outputs functioned correctly.

Fig. 6 Circuit schematic of a full 5-32 bit decoder. Basic decoder block (in Fig. 1) is represented here by two NOR gates enclosed by dotted-line boxes.

V. HIGH SPEED EXPERIMENTAL RESULTS

A test of the nonlatching version of our basic block similar to that of Section IVc was conducted at high speed. The gate, inputs, and SET signal were applied current steps separated by approximately 2 ns. The gate margin for the block was decreased to about +/- 15% at high speed. The gates could only be reset every 40 ns due to limitations in our test setup so that resetting speeds of the gates could not be checked.

VI. CONCLUSIONS

We have presented a novel decoder to be used in a crossbar switch. Resonances were shown to be a critical issue in our design. We propose a nonlatching version of our decoder to avoid resonances. A 2:1 transformer coupling for each SQUID was found to increase the gate margins. The basic decoder block was shown to work at both low-speeds and high-speeds (with the exclusion of a test of the resetting speed).

VII. ACKNOWLEDGEMENTS

The authors thank P. Bradley, P. Yuh and S. Kaplan for their invaluable suggestions. Thanks also to Hypres Inc. for circuit fabrication and to Hewlett-Packard for a donated computer in which most of the simulations were done.

VII. REFERENCES

Abstract We report simulation studies on several novel concepts in superconductive signal processing circuits, including an advanced shift register design and innovative serial decoder concepts. The shift register is a flux-shuttle with very large operating margins. The decoder implements a new serial approach using shift registers and a single comparator for each output.

1. Introduction

Multi-gigahertz digital signal processing requires very high-speed analog-to-digital converters, storage elements, and logic. Superconductive circuits can provide the high speed necessary to implement these components. In this paper, we describe a compact high-speed flux-shuttle shift register as a storage element, and a serial decoder. In addition, these circuits have the advantage of being compatible with nonhysteretic junctions, making them good prospects for use with high-$T_c$ superconductors.

2. Flux-Shuttle Shift Register

A high-speed, low-power, miniature shift register is desirable for a variety of digital signal processing applications. One application might employ an A/D converter-shift register combination to be used as a multi-gigahertz sample-and-hold circuit. Our design goals are wide operating margins, shifting speeds in excess of 20 GHz, compatibility with current superconductive A/D converter designs, and high-speed testability. These design studies are restricted to magnetic flux storage circuits to maximize speed, to allow compatibility with high-$T_c$ superconductors, and to minimize power consumption. Various designs have been examined [1][2][3][4] with most attention focused on the flux shuttle type [3][4] and one utilizing Rapid Single Flux Quantum (RSFQ) logic [2].

An RSFQ shift register has many advantages. Perhaps the biggest of these is the simple clocking scheme that can be used - a traveling Single Flux Quantum (SFQ) pulse. The simple architecture of a shift register makes this circuit a prime candidate for such a clock distribution, whereas a more complex circuit would have to deal with multiple paths and dimensions. These clocks can be provided at high speed, and the number of pulses can be accurately controlled. This allows very high speed test capability without high speed signal lines from external circuits, since the clock can be generated on-chip. Read-out can easily be accomplished through a simple two-junction SQUID and can be either voltage or current state. The margins for this circuit are also very good.

The disadvantage of the RSFQ shift register is its incompatibility with the A/D converter chosen for our system. Although A/D converter circuits have been proposed which are compatible with this type of shift register [2], we are integrating the shift register with a flash-type A/D converter developed in our laboratory [5]. This A/D converter requires the use of a three-phase sinusoidal overlapping clock. An SFQ pulse generator can be used to convert the sinusoidal clock into the pulses required by the RSFQ circuit, but in the RSFQ shift register, the clock and data move in opposite directions. Since the data must be synchronized with the sinusoidally clocked A/D converter, there is a data flow dependency problem with this configuration.

The flux-shuttle shift register proposed by Beha et al. [4][6] uses a three-phase sinusoidal clock (Fig. 1a, 1b). If a flux quantum is stored in the first cell, a clockwise circulating current exists through the storage inductance ($L_1$) and two adjacent junctions ($J_1$ and $J_2$). When the clock in the next cell ($C_2$) becomes active, current is coupled into $J_2$, exceeding its critical current and injecting flux into the two adjacent storage loops. On the left side, this injected flux quantum cancels the one stored in $L_1$, and on the right side, the flux quantum is stored in $L_2$. This effectively transfers the flux quantum from one cell to the next. The clocking scheme presented by Beha uses a clipped, nonoverlapping sinusoid. Although the circuit was simulated at 50 GHz, this type of signal is very difficult to provide externally. We have investigated the use of a low-temperature Schottky diode on-chip to clip a three-phase overlapping pure sinusoid. However, for extensive clock distribution, this circuit poses a difficult microwave transmission problem due to the high frequencies associated with a clipped sinusoid. An alternative is to use the unclipped clock directly in the shift register. Although this reduces the operating margins, the circuit still performs well.

A potential advantage of the flux shuttle is the very low power used in the serially-connected, inductively-coupled clocking shown in Fig. 1a. This presents a problem, however, since the inductance used for
coupling to the storage loop ($L_2$) leaves less inductance available to couple to the read-out SQUID ($L_3$). This limitation stems from the requirement that $\Phi_0 = 2\pi L_2 \Phi_n$, where $L = L_3 + L_n$ be on the order of $2\pi$ for correct circuit operation. The inductively-coupled clock lines also produce undesirable oscillations in the current $I_m$ used to read out the state of the cell.

Various low-inductive methods of reading out the stored state of the cell were investigated and a reverse-coupled dummy shift register was used to cancel the effect of the clock lines on the read-out SQUID. Although the oscillations were eliminated, the low-inductance read-out schemes still had insufficient coupling to achieve large operating margins.

An alternative to the inductively supplied clock is to use direct injection (Fig. 1b). This circuit operates like the previous one, except that the injected clock bias current is directly injected into the junctions. This form of clocking frees all of the storage loop inductance for coupling to a read-out SQUID and greatly reduces the oscillation of the read-out control lines, thus improving the read-out margins. However, it also presents a more difficult clock distribution problem and additional power dissipation. Figure 2 shows a method of clocking this circuit which guarantees correct output for either latching or nonlatching read-out. It uses only two storage loops per bit and drives the read-out SQUID bias with one of the three phases of the clock. The second storage loop is used to couple to the read-out SQUID. This use of the clock phases maintains the correct logic value in the cell throughout the read operation (2/3 of the clock cycle), increasing the read-out portion of the clock cycle.

We have simulated the flux shuttle shift register with direct injection sinusoidal clocking using JSIM [7]. These simulations indicate that the circuit will operate correctly at speeds in excess of 40 GHz, and at 20 GHz, with voltage-state read-out" on the fly" using heavy resistive shunting on the read-out SQUID (Fig. 3). Although the low resistance shunt reduces the output voltage, it prevents the read-out SQUID from staying in the latched voltage state (punchthrough). Operating margins were also checked using the program PSCAN [8], which can be used to produce a two-dimensional plot of circuit performance as a function of two circuit parameters. Simulations indicated ±50% margins on the storage loop inductance, ±60% margins on damping resistance, and ±15% margins on the clock bias. In each case, the margins are defined holding all other parameters at their midpoint values. The margins for the critical-current allowed for as much as ±10% shifts from the design value without adjusting the clock bias. In addition to this, the circuit tolerates critical currents that differ by 30% in adjacent shift register cells.

---

Fig. 2. Flux-shuttle shift register with a two-phase ($C_1$, $C_3$) directly-injected clock. Read-out is done via a two-junction SQUID biased by the $C_3$ clock phase.

Fig. 3. Voltage-state read-out of the flux-shuttle shift register operating at 20 GHz. The two signals are produced by shifting a single "1" value through two adjacent shift register cells.
3. **Serial Decoder**

We will describe a decoder scheme which is well suited for flux-based logic. As will be shown, our design can be implemented with shift registers using XOR read-out gates. Thus the serial decoder presented is only slightly more complex than a flux based shift register. This progression in circuit complexity provides a means to determine if flux-mode shift registers can function properly when integrated with other circuit elements.

The fundamental decoding elements in our design include a closed-loop shift register, an XOR gate and a latch. Figure 4 shows a block diagram of a portion of our decoder. When the first bit (least significant) of a five-bit input address to the decoder becomes valid, it is simultaneously compared (XORed) with all five cells of a shift register. If the result of a given comparison is a match, then the XOR gate generates no output and fails to set its corresponding latch. In the case of a mismatch, the XOR gate produces a "1", setting its latch. The shift register then shifts its data so the bit contained in the fifth cell is transferred to the first cell, the bit in the first cell is transferred to the second, and so forth. Again, the five XOR's are performed in parallel and each XOR cell has a chance to set its corresponding latch. We repeat this procedure five times. At this point, the shift register has returned to its initial state, and an "empty" latch represents a five bit match. A summary of the circuit output follows:

An input address of 00001 will prevent latch #1 from becoming set.

An input address of 00010 will prevent latch #2 from becoming set.

An input address of 10000 will prevent latch #5 from becoming set.

Thus, we have provided for five of the 32 possible input patterns. To generate logic for 30 of these patterns we need six shift registers, each with five XOR gates and five latches, performing the above operation. We must fill the shift registers as follows:

1st shift register - 10000
2nd shift register - 01111
3rd shift register - 11000
4th shift register - 11100
5th shift register - 11010
6th shift register - 00101

The last two of the 32 decoder cases can be covered with two additional XOR gates. Each gate has one input connected to the input address line and its other input connected to "1" or "0". Figure 5 shows a block diagram of the complete decoder. It should be noted that the number of shift registers can be reduced to three if we use inverting shift registers. This decoder has the following advantages:

1) Only two lines have to be compared (fan-in = fan-out = 2) independent of the number of bits in the serial decoder. This maintains logic margins even for flux-based logic where on-chip current-driving capabilities are limited.

2) Flux-based logic typically requires a clock for each gate (e.g. quantum flux parametron [9]). Thus, decoder designs requiring multiple levels of logic would require additional clocking to produce an output. Although our design is implemented in a single level of logic, it still uses multiple clock cycles to do the decode function. However, since this is done in parallel with the input of the serial address, additional clocking is not necessary.

![Decoder Address Line Diagram](image)

Fig. 5. Block diagram of the decoder. Here we show six five bit shift registers loaded with the appropriate data. Two XOR gates have been provided to cover the cases of address = 11111 and address = 00000.

Figure 6 shows one possible implementation of the shift register and XOR gate using the flux-shuttle shift register proposed earlier in this paper. The XOR gate is simply the shift register read-out SQUID with an additional control line (the decoder address line). The latch is formed by the loop containing inductor L_{latch} (see RSFQ storage loop in Fig 6) [2]. Clocks C_1, C_2,
and \( C_3 \) are used to operate the shift register. Clock \( C_2 \) is also used to activate the XOR gate. After five clock cycles, the storage latches can be read-out via \( I_{\text{read-out}} \), which also resets them for the next decode operation.

![Figure 6. Three phase version of flux-shuttle shift register. Here we add an additional control line to the read-out SQUID of the shift register, creating an XOR gate. An RSFQ latch is employed for read-out. \( I_{\text{read-out}} \) is the read-out bias line. \( I_{\text{Address}} \) is the serial input address line.](image)

4. Conclusion

We have presented simulation studies of key elements from a high performance multi-gigahertz digital signal processing system. Simulations of a flux-shuttle shift register with three-phase sinusoidal clocking show very large operating margins for shifting and read-out. The design provides for read-out "on the fly" for frequencies up to 20 GHz. A serial decoder implemented with a single comparator per decoded output and shift registers has been presented. The circuit can be implemented in either voltage or current (flux transfer) logic. The compatibility with flux transfer logic in both these circuits provides for their implementation with high-\( T_c \) superconductors.

Acknowledgements

This work was sponsored by AF Contract 19628-90-K-0037 with RADC funded by SDIO IST.

References

Abstract—The prospect of picosecond gate delays, combined with the peculiarities of superconductive digital circuits, pushes system architecture design for superconductive microprocessors onto new ground. Several groups have proposed possible architectures, including systems for the quantum flux parametrone, for modified variable threshold circuits.

I. INTRODUCTION

There are many types of superconductive digital logic circuits available for use in the design of high-speed microprocessors. However, this is not sufficient to create a faster computing system. In parallel to the development of the circuit technology, it is necessary for system architecture to be developed as well. In this paper, we present several computer architecture issues which have been explored for semiconductordigital circuits, and then examine how they affect microprocessors implemented in superconductor technologies.

We will first present results on instruction use in general purpose computers and some techniques used in conventional high-performance computer architectures. In Section III, we discuss synchronous and asynchronous computer architectures. Section IV describes three representative superconductor digital circuit technologies and the architecture proposed for each of them. Finally, we examine superconductive digital systems in Section V.

II. BACKGROUND

The optimization of digital systems generally includes cost and performance. For our purposes, we will consider cost as being the number of gates or area available for the chip. To get the maximum performance for a given cost, careful study of the tasks to be performed by the system must be done. These well known concepts form the basis for the design of RISC (Reduced Instruction Set Computer) microprocessors. This section will provide a brief overview of the techniques used to improve performance in computer architecture today. It is not intended to be complete, but rather to establish an informed base from which we can determine appropriate forms for superconductor microprocessor architectures.

A. Instruction Use

Many studies on instruction usage have been done for general purpose computers. Table 1 shows the frequency of occurrence and time spent in execution for five classes of instructions. The statistics are from a multi-user load on a VAX 11/780 [1]. The time spent in execution depends on both the implementation and the architecture of the machine, and would be different for other computers. However, the frequencies of occurrence are representative for these types of tasks. This will be rather uniform for all computers, given the same suite of tasks. There are also specific benchmarking programs that test the performance of a variety of task types including compilers, simulators (e.g., SPICE), and linear system solvers. Therefore, Table 1 is not complete for design purposes, but it does show the general distribution of instruction types, and is quite accurate even for very different tasks.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Frequency (%)</th>
<th>Time (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Move</td>
<td>31.7</td>
<td>43</td>
</tr>
<tr>
<td>Branch</td>
<td>28.7</td>
<td>18</td>
</tr>
<tr>
<td>Simple ALU</td>
<td>19.8</td>
<td>5</td>
</tr>
<tr>
<td>Floating Point</td>
<td>10.9</td>
<td>11</td>
</tr>
<tr>
<td>Call/Return</td>
<td>8.9</td>
<td>23</td>
</tr>
</tbody>
</table>

The most striking feature of Table 1 is that the computer performs a move operation most of the time. In fact, complex arithmetic represents only about 11% of the instructions to be executed. Nearly 70% of the instructions depend on high-speed memory access, which, for this computer, accounts for 84% of the execution time. In addition, there are typically fewer than nine instructions between branches. Thus, memory access is the limiting element in the design of high performance computers. These findings have prompted innovative solutions in conventional computer architecture which improve performance dramatically.

B. Caching

To improve memory access time, the most common solution is to use a cache memory [1]. A cache memory is a high speed memory used to store a copy of the most frequently accessed memory locations. When a memory access is requested from a location that is cached, the data is obtained from the cache memory instead of main memory. Since cache memory is much faster than main memory, the operation does not take as long as it would have without the cache.
The speed improvement gained by using a cache depends on how often a memory access request is in a cached location (the hit ratio) and the access time of both the cache and main memory. The hit ratio of a cache depends on the task, the size of the cache, and the size of the main memory. The success of this technique hinges upon the fact that most memory accesses are local to a subset of the total memory.

C. Pipelining

Another popular method of improving computer performance is the use of pipelining [1]. Pipelining allows increased utilization of hardware resources by the partial execution of more than one instruction at the same time. It is like an automobile assembly line, where each worker (hardware resource) performs a task on the car (instruction), completing only part of the total task. Then, while the car is being worked on in the next station, he starts to work on the next car, instead of waiting for the entire car assembly to be completed by all the workers before continuing. This technique can cause an increase in latency, but can provide much higher throughput (latency is the time required to execute a single instruction, while throughput is the number of instructions that can be executed in a given time period). One of the most common uses of pipelining is to fetch the next instruction from memory while executing the current one.

The degree of pipelining is determined by the increased cost of additional pipeline registers, clocking frequency, and diminishing returns due to conditional branch instructions. Conditional branches require that the operation upon which the branch depends be completed before the branch target instructions can be fetched. When these instructions are adjacent, a "bubble" is created in the pipeline until the instruction is completed and the new instruction stream can be started in the pipeline. To reduce the effect of conditional branching in pipelined architectures, various branch prediction or delayed branching techniques have been investigated [1].

D. Parallelism

Multiple execution units can also be used to improve performance. This technique uses parallelism to execute multiple instructions concurrently. The use of multiple execution units requires special care to avoid the hazards associated with more than one instruction operating on the same data. The Transasulo algorithm and scoreboard are two schemes that accomplish this, allowing out-of-order execution and maximum utilization of resources. Simpler schemes restrict the degree of parallelism and do not allow out-of-order execution.

All of these techniques are general and do not depend on technology. This makes them applicable for both semiconductor and superconductor microprocessors. However, the effectiveness of each scheme is technology dependent.

III. SYNCHRONOUS AND ASYNCHRONOUS ARCHITECTURES

Although most computer designs are synchronous, asynchronous designs have been proposed and small systems have been made using this approach [2]. Here we discuss the differences between these two architectures. How they relate to superconductive circuit technologies is explained in Sections IV and V.

A. Synchronous Architectures

Synchronous circuits are by far the most common in digital systems. Ideally, all circuits in a synchronous design receive a common input simultaneously. This input signal is designated as a clock. In synchronous design rules, no circuit is allowed to change the behavior of the clock. This architecture, therefore, assumes that all inputs and outputs are stable at the appropriate times. There is no handshaking circuitry, all delays must be calculated by the designer to be less than the clock period, with allowances made for clock skew and circuit margins.

High-speed clock distribution is a critical element in synchronous circuits. Phase and frequency information must be transmitted to all circuits in the system simultaneously. At high frequencies, the distance a signal travels in a period of the clock becomes comparable to the dimensions of the circuit (λ = 7.5 mm for f = 20 GHz). To meet the distribution requirements, complex clocking schemes must be devised, such as load-balanced H-trees. For high frequencies, the clock lines must be treated as transmission lines as well, creating a serious impedance matching problem, since the clock has a huge fanout. Distributing phase information is especially difficult in systems with multi-phase clocking.

B. Asynchronous Architectures

In an asynchronous circuit, a change of the inputs directly causes a change of the outputs. Thus, all combinatorial circuits are by definition asynchronous. These circuits have no clock; timing is provided by the logic circuits themselves through the use of handshake signals. To maintain data integrity, the handshake signals must guarantee that the input data are stable while being used and that the outputs remain stable until they are no longer needed. A complete signal can be incorporated into the handshake signals to ensure correct operation. The complete signal is generated by logic combinational logic circuits and is asserted after the output data are valid.

Complex bit-level asynchronous systems are not practical, since they would triple bus widths. For microprocessors which use 64-bit data paths and run up to 15 busses, this is not acceptable. Thus, practical complex asynchronous systems synchronize all data bits with respect to each other.

The structure of asynchronous systems can be much like synchronous ones; combinatorial logic bounded by registers. The difference is that instead of using a global external clock, the timing signals are derived from the handshake signals in the circuits themselves. The handshake protocol used by the interconnection blocks determines the degree of concurrency that can be accomplished. For example, a full-handshake allows both computation blocks on either side of the interconnection block to operate on different data simultaneously. A half-handshake, on the other hand, allows only every other logic block to operate on data concurrently.
A combination of asynchronous circuits can be used to implement a synchronous system. All conventional computing systems use this approach. However, a synchronous system cannot be used to implement an asynchronous one, unless the clock frequency used is much higher than the speed at which the asynchronous circuit is to be operated (e.g., asynchronous data transmission between different computing systems as done via modem).

IV. THREE REPRESENTATIVE TECHNOLOGIES

A. Quantum Flux Parametron

The Quantum Flux Parametron (QFP) is a current-latching logic that was developed by Goto et al. [3]. This logic family, which includes a so-called D-gate, requires an external clock, which also provides the power to the circuit. The output drive of the circuit is insufficient to drive the clock by Likharev et al. which also provides the power to the circuit. Rapid Single Flux Quantum Logic (RSFQ) developed latching logic that was developed by Goto

B. Modified Variable Threshold Logic

Modified Variable Threshold Logic (MVTL) [4] was developed at Fujitsu Laboratories, Ltd. and is a voltage-latching logic. Like the QFP, it requires externally clocked power, and can therefore only implement synchronous systems. However, unlike the previous logic family, the inputs do not have to be stable before the clock input. Instead, they are allowed only "0" to "1" transitions after the clock has become active. This allows a conventional "dynamic-CMOS-like" design. Inversion can only take place at a clock edge, making dual-rail signals necessary for ripple logic. Since this logic family is voltage-latching with high impedance loads, RC time constants dominate and punchthrough can occur with high clock frequencies.

The similarity of MVTL to dynamic-CMOS allows the use of conventional computer architectures with minor variations. Since ripple logic is allowed, very high frequency clocks are not necessary, and in fact cannot be used because of punchthrough. This reduces the microwave power distribution problem, but also reduces the potential throughput of the system by reducing the maximum degree of pipelining.

C. Rapid Single Flux Quantum Logic

Rapid Single Flux Quantum Logic (RSFQ) developed by Likharev et al. [5] is a pulse-based logic. Timing signals are used to create a timing window in which the the arrival of a flux quantum \( \Phi \), is interpreted as a logic "1", and no arrival as a "0". All external biases are dc. Timing signals are also single flux quanta, allowing the circuits to drive their own clocks. This allows RSFQ circuits to implement both synchronous and asynchronous systems.

Signal propagation is accomplished through biased Josephson transmission lines (JTL) and by microstrip superconducting transmission lines. JTL's provide isolation and amplification, but require more area and power. Microstrip transmission lines have high impedance compared to a Josephson junction, and pose a termination problem. Since the signals of interest are picosecond pulses of only about \( 10^{-18} \) J, maximum energy transfer is crucial for large margins, and reflections both reduce the transferred energy and, without an isolation buffer, can cause errors in the driving circuit. Also, transmission lines with sharp corners can act as radiating antennas, potentially losing the picosecond pulse signal entirely, and causing severe cross-talk.

The proposed RSFQ handshake circuit does not depend on the data being transferred, and does not produce a complete signal. Therefore, although the protocol is time-independent, its implementation is not hazard-free: the input data may not be valid when the timing pulse arrives at the next logic block. Therefore, while the timing windows proposed in RSFQ may be sufficient for some forms of asynchronous circuits, they do not guarantee correct operation.

A more robust implementation is necessary for a general interconnection circuit. The key is to provide a complete signal. This may be accomplished by redefining the "0" value. One solution is to use a data encoding scheme, wherein the logic levels "0" and "1" are encoded onto two separate lines. Dual-rail logic is sufficient for this purpose. This encoding can also represent the timing signal request by a simple OR of the two data signals. The OR gate can be implemented with a simple confluence buffer in RSFQ, since it does not require additional timing signals. In the case where there are many data lines in a bus, inverters can be used to create both polarities of the output for generation of the complete signal, while only one polarity of the data is actually transmitted to reduce the number of interconnections.
V. SUPERCONDUCTIVE DIGITAL SYSTEMS

General purpose computers depend heavily on random access memory (RAM), as shown in the instruction usage studies. A fast cache memory must be available for data, instructions, and some branch prediction algorithms. High speed memory is also necessary for register files, often containing up to 32 integer registers of 32 bits each in addition to 16 floating point registers of 64 bits each.

Superconductive memories have typically been inadequate to the demands of general purpose computing. However, the proposed Josephson junction/CMOS hybrid memories are promising for a high-speed low-power large main RAM. It is likely, however, that faster caches will still be necessary to provide the required memory performance and fully utilize a superconducting central processing unit in a computer system.

Hybrid architectures can also be used. The system may be synchronous at the highest levels, but include asynchronous blocks of logic to improve performance of certain operations.

Digital signal processing (DSP) covers a wide range of digital systems. DSP can require flow control, arithmetic, memory, and programmability. Thus, it has all the components of a general purpose computer. In fact, many DSP algorithms are run on general purpose computers. However, DSP can also be done on data-flow architectures and systolic arrays, providing a full range of architectures from the most programmable (general purpose computers) to fixed data flow.

DSP microprocessors usually operate on data streams. Thus, RAM can be replaced with a more dedicated memory structure, such as a shift register. Memory in DSP can also often be traded off with I/O bandwidth. This makes DSP systems attractive for implementation in superconductive electronics, which has poor memory capability.

Most current DSP architectures support multiply/accumulate operations, although multiply/(max, min) operations are useful as well. Applications where the multiply/accumulate operation is used are convolution and finite impulse response (FIR) and infinite impulse response (IIR) digital filtering. The algorithms can be adjusted to place the delays used in the convolution sum in different parts of the hardware. Thus, the algorithms can be tailored to fit the superconductor technology.

VI. SUMMARY

We have presented a brief overview of computer architecture issues and considered three different superconducting technologies and the computer architectures which are suitable to each.

The QFP logic family requires a very high-speed external clock, and must be synchronous. The system is also pipelined at the gate level. The deep pipeline and high clock speed necessary make full utilization of this logic family difficult for most digital systems. However, the QFP is a very sensitive comparator, and may play an important role in a hybrid system.

MVTI is a mature logic family with large margins, but requires a synchronous system. Since very high-speed synchronous systems are limited by clock distribution, the additional constraint imposed by punchthrough for MVTI may not be the deciding factor in clock speed determination. Ripple logic capability allows minimum latency regardless of the clock frequency. While MVTI will be limited in the depth of its pipeline (as will all clock-skew-limited synchronous systems), for general purpose computing, this will not be a major issue due to the diminishing returns associated with deeper pipelines.

Asynchronous computer architecture has the most potential for high performance. When clock distribution makes high-speed synchronous systems impossible, an asynchronous solution is the logical choice. RSFQ logic is compatible with both synchronous and asynchronous architectures, but more suited to the latter. Asynchronous circuits implemented in RSFQ logic have potentially very high throughput. When designing asynchronous systems in RSFQ, care must be taken with handshaking signals and pulse transmission.

General purpose computing is unlikely in the near future for systems composed entirely of superconductive components due to RAM and cache requirements. Hybrid technologies of Josephson junctions and CMOS may alleviate some of the memory problems.

In digital signal processing, random access memory can be replaced by more dedicated memory structures. In addition, a data flow architecture with high levels of pipelining can be selected to maximize throughput. With these choices, DSP is the most likely prospect for implementation with superconducting digital electronics.

ACKNOWLEDGEMENTS

This paper includes the authors' interpretation of some of the conclusions reached at the Workshop on Architectures for Josephson Signal Processors and Computers held at the University of California at Berkeley on July 29, 1992.

REFERENCES

An Efficient Method for Finding dc Solutions for Josephson Circuits

Emerson S. Fang and Theodore Van Duzer, Fellow, IEEE

Abstract—A dc solution program is very useful for finding operating points and dc transfer characteristic curves of Josephson circuits in the superconducting state. In this paper, we will discuss the formulation of Josephson circuit equations in the dc state and propose a mixed-mode approach that combines the nonlinear solution method of source-stepping and time-domain method of numerical integration. Josephson circuit equations are often multivalued, which implies the existence of multiple solutions. When the paths taken by the independent sources are specified, only one of the many possible solutions can be physical. The mixed-mode algorithm follows the paths of the independent sources, detects ill-conditioned points, and converges to stable points on the characteristic curves of the simulated circuit. The algorithm has been implemented, and case studies are presented. The method and techniques presented are suitable for implementing dc analysis options in a general circuit simulator.

I. INTRODUCTION

DC ANALYSIS is an integral part of circuit simulation. Although any dc analysis problem can be treated as a single or multiple transient problem, the cost can be very high. For this reason, computer-aided design (CAD) programs, like the SPICE program [1], have dc analysis options such as dc operating point and dc transfer curve as essential features. The work presented in this article is a continuation of the effort on the Josephson circuit simulator (JSIM) [2]. The goal of JSIM is efficient and fast circuit simulation for Josephson circuit applications, especially for large circuits. JSIM has achieved an order of magnitude speed improvement over JSPICE2 [3] in simulation of medium-sized circuits. All existing Josephson circuit simulators, including JSIM, can only simulate transient circuit behavior; dc analysis is not an allowable option in these programs. The SPICE program is often the basic platform for Josephson circuit simulation programs [3]-[5], but unfortunately the dc analysis methods in SPICE cannot be adapted to Josephson circuits.

SPICE uses the Newton–Raphson iteration method to solve systems of nonlinear equations. Every iteration method requires an initial guess of the solution to start the iteration process. The guessed solution has to be close enough to the true solution for the iteration to converge. In semiconductor circuits, the main nonlinearity is the exponential relation of voltage to current. A typical example is the diode equation in which the diode current is an exponential function of voltage. The main difficulty encountered in solving equations of exponential functions is numerical overflow. Two methods, limiting and source stepping, have been used to overcome this problem. The simple limiting method used in SPICE restricts the voltage change across a diode-like device from one iteration to the next. The source-stepping method [6] is a more general approach to finding a starting point but has a greater computational cost than the limiting method.

In the source-stepping method, we want to find the unknown vector \( x \) such that the function \( F(x) = 0 \) is satisfied. The variable \( y \) is the source vector, the value of which is known. We can parametrize \( x \) and \( y \) by letting \( y = y(s) \), then \( x = x(s) \). We then discretize \( s \) and solve for \( x \) at each discrete point using an iteration method such as the Newton method. The initial guess of \( x \) can be extrapolated from previous points; the simplest extrapolation would be using the value of \( x \) at the last point. This parametrization permits solution of path-dependent problems, that is, the solution is dependent on the path taken by the source vector.

In Josephson superconductive circuits, the nonlinearity is nearly sinusoidal. The circuit equations are often multivalued, and therefore, the solution is path dependent. The source-stepping method is a natural approach but is not fail-safe. As pointed out above, the convergence of an iteration method depends on the difference between the initial guess and the true solution. Using the source-stepping method, the initial guess can be arbitrarily close to the true solution by using an arbitrarily small grid in \( s \), provided that the solution vector \( x \) with respect to the source vector \( y \) is continuous. In equations involving sinusoidal functions, this continuity requirement is often not satisfied, and source stepping may not give the correct solution beyond the discontinuity or ill-conditioned point. In addition, the source vector may enter a region where no dc solution exists (referred to as the voltage state in Josephson circuits); a dc solution method must be able to detect the crossing of such a boundary. In this paper, we discuss the Josephson circuit equation formulation for dc solutions, and present an efficient method for dc analysis, namely dc operating point and dc transfer curve calculations. Also, techniques will be presented to treat ill-conditioned points, and to detect the crossing into the voltage state.

II. THE SQUID THRESHOLD PROBLEM AND GENERAL dc ANALYSIS

The majority of the works on dc analysis of Josephson circuits have concentrated on finding threshold curves of multi-junction superconductive quantum interference devices
(SQUID's). The threshold curve is the locus of ill-conditioned points and voltage-state boundary points, which are extremum points. The ill-conditioned points are local maxima, and voltage-state boundary points are global maxima. Circuit equations are based on the Kirchhoff current law (KCL), and constraint equations are derived to determine the extremum points. Tsang and Van Duzer [7], Landman [8], and Peterson and Hamilton [9] use the Lagrangian multiplier method for determining the extremum points, while Schulz-DuBois and Wolf [10] uses the Gibbs free energy. To find the threshold curve, Tsang and Van Duzer, as well as Peterson and Hamilton, search in phase space to find solutions. Landman and Schulz-DuBois employ curve tracing techniques by first finding a starting point on one lobe of the threshold curve and then tracing out the rest of the lobe. The curve-tracing techniques is very similar to the source-stepping method; they are both based on the mathematical technique of parametrization and differentiation of a smooth continuous function. The starting points are found by searching in phase space and using prior knowledge of the SQUID under calculation.

The topic of SQUID behavior is important in Josephson circuits. The work mentioned above has provided adequate ways to understand multi-junction SQUID behavior. The SQUID threshold curve problem is not the primary concern of this paper. Determination of dc operating points and dc transfer curves requires us to calculate the values of branch currents and junction phases with a given set of source vectors for the circuit under simulation, which may be very large. This is a different problem from determination of SQUID thresholds. Furthermore, the techniques of searching in phase space for possible solutions cannot be applied to large circuits, because the computational cost typically increases exponentially with the dimension of the space to be searched, which is proportional to the number of junctions.

The dc state of a Josephson circuit is completely determined if all circuit branch currents and junction phases are known; therefore, a dc solution method for Josephson circuits must be able to solve the problems of ill-conditioned points and crossing into the voltage state. The method we present here can be adapted to find SQUID or other SQUID-like device threshold curves. A brief discussion on the adaptation will be given.

III. JOSEPH DEVICES IN THE DC STATE

The Josephson junction is modeled by the Josephson element in parallel with a capacitor and a nonlinear resistor as shown in Fig. 1. The Josephson element is described by the Josephson equations.

\[ I = I_c \sin \phi \]  \hspace{1cm} (3.1)
\[ \frac{d\phi}{dt} = \frac{2\pi}{\Phi_0} V \]  \hspace{1cm} (3.2)

where \( \phi \) is the element's phase, and \( \Phi_0 = 2.07 \times 10^{-15} \text{ Wb} \) is called the flux quantum. The quasi-static I-V curve of a typical tunnel junction is shown in Fig. 2. When the critical current \( I_c \) of a junction is exceeded, the voltage across the junction becomes nonzero; the time-average voltage is the intersection point of the load line with the I-V curve. When the junction is in the voltage state, there is no dc solution since the variables are oscillating. However, there can be a dc zero-voltage state solution if the supply current is less than the junction critical current. Furthermore, if the junction is shunted with an inductor to form a one-junction SQUID, the steady-state voltage across the junctions always will be zero. In dc analysis, the junction capacitance and nonlinear resistance are not included, since they affect only transient behavior.

IV. EQUATION FORMULATION USING NODAL ANALYSIS METHOD

In Josephson technology, there is a class of zero-voltage state logic circuits that consist solely of junctions and inductors. Knowing the dc operating point and dc transfer curve of such a circuit is essential in the design process. If we use nodal analysis and write down KCL equations at each node, we get, in general, a system of nonlinear equations of the form

\[ F(x) = y \]  \hspace{1cm} (4.1)

where \( x = [\cdots x_i, \cdots]^T \) and \( y = [\cdots y_i, \cdots]^T \). The variable \( x_i \) is the unknown variable at node \( i \), and \( y_i \) is the source variable at node \( i \). In order to use the Newton method to find the solution, (4.1) must be linearized to

\[ J(x)^k x^{k+1} = y - F(x^k) + J(x^k) x^k \]  \hspace{1cm} (4.2)
where $x^k$ is the value of $x$ at the $k$th iteration step, and $J(x^k)$, where $J_{ij} = \partial F_i / \partial x_j$, is the Jacobian of $F$ evaluated at the $k$th step. To find $x^{k+1}$, we only need to solve a system of simultaneous equations of the form

$$Ax^{k+1} = b$$

(4.3)

with $A = J(x^k)$ and $b = y - F(x^k) + J(x^k)x^k$.

For our equation formulation, we assume there are only four types of devices, junctions, inductors (coupled or uncoupled), independent current sources, and flux-controlled current sources. If we let $e_i$ be the node voltage referenced to ground at node $i$ and define $x_i = \int e_i \, dt / \Phi_0$, which makes $x_i$ a normalized flux variable, we can write down the entries to (4.1) and (4.3) by inspection using the nodal analysis templates given in the Appendix as a reference. The inspection method is identical to that discussed in standard CAD textbooks (see, for example, Chua and Lin [11]), with resistors replaced by inductors and voltage replaced by normalized flux. Since there are no mutually coupled resistors, the case of mutually coupled inductors must be treated here.

Shown in Fig. 3(a) is a pair of mutually coupled inductors with primary inductance $L_p$, secondary inductance $L_s$, and mutual inductance $M$. The relation between flux and current is given by

$$\Phi_0 \delta x_p = L_p \delta i_p + M \delta i_s$$

(4.4)

$$\Phi_0 \delta x_s = M i_p + L_s \delta i_s$$

(4.5)

To write the current variables in terms of the flux variables, (4.4) and (4.5) must be inverted. This is not easy if we want to be able to write down the KCL equations by inspection, especially if the mutual coupling involves more than two inductors. Furthermore, for an ideally coupled pair (i.e., $L_pL_s = M^2$), the relation is singular and not invertible. Instead, the coupled pair is replaced by an equivalent two-port network of uncoupled inductors and flux-controlled current sources, as shown in Fig. 3(b). Derivation of the parameter values is straightforward, and they are given by

$$L_{pb} = \frac{L_p}{1 + \alpha_1}, \quad L_{pb} = \frac{\alpha_1 L_p}{1 + \alpha_1}$$

$$G = \frac{(1 + \alpha_1)(1 + \alpha_2)}{\alpha_1 \alpha_2} M \frac{\Phi_0}{L_p L_s}$$

$$\alpha L_{sb} = \frac{L_s}{1 + \alpha_2}, \quad L_{sb} = \frac{\alpha_2 L_s}{1 + \alpha_2}$$

The values of $\alpha_1$ and $\alpha_2$ can be chosen arbitrarily; the simplest choice is $\alpha_1 = \alpha_2 = 1$. Now the templates for the uncoupled inductor and flux-controlled current source can be used to write down entries to the KCL equations. The extension to coupled triplets can be done easily.

V. THE ONE-JUNCTION SQUID PROBLEM

The circuit shown in Fig. 4 is a one-junction SQUID. Using the formulation described earlier, the nodal equation can be written as

$$I_c \sin 2 \pi x_1 + \frac{\Phi_0}{L} x_1 = I_s$$

(5.1)

Fig. 5 is a plot of the one-junction SQUID characteristic curve with $I_c L / \Phi_0 = 1$. Starting at the origin $O$, we raise $I_s$ past $I_A$ to $I_0$; the correct operating point should be $S_1$, even though points $S_2$ and $S_3$ also satisfy (5.1). The point $S_2$ is not a possible operating point because it is on the unstable CD section of the characteristic curve. To get to $S_3$, $I_s$ must be raised past $C$ and then lowered back to $I_0$. If the source-stepping method is used in conjunction with Newton iteration, we can expect to encounter two difficulties. The first one is nonconvergence; that is, the stepping cannot pass point $A$, and one possible scenario is illustrated in Fig. 6, which shows a sequence of iterations. Point $a$ on the curve is the initial guess, the iterated solution lands on point $d$ after three iterations. The iterations oscillate around the hump. The iterated solution cannot pass much beyond the hump if the iteration is carried on further, while the true solution lies far away. The second problem is convergence to a wrong solution. The slope of the tangent line near $A$ in Fig. 5 is almost horizontal, so the intercept of the tangent with the $I_s = I_0$ line can occur at almost any place along the line regardless of the distance between $I_0$ and $I_s$. If the method does converge to a solution, it could be any one of the three points. This is a worse problem than nonconvergence because, with the source-stepping method, we cannot easily distinguish the correct solution from incorrect ones. The
FANG AND VAN DUZER: AN EFFICIENT METHOD

129

some control over effective time constant, which we will use to our advantage later.

VI. THE PSEUDO-TIME-DOMAIN METHOD

We observe from the one-junction SQUID example that traversing past an ill-conditioned point requires a pseudo-time-domain analysis. We call it a pseudo-time-domain method because the relation of flux and phase to time is irrelevant. The method can be extended to the general case of (5.1) by introducing a resistor to ground at every node. The general KCL equation becomes

$$F(x(t)) + \frac{dx}{dt} = y(t). \quad (6.1)$$

where $G$ is a normalized diagonal conductance matrix, $\text{diag} \{G\} = \{\cdots \Phi_0/R_i, \cdots \}$ and $R_i$ is the resistance from node $i$ to ground. Any A-stable numerical integration method, such as backward Euler or trapezoidal method, can be used to solve the equation. Assuming the backward Euler method is used, (6.1) at time step $n + 1$ with a time increment of $h_{n+1}$ becomes

$$F(x[n + 1]) + \frac{x[n + 1] - x[n]}{h_{n+1}} = y[n + 1]. \quad (6.2)$$

The effective incremental time constants of (6.1) are the eigenvalues of the matrix $GJ^{-1}(x)$, which are time varying. The differential equation is, in general, stiff, and therefore, an implicit integration method, such as the Backward Euler, is required [12]. The dc solution is assumed to have been reached when the $x_c$ is less than some predefined small value.

The IBM ASTAP program [13] also uses a time-domain approach, called pseudo-dc scheme, for dc analysis of semiconductor circuits. To find the dc solution, ASTAP inserts an inductor in series with every voltage source and a capacitor in parallel with every current source, and a transient analysis is performed similar to that described above. It is called a pseudo-dc method because the numerical integration error is not controlled and the time-step is taken as large as possible, consistent with the convergence of the Newton iteration. Convergence to a dc solution is guaranteed if the integration method is stable.

One difference between our time-domain method and that of ASTAP is that the matrix $G$, representing the additional circuit elements introduced, is always diagonal. This will never increase the size of nor significantly decrease the sparsity of the matrix $A$, where $A = J + G/h_{n+1}$ and $J$ is the Jacobian of $F$. Rather, $A$ can be made diagonally dominant and hence improve numerical stability when solving $Ax = b$. Another difference is that the integration error has to be controlled to a certain degree in our situation because, while the ASTAP pseudo-dc method guarantees convergence to a solution, it may not be the correct solution.

In the case of the one-junction SQUID example discussed above, there are three possible solutions. Only one of them is the correct one, whereas any one of the three may be reached using the pseudo-dc scheme. In transient and frequency analysis, this is commonly referred to as numerical aliasing. In
the transient simulator JSIM [2], this problem was solved by limiting the phase change of each Josephson junction from one step to the next. For a junction between node \( i + \) and \( i - \), the phase is simply given by \( \phi_i = 2 \pi [x_{i+} - x_{i-}] \), so if \( \phi_i [n + 1] - \phi_i [n] \) is more than \( \pi \), the transient time step size has to be reduced. The computational cost of the pseudo-time-domain method is, in general, higher than for the source-stepping technique. If this method were used to find \( N \) points on a dc transfer curve, \( N \) transient analyses must be performed, and the cost may be very high for large \( N \).

VII. THE MIXED-MODE METHOD

We describe here a cost-minimizing procedure that combines the source-stepping and transient analysis methods. It was observed above that the source-stepping method works well provided that the solution \( x \) is a continuous function of the source \( y \). There is no reason to use a transient analysis except near an ill-conditioned point. The mixed-mode method uses source-stepping until a possibly ill-conditioned point is detected; then a transient analysis is performed by adding resistance to ground at each node. Upon completion of the time-domain calculation, the program reverts back to source stepping until the possibly ill-conditioned point is detected. Suppose after \( j \) steps of the source stepping process, we get \( x = x^j \) for \( y = y^j \). At the \( (j + 1) \)th step, (4.1) is solved by iteration using (4.2); both are stated here for convenience:

\[
F(x^{j+1}) = y^{j+1} \tag{7.1}
\]

\[
J(x^{j+1,k}) x^{j+1,k-1} = y^{j+1} - F(x^{j+1,k}) + J(x^{j+1,k}) x^{j+1,k}. \tag{7.2}
\]

During the iteration process, we may find either of two occurrences that indicate possibly ill-conditioned points. In one case, after solving (6.2) some prescribed number of times, the iteration process still cannot be terminated. This would indicate possible nonconvergence, which usually occurs near ill-conditioned points. Another case is where \( |x^{j+1,k+1} - x^{j+1,k}| \), which is the change from one iteration step to the next, is large. When either situation exists, we switch to solving

\[
F(x^{j+1}(t)) + Gx^{j+1}(t) = y^{j+1} \tag{7.3}
\]

with initial condition \( x^{j+1}(0) = x^j \); hence \( \dot{x}^{j+1}(0) = G^{-1}[y^{j+1} - y^j] \). Assuming that Backward Euler integration and Newton iteration are used to solve (7.3), we get

\[
A x^{j+1,k+1}[n + 1] = b \tag{7.4}
\]

where

\[
A = J(x^{j+1,k}[n + 1]) + \frac{1}{h_{n+1}} G
\]

\[
b = y^{j+1} + \frac{1}{h_{n+1}} G x^{j+1,k}[n] - F(x^{j+1,k}[n + 1]) + J(x^{j+1,k}[n + 1]) x^{j+1,k}
\]

and \( x[n] = x(t_n) \). If we pick \( h_{n+1} \) small enough, the matrix \( A \) is guaranteed to be nonsingular, and can even be diagonal dominant. The transient analysis is stopped when \( \dot{x}^{j+1} \) has decayed to less than \( \epsilon \), where \( |y^{j+1} - G \dot{x}| \) is within the error tolerance.

VIII. ACCELERATING THE DECAY OF \( \dot{x} \)

It has been shown that the source-stepping method may fail at points where the solution \( x \) is discontinuous with respect to the source \( y \). The mixed-mode method then uses a time-domain calculation to side-step the ill-conditioned point. It may take many time steps for the transient signals to decay to within error tolerance if the circuit’s time constants cover a wide range of values. The time constants are determined by \( L_i(t)/R_i \) where \( L_i \) is the effective incremental inductance seen by the damping resistor \( R_i \) introduced at node \( i \). Since we have the freedom to choose \( R_i \), we can make \( R_i \) time varying, so that \( |L_i(t)|/R_i(t) \) is the same at each node. The absolute sign is needed, because the effective incremental inductance can be negative with active devices in the circuit. It is nontrivial to find \( L_i(t) \), but an estimation can be made with minimum additional cost in computation.

We can use a simple model of the decay processes for node \( i \) at time \( t_0 \)

\[
g_i(t_0) \dot{x}_i(t_0) = f(x_i(t_0)) \tag{8.1}
\]

where \( g_i = \Phi_0/R_i \). The model assumes that the damping conductance \( g_i \) sees a dynamic inductance, which is a good approximation for a Josephson circuit. It is an approximation because the damping conductances at other nodes are neglected. Keeping \( g_i \) constant from \( t = t_0 \) to \( t = t_0 + \Delta t \), we have

\[
g_i(t_0) \dot{x}_i(t_0 + \Delta t) = f(x_i(t_0 + \Delta t)). \tag{8.2}
\]

Subtracting (8.1) from (8.2),

\[
g_i(t_0) [\dot{x}_i(t_0 + \Delta t) - \dot{x}_i(t_0)] = \frac{df}{dx}(x_i(t_0))[x_i(t_0 + \Delta t) - x_i(t_0)]. \tag{8.3}
\]

Replacing \( t_0 \) with \( t_n \) and \( t_0 + \Delta t \) with \( t_{n+1} \), then the incremental inductance seen at node \( i \) during this period can be estimated by

\[
\frac{\Phi_0}{L_i[n + 1]} = \frac{df}{dx}(x_i(t_0)) = g_i[n] \frac{x_i[n + 1] - \dot{x}_i[n]}{x_i[n + 1] - x_i[n]} \tag{8.4}
\]

The estimated time constant on the interval \( t_n \) to \( t_{n+1} \) is \( \tau_i[n + 1] \), where

\[
\tau_i[n + 1] = g_i[n] L_i[n + 1] / \Phi_0. \tag{8.5}
\]

To get an effective desired time constant of \( \tau_d \), we simply set \( g_i[n + 1] = \Phi_0 / \tau_d \) and the new \( g_i \) is used for the calculation of \( x_i[n + 2] \). Equation (7.4) is then modified by replacing \( G \) with \( G[n] \). The additional arithmetic operations are insignificant compared to the number of operations required to solve (7.4).
IX. DETECTION OF DEVIATION FROM ZERO-VOLTAGE STATE

A criterion is needed to indicate when a computation is leaving the regime of source values for which dc solutions exist. A simple case is when a single junction is driven by a current exceeding the critical current $I_c$; there cannot be a dc solution. Therefore, the Newton iteration in source stepping, after the source has been stepped above $I_c$, will not converge, and the transient mode is entered. The criterion used in this procedure is to monitor the variable $\dot{x}$. If it does not converge to less than a preselected value $\varepsilon$ within some multiple $N$ of the largest of the average time constants of the circuit, it is assumed that source vector $y$ has left the regime of dc solutions. Thus in the transient mode, we can keep track of the average time constant $\bar{\tau}_i = [\sum_j \tau_i (\eta_j)]/T$ and let $\tau_{\text{max}} = \max_{i} \bar{\tau}_i$. When the integration time exceeds $N\tau_{\text{max}}$, if $\dot{x}$ is still much more than $\varepsilon$, we conclude that the circuit is in voltage state. The constant $N$ is picked on the basis of the desired resolution.

X. ADAPTING TO THRESHOLD CURVE CALCULATION

As described at the beginning, the threshold curve is the locus of ill-conditioned points and voltage-state boundary points, representing local and global maximum points, respectively. Ill-conditioned and voltage-state boundary points are detected when the source-stepping cannot proceed further. An adaptive technique of picking source step-sizes should be used; the increment and decrement steps taken by the sources should be reduced when a possible ill-conditioned point is approached. The accuracy of the calculation will be determined by the smallest source step, which should be set by the user. The sources should be stepped in a systematic way to stay near the threshold for efficient tracing of the threshold curve. The mixed-mode method has an advantage over the Schulz-DuBois continuation method [10] in that it can cross ill-conditioned points. This allows the operating point to move from one threshold lobe to a neighboring lobe, and makes a systematic search for all lobes much easier. A difficulty in applying the threshold calculation technique to a large circuit is the determination of when and if the entire threshold curve is found, but this is true for all threshold-curve calculation methods.

XI. CASE STUDIES

The mixed-mode method with adaptive resistive damping has been implemented. We present two simple cases here, so that the reader can easily implement the methods discussed and verify the result. The first case is the one-junction SQUID circuit discussed in previous sections, the computer calculation of its characteristic curve is shown in Fig. 7. The curve is traced out with the source first raised from 0 to 1.5 mA, then lowered past 0 to -0.4 mA, and finally brought back to zero. The computer calculated curve matches the characteristic curve shown in Fig. 5 without the unstable part A-B. For the one-junction SQUID, the unstable part, which has a negative slope, can be determined by inspection of the curve, but this is not true for many other cases, one of which is presented next. The quantum flux parametron (QFP) circuit proposed by Harada et al. [14] is shown in Fig. 8. Half of a full period of the characteristic curve of this QFP, calculated using our method, is shown in Fig. 9. The other half of the period is just a symmetric reflection about the unstable part of the characteristic curve. Our mixed-mode method finds only the stable solutions, which are the physically possible dc operating points.

XII. SUMMARY

We have discussed a formulation for obtaining dc solutions of Josephson circuits. An efficient method using the combination of source stepping and transient calculation with resistive damping is presented. Adaptive resistive damping is used to equalize the time constant at each node and reduce the computation cost. The calculation of time constants also provides a way to differentiate between the zero and nonzero voltage states. An adaptation of the method to SQUID threshold curve calculation is also discussed. The techniques are suitable and presented in sufficient detail, so that a reader may implement it as part of a general simulation program such as JSIM or SPICE.

APPENDIX

DEVICE TEMPLATES

In our formulation, current is always assumed to flow from the positive node to the negative node, and the direction of leaving a node is considered the positive current direction for that node.

Template for Uncoupled Inductor: For an uncoupled inductor with inductance $L$ between node $i+$ and node $i-$, the contribution of current at the two nodes is given by

$$\text{left side} \begin{bmatrix} \Phi_0/L & -\Phi_0/L \\ -\Phi_0/L & \Phi_0/L \end{bmatrix} \begin{bmatrix} x_{i+} \\ x_{i-} \end{bmatrix}.$$
Fig. 8. A QFP design by Harada et al. [14].

Template for Independent Current Source: For an independent current source with value \( I_s \) between node \( i + \) and \( i - \):

left side:
\[
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & \epsilon
\end{bmatrix}
\]
right side:
\[
\begin{bmatrix}
-I_s \\
0 \\
0 \\
0
\end{bmatrix}
\]

Template for Flux-Controlled Current Source: For a flux-controlled current source with control nodes at \( i^c + \) and \( i^c - \), controlled source between node \( i + \) and \( i - \), and a flux transconductance \( G \):

left side:
\[
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & \epsilon \\
-G & 0 & 0 & \epsilon
\end{bmatrix}
\]

Template for Josephson Junction: For a Josephson junction between nodes \( i + \) and \( i - \) with critical current \( I_c \):

left side:
\[
\begin{bmatrix}
I_c \sin 2\pi \delta x^k \\
-I_c \sin 2\pi \delta x^k
\end{bmatrix}
\]

where the definition \( \delta x^k = x^k_+ - x^k_- \) has been used. The template for the linearized equation in the form of \( A x^{k+1} = b \) is:

left side:
\[
\begin{bmatrix}
2\pi I_c \cos 2\pi \delta x^k & 2\pi I_c \cos 2\pi \delta x^k \\
-2\pi I_c \cos 2\pi \delta x^k & -2\pi I_c \cos 2\pi \delta x^k
\end{bmatrix}
\]

right side:
\[
\begin{bmatrix}
-I_c \sin 2\pi \delta x^k + 2\pi \delta x^k \epsilon \cos 2\pi \delta x^k \\
I_c \sin 2\pi \delta x^k - 2\pi \delta x^k \epsilon \cos 2\pi \delta x^k
\end{bmatrix}
\]

Example: Using the templates, we can write down the KCL equations for the circuit shown in Fig. 10 by inspection.

At node 1:
\[
\frac{\Phi_0}{L} (x_1 - x_2) + I_{cl} \sin 2\pi x_1 = -I_s
\]

At node 2:
\[
\frac{\Phi_0}{L} (x_2 - x_1) + I_{cl} \sin 2\pi x_2 = 0
\]

The entries to matrices \( A \) and \( b \) of (4.3) are
\[
A = \begin{bmatrix}
\frac{\Phi_0}{L} + 2\pi I_{cl} \cos 2\pi x^k & -\frac{\Phi_0}{L} \\
-\frac{\Phi_0}{L} & \frac{\Phi_0}{L} + 2\pi I_{cl} \cos 2\pi x^k
\end{bmatrix}
\]

and
\[
b = \begin{bmatrix}
-I_s - I_{cl} \sin 2\pi x^k + 2\pi I_{cl} x^k \epsilon \cos 2\pi x^k \\
-I_{cl} \sin 2\pi x^k - 2\pi I_{cl} x^k \epsilon \cos 2\pi x^k
\end{bmatrix}
\]

REFERENCES

[7] W. T. Tsang and T. Van Duzer, "DC analysis of parallel arrays of


INDEX: An Inductance Extractor for Superconducting Circuits

P. H. Xiao, E. Charbon, A. Sangiovanni-Vincentelli, T. Van Duzer and S. R. Whiteley

Department of Electrical Engineering and Computer Sciences
and the Electronics Research Laboratory
University of California
Berkeley Ca 94720

Abstract--With the continuous increase of complexity of superconducting integrated circuits, the demand for computer-aided-design tools is rising. Circuit extraction from layout to simulation is an important phase of an IC design. It verifies the circuit design by identifying circuit devices, checking connectivity, and calculating design parameters. This paper presents an extractor INDEX, designed to extract superconducting circuits from layout. The inductances of the superconducting lines are calculated by a set of analytical models. These self- and mutual-inductance models are generated from a series of numerical simulations and a linear programming curve-fitting. INDEX is based on the MAGIC layout system.

I. INTRODUCTION

A circuit extractor has two phases. The first is called netlist extraction in which devices like transistors and Josephson junctions are identified and connected regions (called nets) are determined. The second phase is called parameter extraction which involves calculating electrical parameters of devices and nets, such as the width and the length ratio of a transistor gate, Josephson junction critical currents and capacitance, inductance, and resistance of the net. The uniqueness in the superconducting IC is that the interconnection does not have resistance and it is modeled as an inductor in the circuit. The inductance of the interconnection affects not only the circuit's normal operation but also its operating margins. Thus, it is important that the superconducting extractor accurately extract the interconnection inductance.

There are three ways to calculate inductance, resistance and capacitance in an extractor: numerical calculation by solving Laplace's equation [1], lumped-model approximation [2] and analytical modeling [3]. The numerical solution is the most accurate one, but it is impractical for a large circuit because of its high computational cost. The lumped approximation method is the fastest one but it is the least accurate because it neglects the geometrical detail and fringing effects. Analytical modeling provides a good trade-off between speed and accuracy and this is the method we adopted in INDEX. In this method, nets are first decomposed into simple rectangles and according to their positions, different inductance models are applied.

In this paper, we will first present our method of generating analytical models for various inductor configurations based on numerical simulation. Then we discuss issues involved in the inductance extraction: current direction identification, self- and mutual-inductance calculation, network simplification, and extraction results and speed. Finally, we point out possible future improvements.

II. INDUCTANCE MODELING

A. Analytical Modeling

There are two practical approaches to build models of inductance in layout. One is to construct analytical models for inductors in different geometric configurations from experimental measurements or numerical calculations. The other is to form a look-up table. In the look-up table approach, the memory requirements grow very rapidly with the increase of the number of parameters describing a given inductor configuration and of the range of interest for each parameter, even though sophisticated interpolation methods can be used to reduce the memory storage space. A more attractive way is to generate analytical models. The electromagnetic world is continuous so inductance values should vary smoothly with the change of inductor geometries; hence, a compact analytical formula can be fitted to a wide range of numerical or experimental data. Furthermore, analytical models are easy to interpret. Circuit designers can gain insight into the change of model value with the change of circuit parameters.

B. Numerical Simulation

We base our inductance models on numerical simulation because it is convenient to generate a large range of data for various geometrical configurations and it is considerably faster than doing experiments. There are a few different numerical algorithms available to calculate the inductance of superconducting lines. A fairly efficient approach is to use the Lagrangian variational method [4]. Even though it is only based on a two-dimensional numerical algorithm, its results are found to be rather close to the experimental data [5] and it is reasonably fast because it exports the inductance matrix without calculating the current distribution. There are old FORTRAN programs based on this algorithm, but in order to make the simulation compatible with the inductor model generation system, we implemented it in the C language and improved its memory allocation.
C. Model Fitting and Model Generation

From the above simulation results, we generate inductance values for a set of design parameters. We considered the following configurations: (1) single line over the ground plane, (2) two coupled lines on the same level, (3) two coupled lines on different levels. In case (3), depending on their relative positions, they are further divided into subconfigurations. (See Fig. 1).

In each configuration, the form of the analytical model is guided by physical considerations. For example, for a single line over a ground plane, the inductance gets smaller with the increase of its line width. Let us assume that the calculated values are \( s(D_k) \) for a set of parameters \( D_k \) \((k = 1 \ldots M)\), where \( M \) is the number of the fitting data points. The fitting model is

\[
m(D) = \sum_{i=1}^{N} a_i f_i(D_k)
\]

where the \( a_i \) are the fitting coefficients, \( f_i \) is a certain action of parameters \( D_k \) and \( N \) is the number of coefficients. So the problem of optimizing the model formula can be stated as minimizing the maximum relative error \( e_{\text{max}} \), which is defined as:

\[
|m(D) - s(D)| / s(D) \leq e_{\text{max}}
\]

Since \( m(D) \) is a linear function of coefficients \( \{a_i\} \), this can be considered as a linear programming problem with variables \( \{a_i\} \) and \( e_{\text{max}} \) to minimize \( e_{\text{max}} \). A standard linear programming package based on simplex method is called [6].

We combined all the above programs into a program called INDMOD to automatically generate various inductance models. INDMOD reads in a process description file which specifies the process parameters, such as the number of metal layers, their thicknesses and the separations of the layers, and outputs the self-inductance and mutual-inductance model for each layer. Various models for the Berkeley niobium process have been developed but are not included here due to limited space. All the models are fitted within 10% of the simulation value and many are more accurate than this.

III. SUPERCONDUCTING CIRCUIT EXTRACTION

A. MAGIC layout system

MAGIC is an appropriate layout tool in which to implement our inductance extraction because it is a widely used system which has interfaces with other intermediate layout formats, such as cif and calma. MAGIC's corner-stitch data structure also makes the extraction much simpler. In MAGIC, polygons are represented by rectangles, called tiles. Each tile has four pointers to four neighbors. (See Fig. 2) This makes neighbor-related operations easy to implement. For example, it supports continuous design-rule checking and incremental and hierarchical extraction [2].

MAGIC's extractor can extract transistor properties, and the capacitance and resistance of the nets. These are usually enough to cover the needs of semiconductor circuit design. But the original extractor only extracts the lumped capacitance and resistance of a net. This is not sufficient for a superconducting circuit netlist. An improvement was included in the newest release of MAGIC which can extract a detailed RC network of a net [7], but it requires a preprocessing of the original extractor. In our approach, we implemented INDEX without requiring a lumped model pre-extraction. The following few sections highlight the main procedures of INDEX.

B. Circuit Flattening and Resolving Contacts and Junctions

Before the circuit extraction starts, some preprocessing is done. In the current version of INDEX, a hierarchical circuit is flattened out. All the subcells in a parent cell are pushed up to their parent cell level recursively and the overlaps are merged. This avoids the complexity of calculating mutual inductance and coupling among a parent cell and its subcells. The flattened cell is deleted at the end of the extraction.

---

Fig. 1 Inductor configurations analyzed. The shaded inductors are the ground planes.

Fig. 2 The corner-stitch data structure used in MAGIC. Each tile has four pointers pointing to its neighbor.
Next, contacts and Josephson junctions are identified, because a layer contact or a Josephson junction to the layer is a current source or drain for that layer and can change the current flow dramatically. Each tile is marked as having a point with the information of the position of the contact or junction if there is a contact or junction on it; we call that point a break point. Then the tiles representing contacts are removed and complex superconducting layer tiles around contacts are merged.

C. Rectangle Decomposition

Before we can call any model formula to calculate an inductance value, the current direction and the superconducting line width and length information are needed. The basic algorithm breaks a complex net into rectangles and calculates inductance individually for each rectangle, and then adds them up.

The MAGIC system represents objects in maximally horizontally merged tiles. (See Fig. 3a). To find out the current direction, tiles have to be rearranged. First, narrow horizontal tiles are merged into their neighbors. (Fig. 3b). The second sorting cuts every tile which has a neighbor on its longer side. We get to a situation in Fig. 3c, where current direction in a tile can be best determined by the positions of its neighbors and the position of the break points it has:

1. If a tile has no neighbor or break point, then it is a floating node. Nothing is done to it.

2. If a tile has one neighbor or break point, it is also a floating node. But in some cases, it is an external terminal, so it is processed assuming that the current is coming from the side farthest away from the break point or the neighbor position.

3. If a tile has two neighbors, two break points or one of each, then the current must be flowing between them. Depending on their relative positions, the current direction is categorized as horizontal, vertical, or mixed. For example, if one neighbor is on the left side and one is on the right side, the direction is horizontal. If one is on the above and one is on the left, the direction is mixed. For a mixed inductor, we assume its inductance is a constant fraction of the inductance if it were vertical or horizontal. This constant is chosen as 0.7 based on the experimental experience.

4. If a tile has more than two neighbors or break points, or their combinations, the inductance problem is, in fact, three-dimensional. The model we developed from two-dimensional simulation will be inaccurate. So crude approximations are made here. The extraction is more accurate if those complex tiles only introduce a small fraction of the total inductance. We approximate by first assuming a current direction based on the position of its break points and neighbors. Then their positions are either sorted horizontally or vertically and the inductance rectangles are assigned to a pair of close break points.

Because we can not determine a tile's inductance without finding its mutual coupling with its neighbors, in this pass each inductor is only identified and is associated with a rectangle area and the self-inductance value is postponed to be calculated in the mutual coupling module.

D. Self- and Mutual-Inductance Calculation and Network Simplification.

Tiles with the information of their current flows, dimensions, and positions of the inductors are now checked to see if there is any mutual coupling among them. Different superconducting layers as well as the same layer are searched, but if the separation of two inductors exceeds a certain limit, the coupling is neglected. Coupling is also neglected if two inductors are perpendicular to each other.

After this mutual coupling search, all the inductors or parts of them are categorized into different coupling models as in Fig. 1. And the effects are added up if one inductor has two types of coupling.

The inductors we get by this stage form a very complex circuit network due to the above-described rectangle decomposition procedure. The network is greatly simplified in this module. All the serial and parallel inductors are merged and Δ-shaped networks are transformed into Y-shaped networks.

IV. CASE TESTS AND RESULTS

INDEX has been tested on a number of cases with good results. Like the original extraction tool in MAGIC, INDEX
first produces an extracted file from the layout containing information about the Josephson junctions, inductors, and resistors. After this, a tool called ext2jsim is used to transform the circuit from the extraction format to the JSIM format. After the user puts in the junction model and the external current or voltage sources, JSIM can be run to check the design. In order to help the user to interpret the extracted circuit, the extracted inductors with their names are shown on the layout screen.

A two-junction SQUID (Fig. 4) is used as one test case. Its JSIM output deck is shown in Fig. 5, and its schematic diagram is shown in Fig. 6. We can see that it not only extracts the loop inductances of a SQUID, but also parasitics. The hand calculation value of the total loop inductance, control line inductance and mutual inductance are 7.0 pH, 8.0 pH and 4.2 pH respectively, which are very close to the extracted value 6.7 pH, 7.9 pH and 3.9 pH.

INDEX was also tested on a part of a 5-to-32 bit serial decoder [8]. It extracted 38 junctions, 2507 inductors (before network simplification), 345 inductors (after network simplification) and 81 resistors in 5 seconds. A large portion of the time actually is spent on illustrating inductors on the layout screen. This shows that INDEX is sufficiently fast for the current level of superconducting circuit design.

V. CONCLUSION

In this paper, we present a superconducting circuit extraction tool based on the MAGIC database. It can extract a superconducting netlist along with the areas of Josephson junctions, self- and mutual-inductances, and resistances. The calculation of the extracted inductances is based on the analytical model developed through the numerical simulation.

A number of improvements can be made on INDEX. First, to accurately calculate parasitic inductances, a 3D inductance simulation tool is needed. Second, a hierarchical extraction can be incorporated into INDEX to speed up the extraction and keep the modularity of a large hierarchical circuit. Last, to help the designer interpret the extracted circuit, an automatic schematic generation tool or a netlist comparator is also desired.

VI. REFERENCE

## DISTRIBUTION LIST

<table>
<thead>
<tr>
<th>Addresses</th>
<th>Number of Copies</th>
</tr>
</thead>
<tbody>
<tr>
<td>ZACHARY D. WHITE</td>
<td>10</td>
</tr>
<tr>
<td>ROME LAB/ERAA</td>
<td></td>
</tr>
<tr>
<td>31 GRENIER STREET HANSOM AFB MA 01731-3010</td>
<td></td>
</tr>
<tr>
<td>UNIVERSITY OF CALIFORNIA AT BERKELEY</td>
<td>5</td>
</tr>
<tr>
<td>DEPT OF ELECTRICAL ENGINEERING AND</td>
<td></td>
</tr>
<tr>
<td>COMPUTER SCIENCE BERKELEY CA 94720</td>
<td></td>
</tr>
<tr>
<td>RL/SUL</td>
<td>1</td>
</tr>
<tr>
<td>TECHNICAL LIBRARY</td>
<td></td>
</tr>
<tr>
<td>26 ELECTRONIC PKY GRIFFISS AFB NY 13441-6514</td>
<td></td>
</tr>
<tr>
<td>ADMINISTRATOR</td>
<td>2</td>
</tr>
<tr>
<td>DEFENSE TECHNICAL INFO CENTER</td>
<td></td>
</tr>
<tr>
<td>DTIC-FOAC</td>
<td></td>
</tr>
<tr>
<td>CAMERON STATION BUILDING 5 ALEXANDRIA VA 22304-6145</td>
<td></td>
</tr>
<tr>
<td>BALLISTIC MISSILE DEFENSE ORGANIZATION</td>
<td>2</td>
</tr>
<tr>
<td>7100 DEFENSE PENTAGON WASH DC 20301-7100</td>
<td></td>
</tr>
<tr>
<td>NAVAL WARFARE ASSESSMENT CENTER</td>
<td>1</td>
</tr>
<tr>
<td>GIDEP OPERATIONS CENTER/ CODE QA-50</td>
<td></td>
</tr>
<tr>
<td>ATTN: E RICHARDS CORONA CA 91718-5000</td>
<td></td>
</tr>
<tr>
<td>WEAPONS LABORATORY/NTAAB</td>
<td>1</td>
</tr>
<tr>
<td>ATTN: DR. CARL E. BAUM KIRTLAND AFB NM 87117-6008</td>
<td></td>
</tr>
<tr>
<td>ASC/ENEMS</td>
<td>1</td>
</tr>
<tr>
<td>WRIGHT-PATTERSON AFB OH 45433-6503</td>
<td></td>
</tr>
</tbody>
</table>

DL-1
<table>
<thead>
<tr>
<th>Department/Agency</th>
<th>Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>CDR, U.S. ARMY MISSILE COMMAND</td>
<td>REDSTONE SCIENTIFIC INFO CENTER</td>
</tr>
<tr>
<td></td>
<td>ANSMI-RO-CS-R/ILL DOCUMENTS</td>
</tr>
<tr>
<td></td>
<td>REDSTONE ARSENAL AL 35898-5241</td>
</tr>
<tr>
<td>ADVISORY GROUP ON ELECTRON DEVICES</td>
<td>ATTN: DOCUMENTS</td>
</tr>
<tr>
<td></td>
<td>2011 CRYSTAL DRIVE, SUITE 307</td>
</tr>
<tr>
<td></td>
<td>ARLINGTON VA 22202</td>
</tr>
<tr>
<td>LOS ALAMOS NATIONAL LABORATORY</td>
<td>REPORT LIBRARY</td>
</tr>
<tr>
<td></td>
<td>MS 5000</td>
</tr>
<tr>
<td></td>
<td>LOS ALAMOS NM 87544</td>
</tr>
<tr>
<td>COMMANDER/USAISC</td>
<td>ATTN: ASOP-00-TL</td>
</tr>
<tr>
<td></td>
<td>BLDG 61801</td>
</tr>
<tr>
<td></td>
<td>FT HUACHUCA AZ 85613-5000</td>
</tr>
<tr>
<td>1839 EIG/EIX</td>
<td>KEESSLER AFB MS 39534-6348</td>
</tr>
<tr>
<td>AIR WEATHER SERVICE TECHNICAL LIB</td>
<td>FL 4414</td>
</tr>
<tr>
<td></td>
<td>SCOTT AFB IL 62225-5458</td>
</tr>
<tr>
<td>AFIWC/MSO</td>
<td>102 HALL BLVD STE 315</td>
</tr>
<tr>
<td></td>
<td>SAN ANTONIO TX 78243-7016</td>
</tr>
<tr>
<td>DIRECTOR NSA/CSS</td>
<td>9800 SAVAGE ROAD</td>
</tr>
<tr>
<td></td>
<td>FORT MEADE MD 21055-6000</td>
</tr>
<tr>
<td>NSA</td>
<td>E323/HC</td>
</tr>
<tr>
<td></td>
<td>SAB2 ROOM 22</td>
</tr>
<tr>
<td></td>
<td>FORT MEADE MD 21055-6000</td>
</tr>
</tbody>
</table>

DL-3
DR. ROBIN HARVEY
HUGHES RESEARCH LABORATORY
3011 S. MALIBU CANYON RD.
MALIBU, CA 90265

DR. FERNAND O. BEDARD
NSA ATTN. R53
9800 SAVAGE RD.
FORT GEORGE G. MEADE, VA 20755-6000

DEPARTMENT OF DEFENSE
SDIO/TNI
WASHINGTON, DC 20301-7100

DR. BRUCE NURDOCK
SUPERCONDUCTIVE SYSTEMS
TEKTRONICS LABORATORIES
P.O. BOX 500, MAIL STA. 50-324
BEAVERTON, OR 97077

DR. JAMES W. MINK, DIRECTOR
ELECTRONICS DIVISION
U.S ARMY RESEARCH OFFICE
P.O. BOX 12211
RESEARCH TRIANGLE PARK, NC 27709

PROF. GABRIEL REBEIZ
ELECT. ENG. AND COMPUTER SCIENCE
DEPARTMENT
UNIVERSITY OF MICHIGAN
ANN ARBOR, MI 48102-2122

PROF. MICHAEL WENGLER
DEPARTMENT OF ELECTRICAL ENG.
UNIVERSITY OF ROCHESTER
ROCHESTER, NY 14627

DR. RICHARD WITHERS
CONDUCTUS, INC.
969 WEST MAUDE AVE
SUNNYVALE, CA 94056

DR. CHARLES E. BYVIK
W. J. SCHAFER ASSOC.
1901 N. FORT MYER DRIVE
SUITE 800
ARLINGTON, VA 22209

DL-7
MISSION
OF
ROME LABORATORY

Mission. The mission of Rome Laboratory is to advance the science and technologies of command, control, communications and intelligence and to transition them into systems to meet customer needs. To achieve this, Rome Lab:

a. Conducts vigorous research, development and test programs in all applicable technologies;

b. Transitions technology to current and future systems to improve operational capability, readiness, and supportability;

c. Provides a full range of technical support to Air Force Materiel Command product centers and other Air Force organizations;

d. Promotes transfer of technology to the private sector;

e. Maintains leading edge technological expertise in the areas of surveillance, communications, command and control, intelligence, reliability science, electro-magnetic technology, photonics, signal processing, and computational science.

The thrust areas of technical competence include: Surveillance, Communications, Command and Control, Intelligence, Signal Processing, Computer Science and Technology, Electromagnetic Technology, Photonics and Reliability Sciences.