CCD CORNER-TURNING MEMORY

Texas Instruments Incorporated

Robert J. Kansy
This report has been reviewed by the RADC Public Affairs Office (PA) and is releasable to the National Technical Information Service (NTIS). At NTIS it will be releasable to the general public, including foreign nations.

RADC-TR-80-152 has been reviewed and is approved for publication.

APPROVED: Virgil E. Vickers

VIRGIL E. VICKERS
Project Engineer

APPROVED: Clarence D. Turner

CLARENCE D. TURNER, Acting Director
Solid State Sciences Division

FOR THE COMMANDER: John P. Huss

JOHN P. HUSS
Acting Chief, Plans Office

SUBJECT TO EXPORT CONTROL LAWS

This document contains information for manufacturing or using munitions of war. Export of the information contained herein, or release to foreign nationals within the United States, without first obtaining an export license, is a violation of the International Traffic in Arms Regulations. Such violation is subject to a penalty of up to 2 years imprisonment and a fine of $100,000 under 22 U.S.C 2778.

Include this notice with any reproduced portion of this document.

If your address has changed or if you wish to be removed from the RADC mailing list, or if the addressee is no longer employed by your organization, please notify RADC (ESE) Hanscom AFB MA 01731. This will assist us in maintaining a current mailing list.

Do not return this copy. Retain or destroy.
Two analog CCD reformatting memories were designed and fabricated using an n-channel, double-level, self-aligned polysilicon gate process. These unique CCD structures employ two-dimensional charge transfer cells in a square memory array (32 x 32 and 64 x 64 elements) which is accessed by means of integrated CCD demultiplexer and multiplexer structures, resulting in greater dynamic range than observed in previous line-addressed designs. Details related to the design of this complex device and the...
Item 20 (Cont'd)

Results of experimental tests are presented.

A secondary objective involved the development of a process-compatible bipolar output device for high speed applications. An analysis of this bipolar output circuit is presented, and experimental results are discussed.
# TABLE OF CONTENTS

<table>
<thead>
<tr>
<th>SECTION</th>
<th>CONTENTS</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>I</td>
<td>INTRODUCTION</td>
<td>1</td>
</tr>
<tr>
<td>II</td>
<td>32 x 32 ELEMENT CTM DESIGN</td>
<td>16</td>
</tr>
<tr>
<td>A.</td>
<td>Two-Dimensional Charge Transfer Cell</td>
<td>16</td>
</tr>
<tr>
<td>B.</td>
<td>Input Demultiplexer</td>
<td>21</td>
</tr>
<tr>
<td>C.</td>
<td>Output Multiplexer</td>
<td>26</td>
</tr>
<tr>
<td>D.</td>
<td>32 x 32 CTM Chip Organization</td>
<td>29</td>
</tr>
<tr>
<td>III</td>
<td>32 x 32 CTM EVALUATION</td>
<td>31</td>
</tr>
<tr>
<td>A.</td>
<td>Lot 501</td>
<td>31</td>
</tr>
<tr>
<td>B.</td>
<td>Lot 602</td>
<td>43</td>
</tr>
<tr>
<td>C.</td>
<td>Lot 620</td>
<td>43</td>
</tr>
<tr>
<td>IV</td>
<td>64 x 64 CTM DESIGN</td>
<td>55</td>
</tr>
<tr>
<td>V</td>
<td>64 x 64 CTM EVALUATION</td>
<td>58</td>
</tr>
<tr>
<td>VI</td>
<td>BIPOLAR TEST BAR DESIGN</td>
<td>69</td>
</tr>
<tr>
<td>A.</td>
<td>High-Speed CCD Output Circuits</td>
<td>69</td>
</tr>
<tr>
<td>B.</td>
<td>Bipolar CCD Output Circuit</td>
<td>81</td>
</tr>
<tr>
<td>C.</td>
<td>Bipolar Output Concept and Design Consideration</td>
<td>86</td>
</tr>
<tr>
<td>D.</td>
<td>Test Bar Design</td>
<td>96</td>
</tr>
<tr>
<td>VII</td>
<td>BIPOLAR TEST BAR EVALUATION</td>
<td>103</td>
</tr>
<tr>
<td>A.</td>
<td>Bipolar Device Evaluation</td>
<td>103</td>
</tr>
<tr>
<td>B.</td>
<td>PNP/NMOS Buffer Evaluation</td>
<td>103</td>
</tr>
<tr>
<td>C.</td>
<td>Bipolar CCD Output Circuit Evaluation</td>
<td>108</td>
</tr>
<tr>
<td>D.</td>
<td>32 x 32 CTM With Bipolar Output</td>
<td>115</td>
</tr>
<tr>
<td>VIII</td>
<td>CONCLUSIONS</td>
<td>117</td>
</tr>
<tr>
<td>APPENDIX A</td>
<td>CTM TIMING CIRCUITS</td>
<td>121</td>
</tr>
<tr>
<td>APPENDIX B</td>
<td>DERIVATION of CCD OUTPUT CIRCUIT TRANSFER FUNCTION</td>
<td>138</td>
</tr>
<tr>
<td>APPENDIX C</td>
<td>A CCD TWO DIMENSIONAL TRANSFORM</td>
<td>145</td>
</tr>
<tr>
<td>REFERENCES</td>
<td></td>
<td>155</td>
</tr>
</tbody>
</table>
## LIST OF ILLUSTRATIONS

<table>
<thead>
<tr>
<th>FIGURE</th>
<th>DESCRIPTION</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Video Reformatting in Radar Doppler Processor</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>CCD Analog Doppler Processor Architectures</td>
<td>4</td>
</tr>
<tr>
<td>3</td>
<td>Line Addressed Reformatting Memory Organization</td>
<td>6</td>
</tr>
<tr>
<td>4</td>
<td>Block Diagram of Sequentially Line Addressable Memory Chip</td>
<td>8</td>
</tr>
<tr>
<td>5</td>
<td>Sequentially Line Addressable Memory Chip</td>
<td>11</td>
</tr>
<tr>
<td>6</td>
<td>Sequentially Line Addressable CTM Operation at 12.5 MHz</td>
<td>12</td>
</tr>
<tr>
<td>7</td>
<td>Organization of the Two-Dimensional Reformatting Memory</td>
<td>14</td>
</tr>
<tr>
<td>8</td>
<td>Topology and Clocking Scheme for the Four-Phase Transfer Cell</td>
<td>17</td>
</tr>
<tr>
<td>9</td>
<td>Topology and Clocking Scheme for the Three-Phase Transfer Cell</td>
<td>19</td>
</tr>
<tr>
<td>10</td>
<td>Channel Stop, Gate and Contact Levels for the Three-Phase Transfer Cell Showing Critical Dimensions</td>
<td>20</td>
</tr>
<tr>
<td>11</td>
<td>Photomicrograph of the Two-Dimensional Transfer Cell</td>
<td>22</td>
</tr>
<tr>
<td>12</td>
<td>Channel and Gate Levels of the Demultiplexer Structure</td>
<td>24</td>
</tr>
<tr>
<td>13</td>
<td>Photomicrograph of the Demultiplexer - Memory Array Interface</td>
<td>25</td>
</tr>
<tr>
<td>14</td>
<td>Channel and Gate Levels of the Multiplexer</td>
<td>27</td>
</tr>
<tr>
<td>15</td>
<td>Photomicrograph of the Multiplexer - Memory Array Interface</td>
<td>28</td>
</tr>
<tr>
<td>16</td>
<td>Photomicrograph of the 32 x 32 Element Reformatting Memory Chip</td>
<td>30</td>
</tr>
<tr>
<td>17</td>
<td>Memory Operation in the Corner-Turning Mode at a 5 MHz Clock Rate</td>
<td>32</td>
</tr>
<tr>
<td>18</td>
<td>(a) Memory Operation in the SPS Mode at a 5 MHz Clock Rate and (b) Pulse Response for Determination of CTE</td>
<td>34</td>
</tr>
<tr>
<td>19</td>
<td>Diagram of ac-Coupled/dc-Restored Amplifier Used to Inhibit Clock Pulse Feedthrough</td>
<td>35</td>
</tr>
<tr>
<td>20</td>
<td>Fixed Pattern Noise Observed in Both CTM and SPS Modes</td>
<td>37</td>
</tr>
<tr>
<td>21</td>
<td>Fixed Pattern Noise Observed for Chip #11</td>
<td>38</td>
</tr>
<tr>
<td>22</td>
<td>Fixed Pattern Noise Observed for Chip #9</td>
<td>39</td>
</tr>
<tr>
<td>23</td>
<td>Demultiplexer Fixed Pattern Noise Due to Substrate Inhomogeneities</td>
<td>41</td>
</tr>
<tr>
<td>24</td>
<td>100% Duty Cycle CTM Operation</td>
<td>44</td>
</tr>
<tr>
<td>25</td>
<td>CTM Output</td>
<td>44</td>
</tr>
<tr>
<td>FIGURE</td>
<td>DESCRIPTION</td>
<td>PAGE</td>
</tr>
<tr>
<td>--------</td>
<td>-----------------------------------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>26</td>
<td>5 MHz 32 x 32 CTM Operation</td>
<td>45</td>
</tr>
<tr>
<td>27</td>
<td>SPS Operation of the Memory at 1 MHz</td>
<td>47</td>
</tr>
<tr>
<td>28</td>
<td>Step Response Used to Determine CTE</td>
<td>48</td>
</tr>
<tr>
<td>29</td>
<td>Reformatting Operation of the Memory at 1 MHz</td>
<td>49</td>
</tr>
<tr>
<td>30</td>
<td>DC Response in Reformatting Mode Showing Fixed Pattern Noise Components</td>
<td>50</td>
</tr>
<tr>
<td>31</td>
<td>Input Signal Spectrum</td>
<td>52</td>
</tr>
<tr>
<td>32</td>
<td>Spectrum of Reformatted Data</td>
<td>52</td>
</tr>
<tr>
<td>33</td>
<td>Spectral Components of Fixed Pattern Noise for a 32 x 32 CTM (Leftmost Spike is Zero Hz Marker)</td>
<td>53</td>
</tr>
<tr>
<td>34</td>
<td>Spectral Components of Clock Feedthrough at Output of 32 x 32 CTM (Leftmost Spike is Zero Hz Marker)</td>
<td>53</td>
</tr>
<tr>
<td>35</td>
<td>Schematic Diagram of the High-Speed Output Circuit</td>
<td>56</td>
</tr>
<tr>
<td>36</td>
<td>Photomicrograph of the Completed 64 x 64 CTM Chip</td>
<td>57</td>
</tr>
<tr>
<td>37</td>
<td>SPS Operation of the 64 x 64 CTM at 1 MHz</td>
<td>59</td>
</tr>
<tr>
<td>38</td>
<td>100% Duty Cycle Reformatting Operation of the 64 x 64 CTM at 1 MHz</td>
<td>60</td>
</tr>
<tr>
<td>39</td>
<td>Observed Fixed Pattern Noise Levels in the 64 x 64 CTM</td>
<td>61</td>
</tr>
<tr>
<td>40</td>
<td>(a) Step Response in SPS Mode at a 5 MHz Clock Rate</td>
<td>62</td>
</tr>
<tr>
<td></td>
<td>(b) Swept Spectrum of SPS Delay Line Transfer Function From Which CTE = 0.9999 Is Estimated</td>
<td></td>
</tr>
<tr>
<td>41</td>
<td>64 x 64 CTM Output at 25 MHz Rate</td>
<td>64</td>
</tr>
<tr>
<td>42</td>
<td>64 x 64 CTM Output Fixed Pattern Noise at 25 MHz Rate</td>
<td>65</td>
</tr>
<tr>
<td>43</td>
<td>Output Waveforms for the 64 x 64 CTM in the SPS Mode at a 40 MHz Clock Rate</td>
<td>66</td>
</tr>
<tr>
<td>44</td>
<td>Output Amplifier Frequency Response</td>
<td>67</td>
</tr>
<tr>
<td>45</td>
<td>Operation of Conventional CCD Output Circuits</td>
<td>70</td>
</tr>
<tr>
<td>46</td>
<td>Frequency Response of the Reset Amplifier</td>
<td>72</td>
</tr>
<tr>
<td>47</td>
<td>Equivalent Circuit for the Reset Amplifier</td>
<td>74</td>
</tr>
<tr>
<td>48</td>
<td>Circuit Schematic of the High-Speed Buffer Amplifier</td>
<td>79</td>
</tr>
<tr>
<td>FIGURE</td>
<td>LIST OF ILLUSTRATIONS</td>
<td></td>
</tr>
<tr>
<td>--------</td>
<td>-----------------------</td>
<td></td>
</tr>
<tr>
<td>49</td>
<td>Measured and Predicted Frequency Response of the High Speed Output Buffer</td>
<td>80</td>
</tr>
<tr>
<td>50</td>
<td>Schematic Diagram and Output Waveform</td>
<td>82</td>
</tr>
<tr>
<td>51</td>
<td>The Output Circuit Transfer Function</td>
<td>83</td>
</tr>
<tr>
<td>52</td>
<td>Schematic Diagram of the Bipolar Output Device</td>
<td>85</td>
</tr>
<tr>
<td>53</td>
<td>The Structure and Concept of the Bipolar Output for CCDs</td>
<td>87</td>
</tr>
<tr>
<td>54(a)</td>
<td>Emitter Current Response of the Bipolar Output for a Charge Size of 1.6 fc.</td>
<td>91</td>
</tr>
<tr>
<td>54(b)</td>
<td>SPICE Simulation of the Bipolar Output</td>
<td>92</td>
</tr>
<tr>
<td>55</td>
<td>Doping Distribution Due to the Base and Emitter Ion Implants Forming the PNP Bipolar</td>
<td>93</td>
</tr>
<tr>
<td>56</td>
<td>B vs Ic Characteristics of the Output Bipolar Device</td>
<td>95</td>
</tr>
<tr>
<td>57</td>
<td>PNP/NMOS Buffers</td>
<td>98</td>
</tr>
<tr>
<td>58</td>
<td>Bipolar Test Device Chip From Bipolar Test Bar</td>
<td>99</td>
</tr>
<tr>
<td>59</td>
<td>CCD Test Chip From Bipolar Test Bar</td>
<td>100</td>
</tr>
<tr>
<td>60</td>
<td>32 x 32 CTM With Bipolar Output</td>
<td>101</td>
</tr>
<tr>
<td>61</td>
<td>Equivalent Circuit of the PNP Device Test Structure</td>
<td>104</td>
</tr>
<tr>
<td>62</td>
<td>DC Transfer Curves for the PNP Test Devices</td>
<td>105</td>
</tr>
<tr>
<td>63</td>
<td>Pulse Response of the Single Stage PNP/NMOS Buffer</td>
<td>107</td>
</tr>
<tr>
<td>64</td>
<td>Response of the Dual Channel CCD Test Structure at a 10 MHz Clock Rate</td>
<td>109</td>
</tr>
<tr>
<td>65</td>
<td>Output Waveform and Frequency Response Due to the Reset Amplifier</td>
<td>110</td>
</tr>
<tr>
<td>66</td>
<td>Equivalent Circuit of Bipolar Output Circuit</td>
<td>111</td>
</tr>
<tr>
<td>67</td>
<td>Output Waveform and Frequency Response Due to Bipolar Output Stage With 470 Ω Load</td>
<td>114</td>
</tr>
<tr>
<td>68</td>
<td>Output Waveform and Frequency Response Due to Bipolar Output Stage With 1 kΩ Load</td>
<td>116</td>
</tr>
</tbody>
</table>
# LIST OF TABLES

<table>
<thead>
<tr>
<th>TABLE</th>
<th>Description</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Bipolar Test Bar Implant Parameters</td>
<td>102</td>
</tr>
<tr>
<td>2</td>
<td>High Frequency Bipolar Device Test Results</td>
<td>106</td>
</tr>
<tr>
<td>3</td>
<td>Parts List for Figure 70</td>
<td>126</td>
</tr>
<tr>
<td>4</td>
<td>Parts List for Figure 71</td>
<td>129</td>
</tr>
<tr>
<td>5</td>
<td>Parts List for the High Speed Clock Generator</td>
<td>136</td>
</tr>
</tbody>
</table>
This report is the Final Report on the contract. It covers research done on CCD corner-turning memories during the 25 1/2 month period 15 September 1977 to 31 October 1979. The objective of the research is the development of 2-dimensional CCD corner-turning memories with integral high-speed demultiplexer/multiplexers to provide analog data reformatting with wide dynamic range, for such purposes as radar doppler processors, matched filtering of low data rate covert spread spectrum communication, and high resolution spectral analysis of long time records. Such devices were designed, fabricated, and demonstrated, and working samples were delivered with test fixtures.

The above work is of value since it provides a key device needed for accomplishing signal processing in USAF command, control, communication, and intelligence systems. Related TPO is R5D.

VICKERS

VIRGIL E. VICKERS
Project Engineer
SECTION I
INTRODUCTION

Advancements in charge coupled device technology have resulted in the addition of a number of new components for analog signal processing applications. The most promising developments in this field are the integrated chirp Z-transform (CZT)\(^1\) and the digital-analog correlator,\(^2\) both of which capitalize on the tremendous effective computational power of analog transversal filters to accomplish complex signal processing functions with modest volume, weight, and power dissipation. These devices can be thought of as linear operators that transform a fixed length input record \(V\) into an output record \(U\) of identical length according to \(U = LV\). In the case of the chirp Z-transform \(L\) corresponds to the discrete Fourier transform operator via the chirp Z-algorithm, while in the digital-analog correlator, \(L\) corresponds to a programmable scalar product operator.

Unfortunately, in a number of potentially interesting applications the input data sequence does not occur in the form of consecutive, fixed length input records. A particularly illustrative example is the case in which it is desired to utilize the CZT processor to accomplish doppler resolution of the returns in a pulsed radar system. The situation is illustrated in Figure 1(a), which shows the radar video in each transmitter pulse repetition interval (PRI) subdivided into range bins that are dictated by the range resolution of the radar circuitry. To accomplish the doppler processing, the returns in each range bin must be analyzed over a number of PRIs dictated by the number of points in the transform. Use of an integrated CZT processor in this application requires that the input record be sequential in PRIs, whereas the radar video is sequential in range bins. Figures 1(b) through 1(e) illustrate the problem.

The radar video shown in Figure 1(b) is low pass filtered and sampled at a rate commensurate with the range resolution resulting in Figure 1(c). For illustrative purposes a constant amplitude return (stationary target) is
Figure 1 Video Reformatting in Radar Doppler Processor
shown in range bin 1, and a time varying return (moving target) is shown in range bin 3. The serial video consists of four records, each of length 8, which are grouped in a $4 \times 8$ matrix format in Figure 1(d). The rows correspond to pulse intervals and columns to range bins within each interval. If we consider Figure 1(d) as representing the contents of a $4 \times 8$ memory array into which the sampled video was loaded on a row-by-row basis, then it can be seen that the desired reformatting operation consists in reading the contents of the memory column-by-column. This results in eight records, each four samples in length and each sequential in pulse interval, as required for doppler processing as shown in Figure 1(e).

It can be shown that the minimum number of storage locations required to accomplish the reformatting operation on $M$ records of length $N$ is $(N - 1) (M - 1)$. However, the access and transfer control requirements for a CCD memory of this type are quite complex. Considerable simplification results from the use of an $M \times N$ array (illustrated in Figure 1) in that all access and transfer operations can be made to occur either sequentially or in parallel. Because of the nature of the reformatting operation, this type of memory is referred to as a "corner turning memory" (CTM).

The configuration of the hypothetical doppler processor subsystem is shown in Figure 2. A dual channel analog reformatting memory allows in-phase (I) and quadrature (Q) video components from a coherent radar to be used in conjunction with a complex CZT processor to preserve the sense of the doppler information (approaching vs receding targets). One particularly crucial aspect in the complex doppler processor involves gain tracking between the I and Q reformatting memories. To ensure that the doppler image response (having doppler sense opposite that of the target) is sufficiently attenuated, the gain between I and Q channels must continually track to within limits
Figure 2  CCD Analog Doppler Processor Architectures
given by

\[ 20 \log_{10} \left( \frac{A_1 + A_0}{|A_1 - A_0|} \right) > DR \]

where DR is the desired processor dynamic range in dB.

Analog CCD reformatting memories have been recently applied in high-speed doppler processors in conjunction with surface acoustic wave device chirp transform processors. These memories are predominantly line-addressed CCD structures utilizing N delay lines each having M stages. A typical configuration is shown in Figure 3, again using the 4 x 8 format as an example. The individual CCD input and output circuits are commutated, usually by means of on-chip MOS circuitry.

The memory is loaded in a row-by-row fashion by commutating the input circuit at the desired sampling rate. After each row has been entered, it is transferred one stage toward the output. When the memory is filled, delay line 1 contains the samples from range bin 1, delay line 2 contains those from range bin 2, etc. Reformatting is accomplished by emptying the array column-by-column using the output commutator. Since data cannot generally be entered during this output phase, the duty cycle of this configuration is limited to 50%.

Note that the inverse operation is possible in square arrays in which data can be entered column-by-column during the output phase of one reformatting operation, then emptied row-by-row during the input phase of the next. This specific case results in 100% duty cycle; however, the key limitation in the CCD delay line approach is the fixed pattern gain and offset variations that occur among the delay lines due to MOS threshold voltage variations and, to a lesser extent, CTE variations. In the doppler processor application these result in gain and offset variations among the length M output records in the column-output mode and among samples in each record in the row-output mode. The former may be tolerated in some applications;
Figure 3 Line Addressed Reformatting Memory Organization
however, the latter usually results in an intolerable signal-to-noise degradation.

Based on these architectural considerations, a processor brassboard was developed under a previous contract to evaluate the configuration illustrated in Figure 2 using a SAW chirp transform module. The I and Q components of the radar video are to be derived from a synchronous detector in the radar receiver and stored in separate memory (range store) modules. Each module consists of a number of 66 x 66 cell CCD memory blocks required to accommodate the range swath of interest. For the brassboard two blocks per module were employed, although this is expandable to allow the desired coverage of the radar pulse interval. Thus, the range stores in the brassboard can accommodate 132 range bins (I and Q) collected over 66 PRIs in the search mode and 66 range bins (I and Q for both Σ and Δ channels) over 66 PRIs in the track mode.

The range stores are emptied at a 12.5 MHz rate into I and Q up converters to generate single sideband doppler information at a 318 MHz IF for subsequent spectral analysis by the 64-point SAW CZT module. (The use of 66 cells in the range stores allows two "burn-pulses" for settling of transients before spectral analysis is initiated.) The SAW CZT module consists of two complex CZT channels that are commutated by processor timing to allow continuous system operation. The use of the complex CZT realization is required to maintain the desired phase information.

The organization of the CCD memory chip is illustrated by the block diagram in Figure 4. The configuration shown consists of sixty-six 66-stage CCD delay lines and the circuitry required to address individual input and output terminals on a line-by-line basis. The corner turning operation is performed by sequentially switching between a parallel mode in which consecutive samples are read into and out of consecutive delay lines and a serial mode in which consecutive samples are read into or out of one particular delay line.
Figure 4  Block Diagram of Sequentially Line Addressable Memory Chip
The design approach used in the range stores is based on the two-phase CCD structure, which can be realized with the double-level polysilicon gate process developed at TI. These delay lines are operated in the so-called "phase and one-half" mode in which one set of gate electrodes is clocked, while the other is maintained at an intermediate dc level. By properly designing the output circuit, the charge transfer and sensing operations can be controlled with a single clock pulse. When the CCD clock is off, the signal charge is stored in ion-implanted potential wells beneath the nonclocked electrodes.

Referring to Figure 4, a logic "1" is entered at the input of the 66-stage MOS shift register and propagates through the structure sequentially addressing MOS switch transistors, which couple individual CCD clock lines to the CCD clock buses. Serial or parallel operation is determined by the shift register clock rate. Signal charge is transferred down the CCD delay lines and detected at individual output circuits. The commutating shift register is a standard dynamic MOS ratio-less configuration with an additional transfer switch connected to the CCD clock switch transistor that operates in a bootstrap mode.

The CCD delay lines are each 66-stage buried channel devices having 6 mil (152.4 μm) channel widths and 0.6 mil (15.24 μm) gate lengths. The input circuits are dual-gate surface channel configurations and can be operated in either the "diode cutoff" or "potential equilibration" mode. The buried channel is coincident with the edge of the implanted well under the first clocked electrode.

To preserve dynamic range at the clock rates required, it was determined that each delay line should have an individual source-follower output transistor. Furthermore, power dissipation requirements dictated that each output transistor should be turned off when not in use. This was accomplished with an additional reset transistor at the output of each delay line and the
use of external current summing in the output circuit. As indicated in Figure 5, all output nodes are tied in parallel and connected to the emitter of a PNP common base current summing amplifier. The value of $V_B$ sets the voltage at the emitter and hence, the common output node voltage. This is adjusted such that the preset level at the output diode of the currently addressed delay line is sufficient to turn on its respective source follower.

The collector current of the summing amplifier then follows the drain current of the output follower and is sensed by $R_L$. Coincident with the operation of the shift register, a reset pulse is applied to the gate of the additional reset transistor. This pulls the output diode voltage at all delay lines down to $V_{\text{Reset}}$, which is adjusted to a level below the emitter voltage of the summing amplifier ($V_{\text{Reset}} = V_B$ is sufficient), thereby turning off all output source-followers. The next preset operation turns on the output circuit of the next delay line, and the output voltage follows the source-follower drain current until the next reset pulse occurs.

Figure 5 is a photomicrograph of the completed memory chip. Due to size limitations, the structure is composed of thirty-three 66-stage shift register addressed delay lines. An integral source-follower is attached to the last stage of the shift register to couple it to a second bar, thereby forming the full 66 x 66 memory array. The dimensions of the chip are 156 mils x 293 mils (3.96 mm x 7.44 mm).

Figure 6 illustrates the operation of two interconnected memory chips performing time compression of an analog test signal at a 12.5 MHz clock rate. The top two traces are, respectively, the test signal and memory output (before sample-and-hold) on the same time scale. The test signal is sampled during the parallel mode, which can be differentiated from the serial mode by level variations in the latter due to threshold voltage variations. The amplitude of this fixed pattern noise component is seen to be of the same order as the peak-to-peak signal swing. In the doppler processor output these variations appear in the "0" doppler cell and can be externally inhibited.
Figure 6 Sequentially Line Addressable CTM Operation at 12.5 MHz
The primary performance limitation observed in the system tests resulted from gain variations between I and Q channels, which are related to the threshold voltage variations through the transconductance \((g_m)\) of the individual output source-followers. These gain variations were observed to be in the range of 3 to 5\% and resulted in poor suppression of the doppler image response.

The limitations imposed by MOS threshold variations are inherent in line-addressed analog memories and stimulated the development of a unique approach to the problem. Since the reformatting operation consists in interchanging row and column transfer of the stored data sequence, the most direct realization of a CCD reformatting memory involves a structure in which charge can be transferred either vertically (row-by-row) or horizontally (column-by-column) in the array. The structures controlling the two-dimensional charge transfer operation are complicated, and the potential advantages of this direct realization are lost in the higher risks associated with the sophisticated design. In addition, there is no performance advantage over line-addressed structures if individual input and output cells are employed at the edge of the array. However, the two dimensional transfer approach offers the potential for substantially improved performance with the integration of a CCD multiplexer and demultiplexer to provide serial interfaces with the memory array. A block diagram of the structure appears in Figure 7. The input demultiplexer performs a serial-to-parallel conversion and loads the memory array row-by-row. During the load cycle, charge is transferred vertically in the array. When all rows have been loaded, the array is switched to a horizontal transfer mode, and the output multiplexer performs the required parallel-to-serial conversion of the data stored in each column. The resulting CCD structure has a single input port (voltage-charge conversion) and a single output port (charge-voltage conversion). All intermediate data manipulations involve CCD-type charge transfers. Thus, the sources of the fixed pattern variations that cause degraded dynamic range in the line-addressed structures have been eliminated, and full dynamic range is expected over the entire reformatted data array (MN samples).
Figure 7 Organization of the Two-Dimensional Reformatting Memory
In square arrays \((M = N)\) an additional demultiplexer can be added at the left of the memory array in Figure 7 and an additional multiplexer added at the bottom to allow simultaneous loading and unloading. Full duty cycle is thereby achieved; in addition, the array can be operated as a serial-parallel-serial (SPS) delay line. The latter feature greatly simplifies device evaluation.

The primary objective of the program executed under Contract No. F19628-77-C-0234 is the development of a prototype analog CCD reformatting memory using two-dimensional charge transfer cells and integral CCD multiplexers and demultiplexers. A secondary objective is the development of CCD process-compatible bipolar transistors for use in high-speed interface circuits. Three chip designs were executed during the program:

- 32 x 32 element prototype CTM
- 64 x 64 element prototype high-speed CTM
- Bipolar circuit test chip.

The 32 x 32 design was accomplished to evaluate the basic memory structure. Design and evaluation are discussed in Sections II and III, respectively. The 64 x 64 element design was improved to accommodate higher operating speeds and is discussed in Sections IV and V. Design and testing of the bipolar chip are discussed in Sections VI and VII.
A. Two-Dimensional Charge Transfer Cell

All basic electrode arrangements for two-dimensional charge transfer arrays employing two-, three-, or four-phase clocking schemes were described by Sequin in 1974. The subject seems to have been only of academic interest, since at the onset of the program the construction and operation of a true two-dimensional charge transfer array had not been described in the literature. The first design task, therefore, involved selection of a topology that was consistent with available fabrication processes.

Consideration of the number of independent clock phases required for two-dimensional charge transfer immediately suggested the use of the "two-phase" CCD process that was developed at Texas Instruments for production of high density CCD digital memories. The salient features of this process include:

- two polysilicon gate levels
- self-aligned CCD well implants
- self-aligned source/drain diffusion
- buried channel CCD capability.

The first 2-D transfer scheme considered was a four-phase approach using diagonally interconnected electrodes. The topology and operation of this structure are illustrated in Figure 8, which shows horizontal transfer. Vertical charge transfer is accomplished by interchanging the $\phi_1$ and $\phi'_1$ clock waveforms. A four-phase clock system is required, and the array can (in principle) be realized with no intralevel interconnections in the charge transfer region. Unfortunately, inclusion of multiplexers and demultiplexers at the edges of the array obviates this apparent advantage, since the diagonal interconnections cannot be extended beyond the edges of the array.
Figure 8  Topology and Clocking Scheme for the Four-Phase Transfer Cell
The primary disadvantage of this approach is the complexity required at the multiplexer and demultiplexer interfaces. Note that charge is entered at the left edge of the array at three different times during one clock period. Charge packets from a CCD demultiplexer will, however, appear simultaneously at the interface. Thus, an extra buffer stage is required between the demultiplexer and the memory array to accommodate the timing differences. Similar problems exist at the array-multiplexer interface.

To simplify these critical interfaces, a more straightforward two-dimensional cell design was adopted that relies on interlevel interconnects within the array. The structure is similar to a three-gate level design discussed by Sequin [Reference 7, Figure 5(b)] and is shown pictorially in Figure 9. The cell consists of two interleaved two-phase CCDs that share a common phase. The V and H transfer gates are both first-level polysilicon electrodes. The H gates are continuous polysilicon in the vertical direction, and columns are interconnected with metal. The V gates rely solely on metal interconnects. The common phase is second-level polysilicon that completely covers the array except for the openings required for contact to the first-level gates. The hatched areas indicate the barrier regions in the two-phase transfer stages. Note that horizontal and vertical transfers are accomplished simply by enabling the corresponding clock waveform. Further, all charge packets enter or leave the array simultaneously, thereby allowing the interface to be accomplished with a single transfer gate between demultiplexer and array and another between array and multiplexer.

Since both topologies require intralevel interconnections within the array, the three-phase transfer cell was identified as exhibiting the lowest overall technical risk. The remainder of this section describes the design considerations applied to the realization of the 32 x 32 CTM using the three-phase transfer cell.

Figure 10 shows the channel stop, gate, and contact levels for the three-phase transfer cell designed to be compatible with the design rules for
Figure 9  Topology and Clocking Scheme for the Three-Phase Transfer Cell

C = Common Phase
Figure 10  Channel Stop, Gate and Contact Levels for the Three-Phase Transfer Cell Showing Critical Dimensions
the "two-phase" CCD process. Critical feature dimensions are labeled A through E and are discussed below. The horizontal (H), vertical (V), and common phases are indicated for comparison with Figure 9.

The minimum center-to-center spacing (pitch) of the transfer cells is dictated by the process design rules that are set by photomask geometry and alignment tolerances. This distance can be determined from Figure 10 and is given by \( P = B + E + 2(A + C + D) \), where

- \( A \) = minimum intralevel poly separation (0.3 mil) 7.6 \( \mu \)m
- \( B \) = minimum poly line width (0.3 mil) 7.6 \( \mu \)m
- \( C \) = minimum poly-channel stop overlap (0.2 mil) 5.1 \( \mu \)m
- \( D \) = minimum contact-poly separation (0.2 mil) 5.1 \( \mu \)m
- \( E \) = minimum contact dimension (0.3 mil) 7.6 \( \mu \)m

The resulting pitch determined by the "two-phase" design rules is \( P = 2.0 \) mils (50.8 \( \mu \)m). Smaller values of \( P \) could be achieved at the expense of reduced yield; however, for the purpose of designing this prototype structure the design rules were strictly observed.

Figure 11 is a photomicrograph of the completed chip showing the details of the two-dimensional charge transfer cell. The orientation is the same as in Figure 10, and the metal interconnections among the individual V transfer gates and the H gate column can be seen. Cell interconnects are redundant to provide a measure of protection against a few open metal lines.

B. Input Demultiplexer

The input demultiplexer accomplishes the serial-to-parallel conversion required to load the signal charge packets into the memory array. In its simplest form a CCD demultiplexer consists of a serial shift register having an output tap on each stage. The input demultiplexer for the CTM is more complex due to constraints imposed by the desired charge-coupled interface with the memory array.
Figure II  Photomicrograph of the Two-Dimensional Transfer Cell
The pitch of the shift register must match that of the two-dimensional transfer cells. This requirement is inconsistent with maintaining good charge transfer efficiency (CTE) in the shift register and requires the use of a four-phase delay stage having 0.5 mil gate length per phase. To further improve the CTE capabilities of the shift register, each phase includes a depletion well implant that results in a 0.2 mil barrier length. In addition, the entire channel is implanted to operate in a buried channel mode.

The charge-coupled interface requires that the parallel transfer be accomplished by means of contiguous polysilicon transfer gates. This complicates the layout due to intralevel polysilicon separation considerations.

Figure 12 shows the channel and gate levels of the resulting demultiplexer design. Parallel transfer is accomplished by inhibiting the \( \phi_3 \) clock and enabling the XFR gate. Charge stored beneath the \( \phi_2 \) electrode is then transferred to the XFR gate and subsequently to the second-level gate in the two-dimensional transfer cell. The interface between the \( \phi_2 \) and XFR electrodes is slanted to accommodate intralevel gate separation requirements to provide increased gate width and to minimize charge trapping in the demultiplexer, which could degrade CTE during the serial load operation. A photomicrograph of the completed chip showing the demultiplexer-memory array interface appears in Figure 13. Note that incomplete charge transfer in the parallel transfer operation results only in an apparent attenuation of the signal, so long as the charge remaining the the demultiplexer is completely cleared during the subsequent load operation.

The serial input circuit consists of an input diode and two surface channel gates. A standard CCD output circuit consisting of a reset amplifier
Figure 12 Channel and Gate Levels of the Demultiplexer Structure
Figure 13  Photomicrograph of the Demultiplexer - Memory Array Interface
and a source-follower output buffer is included in each demultiplexer to fully characterize the performance of the structure.

C. Output Multiplexer

The function of the output multiplexer is to perform a parallel-to-serial conversion on the 32 signal charge packets that are simultaneously shifted out of the memory array. In addition to the interface consideration applied to the demultiplexer design, the multiplexer interface must include a means for clearing of any residual charge following the parallel transfer. This function is inherent in the operation of the demultiplexer, and its neglect in the multiplexer can lead to a severe degradation of the apparent CTE of the memory array. The shift register portion of the multiplexer structure employs the same four-phase buried channel scheme used in the demultiplexer.

The channel and gate levels of the multiplexer structure are illustrated in Figure 14. The parallel transfer is accomplished between an extended second-level memory array electrode and the $0_2$ multiplexer gate by means of a short first-level transfer gate. As in the demultiplexer, the interface is slanted to accommodate intralevel gate spacing and to provide maximum channel width. An individual drain diode is included in each multiplexer cell and is connected to the extended output electrode by means of a short tab added to the adjacent first-level memory array gate. The diode is reverse-biased with a positive dc voltage, and any residual charge under the extended second-level electrode is drained when the adjacent first-level electrode is clocked to a high level prior to each parallel transfer. A photomicrograph of the completed chip showing the output multiplexer-memory array interface appears in Figure 15.

Each multiplexer includes a serial "fat zero" input circuit, which is also used to characterize the serial shift register. The two multiplexer channels are joined in a common output diode, and charge sensing is accomplished with a single reset amplifier.
Figure 14 Channel and Gate Levels of the Multiplexer
Figure 15  Photomicrograph of the Multiplexer - Memory Array Interface
D. 32 x 32 CTM Chip Organization

Figure 16 is a photomicrograph of the completed 32 x 32 CTM chip. The leads required to make connection to the memory array clocks, transfer gates, and diode drain were routed between the input and output circuits of the multiplexer and demultiplexers. Due to these interconnection requirements, it was not possible to incorporate a single input circuit for use with both demultiplexers. The dc offset between the individual demultiplexer input circuits (due to threshold voltage variations) is corrected using external dc-level adjustments. The die dimensions are 112.5 mils x 114 mils (2.86 mm x 2.90 mm) and includes 47 bond pads. In practice, the multiplexer clocks and \( \phi_1, \phi_2, \) and \( \phi_4 \) demultiplexer clocks are bonded in parallel across the chip, thereby allowing the use of a standard 40-pin package.

Experimental results are presented in the following section.
SECTION III
32 x 32 CTM EVALUATION

Three lots (12 slices/lot) of 32 x 32 CTMs were processed during the program. Significant experimental results from each lot are discussed in this section. Device evaluation was accomplished using an exerciser that was designed and constructed to allow parallel clocking of two CTM chips, thereby allowing dual channel (I and Q) operation. Timing diagrams and circuit schematics for the exerciser are included in Appendix A. The clock generation circuits were constructed using Schottky TTL, and the clock driver circuits employed integrated TTL-to-MOS converters. The maximum clock rate is limited to about 7 MHz by capacitive loading of the clock drivers.

A. Lot 501

Completed slices of the first lot were received in March 1978, and several test chips containing the three-phase CTM structure were bonded in 40-pin packages. Initial tests indicated a high incidence of intralevel shorts on the second polysilicon gate level. No completely operational chips were found, although it was apparent that the CCD multiplexers and demultiplexers were operational on a number of devices.

Further investigation revealed that, due to a mixup in process specifications, only half the lot (Slices 7-12) were fabricated using a new polysilicon etch technique developed to eliminate intralevel shorts. It was originally intended that this technique would be used exclusively for the entire lot. Subsequently, devices from Slices 9 and 10 were bonded and tested, and a number of operational units were observed. Successful operation was demonstrated at a program review on 14 April 1978.

In the course of further testing it was found that one CCD demultiplexer (at the top of the structure in Figure 16) was not operational as a result of a minor design error. Figure 17 illustrates 5 MHz operation of the 32 x 32 memory in the corner-turning mode, in which the memory is filled.
Figure 17 Memory Operation in the Corner-Turning Mode at a 5 MHz Clock Rate
column-by-column by the demultiplexer at the left, then emptied row-by-row through the multiplexer at the bottom of the array. Figure 17(a) shows the triangular test waveform and CTM output on the same time-scale and illustrates a 50% duty cycle in the memory due to operation with a single demultiplexer. Figure 17(b) shows an expanded view of the CTM output and illustrates the 32X time compression of the test waveform due to the corner-turning operation. The fine structure in Figure 17(b) occurs primarily because of the feedthrough of various clock pulses; however, a significant fixed pattern noise component was observed and is discussed later in this report. The clock rate in the multiplexer and demultiplexer is 5 MHz, and the signal attenuation observed is due partly to a mismatch in input and output node capacitances and partly to the low voltage gain of the source-follower output amplifier.

The memory was also tested in an SPS mode in which operation is the same as a 1024-stage delay line. In this case, however, each charge packet experiences only 192 transfers. Results of the SPS mode tests are shown in Figure 18. The 1024-stage delay (204.8 ns at the 5 MHz clock rate) is illustrated in Figure 18(a), while Figure 18(b) shows the response to a square pulse that is commonly used to evaluate CTE. In this case, however, the CTE measurement is complicated by the fixed pattern noise component, and a worst-case value of 0.9998 is based on the noise level observed. To isolate the fixed pattern noise component, it is first necessary to eliminate feedthrough of the low duty cycle clock pulses. Two techniques were employed to accomplish this goal.

The first technique employed was an ac-coupled/dc-restored amplifier (Figure 19) at the memory output. As indicated in the figure, the restore switch is closed briefly during the period immediately following the multiplexer output node precharge operation, and the sample switch is similarly pulsed following transfer of the signal charge packet. Thus, level translations at the output node that are present during both restore and sample operations (such as those due to stray coupling of clock pulses) are inhibited; however, stray transitions that occur between application of the
Figure 18  (a) Memory Operation in the SPS Mode at a 1 MHz Clock Rate and
(b) Pulse Response for Determination of τ τ
Figure 19 Diagram of ac-Coupled/dc-Restored Amplifier Used to Inhibit Clock Pulse Feedthrough. (Signal charge is transferred to the CCD output node on the falling edge of $\phi_C$.)
restore and sample pulses are not affected. The use of this circuit made it possible to inhibit the feedthrough of the low rate, two-dimensional memory clock pulses, but not the feedthrough due to the multiplexer and demultiplexer parallel transfer pulses.

The resulting output waveforms for both the CTM and SPS modes are illustrated in Figure 20. Comparison of Figure 20(a) (one complete CTM cycle 1024 samples) and Figure 20(b) (one SPS output burst - 32 samples) indicates a high degree of correlation in fixed pattern noise components due to the parallel demultiplexer transfer. (The slant on the CTM response is due to dark current integration at the relatively low clock rate employed.) Figure 20(c) shows the individual 32 sample bursts in the CTM mode and indicates that there is much less fixed pattern noise associated with the parallel multiplexer transfer. The large spurious samples in Figure 20(c) are due to coupling of the multiplexer transfer pulse, as mentioned earlier. The largest component of fixed pattern noise is clearly generated in the parallel demultiplexer transfer.

To completely eliminate the possibility of clock pulse feedthrough, a new timing circuit was designed and constructed. Timing diagrams and circuit schematics are included in Appendix A. By lowering the duty cycle of the memory to 25%, the control pulses can be made to occur between 32-sample data strings, rather than simultaneously with them. Resulting fixed pattern noise components for two CTM chips operating at two clock rates are shown in Figures 21 and 22. In this case post-amplification was employed to achieve unity gain through the memory. The maximum signal excursion is approximately 2 V peak-to-peak. The fixed pattern noise components observed in the figures (neglecting dark current integration) result in dynamic range figures of approximately 36 dB on a frame basis (1024 samples) and better than 40 dB on a line basis (32 samples) for chip No. 11. Note that the fixed pattern noise distributions are not correlated between chips and, except for dark current, are insensitive to clock rate. These results eliminate dark current spikes in the memory array and charge redistribution time constants in the
Figure 20: Fixed pattern noise observed in both CTM and SPS modes.
Figure 21 Fixed Pattern Noise Observed for Chip #11 (51. 9).
Clock rates are 100 kHz for (a) and (b); 500 kHz for (c) and (d).
Figure 22 Fixed Pattern Noise Observed for Chip #9 (51.9). Clock rates are 100 kHz for (a) and (b); 500 kHz for (c) and (d).
serial-parallel and parallel-serial interfaces as possible sources of the fixed pattern component. Both would result in a clock rate dependence of the noise.

Similar demultiplexer-related effects have been observed in digital CCD memories employing SPS structures. A recent explanation is based on the reduction of the channel width of the receiving well due to the effect of two-dimensional fringing fields. This is not a reasonable explanation for the effect observed in the CTM because both the serial register gate and the receiving well gate are independently clocked. No dependence of fixed pattern noise on clock amplitude (in excess of a minimum value required to initiate charge transfer) has been observed in the CTM.

A more reasonable explanation for the effect is based on two-dimensional variations in surface potential under the serial-register gate that have dimensions comparable with the gate width and larger than its length. These variations can be caused by localized variations in substrate doping or surface state density. Appreciable variations in both quantities are known to occur and are, in fact, the cause of typical MOS transistor threshold voltage variations. Threshold variations between adjacent overlapping gates (less than one mil center-to-center) in CCD multiplexers give rise to fixed pattern channel-to-channel offsets typically from 10 to 25 mV in amplitude. Figure 23 shows a simplified view of the surface potential at the serial-parallel interface in the demultiplexer. Although most of the signal charge is transferred from the demultiplexer $\phi_1$ electrode to the transfer electrode, a fraction remains trapped in a surface potential minimum under the $\phi_1$ electrode. When the demultiplexer $\phi_2$ electrode is clocked to a high level, this trapped charge (or the fraction trapped in minima having dimensions larger than the gate length L) will be transferred from the $\phi_1$ electrode to the $\phi_2$ electrode and hence, to the demultiplexer output circuit during the next serial input cycle. The minima are thereby cleared during each read-in cycle and are "refilled" at each parallel transfer.
Charge trapped in surface potential minimum

Charge transferred into memory array

Trapped charge completely transferred to $\xi$ electrode

Figure 23: Demultiplexed fixed pattern error due to substrate inhomogeneities
Note that the same conditions do not exist at the memory array-multiplexer interface (see Figure 14). The effective channel width of the diode drain is much narrower (0.3 mils) and probably does not result in complete clearing of the minima that must also exist in the parallel-serial transfer stage. These minima remain "filled" at all times and thus do not affect the signal charge packets. The fixed pattern components introduced in the multiplexer should, therefore, be smaller than those introduced in the demultiplexer, as is observed in Figures 20, 21, and 22.

An approximate value for the amount of charge trapped under the demultiplexer $\phi_1$ electrode can be calculated by assuming a linear surface potential gradient sloping away from the parallel transfer gate (as in Figure 23) and having a peak-to-peak variation of $\Delta V = 25$ mV. The area of the well region under the $\phi_1$ electrode is about 0.6 mils$^2$. The gate oxide capacitance is approximately 0.22 pF/mils$^2$, and the trapped charge is approximately $n_T = C_{\text{gate}} \frac{\Delta V}{2q} \approx 10^4$ electrons. The peak charge capacity is determined by the area and "depth" of the surface potential well in the charge storage region of the two-dimensional transfer cell. The resulting calculation gives a peak charge capacity, $n_{\text{max}}$, of about $10^6$ electrons. The ratio $n_T/n_{\text{max}}$ is on the order of 1%, resulting in a dynamic range (peak signal to peak noise) of about 40 dB, which is very close to the observed value.

Reduction of the fixed pattern offsets introduced by this mechanism will directly result from reduction in the spatial variations in substrate doping and surface state density. Recent evidence indicates that substrate doping variations are the dominant cause of MOS threshold variations and that special processing of the starting materials can reduce the threshold variations in proximate devices to 1 to 3 mV.

The offsets can also be reduced by decreasing the width of the demultiplexer channel. This directly reduces the quantity of trapped charge but also increases the effect of the lateral fringing fields due to the parallel transfer gate.
B. Lot 602

The buried channel and metal etch masks for the 32 x 32 CTM were reprocessed in order to:

- Correct the error in the demultiplexer clock line to achieve 100% duty cycle operation
- Correct the buried channel implant mask in the multiplexer serial input circuit
- Add buried channel implant to the reset transistor in the multiplexer output circuit to permit higher operating rates.

Figure 24 illustrates the input/output relationship of the CTM operating with 100% duty cycle at a 200 kHz sample rate. Figure 25 shows the CTM output on an expanded time base. In both photographs, the precharge levels as well as signal are seen in the output waveform.

Operation of the corrected CTM at 5 MHz is illustrated in Figure 26. Operation at this data rate is significantly better than operation of the original lot of CTMs. This is attributed to the addition of a buried channel implant to the output precharge circuit. The upper trace in Figure 26 shows the 5 MHz CTM output, while the lower trace shows the output of subsequent sample-and-hold circuitry, which eliminates the precharge level from the waveform. The observed fixed pattern noise levels were the same as seen for the devices from the previous lot.

C. Lot 620

The third lot of 32 x 32 CTMs was fabricated using high quality 3 in. diameter silicon slices known to have a more uniform oxygen distribution than the typical 2 in. slices used in the previous lots. This new material has significantly improved the performance of CCD imagers by yielding a more uniform dark current distribution. It was anticipated that the better oxygen
Figure 24 100% Duty Cycle CTM Operation

Figure 25 CTM Output
Figure 26 5 MHz 32 x 32 CTM Operation. Horizontal = 5 μs/div
uniformity would be accompanied by a more uniform substrate doping density resulting in smaller fixed pattern offsets.

Figure 27 shows input and output waveforms for memory operation at 1 MHz in the SPS mode, corresponding to a 1024-stage shift register. Charge transfer efficiency (CTE) has been computed from the observed step response in this mode shown in Figure 28 (assuming that it is dominated by demultiplexer and multiplexer transfers) and is greater than 0.9999 at 1 MHz.

Operation in the reformatting mode with 100% duty cycle at a 1 MHz rate is illustrated in Figure 29. Figure 29(a) shows the test waveform, which is triggered once per frame (1024 cycles). Figure 29(b) shows memory output over two successive frames and demonstrates the full duty cycle capabilities. Figure 29(c) and (d) are expanded views of the output during the first frame and confirm the reformatting operation which, in this case, corresponds to a 32x time compression of the test waveform.

Figure 30 shows the memory output on a more sensitive scale with a dc input signal and illustrates that fixed pattern noise has been reduced by 6 dB, compared to the previous lots. Figure 30(a) corresponds with Figure 29(b) and shows a line-to-line fixed pattern offset distribution having an rms value, which is approximately 43 dB below the maximum peak-to-peak signal illustrated in Figure 29.

Figure 30(b) corresponds with Figure 29(b) and indicates excellent fixed pattern noise performance within each line (32 samples). A slight upward tilting of the waveform can be discerned, which is due to a leakage current component. The tilt is caused by the fact that each consecutive sample in a line was stored in the two-dimensional array for one additional cycle.

An investigation of spectral distortion due to nonlinearity of the input sampling scheme, and the addition of fixed pattern noise to the signal was conducted using these devices. A microprocessor-generated 32-byte sine
Figure 27  SPS Operation of the Memory at 1 MHz. (a) Input waveforms, 0.5 V/div., (b) Memory Output, 0.05 V/div. Horizontal scale 200 us/div.
Figure 28 Step Response Used to Determine CTE. Vertical 20 mV/div, Horizontal 1 μs/div
Figure 29  Reformatting Operation of the Memory at 1 MHz. (a) Input waveform, 0.5 V/div., (b) Memory output 0.05 V/div. Horizontal Scale 200 µs/div. (c) Memory Output, 50 µs/div. (d) 5 µs/div.
Figure 30
DC Response in Reformatting Mode Showing Fixed Pattern Noise Components
(a) Frame to Frame, 5 mV/div., 500 μs/div.
(b) Line to Line, 5 mV/div., 5 μs/div.
wave with an accuracy of eight bits was phase-locked to the CTM timing to produce a signal source with low harmonic content, as shown in Figure 31. Ideally, the CTM will reformat the phase-locked signal to reproduce the 32-byte sine wave shifted in frequency by five octaves; i.e., the output signal fundamental occurs at the line rate of the device. Harmonic distortion and fixed pattern contributions at the output can then be evaluated by comparing the resultant spectral components with the signal fundamental.

Figures 32, 33, and 34 show the results obtained with an output signal magnitude of 100 mV peak-to-peak at a 250 kHz sample rate. The output fundamental is at a frequency of 250 kHz + 32 = 7.81 kHz. Figure 32 shows that the second harmonic distortion component has a magnitude less than -30 dB relative to the fundamental, with the magnitude of all other harmonics less than -40 dB. The large second harmonic contribution to the total distortion is characteristic of the "fill and spill" input scheme used in this test. The linearity of the fill-and-spill input technique depends on maintaining a minimum interval between "fill" and transfer into the CCD. This interval is fixed at 25% of one clock period by the exerciser circuitry and leads to higher second harmonic distortion than is ultimately achievable using this input technique. Figure 33 shows the spectral distribution of the intraline fixed pattern noise, which contains components at multiples of the line rate (7.81 kHz). By removing the fat zero charge in the device, the contribution of clock feedthrough to the output noise can be measured as shown in Figure 34. If the clock feedthrough contributions of Figure 34 are "subtracted" from the total fixed pattern spectrum shown in Figure 33, the remaining noise components represent the contributions due to threshold variations that would not be removed from the output signal by subsequent filtering. Care must be taken in the evaluation of the separate contributions that add to the total noise spectrum because the phase relationship between these components is unknown. However, reasonable estimates of the individual contributions can be made. From Figures 33 and 34
Input Signal

Aliased Images

Horizontal: 1 kHz/div
Vertical: 10 dB/div
Bandwidth: 30 Hz

Figure 31 Input Signal Spectrum

Reformatted Signal

2nd Harmonic

Horizontal: 20 kHz/div
Vertical: 10 dB/div
Bandwidth: 300 Hz

Figure 32 Spectrum of Reformatted Data
Figure 33 Spectral Components of Fixed Pattern Noise for a 32 x 32 CTM (Leftmost Spike is Zero Hz Marker)

Figure 34 Spectral Components of Clock Feedthrough at Output of 32 x 32 CTM (Leftmost Spike is Zero Hz Marker)
it appears that the worst spectral components of fixed pattern noise that can be attributed to threshold variations are at least -48 dB relative to the signal fundamental shown in Figure 32. For comparison, the rms thermal noise level at the CTM output (dominated by kT/C noise generated at the output node) is calculated to be approximately 60 dB below the peak output signal level. The need for further reduction in the fixed pattern noise introduced in the input demultiplexer clearly requires further effort in both materials preparation and device design.
SECTION IV

64 x 64 CTM DESIGN

Having established the viability of the two-dimensional charge transfer concept for the CTM device, a redesign was accomplished to achieve a configuration more closely aligned with the original program objectives (100 x 100 array, 50 MHz operation). Consideration of potential performance versus technical risk led to the decision to expand the CTM structure to a 64 x 64 element array. Although the same basic demultiplexer, multiplexer, and two-dimensional transfer cell designs were employed, a few minor modifications were incorporated for improved performance.

The demultiplexer channel width was reduced from 1.6 mils (40.6 μm) to 0.8 mils (20.3 μm) to improve fixed pattern noise performance. A new output amplifier circuit was included to enhance high frequency performance. A schematic diagram is shown in Figure 35. Specific modifications included:

- increasing the aspect ratio of the buried channel reset transistor from 3/1 in the 32 x 32 design to 6.7/1
- incorporation of two-stage source-follower output buffer amplifier circuit.

With a 500 Ω external load resistor and 5 pF load capacitance, the -3 dB bandwidth of the output stage is predicted to be greater than 100 MHz and the on-chip power dissipation is less than 100 mW.

Figure 36 shows a photomicrograph of the completed chip. The die dimensions are 180 mils by 220 mils (4.57 mm x 5.59 mm). Experimental results are discussed in the following section.
Figure 35 Schematic Diagram of the High-Speed Output Circuit
SECTION V
64 x 64 CTM EVALUATION

The lot of 64 x 64 CTMs was completed in April 1979. Initial evaluation indicated a very low yield due to a high incidence of interlevel shorts between the polysilicon gate levels. The remaining devices were manually probed to eliminate those exhibiting interlevel shorts among the memory array clock phases. A number of operational units were obtained from the presorted chips.

Figure 37 shows the response of a 64 x 64 CTM operated in the SPS mode at a 1 MHz clock rate and shows the resulting 4.096 ms delay. Figure 38 shows 100% duty cycle reformatting operation at a 1 MHz clock rate. The dynamic range of the device is comparable with that observed for the 32 x 32 element CTM on high quality starting material. Figure 39 shows the observed fixed pattern noise in the reformatting mode. Figure 39(a) shows several output fields and indicates that the line-to-line variations are less than 1 mV peak-to-peak. Figure 39(b) shows several lines of the output waveform and indicates that the primary fixed pattern component is due to dark current integration. The amplitudes of the higher frequency components are substantially smaller than 1 mV.

Figure 40(a) shows the step response of the 64 x 64 CTM in the SPS mode at a 5 MHz clock rate and is indicative of excellent CTE. Figure 40(b) shows the swept spectrum of the transfer function of the SPS delay line. An estimate of the CTE can be obtained from the departure of the spectrum from the ideal \( \sin(\pi f/f_c)/(|\pi f/f_c|) \) spectrum. The maximum error occurs at the Nyquist rate \( f_c/2 \) and a conservative estimate of CTE = 0.9999 results.

These devices were evaluated using a timing circuit that was designed for high speed operation using ECL components. Discrete clock driver circuits were designed using Schottky diode clamped VHF power transistors. The timing and driver circuits are included in Appendix A. The ECL circuitry was capable of
Figure 37  SPS Operation of the 64 x 64 CTM at 1 MHz
Figure 38 100% Duty Cycle Reformatting Operation of the 64 x 64 CTM at 1 MHz
Figure 39  Observed Fixed Pattern Noise Levels in the 64 x 64 CTM.  
(a) Line-to-line and (b) Sample-to-sample in one line.
Figure 40  (a) Step Response in SPS Mode at a 5 MHz Clock Rate.  (b) Swept Spectrum of SPS Delay Line Transfer Function from Which CTE = 0.9999 is Estimated.
generating the necessary waveforms for operation in excess of 50 MHz (200 MHz master clock rate). However, variations in propagation delay in the ECL-TTL converters used in the driver circuits limited high frequency operation to a maximum clock rate of 40 MHz.

Figure 41 shows operation in the reformatting mode at a 25 MHz clock rate. Note that a number of clock "glitches" are discernible on the output waveform.

Fixed pattern noise performance at 25 MHz is shown in Figure 42. Figure 42(a) shows one line of the output waveform and indicates that the dominant noise component is due to capacitive coupling of the memory and transfer clock waveforms to the output node. Figure 42(b) shows one frame of the output waveform and indicates a spurious response during the first line of each field. This response has been traced to coupling of the demultiplexer transfer pulse to the demultiplexer input node. The remaining fixed pattern noise appears to be 2 to 3 mV peak-to-peak, which is 36 to 40 dB below the peak signal level. As anticipated, the dynamic range is slightly reduced at this higher clock rate.

Figure 43(a) shows the response of the 64 x 64 CTM to a triangular test waveform in the SPS mode at a 40 MHz clock rate. The test waveform was triggered at the frame rate so that the 102.4 μs delay is indicated by the interval between the output waveforms. Figure 43(b) shows more detail of the output waveform during the flat portion of the output in Figure 43(a) and illustrates the complexity of the waveform due to coupling of the multiplexer clock pulses to the output node. Note that further sampling of this waveform will require an extremely narrow sampling aperture to avoid errors caused by timing circuit phase noise and the steep slope of the output waveform during the signal interval.

Figure 44 shows the results of an independent measurement of the bandwidth of the two-stage source follower output amplifier. The top trace is
Figure 41 64 x 64 CTM Output at 25 MHz Rate
Figure 42 64 x 64 CTM Output Fixed Pattern Noise at 25 MHz Rate
Figure 43 Output Waveforms for the 64 x 64 CTM in the SPS Mode at a 40 MHz Clock Rate
Figure 44  Output Amplifier Frequency Response. Top Trace: tracking generator output level. Bottom Trace: amplifier output.
the output of a tracking generator, which is ac-coupled to the $V_{\text{reset}}$ node of the reset switch transistor. The dc operating point was established through a 50 Ω resistor and the gate of the switch transistor was maintained at +15 V to ensure operation in the linear region. The external load attached to the output node was a 1000 Ω resistor in parallel with a load capacitance of approximately 6 pF. The lower trace in Figure 44 shows the response measured with a FET probe at the output pin. The voltage gain of the amplifier is 0.45 (-7 dB) and the -3 dB bandwidth is approximately 50 MHz. The bandwidth can be increased by reducing the value of the load resistor and increasing the $V_{\text{reset}}$ level to increase the drain current in the output stage.

All of the operational 64 x 64 CTM devices eventually failed during the high frequency tests. The last operational device was used to obtain the photographs shown in Figure 43 and failed before the reformatting mode could be tested. The problem appears to be related to the higher peak-to-peak levels of the clock waveforms that were required for successful high-speed operation and to the questionable integrity of the interlevel oxide as suggested by the low initial yield. It is suspected that the higher clock levels are required to offset voltage drops across the substrate, which can become appreciable at high clock rates.

Another set of devices (30) was bonded during the preparation of this report. One "semioperational" device was identified, and it failed during initial evaluation.

Because of the excellent yield obtained for the 32 x 32 CTM lots, it is apparent that the low yield observed for the 64 x 64 CTM is due to a processing error. Inspection of the processing history of this lot failed to uncover an obvious error. It is anticipated that fabrication of a second lot (with particular attention paid to the deposition of the interlevel oxide) will be required to fully evaluate the 64 x 64 CTM device.
SECTION VI
BIPOLAR TEST BAR DESIGN

At the design review held in April 1978 it was decided that the MESFET amplifier development discussed in the proposal would be replaced with the development of a process-compatible bipolar device. The results of an exploratory development program indicated the following potential advantages of a bipolar emitter-follower CCD output stage:

- Lower CCD output mode capacitance, resulting in significant voltage gain as compared to present output techniques.

- Lower amplifier output impedances, resulting in potentially higher output circuit bandwidths.

- Elimination of the precharge operation (in some applications), a considerable simplification for high speed operation.

These advantages are discussed in more detail in the remainder of this section, following a review of the high-speed capabilities of conventional CCD output circuits.

A. High-Speed CCD Output Circuits

Figure 45 illustrates two typical circuits used to accomplish the charge/voltage conversion and buffering in CCD output circuits at low data rates. Figure 45(a) shows a reset amplifier in which signal charge is transferred to an output diode that is reset once per clock cycle to erase the collected charge. Figure 45(b) shows the so-called "floating gate" output circuit in which charge is sensed via capacitive coupling to the surface potential under the sense electrode. The signal charge is effectively erased when it is transferred under the φ2 electrode and subsequently to a reverse-biased diode drain. Since there is no dc term associated with the
Figure 45 Operation of Conventional CCD Output Circuits
(a) Reset Amplifier, (b) Floating Gate Amplifier
charge sensing operation, the reset transistor need only be operated at low
duty cycles to obviate the effects of leakage current, which can result in a
slight time dependence of the \( V_{\text{ref}} \) level (shown exaggerated in the figure).

Both circuits result in a sample-and-hold waveform that can be
approximated by the rectangular waveform shown in Figure 46(a) for the case of
a dc (''fat zero'') input signal. The period of the waveform is \( T = 1/f_c \) where
\( f_c \) is the clock frequency and the duration of the signal interval is \( T \), which
is taken to be \( 3T/4 \) in the example. Neglecting the \( V_{\text{ref}} \) offset, the
resulting output spectrum is shown in Figure 46(b) and consists of a
distribution of \( \delta \)-functions occurring at integer multiples of the clock
frequency \( f_c \) and weighted by the aperture function \( \sin (\pi f_T)/(nf_T) \).

Figure 46(c) shows the situation when signal samples \( g(nT) \) are added in
the output. The continuous function \( g(t) \) has a band-limited spectrum \( G(f) \)
shown in Figure 46(d). Neglecting the flat zero component and assuming that
the output charge transfer can be modelled by a \( \delta \)-function current source
\( l_0(t) \) having a constant Fourier transform \( l_\omega \), the analysis in Appendix B
shows that the spectrum of the output voltage \( V_{\text{out}}(f) \) for the reset
amplifier is given by

\[
V_{\text{out}}(f) = \frac{T}{T} \frac{l_\omega}{C_{\text{out}}} \frac{\sin \pi f_T}{\pi f_T} \sum_{n=-\infty}^{\infty} G(f + nf_c)
\]

where \( C_{\text{out}} \) is the total capacitance of the output node and the infinite sum
is indicative of aliasing of the spectrum \( G(f) \) about multiples of \( f_c \). The
resulting output spectrum appears in Figure 46(e). Note that in the ideal
case \( T = T \), and the \( \sin x/x \) function optimally filters the clock ''noise'' due to
the fat zero term.

Potential device related clock rate limitations include operation of the
rest switch in the reset amplifier and the bandwidth of the output buffer for
both configurations. A practical limitation of the floating gate approach is
due to capacitive coupling of the clock waveforms applied to the electrodes.
Figure 46 Frequency Response of the Reset Amplifier. (a) Output waveform and (b) spectrum for "fat-zero" component. (c) Output waveform for input signal having spectrum shown in (d) Resulting output spectrum is shown in (e)
adjacent to the sense electrode. This effect can be reduced by maintaining the adjacent electrodes at appropriate dc levels to act as electrostatic shields. This results in an unavoidable increase in the effective gate length of the sense electrode, causing an increase in the transfer time constant, which itself limits the maximum operating rate. For these reasons the floating gate amplifier is seldom used where high clock rates are desired, and further discussion is limited to the reset amplifier configuration.

Figure 47 shows an equivalent circuit of the reset amplifier. Charge transfer is modelled by the current source $I_s$ which provides current pulses having amplitude $-I_s$ and duration $\tau_s$ so that the total charge transferred is $Q_s = -I_s \tau_s$. The total output node capacitance is

$$C_{out} = C_{gs1} + C_D + C_{gd2} + C_{gs2} (1 - A)$$

where $C_{gs1}$ = gate-source capacitance of M1

$C_D$ = reverse bias capacitance of output diode

$C_{gd2}$ = gate-drain capacitance of M2

$C_{gs2}$ = gate-source capacitance of M2

$A$ = voltage gain of the output stage (M2, $G_L$)

The transient associated with the reset operation is determined by the drain-source conductance of M1 and capacitance $C_{out}$. Assuming that the amplitude of the reset pulse $V_{reset}$ is larger than $V_{ref}$, then M1 operates in the linear region, since the peak drain-source voltage corresponds to the CCD output signal, which is typically less than one volt.

The drain-source conductance of M1 is

$$G_{DS} = 2 K V_{GS0} W/L$$

where $K = \mu e_{ox}/\tau_{ox} \approx 10^{-5}$ for typical devices having 800 to 1000 Å oxide thickness. $V_{GS0}$ is the gate-source voltage during the reset, and $W$ and $L$
Figure 47 Equivalent Circuit for the Reset Amplifier
are the width and length of the transistor gate region. The reset-time constant is

$$\tau_{\text{reset}} = \frac{C_{\text{out}}}{(2K V_{\text{GS0}} W/L)}$$

Typical values of the parameters are $C_{\text{out}} \sim 0.25\, \text{pF}$, $V_{\text{GS0}} \sim 2\, \text{V}$, and $W/L \sim 3$ giving $\tau_{\text{reset}} \sim 2\, \text{ns}$ and resulting in a $-3\, \text{dB}$ response frequency of about 80 MHz. Since a low duty cycle is desirable for the reset operation, this response time is marginal for device operation in excess of 40 MHz.

The value of $\tau_{\text{reset}}$ can be substantially reduced by employing a depletion transistor for $M_1$, thereby increasing $V_{\text{GS0}}$. A buried channel device exhibits a threshold voltage of about $-6\, \text{V}$ at the bias levels employed in the output circuit, and results in a typical $V_{\text{GS0}}$ of 8 V. Thus $\tau_{\text{reset}}$ can be reduced to approximately 0.5 ns which is sufficient for operation at a 50 MHz clock rate with a 25% reset pulse duty cycle.

The function of the source-follower output amplifier is to provide a buffer between the low capacitance CCD output node $C_{\text{out}}$, and the output load capacitance $C_L$ due to package parasitics and external circuitry. Typical values of $C_L$ are 5 to 10 pF, and, while it is desirable to keep $C_{\text{out}}$ as small as possible, typical values range from 0.1 to 0.25 pF. Thus, a worst-case capacitance ratio of 100 must be taken into account in the amplifier design, while maintaining the desired bandwidth and power dissipation parameters.

The capacitance buffering capabilities of a single MOS source-follower stage can be calculated from small signal device parameters at the established bias levels. The following analysis is based on typical device parameters for 0.3 µm gate lengths fabricated using the self-aligned gate NMOS process.
The input capacitance of a source-follower stage is

\[ C_{in} = C_{gd} + C_{gs} (1-A) \]

\( C_{gd} \) is the gate-drain capacitance in saturation

\[ C_{gd} = W C_{edge} + W L' C_{gate}/2 \]

where \( C_{edge} \) is the fixed capacitance due to the small gate-drain overlap, \( C_{gate} \) is taken as the gate oxide capacitance, and \( L' \) is the effective gate length that takes into account lateral diffusion of the drain and source regions (\( L' = 0.22 \) mils). \( C_{gs} \) is the gate-source capacitance and is equal to \( C_{gd} \) in saturation and \( A \) is the voltage gain of the source-follower stage given by

\[ A = \frac{g_m + j \omega C_{gs}}{g_m + G_L + j \omega (C_L + C_{gs})} \]

Assuming that \( C_{gs} \ll C_L \), the -3 dB cutoff frequency is given by

\[ \omega_C = (g_m + G_L)/C_L \]

The value of the \( G_L \) is based on power dissipation requirements

\[ G_L = 2 \frac{P_{DMAX}}{V_{DD}^2} \]

where \( P_{DMAX} \) is the maximum desirable power dissipation and \( V_{DD} \) is the stage power supply voltage. It is assumed that \( G_L \) will be furnished off-chip so that \( P_{DMAX} \) is taken to be 200 mW with \( V_{DD} = 15 \) V. The resulting value of \( G_L \) is 1.78 mmhos.
The required $g_m$ is then dictated by

$$g_m = \omega C_L - G_L$$

and is related to device parameters by

$$g_m^2 = 2K_1D \frac{W}{L}$$

where $K = \sim 10^{-5}$. The gate width is then

$$W = \frac{g_m^2 L}{2K_1D}$$

$$= \frac{g_m^2 L}{KV_{DD}GL},$$

where it has been assumed that the voltage drops across $G_L$ and the transistor are equal. The input capacitance is then calculable.

Taking $C_L = 10$ pF and $\omega C = 2 \pi \times 100$ MHz, the required $g_m$ is $4.5$ mmhos and $A = 0.72$. The resulting gate width is $W = 23.4$ mils. With $C_{\text{edge}} = 0.009$ pF/mil and $C_{\text{gate}} = 0.22$ pF/mil$^2$ the input capacitance is $C_{\text{in}} = 1.15$ pF, which is too large to be applied to the CCD output node. Further decrease in the input capacitance can only be achieved at the expense of higher power dissipation or inclusion of a second source-follower stage to provide additional buffering.

Repeating the design process, the second stage power dissipation is ratioed by the capacitance reduction of the output stage:

$$P_D = P_{\text{DMAX}} \frac{C_{\text{in}}}{C_L} = 23 \text{ mW}.$$ 

It is assumed that the same $V_{DD}$ supply is used, thus $G_{L2} = 204$ mmhos, the required $g_m$ is 518 $\mu$mhos and $A = 0.72$. The gate width is $W_2 = 2.6$ mils and the resulting input capacitance is $C_{\text{in}} = 0.16$ pF, which is a reasonable
value for application to the CCD output circuit. The overall voltage gain of the two-stage amplifier is 0.52 and the -3 dB bandwidth is 64 MHz.

Although the paper design presented above is not optimized, it is indicative of the performance available from two-stage NMOS source-follower amplifiers. The primary tradeoff is (as usual in MOS designs) bandwidth versus power dissipation for a fixed load capacitance. Note that the results are independent of oxide thickness, since that parameter affects W and Cgate in a complementary manner. However, the results are approximately proportional to the square of the minimum attainable gate length (L), since that parameter affects W and the expressions for Cgs and Cgd in a linear manner.

It is concluded that conventional CCD output circuits can be employed at clock rates near 50 MHz. This may be expanded to the 80 to 90 MHz region with shorter gate lengths. The on-chip power dissipation is high and may be prohibitive in multichannel circuits requiring several independent output amplifiers. In addition, generation of the short duty cycle reset pulse waveform at MOS drive levels is a significant problem area.

Figure 4.8 shows a schematic diagram of a high-speed MOS buffer amplifier that was included on the 32 x 32 CTM test bar. This circuit employs three source-follower stages and a source-coupled voltage gain stage to compensate for the low gain of the source-followers. SPICE circuit simulation predicted a voltage gain of 1.3 (2.1 dB); -3 dB bandwidth of 80 MHz and power dissipation of 125 mW when driving an external load of 1 kΩ in parallel with 10 pF.

Figure 4.9 shows the predicted and measured frequency response for the high-speed amplifier. Although the measured bandwidth (50 MHz) is lower than predicted, the circuit exhibits slightly more gain (6 dB) than expected. Thus, the measured gain-bandwidth product is in good agreement with the predicted value. Further analysis indicated that the primary sources of the
Figure 49  Measured and Predicted Frequency Response of the High Speed Output Buffer
unexpected bandlimiting were an underestimation of the device capacitances and excessive load impedance in the gain stage due to unexplained shifts in the threshold voltage of the buried channel load transistors.

B. Bipolar CCD Output Circuit

Based on consideration of the difficulties associated with operating a conventional CCD output circuit at high clock rates, an earlier output scheme was reconsidered. Figure 50 shows a schematic diagram that illustrates the approach. The reset switch is replaced with a resistor connected to the Vref supply. The reset operation can be considered to be continuous, rather than occurring at a specified time during the clock cycle. The RCout time constant of the output node results in zeroth order integration.

An analysis of the response of this circuit using the same assumptions applied previously to the reset amplifier is included in Appendix B and shows that the spectrum of the output waveform is given by

\[ V_{\text{out}}(f) = \frac{R_{\text{Cout}}}{T} \frac{V_{\text{in}}}{C_{\text{out}}} \left[ 1 + (2 \pi f R_{\text{Cout}})^2 \right]^{1/2} \sum_{n=-\infty}^{\infty} G(f + nf_c) \]

The output circuit transfer function \( H_{\text{out}}(f) = V_{\text{out}}(f)/G(f) \) is shown in Figure 51 for the two circuit configurations with \( 0 < f < f_c/2 \). \( \tau/T \) is used as a parameter for the reset amplifier, while \( R_{\text{Cout}}/T \) is the parameter for the present case. \( C_{\text{out}} \) is assumed to be the same for both cases.

It is seen that the response for \( R_{\text{C}}/T = 0.25 \) is approximately the same as for \( \tau/T = 0.5 \) except for an additional 8.5 dB attenuation. Similarly, the response for \( R_{\text{C}}/T = 0.125 \) is nearly the same as that for \( \tau/T = 0.25 \), except for an additional 12 dB attenuation. Note, however, that whereas the noise introduced in the reset amplifier is effectively sampled and thereby completely aliased into the Nyquist interval \( (0 < f < f_c/2) \), the noise
Figure 50 Schematic Diagram and Output Waveforms for the RC Output Circuit
generated in the RC output circuit can be further bandlimited in external circuitry, leading to substantially lower noise.

Thus, if $C_{out}$ is reduced by a factor of 3 to 4, output circuit performance of the circuit very closely approaches that of the reset amplifier, and it is not necessary to generate the high speed reset pulse. This is the rationale that led to the application of the bipolar output stage to high speed devices, since the effective output capacitance can be dramatically reduced. The bipolar device is integrated in the CCD output circuit. The structure is realized by adding one mask and two extra ion implantation steps near the end of the CCD process to fabricate a nondepleted n-base region and a self-aligned $p^+$ emitter. This combination forms a vertical PNP transistor structure with substrate serving as the collector and operates as an emitter follower. The emitter can be biased with an external resistor to provide a low output impedance for direct connection to external circuitry. The dominant output node capacitance is the base-collector capacitance of the bipolar device, since the load capacitance is buffered by the $B$ of the device. Figure 52 shows an equivalent circuit of the bipolar output amplifier.

When a charge packet flows into the base region of the PNP transistor, it turns on and draws emitter current that is $\beta$ times the base current equivalent of the signal charge. This current gain allows a much larger drive capability. When the charge packet is annihilated by recombination in the base, the PNP transistor switches off and is automatically reset for the next charge packet. Contrary to floating node schemes, there is no precharging necessary and no clock feedthrough at the output when no charge packets are sensed in this mode of operation of the bipolar output detector. This feature is also expected to result in reduced fixed pattern noise in multichannel CCD circuits.
Figure 52  Schematic Diagram of the Bipolar Output Device
C. Bipolar Output Concept And Design Consideration

Figure 53 shows the cross section of a CCD channel that incorporates a bipolar charge detection element at the output. This detection element is shown for a buried channel CCD structure. It can, in principle, be used with both surface and buried channel CCDs. The two-phase coplanar electrode \textsuperscript{10,11} CCD shown in Figure 53 is representative of the CCD structure that was used to demonstrate the concept. The concept is by no means limited to this CCD structure and may be used with other two-, three-, or four-phase versions as well.

Figure 53 also shows an integrated load device and source-follower buffer stage, which allow operation of the circuit in one of several modes to be discussed later.

The bipolar element is fabricated at the end of the CCD process by opening a window in the polysilicon output gate electrode. The n-type base region implant is performed through the same opening. In principle, the lateral straggle of the implants \textsuperscript{12} should be adequate to ensure that no emitter-collector shorts occur at the edge of the window. To guarantee this, however, a low temperature heat step (\( \leq 200^\circ\text{C} \)) may be used to flow the resist into the window to provide additional margin.

The detection scheme presented here is designed to operate at very low current levels. The effective base current into this device may be computed from the dynamics of a charge packet being transferred into the base region through the output gate. It has been shown,\textsuperscript{13} that the charge transfer in a CCD is approximately exponential and the electrode potential has a time constant \( T_f \), which may be described by

\[
\frac{1}{T_f} = C_1 \frac{\pi^2 D}{4L^2} + \mu E \frac{Y}{4D},
\]

(1)
Figure 53 The Structure and Concept of the Bipolar Output for CCDs. The top figure shows the cross section of the output and the bottom shows the operation of this output scheme.
where

\[ D = \text{electron diffusivity} \]
\[ \mu = \text{electron mobility} \]
\[ L = \text{length of the gate electrode} \]
\[ E_y = \text{lateral fringing field} \]
\[ C_1 = \text{a parameter that varies from 1 to 2 as the normalized fringing field varies from 0 to } \infty. \]

The charge \( Q_B(t) \) flowing into the base is represented as

\[ Q_B(t) = Q_0 A_{\text{well}} (1 - e^{-t/\tau_f}) \quad (2) \]

In this expression \( Q_0 \) is the charge capacity per unit area and \( A_{\text{well}} \) is the area of the CCD storage well.

The base current in the bipolar device will then by given by

\[ I_B = \frac{dQ_B(t)}{dt} = A_{\text{well}} \frac{Q_0}{\tau_f} e^{-t/\tau_f} \quad (3) \]

This exponentially decaying pulse of \( I_B \) is fed into the bipolar device which is normally off.

The transient response due to this current may be estimated from the charge control analysis of the bipolar transistor. The charge packet will supply the base recombination and charge the emitter and collector junction capacitance as well as the base. The differential equation describing this process may be written as

\[ \frac{d}{dt} (Q_B + Q_T + Q_C) + \frac{Q_B}{\tau_B} + \frac{d}{dt} C_{gbe} = -I_B \quad , \quad (4) \]
where

\[ \begin{align*} 
Q_B & \quad \text{base minority charge} \\
Q_{Te}, Q_{Tc} & \quad \text{junction transition charge} \\
C_g & \quad \text{output gate capacitance} \\
V_{be} & \quad \text{base emitter voltage.}
\end{align*} \]

Following the standard charge control treatment\(^1\) Equation (4) may be written for our configuration as

\[ i_B = \left( \frac{1}{\omega_T} + \frac{1}{\omega_g} + \frac{C_{LR_L}}{\beta} \right) \frac{di}{dt} + \frac{i}{\beta} \]  \hspace{1cm} (5)

where \( \omega_g = g_m/C_g \) for the output gate and \( \omega_T = 2\pi f_T \) where \( f_T \) is the short-circuit current gain bandwidth product. The collector resistance is assumed to be zero, and \( C_{LR_L} \) represents the emitter load.

We may now solve Equation (5) and (3) to obtain the emitter current as

\[ i_e(t) = A_0 A_w e^{-t/\tau_f} \left[ \frac{1}{\tau_f + k} - \frac{(1 - \beta)^{1/\beta}}{\beta^{1/\beta} (\tau_f + k)} e^{-t/\tau_f} \right] \]  \hspace{1cm} (6)

where \( k = \left( \frac{1}{\omega_T} + \frac{1}{\omega_g} + \frac{C_{LR_L}}{\beta} \right) \)

It is obvious from (6) that the emitter current provides a factor of \( \beta \) increase in the charge available to drive the next stage. Alternatively, the load capacitance is now reflected to the base node as \( 1/\beta \) times what it would be in a simple precharge circuit. Base widths of \( \sim 1500 \mu m \) can be fabricated and lead to an \( \omega_T \) of much greater than 1 GHz. So, in typical layouts, the total effective node capacitance of this output device is dominated by the enlarged output gate needed to enclose the base structure. Hence, frequency response of this output structure would be limited by the output gate.
Figure 54(a) shows a theoretical evaluation of the emitter current waveform based on Equation (6) for a charge packet size of $\sim 1.6 \, \text{fc}$. The assumed parameter values are shown in the figure. The simple theory presented here is compared also to a numerical simulation using SPICE circuit simulator and typical MOS parameters Figure 54(b). For the SPICE simulation we have used a simplified current pulse as the input and plotted the emitter voltage as the output. The emitter voltage follows the emitter current prediction of Figure 54(a) based on the charge control model. It has been assumed in this calculation that a minimum geometry output gate node can be built. In practice the output gate capacitance is larger since it surrounds the bipolar. This will limit the turn-off characteristics of the bipolar device.

This application of the bipolar transistor as the output of the CCD shift register requires that the $\beta$ fall-off at low currents be minimized. It has been very well known that the low current $\beta$ fall-off is dominated primarily by injection efficiency considerations. The generation component of current in the emitter-base depletion region and the surface recombination component play a very important role in the determination of the low current fall-off characteristics of the bipolar transistor. The incorporation of a bipolar device in a CCD process has some very significant advantages. CCDs are built on (100) silicon to obtain low surface state density. Gettering sequences designed into the process allow very high bulk lifetimes to be achieved in processed devices\textsuperscript{15,16}. Bulk lifetimes of $\sim 1$ ms and surface recombination velocities of $< 1 \, \text{cm/s}$ have been measured in our devices. In addition Hansell and Fonstad\textsuperscript{17} have shown that the $\beta$ linearity at low current is affected most significantly by decreasing emitter doping to take advantage of the increased mobility. They show that $\beta$ at very low currents is obtained at the expense of the absolute $\beta$ value.

In an exploratory development program bipolar structures with a $\beta$ of $\sim 800$ were fabricated as well as those with $\beta \sim 100-150$. The variations in $\beta$ were achieved by varying the base implant dose. The typical implant distribution calculated for bipolar devices with $\beta \sim 100-150$ is shown in Figure 55. The
Figure 54(a) Emitter Current Response of the Bipolar Output for a Charge Size of 1.6 fC. The maximum emitter current is $I_E = 2.3 \mu A$. The calculations are based on $\beta = 100$ and $k = 0.09 \mu s$. 

\[ Q_B(t)/Q_0 A_{well} \]

Time (ns)

$Q_B$

$I_E$

$Q_1/ Q_1$
Figure 54(b) SPICE Simulation of the Bipolar Output. The response to a current pulse $I_B$ is shown. This simulation is more optimistic than the charge control theory because default parameters for the bipolar are used.
Figure 55  Doping Distribution Due to the Base and Emitter Ion Implants Forming the PNP Bipolar. The collector distribution is non-uniform because of the channel implants required to avoid short channel effects and punch through.
estimated $f_T$ of this pnp transistor from theoretical considerations is $> 1$ GHz. However, the frequency response of the bipolar device by itself cannot be measured since it is integrated with an FET device at the base. The $\beta$ vs $I_C$ characteristics of these devices are shown in Figure 56. For a typical charge packet size of $\sim 4 \times 10^{-15}$ C and a gate length of $\sim 4 \mu$m the effective base current pulse is about 26 nA. Thus for $\beta \sim 100$ the typical operating region is at $I_C \sim 25$ $\mu$A. It is obvious from Figure 56 that the design for $\beta \sim 100$ has adequate low current $B$ linearity.

The bipolar device concept and design discussed in the previous section may be operated in three distinct modes at the output of a CCD shift register. In the simplest mode the bipolar device is operated with a fixed emitter bias obtained through the emitter load. In this mode the charge packet turns on the bipolar as was described in the previous section, and a large current gain is obtained. However, there is no voltage gain available because the emitter voltage follows the base voltage. The effective charge to voltage ratio, of course, depends on the ratio of the new output node capacitance to the capacitance of a storage well. The use of the bipolar device provides an impedance transformation so that the load capacitance on the emitter node is reflected as a $\beta$ times smaller capacitance at the base. Thus, for a given load an effective increase in the net voltage swing may be obtained for a given charge packet size. This increase is partly offset, however, by the increased capacitance components due to base emitter capacitance of the bipolar device. The operation in this mode has the advantage of very low coupling of clocks and precharge pulses compared to a normal precharged output. A large current drive capability is obtained, but the voltage swing is not significantly increased.

The second mode of operation uses a dynamic emitter biasing or precharging of the emitter node before the charge packet flows into the base. In this mode the emitter is floating when the charge flows into the base and the transient emitter current proportional to $(\beta + 1) I_B$ produces a transient emitter voltage spike that may be much larger than $\Delta V_{BE}$. Thus, a large
Transient voltage swing is available on the base, which may be used as a latching signal in digital applications.

The bipolar output may also be operated in a semi-nondestructive mode. To achieve this, the device is built into one of the electrodes of the shift register. The charge packet flows into the base and remains there for the clock period. During this time the bipolar is turned on. The next phase clocks the charge out of the base into the next storage location. Thus, the signal charge may be propagated after the read operation using the bipolar device. The signal charge used to provide the output signal corresponds to the fraction of the charge packet which recombines in the base during the read cycle. Since all base charge does not have to recombine, this mode of clocking may provide an extremely high speed output.

D. Test Bar Design

To evaluate the capabilities of the bipolar output device in high-speed analog applications, a test bar was designed and fabricated that included the following test structures:

- 2 φCCD with bipolar output (surface and buried channel)
- 2 φCCD with MOS output (surface and buried channel)
- 4 φCCD with bipolar output
- 4 φCCD with MOS output
- Single-stage NMOS/PNP buffer amplifier
- Dual-stage NMOS/PNP buffer amplifier
- Bipolar transistor test arrays for characterization of $f_T$ and $S$
- MOS transistor array located in proximity to the bipolar array to test for threshold voltage variations due to ohmic drop across the substrate caused by bipolar collector current
- Bipolar test array for measurement of $V_{BE}$ variations.
Schematic diagrams of the single and dual stage NMOS/PNP buffers are shown in Figure 57. The combined use of enhancement and buried channel emitter loads is designed to provide an active "pull-up" via the enhancement device, resulting in a pseudocomplementary output stage.

In addition, a modified version of the 32 x 32 CTM is included that incorporates the bipolar output device. The CTM geometry was reduced by 80% (linear) to evaluate the effect of smaller gate geometries.

Figures 58, 59, and 60 are photomicrographs of the completed test bar showing the location of the important test structures.

Initial ion implantation parameter for the base and emitter implants were extrapolated from those used in the previous exploratory development effort. The parameters used for the bipolar test bar lot were distributed across the lot and bracketed the initial values. The energy and dose parameters for each slice are listed in Table 1.
Single Stage (500 μA, 50 Ω)

Dual Stage (2 mA)

Figure 57 PNP/NMOS Buffers
Figure 58  Bipolar Test Device Chip From Bipolar Test Bar
Table 1

Bipolar Test Bar Implant Parameters

<table>
<thead>
<tr>
<th>Slice No.</th>
<th>Base Implant</th>
<th>Emitter Implant</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>$1 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$2 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>2</td>
<td>$1 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$3 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>3</td>
<td>$1 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$4 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>4</td>
<td>$2 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$2 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>5</td>
<td>$2 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$3 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>6</td>
<td>$2 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$4 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>7</td>
<td>$3 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$2 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>8</td>
<td>$3 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$3 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>9</td>
<td>$3 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$4 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>10</td>
<td>$4 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$2 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>11</td>
<td>$4 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$3 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
<tr>
<td>12</td>
<td>$4 \times 10^{13}$/cm$^2$, $^{31}$P, 350 keV</td>
<td>$3 \times 10^{14}$/cm$^2$, $^{11}$B, 50 keV</td>
</tr>
</tbody>
</table>
SECTION VII
BIPOLAR TEST BAR EVALUATION

A. Bipolar Device Evaluation

PNP test transistors on the bipolar device test chip from slices 2, 5, 8, and 11 were used for the initial device evaluation. A schematic diagram showing the equivalent circuit for these test devices appears in Figure 61. The gate of the buried channel MOS transistor was connected to the emitter, thereby ensuring a low resistance base connection. Dc characteristics were obtained using a curve tracer, and typical results are shown in Figure 62. The base-emitter window for these devices had dimensions of 0.4 x 0.8 mils (10.2 μm x 20.3 μm). Betas ranged from 6 to 40 and increased with reduced base doping density, which results in a wider base-collector depletion region and a narrower base width. A parasitic collector resistance of about 4 KΩ is evident in the figure and is due to the high resistivity of the substrate (~25 Ω-cm).

A limited number of devices have been tested using a high frequency network analyzer. The results of these tests are shown in Table 2. The dc betas correspond well with the earlier tests, and the minimum geometry devices exhibit cutoff frequencies in excess of the 1 GHz measurement limit.

B. PNP/NMOS Buffer Evaluation

The single-and dual-stage output buffer circuits shown in Figure 57 were included as test structures on the device test chip. Only the single-stage circuit has been evaluated. Figure 63(a) shows the pulse response of the single-stage buffer driving a 50 pF load. The pull up transient is due to the NMOS devices and exhibits the characteristic 1-(1 + τ)(t/τ)^-1 transient with τ = C_L/g_mο ~18 ns, where g_mο is the device transconductance at t = 0^+. Figure 63(b) shows the pull down transient due to the bipolar device, which exhibits a fall time of about 3 ns. Since this is of the same order as the fall time of the of the input pulse, it is not indicative of the ultimate performance of the bipolar device.
Figure 61  Equivalent Circuit of the PNP Device Test Structure
Figure 62  DC Transfer Curves for the PNP Test Devices. Vertical: collector current 200 μA/div. Horizontal: collector-emitter voltage 2 V/div
<table>
<thead>
<tr>
<th>Slice No.</th>
<th>Base/Emitter Geometry</th>
<th>$I_E$</th>
<th>$\beta_{dc}$</th>
<th>$f_T$</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td>0.4 x 0.4 mils</td>
<td>0.25 mA</td>
<td>21</td>
<td>&gt; 1 GHz</td>
</tr>
<tr>
<td>5</td>
<td>2 x 2 mils</td>
<td>1 mA</td>
<td>18.5</td>
<td>300 MHz</td>
</tr>
<tr>
<td>8</td>
<td>0.4 x 0.4 mils</td>
<td>0.25 mA</td>
<td>12.8</td>
<td>&gt; 1 GHz</td>
</tr>
<tr>
<td>8</td>
<td>2 x 2 mils</td>
<td>1 mA</td>
<td>12.4</td>
<td>200 MHz</td>
</tr>
</tbody>
</table>
Figure 63  Pulse Response of the Single Stage PNP/NMOS Buffer
C. Bipolar CCD Output Circuit Evaluation

Evaluation of the bipolar device in CCD output circuit applications was accomplished using the dual-channel 99-stage two-phase buried channel CCD test structure on the CCD test chip. One channel of this device employs a conventional reset amplifier in the output circuit, while the second channel employs a bipolar device with additional peripheral circuitry to allow operation in any of the modes described in the previous section.

Each CCD channel is 5 mils (127 µm) wide, and the transfer gate length is 0.5 mils (12.7 µm). The emitter/base window for the bipolar device is 0.4 x 0.4 mils (10.2 x 10.2 µm) and is placed within the well region of an extended first-level gate. In the tests described below the bipolar device was utilized as an emitter follower with emitter bias established with an external load resistor.

Figure 64 shows the input test waveform and output response for both CCD channels at a 10 MHz clock rate. The external load resistor for the bipolar device was 470 Ω. Note that the output traces were photographed at different sensitivities, and the peak-to-peak values of the output signals are approximately equal.

Figure 65(a) shows an expanded trace of the reset amplifier output waveform. The reset pulse was about 5 ns in duration, and its timing relative to the output clock phase resulted in a duty cycle of about 70%. Figure 65(b) shows the swept frequency response of the delay line, which exhibits the expected sin x/x spectrum. The response above 10 MHz is degraded by direct feedthrough of the signal from the tracking generator due to the experimental setup.

The transient response of the bipolar output circuit can be obtained from the equivalent circuit shown in Figure 66. The transistor is modeled with a pi-equivalent circuit and the charge transfer with the current source. The differential equations describing $V_B(t)$ and $V_E(t)$ are
Figure 64  Response of the Dual Channel CCD Test Structure at a 10 MHz Clock Rate. (a) Reset amplifier and (b) Bipolar output with external 470 Ω Load.
Figure 65 Output Waveform and Frequency Response Due to the Reset Amplifier. (a) Output waveform detail and (b) Delay line frequency response.
Figure 66  Equivalent Circuit of Bipolar Output Circuit
The complementary solution of (7) is found from

\[ V''_B + \left[ \frac{(\beta + 1)}{r eC_L} + \frac{1}{R_L C_L} + \frac{1}{r eC_B} \right] V'_B + \frac{V_B}{r eR_L C_B C_L} = \frac{i_0}{C_B} \left[ \frac{(\beta + 1)}{r eC_L} + \frac{1}{R_L C_L} \right] + \frac{i_o}{C_B} \]  

(7)

\[ V_E = V_B + r eC_B V'_B - i_0 e \]  

(8)

Assuming solutions of the form \( e^{-qt} \), it is found that \( q \) must satisfy

\[ q^2 - (a + b + c)q + bc = 0 \]  

(10)

for which solutions are:

\[ q_1 = \frac{(a + b + c)}{2} + \frac{[\frac{(a + b + c)^2}{2} - 4bc]^{1/2}}{2} \]  

(11)

\[ q_2 = \frac{(a + b + c)}{2} - \frac{[\frac{(a + b + c)^2}{2} - 4bc]^{1/2}}{2} \]

Further assuming \( b \ll a, c \), these roots lead to two characteristic time constants given approximately by:

\[ \tau_1 \sim re(C_B + C_L/\beta) \]

\[ \tau_2 \sim R_L C_L + (re + 8R_L) C_B \]
where $\tau_1$ characterizes the turn-on transient and $\tau_2$ characterizes the recovery transient.

Figure 67(a) shows the output waveform for the bipolar output with a 470 $\Omega$ load resistor. The observed turn-on transient is estimated to correspond to a time constant of $\tau_1 \approx 2.3$ ns, while the recovery time constant is approximately $\tau_2 \approx 27$ ns ($\tau/T \approx 0.27$). The small signal emitter resistance can be estimated from $r_e = kT/qI_E$ and exhibits an initial value of about 200 $\Omega$. The base capacitance $C_B$ is estimated from geometrical considerations to be about 0.4 pF. Assuming $\beta = 10$ and $C_L \approx 6.5$ pF, the calculated values of the time constants are $\tau_1 \approx 0.2$ ns and $\tau_2 \approx 5$ ns, both considerably smaller than observed.

Note the presence of a positive "glitch" on the falling edge of the turn-on transient. This transient is attributed to the parasitic collector resistance that causes a positive excursion of the substrate (collector) voltage during the initial current transient and modulates $I_B$ through the Early effect. It is felt that this glitch masks the high-speed turn-on transient.

The simplified transient analysis presented above neglected the dependence of the small signal emitter resistance $r_e$ on the emitter current, $r_e = kT/qI_E$. This dependence results in a positive feedback effect that reduces the base current, especially when low values of $R_L$ are employed. This mechanism is probably responsible for the extended recovery transient.

Figure 67(b) shows the frequency response of the delay line with the bipolar output. The 0 dB reference is the same as in Figure 65(b), and the signal energy contained in the first Nyquist interval for the bipolar output is comparable with that for the reset amplifier. The frequency response is down by about -3 dB at the Nyquist rate, corresponding closely to the $\tau = T/4$ case presented in Figure 51 of the previous section.
Figure 67  Output Waveform and Frequency Response Due to Bipolar Output Stage With 470 $\Omega$ Load. (a) Bipolar output waveform with 470 $\Omega$ load. (b) Delay line frequency response.
Figure 58 shows the output waveform and frequency response for the bipolar output with a 1000 Ω load resistor. Note that the low frequency response in Figure 68(b) is about 9 dB above that in Figure 67(b), whereas only a 6 dB increase is theoretically expected. This improvement is attributed to the $r_e$ term, which increases by about a factor of 2 in this case. The response at the Nyquist rate is down by 6 dB and corresponds to $\tau \sim 0.7$ T. Note that the average energy in the first Nyquist interval is nearly 10 dB greater than observed for the reset amplifier in Figure 65(b).

D. 32 x 32 CTM With Bipolar Output

As in the initial lot of 32 x 32 CTMs, the bipolar test lot exhibited a high incidence of intralevel shorts on the second polysilicon gate level. A number of 32 x 32 CTM/bipolar output chips were prescreened by manual probing and were packaged. None of these devices were found to be operational.

The cause of the intralevel shorts is unexplained, since the "zero-undercut" etch technique was employed in processing the bipolar lot.
Figure 68 Output Waveform and Frequency Response Due to Bipolar Output Stage With 1 kΩ Load. (a) Bipolar output waveform with 1 kΩ load. (b) Delay line frequency response.
The primary objective of this program was the development of an analog CCD reformatting memory for high speed signal processing applications. Based on previous experience with line-addressed structures, a new approach was proposed that utilized a two-dimensional charge transfer cell in a square memory array with integral CCD multiplexers and demultiplexers to provide the interface. In principle this approach offered higher performance because of its potential for lower fixed pattern noise and higher operating rates. However, it was recognized that the complexity of this unique charge-transfer device resulted in a moderately high technical risk.

During the course of this program, five lots of devices were fabricated. The first lot consisted of a test bar that included a 32 x 32 element CTM and numerous CCD test devices. Although half of the lot failed due to an oversight in specifying the process parameters, operational CTMs were obtained, and their performance was observed to be nearly as anticipated. Fixed pattern noise performance across an entire reformatted frame was observed to be at least an order of magnitude better than in line-addressed designs. Because of a layout error, it was not possible to operate these devices in a 100% duty cycle mode.

Two photomask levels were subsequently modified, and two additional lots were processed: the first on standard two-inch diameter starting material and the second on three-inch material known to have more uniform oxygen distribution. It was anticipated that the devices fabricated using the latter would exhibit even better fixed pattern noise performance due to a more uniform impurity distribution. This was found to be the case, and 100% duty cycle operation was observed with about a 6 dB improvement in fixed pattern noise. These devices were subsequently utilized in signal processor subsystem concept demonstrations including a radar doppler processor and two-dimensional discrete Fourier transform. The former was documented in the interim report.
prepared during this program; the latter was described in a paper at the 1978 CCD Applications Conferences and a reprint is included as Appendix C.

The CTM structure was redesigned with minor modifications to realize a 64 x 64 element array. This device was also processed on three-inch material, and, although the yield was low due to a high incidence of interlevel shorts, a number of operational units were obtained, and their performance documented. These devices have been operated at a 40 MHz clock rate with the primary performance limitation due to parasitic coupling of the clock pulses to the output node. Although this noise component can be removed using a double sampling approach, this solution is considered undesirable at high clock rates. Further study should be directed towards on-chip reduction of the parasitic coupling mechanism.

The fifth lot of CTM devices fabricated during this program was a 32 x 32 test device that was included on the bipolar test bar. This device was a "shrink" version of the previous 32 x 32 structure, has a 1.6 mil cell spacing, and included a bipolar output device. Unfortunately, the bipolar test bar has exhibited a high incidence of intralevel shorts (similar to the problem observed in the first lot of 32 x 32 devices) and operational units have not been found.

The secondary objective of this program was the development of a high-speed device for use in CCD output circuits. The results of an internal exploratory development program led to the decision to develop a CCD process compatible PNP bipolar device for this application. The bandwidth capabilities of this device are well in excess of those attainable with NMOS devices. In addition, a particularly efficient output circuit configuration results that is compatible with further continuous-time (as opposed to sampled) signal processing, e.g., as is the case for surface acoustic wave (SAW) device signal processing.
A bipolar device test bar was designed and fabricated and preliminary evaluation shows that the transistors exhibit a cutoff frequency in excess of 1 GHz. Analysis of the operation of an experimental bipolar CCD output circuit shows the frequency response of the output circuit to be in good agreement with theoretical predictions.

In summary, the results of this program have been excellent. The two-dimensional charge transfer CTM has been successfully demonstrated and shown to be a powerful addition to the analog signal processing repertoire. The bipolar output technique has been successfully demonstrated, although a more detailed analysis is needed. Further efforts should be directed toward reducing parasitic coupling of the CCD clocks to the output node and more precise identification of the source of the remaining fixed pattern noise observed in the CTMs.
APPENDIX A
CTM TIMING CIRCUITS

Operation of the two-dimensional charge transfer CTM with 100% duty cycle requires the generation of 15 individual clock waveforms. These waveforms are listed below, identifying the abbreviations used in the diagrams that appear in this appendix.

\[ \phi_1, \phi_2, \phi_4: \] Common high-speed multiplexer and demultiplexer clock phases.

\[ \phi_{3\text{MUX}}: \] \( \phi_3 \) waveform employed in the multiplexer

\[ \phi_{3\text{DMXH}}, \phi_{3\text{DMXV}}: \] \( \phi_3 \) waveform employed in the demultiplexer

\[ \text{IPG}: \] Demultiplexer input gate (sampling) pulse

\[ \text{PRE}: \] Multiplexer output circuit reset pulse

\[ \phi_{\text{M1}}: \] Common memory array clock phase

\[ \phi_{\text{M2H}}: \] Memory array clock phase controlling horizontal transfers

\[ \phi_{\text{M2V}}: \] Memory array clock phase controlling vertical transfers

\[ \text{MUX}_H, \text{MUX}_V: \] Horizontal multiplexer: array transfer pulse
Vertical multiplexer: array transfer pulse
DMX_H: Horizontal demultiplexer: array transfer pulse

DMX_V: Vertical demultiplexer: array transfer pulse.

A block diagram showing the basic organization of all the timing circuits is shown in Figure 69. Four-phase clock waveforms are derived from a high-speed master clock and directly applied to a binary counter having a modulus equal to the number of elements per field (1024 for the 32 x 32, 4096 for the 64 x 64). The carry pulse from the counter toggles a flip-flop whose output state determines whether array charge transfer is horizontal or vertical. This flip-flop is disabled in the SPS mode. The remaining clock waveforms are derived from the four-phase, counter and H/V gating pulses.

Figure 70 shows the clock generator circuit used for initial evaluation of the 32 x 32 CTM. Schottky TTL was employed in this circuit, and the part numbers are identified in Table 3. TTL to MOS level translation was accomplished using SN75363 integrated driver circuits for the high-speed clocks and SN75361 drivers for the memory array clocks.

Figure 71 shows the clock generator circuit that eliminates clock pulse coupling to the output circuit by reducing the duty cycle to 25%. Thus, only a single DMX_H transfer pulse and MUX_V transfer pulse is generated. The T2 waveform is furnished externally and allows synchronization of the timing circuit with other signal processors. A parts list appears in Table 4.

Figure 72 shows the high-speed clock generator circuit for the 64 x 64 CTM. The design is basically the same as shown in Figure 70, except that the field counter is expanded and ECL logic components are employed. Figure 73 illustrates the high-speed pulse generation capabilities of this circuit. Figure 73(a) shows φ_1 and φ_2 waveforms that are derived from a 200 MHz master
Figure 70(b) Circuit Diagram for the 32 x 32 Clock Generator Circuit
<table>
<thead>
<tr>
<th>Designation</th>
<th>Part Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>A1</td>
<td>SN74S30</td>
</tr>
<tr>
<td>A2</td>
<td>SN74S161</td>
</tr>
<tr>
<td>A3</td>
<td>SN74S161</td>
</tr>
<tr>
<td>A4</td>
<td>SN74S161</td>
</tr>
<tr>
<td>A5</td>
<td>SN74S112</td>
</tr>
<tr>
<td>A6</td>
<td>SN74S112</td>
</tr>
<tr>
<td>A7</td>
<td>SN74S112</td>
</tr>
<tr>
<td>B1</td>
<td>SN74S04</td>
</tr>
<tr>
<td>B2</td>
<td>SN74S74</td>
</tr>
<tr>
<td>B3</td>
<td>SN74S00</td>
</tr>
<tr>
<td>B4</td>
<td>SN74S74</td>
</tr>
<tr>
<td>B5</td>
<td>SN74S10</td>
</tr>
<tr>
<td>B6</td>
<td>SN74S10</td>
</tr>
<tr>
<td>B7</td>
<td>SN74S74</td>
</tr>
<tr>
<td>C</td>
<td>SN75363 driver</td>
</tr>
</tbody>
</table>
Figure 71(b) Circuit Diagram for the 50% Duty Cycle 32 x 32 CTM Clock Generator
<table>
<thead>
<tr>
<th>Designation</th>
<th>Part Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>A1</td>
<td>SN74S112</td>
</tr>
<tr>
<td>A2</td>
<td>SN74S112</td>
</tr>
<tr>
<td>A3</td>
<td>SN74S112</td>
</tr>
<tr>
<td>A4</td>
<td>SN74S10</td>
</tr>
<tr>
<td>A5</td>
<td>SN74S74</td>
</tr>
<tr>
<td>A6</td>
<td>SN74S163</td>
</tr>
<tr>
<td>A7</td>
<td>SN74S163</td>
</tr>
<tr>
<td>A8</td>
<td>SN74S74</td>
</tr>
<tr>
<td>A9</td>
<td>SN74S00</td>
</tr>
<tr>
<td>A10</td>
<td>SN74S04</td>
</tr>
<tr>
<td>A11</td>
<td>SN74S02</td>
</tr>
<tr>
<td>A12</td>
<td>SN74S10</td>
</tr>
<tr>
<td>A13</td>
<td>SN74S74</td>
</tr>
<tr>
<td>A14</td>
<td>SN74S74</td>
</tr>
<tr>
<td>A15</td>
<td>SN74S32</td>
</tr>
</tbody>
</table>
Figure 72(a) Circuit Diagram for the 64 x 64 Clock Generator
Figure 73  50 MHz CTM Timing Circuit Operation
clock. Figure 73(b) shows the $\phi_{3\text{DMXH}}$ clock waveform and the DMXH transfer pulse, and shows how the $\phi_3$ waveform is inhibited during the parallel transfer to ensure complete charge transfer.

Figure 74 shows circuit diagrams of the clock driver circuits employed in operation of the 64 x 64 CTM. Clock driver circuit 1 was used to buffer the $\phi_1$, $\phi_2$, $\phi_3$, $\phi_{3\text{DMXH}}$, $\phi_{3\text{DMXV}}$ and $\phi_4$ clock waveforms. Clock driver circuit 2 was employed for the remaining high-speed transfer pulses, the input sampling pulse, and output circuit reset pulse. Figure 75 shows the DMXH waveform (driver circuit 2) and the $\phi_{3\text{DMXH}}$ waveform (driver circuit 1) at a 25 MHz clock rate. At clock rates in excess of 40 MHz, variations in propagation delays through the ECL-TTL convertors result in improper phasing of the clock waveforms.

Table 5 is a parts list for the high-speed clock generator circuit.
Driver Circuit 1

Driver Circuit 2

Figure 74 64 x 64 Clock Driver Circuits
Figure 75 64 x 64 CTM Clock Drivers
Vertical: 5 V/div
Horizontal: 20 ns/div
Table 5
Parts List for the High Speed Clock Generator

<table>
<thead>
<tr>
<th>Designation</th>
<th>Part Number</th>
</tr>
</thead>
<tbody>
<tr>
<td>A1</td>
<td>MC10135</td>
</tr>
<tr>
<td>A2</td>
<td>MC10135</td>
</tr>
<tr>
<td>A3</td>
<td>MC10135</td>
</tr>
<tr>
<td>A4</td>
<td>MC10135</td>
</tr>
<tr>
<td>A5</td>
<td>MC10136</td>
</tr>
<tr>
<td>A6</td>
<td>MC10136</td>
</tr>
<tr>
<td>A7</td>
<td>MC10136</td>
</tr>
<tr>
<td>A8</td>
<td>MC10109</td>
</tr>
<tr>
<td>A9</td>
<td>MC10101</td>
</tr>
<tr>
<td>B1</td>
<td>MC10102</td>
</tr>
<tr>
<td>B2</td>
<td>MC10106</td>
</tr>
<tr>
<td>B3</td>
<td>MC10131</td>
</tr>
<tr>
<td>B4</td>
<td>MC10104</td>
</tr>
<tr>
<td>B5</td>
<td>MC10104</td>
</tr>
<tr>
<td>B6</td>
<td>MC10131</td>
</tr>
<tr>
<td>B7</td>
<td>MC10102</td>
</tr>
<tr>
<td>B8</td>
<td>MC10131</td>
</tr>
</tbody>
</table>
APPENDIX B

DERIVATION OF CCD OUTPUT CIRCUIT TRANSFER FUNCTION

1. Reset Amplifier

The output circuit is modeled as shown in Figure 76(a). The output charge transfer is modeled by a current source $I_s(t)$ given by

$$I_s(t) = \sum_{n=-\infty}^{\infty} s(t - nT) I_0 (t - nT)$$  \hspace{1cm} (12)

where $s(t)$ is the signal waveform and $I_0(t)$ describes the current transient due to charge transfer. The reset switch is closed during the period $nT + \tau_s < t < (n + 1)T$. $V_{out}(t)$ at the output diode is to be calculated. The effect of an output buffer amplifier is, therefore, ignored. $V_{out}(t)$ is given by

$$V_{out}(t) = V_{ref} - \frac{1}{C} \int_{\alpha=nT}^{t} l_s(\alpha)d\alpha, \hspace{1cm} nT < t < nT + \tau_s$$

$$V_{ref}, \hspace{1cm} nT + \tau_s < t < (n + 1)T$$  \hspace{1cm} (13)

The spectrum of $V_{out}(t)$ is given by its Fourier transform:

$$V_{out}(\omega) = \frac{1}{2\pi} \int_{-\infty}^{\infty} V_{out}(t) e^{-j\omega t} dt$$

$$= \sum_{n=-\infty}^{\infty} \int_{nT+\tau_s}^{nT+\tau_s+T} V_{ref} - \frac{1}{C} \int_{\alpha=nT}^{t} l_s(\alpha)d\alpha e^{-j\omega t} dt$$

$$+ \sum_{n=-\infty}^{\infty} V_{ref} e^{-j\omega t} dt$$  \hspace{1cm} (14)
Figure 76 Circuit Models Employed for Calculation of Output Circuit Transfer Functions
which is simplified to yield

\[ V_{\text{out}}(\omega) = \int_{-\infty}^{\infty} V_{\text{ref}} e^{-j\omega t} \, dt \]

\[ = l \sum_{n=-\infty}^{\infty} \int_{t=nT}^{nT+nT_s} \int_{x=0}^{x+nT_s} l_s(\alpha) \, \alpha \, e^{-j\omega x} \, dx \]

with the substitution \( x = t - nT \), (15) becomes

\[ V_{\text{out}}(\omega) = V_{\text{ref}} \delta(\omega) - \frac{l}{c} \sum_{n=-\infty}^{\infty} \int_{x=0}^{x+nT_s} e^{-j\omega nT} \int_{nT}^{x+nT} l_s(\alpha) \, \alpha \, e^{-j\omega x} \, dx \]

with \( l_0(t) = l_0 \delta(t) \), the integral in \( \alpha \) reduces to \( S(nT) \). Thus,

\[ V_{\text{out}}(\omega) = V_{\text{ref}} \delta(\omega) - \frac{l}{c} \sum_{n=-\infty}^{\infty} S(nT) e^{-j\omega nT} \int_{x=0}^{T_s} e^{-j\omega x} \, dx \]

\[ = V_{\text{ref}} \delta(\omega) - \frac{l}{c} T_s e^{-j\omega T_s/2} \frac{\sin \omega T_s/2}{\omega T_s/2} \sum_{n=-\infty}^{\infty} S(nT) e^{-j\omega nT} \]
Now apply Poisson’s sum formula \(18\)

\[
\sum_{n=-\infty}^{\infty} \phi(nT) e^{-jwnT} = \frac{1}{T} \sum_{n=-\infty}^{\infty} \hat{\phi}(\omega + \frac{2\pi n}{T})
\]

(19)

where \(\hat{\phi}(\omega)\) is the Fourier transform of \(\phi(t)\). Then (18) becomes

\[
V_{\text{out}}(\omega) = V_{\text{ref}} \delta(\omega) - \frac{I_0}{C} - \frac{T}{\tau} e^{-j\omega T/2} \left( \frac{\sin \omega T/2}{\omega T/2} \right) \sum_{n=-\infty}^{\infty} S(\omega + \frac{2\pi n}{T})
\]

(20)

Where \(S(\omega)\) is the Fourier transform of \(s(t)\).

2. **RC Output Circuit**

The output circuit is modeled as shown in Figure 76(b). The output spectrum is given by:

\[
V_{\text{out}}(\omega) = V_{\text{ref}} \delta(\omega) - l_s(\omega) \left( \frac{R}{1 + j\omega RC} \right)
\]

(21)

\(l_s(t)\) has the form used previously.

Thus,

\[
l_s(\omega) = \int_{-\infty}^{\infty} \sum_{n=-\infty}^{\infty} S(t - nT) I_0(t - nT) e^{-j\omega t} dt
\]

\[
= \sum_{n=-\infty}^{\infty} S(nT) \int_{-\infty}^{\infty} I_0(t - nT) e^{-j\omega t} dt
\]

\[
= \sum_{n=-\infty}^{\infty} S(nT) e^{-j\omega nT} I_0(\omega)
\]

(22)
Again using $I_o(t) = I_o \delta(t)$ and Poisson's sum formula, (22) can be expressed as:

$$I_s(\omega) = \frac{I_o}{T} \sum_{n=-\infty}^{\infty} S(\omega + \frac{2\pi n}{T})$$  \hspace{1cm} (23)

Thus,

$$V_{\text{out}}(\omega) = V_{\text{ref}} \delta(\omega) - \frac{I_o}{T} \left( \frac{R}{1 + j\omega RC} \right) \sum_{n=-\infty}^{\infty} S(\omega + \frac{2\pi n}{T})$$  \hspace{1cm} (24)

and multiplying numerator and denominator by $C$ yields

$$V_{\text{out}}(\omega) = V_{\text{ref}} \delta(\omega) - \frac{RC}{T} \left[ \frac{1}{C} \frac{1}{1 + j\omega RC} \right] \sum_{n=-\infty}^{\infty} S(\omega + \frac{2\pi n}{T})$$  \hspace{1cm} (25)

which allows direct comparison with the spectrum of the reset amplifier given in (20).
APPENDIX C

A CCD TWO DIMENSIONAL TRANSFORM

W.L. Eversole, D.J. Mayer, and R.J. Kansy
Texas Instruments Incorporated
Dallas, Texas 75222

ABSTRACT. The two-dimensional transform is recognized as a valuable tool for processing two dimensional signals. It has applications in bandwidth reduction systems, image enhancement systems, correlation trackers, seismic array processors, and radar and sonar systems. A 32 x 32 point discrete Fourier transform has been built using two low power CCD integrated circuits - a 32 point DFT IC and a 32 x 32 point reformatting CCD memory. The detailed design and operation of this two dimensional transfrom breadboard are described.

I. INTRODUCTION

Since the introduction of the charge-coupled device (CCD) in 19701, the CCD has become an important element for implementing many signal processing functions, one of which is spectral analysis via the chirp Z transform (CZT) algorithm. This algorithm for obtaining the discrete Fourier transform (DFT) is ideally suited to CCD implementation as the bulk of the computation may be performed in CCD transversal filters. In 1977, a monolithic 32 point CCD DFT IC which implements the chirp Z transform algorithm was designed and fabricated. With this chip a 32 point one-dimensional DFT may be obtained in 64 us with a power dissipation of 600 mW. Although the one-dimensional DFT has many applications, there are a variety of applications which require the processing of two dimensional signals such as obtained from images, seismic arrays, and sonar and radar systems. The two-dimensional DFT may be obtained via one dimensional DFT by first transforming the rows of a two dimensional array and then transforming the columns of the resulting two dimensional array. To implement this, the data output of the first transformation must be reformatted to become the input data for the second transformation. In 1978 a 32 x 32 point corner turning memory (CTM) using a novel two dimensional charge transfer structure was designed and fabricated. This CCD CTM is ideally suited to perform the reformatting operation.

This paper describes the implementation of a 32 x 32 point two dimensional discrete Fourier transform using the 32 point CZT IC and the 32 x 32 point corner turning memory. Section II reviews how the two dimensional DFT may be implemented with one dimensional DFTs. Sections III and IV describe in detail the 32 point CCD CZT IC and the 32 x 32 point CCD corner turning memory. Section V presents the results of implementing the two dimensional DFT with these two CCD ICs.

II. THE TWO DIMENSIONAL DFT

The DFT of a finite area sequence $f(m,n)$ is

$$F(k,l) = \sum_{m=0}^{M-1} \sum_{n=0}^{N-1} f(m,n) e^{-j2\pi \frac{km}{M} \frac{ln}{N}}$$

The two dimensional DFT can be interpreted in terms of the one dimensional DFT by expressing Equation (1) as

$$F(k,l) = \sum_{m=0}^{M-1} e^{-j2\pi \frac{km}{M}} G(m,l)$$

where

$$G(m,l) = \sum_{n=0}^{N-1} f(m,n) e^{-j2\pi \frac{ln}{N}}$$
The function $G(m,t)$ corresponds to an $N$-point one dimensional DFT for each value of $m$, i.e., it consists of $N$ one dimensional transforms, one for each row of $f(m,n)$. The two dimensional DFT $F(k,t)$ is then obtained by performing $N$ one dimensional transforms, one for each column of the sequence $G(m,t)$. Thus, the two dimensional transform can be implemented as shown in Figure 1.

![Figure 1. Implementation of Two Dimensional Discrete Fourier Transform.](image)

A one dimensional transform is performed on the rows of the signal using a CCD CZT IC. The resulting complex Fourier coefficients are stored row by row in two CCD corner turning memories. Next these coefficients are read out column by column into a second CCD CZT IC which completes the calculations for the two dimensional transform.

### III. 32 POINT CCD CZT IC

A monolithic 32 point DFT using the chirp Z transform algorithm has been designed and fabricated using an $N$ channel, two level polysilicon coplanar electrode process. Goals of the design included the elimination of all external support components and operation of the transform IC at a 1 MHz data rate. Small size, low weight, low power, and high speed are achieved with total integration.

The chirp Z transform algorithm is derived by starting with the definition of the discrete Fourier transform:

$$F_k = \sum_{n=0}^{N-1} f_n e^{-j \frac{2\pi kn}{N}}$$

and making the substitution:

$$2nk = n^2 + k^2 - (n-k)^2$$

The chirp Z transform equation results:

$$F_k = e^{-j \frac{2\pi k^2}{N}} \sum_{n=0}^{N-1} (f_n e^{-j \frac{2\pi n^2}{N}}) e^{-j \frac{2\pi (k-n)^2}{N}}$$

Equation 6 has been factored to emphasize the three operations which make up the CZT algorithm: (1) pre-multiplying the time signal with a chirp (linear FM) waveform, (2) filtering in a chirp convolution filter, and (3) postmultiplying the Fourier output by a chirp waveform. This is illustrated in Figure 2.

![Figure 2. Schematic of Chirp Z Transform Algorithm.](image)
The chirp Z transform IC is divided into two sections. The first section contains the multipliers which are implemented using multiplying digital-to-analog converters (MDACs). Analog input signals applied to the MDAC reference terminals are multiplied by binary coded chirp waveforms stored in a ROM. External digital inputs are provided to bypass the ROM for multiplication by other waveforms. Differential MDAC analog inputs, and uncommitted outputs allow maximum flexibility in total system configuration.

The second portion of the chip performs the chirp filtering operation. Four 63 stage transversal filters are needed to implement the complex chirp convolution required by the CZT algorithm. The weighting coefficients for the sine and cosine chirp filters are:

\[
\begin{align*}
  h_k^\cos &= \cos \frac{k^2}{32} \quad k = 0, 62 \\
  h_k^\sin &= \sin \frac{k^2}{32} \quad k = 0, 62
\end{align*}
\]

The coefficients are realized using the split-electrode technique. The CCD filters are two phase, coplanar electrode structures with ion implant wells. Operational amplifiers provide differential inputs and outputs for the CCD filters, again for the purpose of versatility. In practice, the filters are loaded with 32 time domain data samples, and then the input is blanked while the convolution is performed. Thus, a single chip yields a DFT output with a 50% duty cycle. In this mode of operation, the 4 MDAC's provided can be multiplexed to do both pre- and post-multiplication. If 100% duty cycle output is required, two chips are used, and the 8 MDAC's included are sufficient for the needed multiplications.

All clock waveforms are generated on chip from a two phase master clock, and the system timing is such that it is possible to cascade CZT's for the purpose of multi-dimensional transforms, or for performing correlation by multiplication in the frequency domain.

A block diagram of the CZT chip is shown in Figure 3, and a photomicrograph of the IC is shown in Figure 4. The IC measures 6.04 x 5.69 mm² (238 x 224 mil²). Details relating the performance of the 32 point CZT IC have been described elsewhere.
Analog CCD reformatting memories have been recently applied in high speed processors for pulse doppler radar in conjunction with surface acoustic wave device chip transform units. These memories are predominantly line-addressable CCD delay line structures which exhibit line-to-line offset and gain variations due to fluctuations in MOS transistor threshold voltages at the multiple input and output ports. Since the doppler information is typically used on a line-by-line basis, these variations can be tolerated to some extent. However, in the application described here, the information content of an entire frame must be preserved, and the line-to-line variations constitute a serious dynamic range limitation.

The reformatting or "corner turning" operation can be visualized as loading a square memory array in a row-by-row fashion, then reading it column-by-column. The most direct realization involves the use of a memory structure in which charge can be transferred horizontally or vertically. This realization is further enhanced with the addition of a CCD multiplexer and demultiplexer which results in a single input port and output port, thereby minimizing the number of voltage-charge and charge-voltage conversions, and their troublesome threshold dependence.

A block diagram illustrating the operation of this two dimensional CCD memory appears in Figure 5. The input demultiplexer performs a serial to parallel conversion and loads the memory array row-by-row. During the load cycle charge is transferred vertically in the memory array. When all rows have been loaded, the array is switched to a horizontal transfer mode, and the output multiplexer performs the requisite parallel to serial conversion of the data stored in each column.

A 32 x 32 element CCD memory based on this architecture has been designed and fabricated using the same process utilized for the 32 point CZT IC. A photomicrograph of the chip appears in Figure 6. In order to permit 100% duty cycle in other applications, an additional multiplexer and demultiplexer were included at the bottom and right of the memory array. The memory array is composed of 1024 three-phase two dimensional charge transfer cells placed on 2.0 mil centers. It operates as two interleaved two-phase structures with one common phase. The multiplexer and demultiplexer are four phase structures in order to match the pitch of the memory array and maintain short transfer lengths for good CTE. The structure is buried channel, with the exception of the demultiplexer input circuitry.
All timing pulses needed for the CTM operation are generated externally using standard TTL circuits and TTL-\text{MOS} drivers. This control circuitry also provides the two phase clocks and synchronization pulses needed for the CCD CTM ICs. Due to a photomask error, the tapweights of the CCD filters on one IC were shifted one stage thereby shifting the output sequence of the Fourier coefficients by one position. This also precludes use of the on chip MDA's for postmultiplication operations. Therefore, the MDA's of another CZT IC using a delayed sync pulse were used to perform the postmultiplication needed in the first one dimensional transform.

In order to facilitate interchanging the analog circuitry and to simplify system timing and synchronization, evaluation of this demonstration unit was performed at a 100 kHz data rate.

Experimental Results

A simple two dimensional input signal was generated by multiplying two analog signals together in a 4 quadrant analog multiplier. One signal provides variations in the multiplexer and demultiplexer channels which result in "fixed pattern" noise components. However, nearly 40 dB dynamic range (peak signal to peak noise) is achieved across the entire frame as compared to less than 20 dB observed in a previous line addressable design.\(^{(10)}\)

V. EVALUATION OF CCD TWO DIMENSIONAL TRANSFORM IMPLEMENTATION

A detailed block diagram of the CCD two dimensional transform is shown in Figure 7. A one dimensional DFT is performed on the rows of a real input signal, \(f(m,n)\), using the 32 point CZT IC. The real and imaginary Fourier coefficients, \(G_r(m,\cdot)\) and \(G_i(m,\cdot)\), are reformatted in two CCD CTMs resulting in transposed matrices \(G_r^T(m,\cdot)\) and \(G_i^T(m,\cdot)\). A second one dimensional transform is performed on the transposed data from the CTMs. The final postmultiply operation has been replaced by a magnitudes operation.

Figure 6. Photomicrograph of Corner Turning Memory (CTM)

The chip has been successfully operated at data rates in excess of 5 MHz. Dynamic range is still dictated by threshold variations in the multiplexer and demultiplexer channels which result in "fixed pattern" noise components. However, nearly 40 dB dynamic range (peak signal to peak noise) is achieved across the entire frame as compared to less than 20 dB observed in a previous line addressable design.\(^{(10)}\)
Figure 7  Implementation of the Two Dimensional Discrete Fourier Transform
Figure 8. Input to two-dimensional DFT

(a) 1 frame (2 cm/s)
(b) 1 row (50 μs/cm)

Figure 9. Output of first one-dimensional transform (G₁(m,v))

(a) 1 frame (2 cm/s)
(b) 1 row (50 μs/cm)

Figure 10. Output of corner turning process (G₂(m,v))

(a) 1 frame (2 cm/s)
(b) 1 row (50 μs/cm)

Figure 11. Output of second one-dimensional transform
The real and imaginary Fourier coefficients, \(G(m,j)\) and \(G_{\text{f}}(m,j)\), from the first one dimensional transform for one frame consists of 32 groups of 32 Fourier coefficients, one group for each row of the input signal \(f(m,n)\). The Fourier coefficients in each group appear in the order 32, 1, 2, 3, ..., 31 due to the photomask error previously mentioned. The horizontal input signal consists of six cycles of a sinusoid. Therefore the transform of a row results in non zero values for the 7th and 27th Fourier coefficients as shown in Figure 9b. The real and imaginary outputs are 90° out of phase, thus one is at its peak value when the other is zero. Also seen in Figure 9b is a non zero value for the first Fourier coefficient of the real output due to the DC offset of the input signal. Note that with the exception of the DC coefficient the non zero Fourier coefficients are modulated by the vertical signal as seen in Figure 9a.

Figure 10 illustrates the output of the corner turning memory \(G'(m,f)\). The corner turning operation transposes the coefficients previously stored \(G'(m,f)\), thus each consecutive group of 32 points from the CTM consists of the data in one column of \(G(m,j)\), i.e., the 32 values of a single Fourier coefficient previously obtained by transforming the rows of \(f(m,n)\). This matrix transpose operation is confirmed by comparing the CTM outputs in Figure 10 to the CTM inputs in Figure 9. Figure 10a shows a complete frame of transposed data. Figure 10b is an expanded view of the 8th group of 32 points from the CTM. This corresponds to the 32 values of the 7th Fourier coefficient which is the 8th column of \(G(m,j)\).

The final output of the two dimensional transform unit is the magnitude \(F(k,j)\) which is seen in Figure 11. This is obtained by generating the one dimensional DFT of each of the 12 groups of 32 points from the CTM. This corresponds to obtaining the transform of each column of the matrix \(G(m,j)\). From Figure 10a it can be seen that only the 2nd, 8th, and 28th groups from the CTM contain non zero coefficients. These are due to the input DC offset and the horizontal sinusoid. Thus the second DFT yields non zero coefficients only in the 1st, 7th, and 27th rows of \(F(k,j)\). The second group from the CTM represents the unmodulated DC offset. The DFT of this group yields only a non zero DC coefficient \(F(1,1)\). The 8th and 28th groups have been modulated by the vertical input signal which consists of two cycles of a sinusoid. The DFT of each of these groups yields non zero values for the 3rd and 31st Fourier coefficients in rows 7 and 27 of \(F(k,j)\), i.e., \(F(7,3), F(7,31), F(27,3),\) and \(F(27,31)\). Figure 11b is an expanded view of the DFT of the 8th group from the CTM showing \(|F(7,3)|\) and \(|F(7,31)|\).

In order to conveniently demonstrate the operation of the two dimensional transform, a display circuit was employed to apply the desired signal to the Z axis of an oscilloscope. The X axis sweep was triggered by a row synchronization pulse and a frame synchronized staircase waveform was applied to the Y axis. A short pulse added to \(|F(k,j)|\) before each coefficient output displays a two dimensional grid. Each value of \(|F(k,j)|\) appears as a line after its corresponding grid point. Figure 12 shows the display of the input signal used in Figures 8 through 11 and its two dimensional transform. The input signal is the product of six cycles of a sinusoid in the horizontal direction and two cycles of a sinusoid in the vertical direction with a DC offset added to the product. The display of the two dimensional transform magnitude shows the DC coefficient \(F(1,1)\) in the upper left corner at position (2,2) of the 12 x 32 array. The Fourier coefficient \(F(7,3)\) due to the input product can be seen at position (8,4) of the array. (Remember that all outputs are shifted by one bit in both directions due to the CTM photomask error.) The three aliased outputs...
F(7,31), F(27,3), and F(27,31) can also be seen.

![Figure 12](image1.png)

Figure 12. Two Dimensional Display of the DFT Output for Sinusoidal Inputs.

Figure 13 shows the operation of the two dimensional DFT with a square wave product input. The horizontal input seen in Figure 11a is two cycles of a square wave and the vertical signal is one cycle. A DC offset has been added. The DFT output seen in Figure 11b clearly shows the fundamental and odd harmonic components due to each input. The aliased fundamental from the vertical square wave appears in the first column of the display due to the CZT photomask error. The DC components from the 2nd CZT seen in column two of the display are due to the -36 dB fixed pattern noise in the CTM. Figure 14 shows |F(k,1)| for k=3, i.e., the 4th row of the display. An imbalance in the output magnitude circuit caused the nonuniformity seen in the harmonics.

![Figure 13](image2.png)

Figure 13. Two Dimensional Display of the DFT Output for Square Wave Inputs.

The dynamic range of the two dimensional transform is presently limited by the corner turning memory which exhibits 36 dB dynamic range due to a fixed pattern noise component discussed earlier. The dynamic range of the CCD CZT IC has been demonstrated to be 50 dB. The power required for the complete two dimensional transform breadboard was 10 watts. This includes all timing circuitry, external amplifiers, drivers, MDACs, etc. Improvements to the CZT and CTM IC and further integration can be expected to reduce the power requirements and increase dynamic range.

CONCLUSIONS

Preliminary results from a two dimensional DFT breadboard demonstrated the potential in applying analog CCD technology to the implementation of low power, high speed complex two dimensional signal processing algorithms. Although 100 kHz is not an aggressive data rate for CCD circuits, it results in computation of the 12 x 32 point two dimensional DFT (1024 total points) in 0.4 msec. A digital FFT processor would require an effective multiply-accumulate time of 80 msec to achieve the same rate. It is anticipated that CCD two dimensional transforms can be performed at clock rates of several megahertz with a moderate increase in system complexity.

ACKNOWLEDGEMENT

Development of the 32 point CCD CZT was supported by NASA-Langley, Contract NASA-L-14290 (Harry Benz, contract monitor) and the CCD CTM was supported by AFGL under contract F19628-77-C-0214 (Freeman Sheppard, contract monitor).
The authors would like to acknowledge Dennis Buss, Tom Cheek, Bob Hewes, Dwaine Hurta, and Jim Salzman for their technical advice and for their assistance in the instrumentation and measurements associated with this work.

REFERENCES


REFERENCES


15. H. S. Fu and A. F. Tasch, Jr., unpublished.


MISSION

of

Rome Air Development Center

RAVC plans and executes research, development, test and selected acquisition programs in support of Command, Control Communications and Intelligence (C3I) activities. Technical and engineering support within areas of technical competence is provided to ESD Program Offices (POs) and other ESD elements. The principal technical mission areas are communications, electromagnetic guidance and control, surveillance of ground and aerospace objects, intelligence data collection and handling, information system technology, ionospheric propagation, solid state sciences, microwave physics and electronic reliability, maintainability and compatibility.