Memory design for electrically addressed spatial light modulators

Robert S. Caprari

19970429 145

APPROVED FOR PUBLIC RELEASE

© Commonwealth of Australia

DEPARTMENT OF DEFENCE
DEFENCE SCIENCE AND TECHNOLOGY ORGANISATION
Memory design for electrically addressed spatial light modulators

Robert S. Caprari

Land, Space and Optoelectronics Division
Electronics and Surveillance Research Laboratory

DSTO-RR-0094

ABSTRACT

This report examines the provision of a high performance memory system for an electrically addressed spatial light modulator (SLM), destined for use in an optical correlator. Two distinct fast memory system designs are proposed. One design is advanced to the stage of specification of major elements of its architecture and timing signals. The other design is developed to the conceptual stage. Appendices review recent high performance semiconductor memory technology.

APPROVED FOR PUBLIC RELEASE

DEPARTMENT OF DEFENCE

DEFENCE SCIENCE AND TECHNOLOGY ORGANISATION
Memory design for electrically addressed spatial light modulators

EXECUTIVE SUMMARY

Real time automatic target recognition (ATR) is an operation that is of importance in the detection and recognition of military platforms of various types. In particular, this technology could facilitate the recognition of land vehicle targets in the northern Australian environment, a capability that has been identified by the ADF as an important element of focal area reconnaissance (reconnaissance within a 100 km radius around sites of strategic importance in a military conflict). The research reported here was conducted under the sponsorship of Directorate Combat Force Development (Land), and examines one specific technology of use in some types of ATR systems that operate on optical imagery.

A spatial light modulator (SLM) is a specialised type of two dimensional liquid crystal display (as used in portable televisions, digital watches and laptop computers). SLMs are the critical component of optical correlators, which are instruments for detecting the occurrence of specific small targets within a larger image, which is the critical operation of ATR. There exists a spatial filter pattern tuned to any conceivable target, for which the optical correlator experiences an acute response when that spatial filter pattern is displayed on its SLM. An optical correlator maintains a large ensemble of spatial filters in memory, which enables it to detect an equally large variety of target views in the input image. For every input image, the optical correlator must cycle through all of its stored target filters, retrieving each one from memory and writing it to the filter SLM.

For real time operation there is a time limit within which the filter cycling must occur. The shorter the cycle time per filter, the more filters that can be tried, and the more robust the target recognition capability. This report examines the design of high speed memory systems for SLMs, such an endeavour being necessary for the attainment of better performing optical correlator based ATR systems. Two alternative functional designs for high speed SLM memory systems are proposed and conceptually developed as far as appropriate for the aims of this task. Appendices review characteristics of the latest high performance memory technology, from the perspective of SLM memory requirements.

The outcomes of this investigation are twofold. Most concrete is the presentation of two design concepts for high performance SLM memory systems. Such advanced memory architectures will be very beneficial to the performance of optical correlators that may be of interest to defence forces, yet at present these sophisticated memories are not available in any commercial optical correlators. Another outcome is the accumulation by DSTO researchers of a degree of technical knowledge about SLM memory systems, that will be of assistance if DSTO is called upon to objectively assess proposals for the use of optical correlators by the ADF.
Authors

Robert S. Caprari
*Land, Space and Optoelectronics Division*

Robert Caprari did his undergraduate study at Adelaide University, obtaining the degrees of Bachelor of Engineering (Honours) and Bachelor of Science, the former majoring in electrical and electronic engineering, and the latter majoring in physics. He then joined Optoelectronics Division of DSTO as a Professional Officer 1, and did research in image processing and imaging system characterisation, before joining Electronic Warfare Division as a Professional Officer 2, and doing research in mathematical and acoustooptic techniques of radio frequency signal detection and identification. On obtaining DSTO sponsorship as a Cadet Research Scientist, he conducted research in experimental and theoretical condensed matter and electron scattering physics at Flinders University, obtaining the degree of Doctor of Philosophy. Subsequently, he joined Land, Space and Optoelectronics Division in his current position as a Research Scientist, undertaking research into the physics of imaging, and statistical signal theory and computation.
Contents

1 Introduction 1
2 Synchronous DRAM 2
3 A synchronous DRAM SLM memory 4
4 Improving performance by a cache memory on the SLM 7
5 Conclusion 12
Acknowledgements 13

Appendix A DRAMs for digital television 14
Appendix B Survey of fast DRAMs 15
References 18
Glossary 19
Distribution 23
1 Introduction

SLMs\(^1\) are indispensable components of optical correlators. The input image within which a pattern is sought is displayed on one SLM, and spatial filters tuned to individual patterns are sequentially scanned on another SLM. The ensemble of spatial filters would typically be quite large, so for real time pattern recognition it is imperative that filters can be written onto the SLM very quickly. The achievable rate of SLM updating is determined by either the filter memory system data throughput, or the physical limits of the SLM response to input data transitions. This report examines the former factor.

Only electrically addressed SLMs will be considered in this report, so the SLM memory system is a digital electronic one. The majority of the report is devoted to the presentation and discussion of two very different propositions for future high speed SLM memories. A functional design will be provided for a synchronous DRAM memory system, together with an explanation of its operation. The other memory system, based on an SRAM cache memory on the SLM chip, will be considered only at the conceptual level. In formulating the more general aspects of the discussion of semiconductor memory, I referenced the authoritative book by Prince [1], the series of special reports edited by Comerford and Watson [2], and the tutorial by Prince [3].

Although the memory designs developed in this report are in principle compatible with any active backplane SLM, for definiteness I will be specifically assuming FLC SLMs. An exposition of a variety of SLM technologies, including FLC SLMs, is contained in the compilation edited by Efron [4].

Of SRAM and DRAM, precedence dictates that SLM main memory in optical correlators should be constructed of DRAM components, just as it is in most other large digital systems. This is because SLM main memory must be very large to store the requisite number of filters. The lower power consumption, higher density, lower price and better availability of DRAMs compared with SRAMs outweigh the advantages that SRAMs have over DRAMs.

Before embarking upon a consideration of the DRAM technology that I believe has the best prospects for high performance SLM memory, the reader may wish to review the appendices to obtain a broader perspective of DRAM technology in the context of SLM memory usage. Appendix A alerts readers to the emerging ‘killer application’ for DRAMs in consumer electronics, where, propitiously, the requirements on memory performance are quite similar to those for SLM memories. Appendix B briefly reviews many types of high performance DRAMs, including appraisals of their suitability for use in SLM memories. Some of the memory terminology used in the appendices is defined in the footnotes of the main body of the report. Reading of the appendices may be forgone without loss of continuity.

\(^1\)See the Glossary on page 19 for the expansion of all acronyms used in this report, together with brief explanations.
2 Synchronous DRAM

Conventional DRAMs operate asynchronously, that is, the integrated circuit is not paced by an externally supplied system clock (DRAMs may have their own chip level internal clock—it is the absence of synchronisation with the global system clock that renders them asynchronous). Superficially this seems to be the modus operandi best suited to achieving the physical limits of circuit speed, since the component can respond immediately to input logic transitions, without waiting for the next system clock transition to occur. However, this reasoning ignores the finite time taken for the circuit to complete its response, and be available to be prompted for another response. The response time is a latent period in the operation of the circuit, because any input during this period is ignored and lost, or at best, ignored and stored for later response. Most importantly, this response time usually is much longer than the time period needed for a new signal to be issued by the circuit module that is interacting with the memory. The two interacting modules are not synchronised with one another, so that time will always elapse between one memory response being completed and the next response being prompted. Therefore, in practice asynchronous operation of digital systems does not approach the physical limits of circuit speed.

Accordingly, it is possible that electronic systems containing modules with very different response times may operate more quickly if they are globally synchronous, that is, they are paced by a universal system clock. The argument advanced is that synchronous systems operate with complete certainty about the clock cycle in which response signals become available, so that no time need be wasted in polling and waiting for incoming signals. The time that an asynchronous system would spend on polling and waiting, a synchronous system would utilise to execute useful operations. Also, a synchronous system always will read an output within one clock cycle of it becoming available, whereas there is no upper time limit within which an asynchronous system must read outputs. This reasoning motivated the development of synchronous SRAMs quite some time ago, and it has become conventional wisdom that synchronous SRAMs are superior to asynchronous SRAMs in fast applications.

More recently, synchronous DRAMs have emerged from fabricators such as Texas Instruments, Micron Technology, Samsung, Hitachi, Toshiba, Mitsubishi, NEC, Fujitsu and Oki. Synchronous DRAM is a nonproprietary concept specified by the Electronic Industries Association/ Joint Electron Devices Engineering Council (EIA/Jedec) JC 42.3 DRAM Standards Committee. The semiconductor industry in general seems to regard synchronous DRAM as the most prospective contender for the future standard for high performance DRAMs. This fact reduces the technology risks in basing the SLM memory design on a synchronous DRAM architecture. Current generation SLM backplanes already operate synchronously, so there should not be any logic design problems in interfacing SLMs to synchronous DRAM systems. If anything, the ‘glue’ logic actually should be simpler, because of the absence of the need to accommodate asynchronous ‘artifacts,’ such as wait states.
The synchronous DRAM design is optimised for sequential access\(^2\) rather than random access\(^3\), which is entirely consistent with usage of the SLM memory in optical correlators, in which spatial filters stored in memory are sequentially transferred to the SLM without any manipulation. Some of the design techniques used to maximise the speed of operation of synchronous DRAMs are well established. One is address pipelining, in which processing of the new address is begun before completion of the old address processing, thus reducing the effective cycle time\(^4\) by overlapping cycles. Another is memory bank\(^5\) interleaving on the DRAM chip, in which one bank may be prepared for access while another bank is undergoing access. By overlapping cycles in this way, the effective cycle time is reduced for sequential access, because consecutive rows will always reside in separate memory banks on the memory chip (this characteristic is what distinguishes memory interleaving from simple partitioning of memory into blocks). Memory interleaving is most effective for the sequential access used in SLM memories.

Synchronous DRAMs output a serial bit stream at the clock frequency when operating in burst mode\(^6\). The sequence of events culminating in this output begins with the specification of the address of a row in the memory array. Once a single access time\(^7\) period has elapsed, all of the data in that row has been written into the output shift register, where it can be clocked out as a serial bit stream. If the memory input/output bus is several bits wide, then contiguous rows of the memory are simultaneously accessed in this fashion. Each serial line has its own shift register, which is synchronised with all of the other shift registers. The number of bits in a chip page\(^8\) is typically within the range 512 to 2048 bits.

Increased synchronous DRAM functionality is provided by the ' nibbled page' architecture of Toshiba Corporation (Numata et al \([6]\)), in which nonoverlapping 8 bit sequences of data from a specific row are able to be accessed at random, at the same data rate as strictly sequential accesses of chip pages. This facility is useful for high speed memory systems that require random access capabilities, but for SLM memories which strictly adhere to sequential access operation, it is superfluous.

The synchronous DRAM standard includes a clock disable facility, which puts the memory chip into a low power standby mode. In this mode, the system clock is intermittently allowed to propagate cycles to refresh the DRAM cells, but only often enough to maintain data integrity. The expectation is that eventually synchronous DRAMs will have a 'self refresh' capability (already available on premium asynchronous DRAMs), in

---

\(^2\)Sequential Access: Successive bytes accessed from/directed to consecutive storage cells on the chip. Analogous to page or burst mode.

\(^3\)Random Access: Successive bytes accessed from/directed to arbitrary storage cells on the chip. Analogous to byte mode.

\(^4\)Cycle Time: Period of the memory access cycle, that is, the minimum possible time between initiation of one memory access and initiation of the next.

\(^5\)Memory Bank: An independently functional subarray of memory cells.

\(^6\)Burst Mode: Synonym for page mode. See Footnote 9.

\(^7\)Access Time: Elapsed time between initiation of memory access and completion of the byte transfer phase of the memory access cycle. This excludes the post-transfer phase of the memory access cycle, in which the memory resets itself to become available for the next access. The cycle time is inclusive of both phases of the memory cycle, hence it is longer than the access time.

\(^8\)Chip Page: A row of memory cells in the memory array.
which memory chip circuitry controls the refreshing of memory cells, without assistance from external circuitry.

Present generation synchronous DRAMs operate at clock frequencies of 100 MHz, with unsubstantiated claims of the potential of attaining up to 500 MHz performance in the future. Input/output bus widths are 8 bits, giving a total data rate of 800 Mb/s. The intrinsic access time for a chip page is about 60 ns, although this can be hidden by interleaving memory chips, as demonstrated in Section 3. Memory capacity is presently 16 Mb per chip.

Synchronous DRAM communicates at high serial bit rates (100 Mb/s). At these frequencies signal propagation exhibits guided wave effects such as reflection, mutual coupling, pulse dispersion, and transmission line behaviour in general. Specialised packaging, interconnection strategies and interface circuitry are needed to accommodate such signal behaviour. Printed circuit board tracks have to be treated as transmission lines and antennas. Parasitic capacitance and inductance of interface electronics, device packaging and waveguide discontinuities materially affect signal quality, and so must be minimised and any residual taken into account in the design. Low voltage swing interfaces must be used to increase signalling rate and reduce power consumption. Signal transmitters must be able to impress signals on low characteristic impedance (~100Ω) interface buses. As the clock period approaches the wave propagation time over the system, special care must be taken to keep the signal and clock phases aligned at all locations in the system.

To ensure effective operation at their high clock frequencies, synchronous DRAM interfaces are low voltage swing types, such as GTL or CTT. The components with which the synchronous DRAMs communicate, such as the SLM, must have a compatible interface. Since electrically addressed SLMs have never been designed with low voltage swing interfaces, this represents an area of technological risk in the adoption of synchronous DRAM memories for SLMs. Packaging that is suitable for high frequency operation, being miniature housings that reduce the effective length of wiring and leads, are necessary for synchronous DRAMs. Present synchronous DRAMs are available as TSOPs.

3 A synchronous DRAM SLM memory

In this section I shall propose a synchronous DRAM based SLM memory architecture that is suitable for use in an optical correlator. The proposed memory system is realisable using current generation synchronous DRAMs, as discussed in Section 2. However, present electrically addressed SLMs use TTL or CMOS interface levels. To interface with a synchronous DRAM constituted memory, an SLM that operates with a GTL or CTT interface would need to be fabricated.

The specifications of the memory are, to a large extent, dictated by the specifications of the SLM and optical correlator operating algorithm. It will be assumed that the SLM has 512 x 512 pixels. Two cases of degree of modulation will be concurrently examined. One is binary modulation, and the other is hexadecimal quantised modulation. Binary modulation is determined by a single bit; hexadecimal modulation by four bits. The
operating algorithm will be assumed to have a suite of 1024 spatial filters available. The capacity required of the SLM memory is thus 256 Mb for binary modulation, and 1 Gb for hexadecimal modulation.

Figure 1 (page 8) displays the functional block diagram of a 256 Mb SLM memory; the expansion to 1 Gb is achieved by replacing each memory chip by four of the same chip. The synchronous DRAM components are presently commercially available. Capacity of the memory chips is 16 Mb, configured as 2 Mb of 8-bit bytes. There is one data line for each bit in the bytes, thus bytes are communicated in parallel across 8 bit data buses. A pair of memory chips shares each data bus in an interleaved arrangement, whereby chips are accessed alternately, with sufficient overlap to account for the access time latency. Each chip access reads a complete chip page. Serial data rate is 100 Mb/s, and this is sustainable for the duration of the filter load into the SLM, because the chip interleaving allows the page access time for one chip to occur simultaneously with the finish of the other chip’s page output. Effectively, there is no page access latency. This seamless operation is demonstrated in the timing diagrams of Figure 3, which are discussed below. There are eight chip pairs, served by eight 8-bit data buses, giving a total bus width of 64 bits. Actually, this architecture is consistent with the one advocated by Salters for HDTV, as described in Appendix A.

Since the memory system is synchronous, the 64 data outputs will switch simultaneously. On occasions when the net balance of high logic states at the output of one memory chip changes as a result of the logic transitions, transient currents will flow in the power supply and ground lines. The transient currents may induce voltage spikes in these lines, and voltage ringing in the output lines. As a precaution, printed circuit board design techniques to mitigate this ‘ground bounce,’ without appreciably slowing signal propagation, may need to be used. This endeavour is assisted by the low inductance of TSOP pins; one good reason why synchronous DRAMs are packaged in TSOPs.

A mechanism by which incoming data is distributed to the pixel array of the SLM is indicated by Figure 2 (page 9), for the case of binary modulation. Each of the 64 bit lines is connected to an 8 bit shift register. The serial data fills the shift register in eight clock cycles, at which instant a 1-in-8 clock parallel latches the data into eight consecutive column buffers connected to the SLM pixel array. Column buffer output data remains fixed for the next eight clock cycles, which must be long enough for the data signals to be impressed upon their corresponding SLM pixels. Data latching into the column buffers occurs concurrently with the last serial entry into the shift register, so there should be no interruption to the serial data flow, which is entirely consistent with the operation of the synchronous DRAM in the fast burst mode. For the currently dominant synchronous DRAM clock period of 10 ns, the time allocated to charging the pixel capacitors (note that the FLC is just the dielectric medium between the capacitor electrodes) in each row of SLM pixels is 80 ns, which translates into write time allocation for the whole SLM array of 40.96 µs. For comparison, current generation SLMs with 512 pixel rows, are physically limited to row charging times of 43 ns, using polysilicon row lines driven from both ends; while anticipated future SLMs using metal row lines are expected to achieve row charging times of about 1 ns (Serati [7]).

A similar circuit to that shown in Figure 2 would be suitable in the case of hexadecimal
modulation, in which four bits of data would be supplied to each SLM pixel. Each shift register would still service eight consecutive columns of pixels, therefore the size of the shift register would need to be increased to 32 bits. It would take 32 clock cycles to serially load the shift register, so latch enabling of the 32 column buffers would need to be controlled by a 1-in-32 clock. The time allocated to charging the pixel capacitors in each row of SLM pixels is now 320 ns, giving a time allocation of 163.84 μs for the whole SLM array.

Not shown in Figure 2 are the digital to analogue converters (D/A) that must be interposed between the data distribution digital circuitry and each column line, in the case of hexadecimal modulation. To the extent that the D/A response time is much less than the persistence time of data at the D/A input, the inclusion of a D/A stage in the signal path should not have a material impact on the attainable data transfer rate. 8 bit D/As presently being incorporated on SLM backplanes have response times of about 25 ns (Scrat [7]), which compares favourably with the 4 bit input data persistence time of 320 ns. So the presence of column D/As in the case of hexadecimal modulation, although adding complexity, does not add significant delay.

To elucidate the data transfer process during the loading of the SLM, timing diagrams are displayed in Figure 3 (page 10), for the case of binary modulation. Periodic inputs to the synchronous DRAM components include Clock, Row Access Strobe (RAS: a 1-in-N clock, where N is the number of bits in a chip page) and Column Access Strobe (CAS: RAS delayed by four periods of Clock). Periodic inputs to the synchronous SLM include Clock, 1-in-8 Clock and SLM RAS (Clock=8).

The serial entrance of data into the SLM shift register is demonstrated by the waveform ‘Shift Register Cell 7 Data.’ Entrance of the final data bit is accompanied by a 1-in-8 Clock pulse, which latches the data into the SLM column buffers, as demonstrated in waveform ‘Column 7 Pixel Data.’

SLM RAS is a waveform with the same period and phase as 1-in-8 Clock, but with a 50% duty cycle. This signal is applied to the gates of the pass FETs that connect the column lines to one of the SLM pixel capacitor electrodes, for every pixel along the selected row. Only while the pass transistor gate voltage is high (for n-channel FETs) will the column line voltage be imposed on the pixel capacitor electrode. Both the row and column lines are relatively long, traversing the complete width and height, respectively, of the SLM pixel array. And both of these lines are heavily capacitively loaded, each driving 512 identical devices (FET gate for the row line; pixel capacitor for the column line). In combination, these two factors significantly delay signal propagation to the ends of the lines.

To ensure that a high state on the pass FET gate coincides with the new data voltage at the source of the pass FET (the source being connected to the column line), both the column data and the SLM RAS must persist for a considerable length of time. The 1-in-8 Clock pulse may not persist long enough to fulfill the role of SLM RAS, but the Clock±8 pulse does, so the latter is used as the SLM RAS. Note that the SLM RAS pulse falling edge must have propagated to the end of the row of pass transistors before the data on the column lines is changed, or else the new data may overwrite the old data (even if only
The timing diagrams of Figure 3 correspond to a time interval during which a page of data from Bank #1 (i.e. one of the memory chips connected to a particular data bus) finishes, and a page of data from Bank #2 (i.e. the other memory chip connected to the same data bus) starts. The interleaving of the two memory banks allows the Bank #2 Row Access and Column Access Strobes to be issued while Bank #1 is still transmitting data. This gives Bank #2 enough time to respond to the data request, and start transmitting data as soon as Bank #1 has stopped, with no interruption in the data flow to the SLM. Such continuous data flow is not possible without the external synchronous DRAM interleaving adopted in the memory system architecture of Figure 1.

Similar timing diagrams to those of Figure 3 would also apply for the case of hexadecimal modulation, but with a 1-in-32 clock replacing the 1-in-8 Clock, and the SLM RAS being a Clock÷32 waveform.

4 Improving performance by a cache memory on the SLM

If extremely high memory throughput does not need to be sustained over an extended time interval, then it may be feasible to use slower conventional DRAMs for main memory, and interpose a fast cache memory capable of storing one filter between the main memory and the SLM backplane. While the liquid crystal is responding to a newly applied electric field, followed by the correlator's CCD detector array being read, and then the correlation signal being interpreted by a processor, a new filter may be transferred from main memory to the cache memory, ready for fast access when the correlator is ready. There is only value in this memory scheme if the spatial filters to be impressed upon the SLM are accessed in a predetermined order. Unlike the synchronous DRAM system of Section 3, the performance of SLM memory using a cache deteriorates with increasing occurrence of choice of the next filter based upon the correlation results of the present filter.

Filter transfer from main memory to cache memory can be slow; it is the transfer from cache memory to the SLM backplane that should be as fast as possible. This characteristic is best achieved by having an SRAM cache memory fabricated on the same integrated circuit as the SLM backplane, with a very wide data bus (preferably equal to the number of columns in the SLM array) connecting the cache memory with the SLM array.

A 512×512 pixel SLM fabricated with smallest feature sizes of 2 μm (which is fairly coarse resolution by the standards of presently achievable photolithography) would essentially completely fill a standard integrated circuit die of 200 mm² area, taking into account the peripheral circuitry associated with the SLM array (Handschy et al [8]). However, by the Year 2000 it is predicted that the smallest attainable feature sizes from lithography will have shrunk to about 0.2 μm, while standard die sizes will have grown to over 400 mm² in area (Geppert [9]). A 1 Mb (i.e. 512×512 pixels × 4 bits/pixel) SRAM memory array fabricated with 0.8 μm minimum feature sizes occupies about 45 mm² area, ignoring
Figure 1: Functional block diagram of an interleaved synchronous DRAM SLM memory, suitable for use in an optical correlator. Individual data buses are 8 bits wide. Total memory capacity is 256 Mb.
Figure 2: One of the 64 identical serial-to-parallel data conversion circuits on the SLM chip, for the case of binary modulation. Not shown is the low voltage swing signal receiver, that detects the incoming serial bit stream, and converts it to CMOS voltage levels.
Figure 3: Timing diagrams for the data transfer process from interleaved synchronous DRAM chips to the SLM, for the case of binary modulation. The time interval that is chosen includes the uninterrupted transition of data flow from Bank #1 to Bank #2.
peripheral circuitry (Prince [1, p. 432]). What this mishmash of figures emphatically demonstrates, is that imminent integrated circuit fabrication technology will enable a one filter sized SRAM cache to be fabricated on the same chip as the SLM.

CMOS based SRAMs have random access times as short as 8 ns at present. Combined with a very wide data path from the SRAM cache to the SLM pixel array, the speed potential of SRAM ensures that filter transfer from cache to SLM can be made much quicker than the already impressively quick synchronous DRAM design of Section 3. The value of the filter transfer time is strongly dependent on the design characteristics of the SRAM cache memory. Specific designs will not be considered here, although general guiding principles will be addressed.

Data is written to SRAM cache from DRAM main memory operating in page mode\(^6\) with a cycle time of as low as 40 ns for asynchronous DRAM. This cycle time is not sustainable across pages, since at the start of each new page, the applicable DRAM access time is that for byte mode, which is longer than that for page mode. An indication of the quickening of memory access in page mode as opposed to byte mode, is conveyed by a comparison of performance in the two modes for a 1 Mb DRAM from Motorola (Prince [1, p. 62])—access time: 25 ns for page access and 85 ns for byte access; cycle time: 50 ns for page access and 165 ns for byte access.

For a typical DRAM page length of 512 bits, and a currently maximum DRAM data bus width of 16 bits, a 512×512 pixel binary filter will require at least 656 \(\mu\)s to be transferred from DRAM main memory to cache; the corresponding figure for a hexadecimal filter is 2625 \(\mu\)s. However, these cache memory write times reduce inversely with increasing data bus width. And 32 bit DRAM data buses will be commercially available by the Year 2000, with 64 bit widths available by 2003. Furthermore, several DRAM chips can be mounted and connected in parallel to behave as a single composite DRAM component having a wide data bus. This technique is already well established in the ubiquitous single in line memory modules (SIMMs) used in personal computer main memories. It is probably possible to use a data bus sufficiently wide to allow loading of the cache within the time span of a single correlation.

Although the DRAM main memory system could have a two bank interleaved configuration, thus eliminating the read latency at the beginning of each page, this is of marginal benefit for typical chip page lengths. Unless the page length of the DRAM is unusually small, or implementing memory interleaving has no complexity penalties, it seems best not to incorporate memory interleaving.

Since filter pixels are transferred in an invariant and known sequence, both the DRAM main memory and SRAM cache can be serial access memories, to reduce complexity and cost. Performance will not be compromised in any way by the reduced flexibility of the memory, because serial access always is the fastest mode of memory access, even when the memory does have random access capabilities. The serial access architecture is essentially one very long shift register for every parallel output line, and is used in the frame buffers.

\(^6\)Page Mode: A fast method of consecutively accessing a whole chip page of memory. The page address is specified only at the beginning, and is retained for the duration of the operation, instead of being respecified for each byte, as in the slower byte mode.
mentioned in Appendix B.

Data transfer to the SLM by the synchronous DRAM design of Section 3 occurs at very fast bit rates for brief time periods separated by extended periods of no communication. In contrast, the present on-SLM cache memory concept transfers the same data at a lower bit rate over extended time intervals, separated by inactive intervals of shorter length than the synchronous DRAM design. Unlike the synchronous DRAM design, the intercomponent bit rate in the on-SLM cache memory concept is low enough to be accommodated by conventional interface, packaging and printed circuit board technologies. Detailed design of the on-SLM cache memory system is thus simplified in some important respects.

In the case of an exceptionally responsive SLM from the perspective of pixel charging times, it may be tempting to consider fabricating a bipolar SRAM cache. Certainly, BJT memories, in particular ECL, have very fast speeds; but they also have high power consumption and low device density. More complicated and expensive semiconductor processing is required to fabricate both BJTs (for the cache) and FETs (for the SLM backplane) on the same silicon wafer. However, combining BJTs and FETs on the same chip is a well established process, being the foundation of biCMOS technology. A 64 kb ECL SRAM with access times of \( \sim 3 \) ns, or a 5 kb ECL SRAM with access times of \( \leq 1 \) ns, represent the present capabilities of bipolar semiconductor memories.

Extra semiconductor processing steps may be needed for integrating an SRAM cache onto the SLM backplane chip. For an nMOS SRAM, an implantation step will have to be introduced to fabricate the depletion mode transistors. For a CMOS SRAM, an n-well diffusion step has to be introduced into the fabrication process, followed by different diffusion steps for the source/drain regions of the n-channel and p-channel MOSFETs. None of these steps are needed for the SLM pixel array. However, the peripheral circuitry that would be present regardless of any cache memory may need these steps, so the extent to which inclusion of cache SRAM circuitry increases fabrication complexity is uncertain.

5 Conclusion

Figures quoted in Section 3 indicate that a well designed present generation SLM backplane is capable of handling, with reserve, a notional high performance synchronous DRAM memory system. And it is inevitable that SLM backplanes with metal row lines, and much increased writing speeds, eventually will become available, thus making feasible SLM memory systems seem even more inadequate. Such reasoning probably exaggerates the speed potential of the SLM, because it ignores the FLC switching time in response to a transition of the pixel capacitor voltage. Typical FLC switching times in current SLMs are about 50 \( \mu s \) (McKnight et al [10]), although FLC materials that allow much faster switching are already available.

On balance, it seems that a state of the art active backplane FLC SLM will outperform any evolutionary advance in DRAM memory systems. The favourable aspect of this circumstance is that any effort expended in increasing memory speed will be rewarded by an almost proportionate increase in SLM system speed. I have proposed two enhanced
performance SLM memory architectures in this report, and based upon the evidence that I have presented, we accept that the performance improvements of these memories translate into worthwhile improvements in SLM filter update rate. Faster memories certainly are conceivable, but these are somehow exotic (in architecture, devices, materials, signalling or packaging), and probably not serious candidates for SLM memories.

Acknowledgements

Dr Paul Miller of LSOD alerted me to the severe limitations of present generation SLM memory designs. I am also grateful to Dr Miller for his constructive criticism of preliminary drafts of the report. Mr Warwick Holen converted my hand drawn sketches into the computer drafted figures adorning this report. A preliminary draft of this report was critically appraised by Mr Steve Serati of Boulder Nonlinear Systems, and I acknowledge with gratitude his thoughtful and insightful comments, which I have endeavoured to take into account in this final version.
Appendix A

DRAMs for digital television

Because of the highly specialised nature of optical correlators, they will probably never be a driving influence on memory technology. Instead, they will probably derive their memory technology from mass market applications.

High definition television (HDTV), once it becomes a commercial reality, is anticipated to be the largest consumer of DRAM—larger than all types of computing and digital telecommunications. And HDTV will be a demanding driver of memory technology, since it will require higher memory throughput than most computers and their displays. Consider the example of converting the field refresh rate on conventional colour televisions from 50 Hz to 100 Hz, to reduce screen flicker. The combined input and output data throughput is 372 Mb/s. Based upon the proposed standards for HDTV, HDTV will require a combined input and output data throughput of 1950 Mb/s. The amount of memory required will be over 4 MB.

For elementary processing of HDTV images, memory is only accessed sequentially, just like optical correlators. Accordingly, it is reasonable to expect that any specialised memory components developed for HDTV also will be well suited for use as the SLM memory in optical correlators. Salters [5] opines that the HDTV memory requirement will be fulfilled by using multiple memory chips in parallel, with a total bus width of 64 bits. Individual memory chips would need to be accessed in page mode to achieve the necessary data rate. The same design principles would need to be adopted for SLM memory in optical correlators, as exemplified by the memory design in Section 3.

In summary, HDTV is a prospective mainstream application that will direct the future of semiconductor memory technology, and it is fortunate that the memory requirements of HDTV are similar to those of SLMs in optical correlators. It should prove beneficial for optical correlator designers to monitor closely the ongoing development of HDTV memory, with the intention of adapting it to their particular application.
Appendix B
Survey of fast DRAMs

Many different types of high speed DRAM are available, distinguished either by their fabrication technology, chip architecture, or functionality as system components. The performance improvement offered by fast DRAMs should be assessed against the best random access speed available from standard asynchronous DRAMs; being 40 ns access time and 80 ns cycle time.

Hitachi has achieved 17 ns access time from a DRAM fabricated using a biCMOS process; in which the memory cell array is fabricated in CMOS, but the peripheral circuitry on the chip is fabricated in ECL. biCMOS is more complex and expensive than standard CMOS. This circuit, which has a conventional DRAM architecture, is representative of the fabrication technology approach to performance improvement; an approach more common in SRAMs than DRAMs. Most schemes of DRAM performance improvement concentrate on circuit design innovation.

One established high speed DRAM architecture is the video DRAM, which is used for controlling computer displays. Video DRAMs write the contents of a complete row of DRAM cells in parallel into the cells of a shift register. A serial data stream is then clocked out of the shift register. This operation is well suited to an SLM memory. A random access write can be simultaneously made to the memory array. This latter facility is of no benefit to an optical correlator, in which the filters are invariant. Video DRAM circuitry occupies up to 50% more area on the silicon die than conventional DRAMs of the same bit capacity. The extra functionality and performance of video DRAM comes at the expense of increased package size and pin count, higher power dissipation, and greater cost.

Video DRAMs are faster than conventional DRAMs, but not as fast as emerging generation high performance DRAMs. The present frontier of video DRAM capability is exemplified by a Texas Instruments video DRAM, that has a capacity of 1 Mb, and outputs eight serial bit streams, each uninterrupted at 70 Mb/s. Uninterrupted operation at this rate is achieved by pipelining operations, and on-chip memory bank interleaving.

Frame buffer DRAMs are specialty memories developed for video display applications, in particular digital television. They have simplified chip interfaces that only allow serial input and output. Frame buffers have similar serial output performance as video DRAMs, but at a lower cost, because of the absence of random access capability. The filter access requirements of an optical correlator are only serial, so that frame buffers would be just as suitable for the SLM memory as video or conventional DRAMs. A particularly capable frame buffer is a 16 Mb component from Toshiba, that simultaneously outputs four serial bit streams of length 2048 bits each, at a rate of 100 Mb/s. This type of component merits serious consideration for use in the SLM memory of an optical correlator.

Emerging high performance DRAM architectures include the Enhanced DRAM from Ramtron, and Cache DRAM from Mitsubishi Electric. Both of these architectures are
based around the presence of a fast SRAM cache on the DRAM main memory chip, with a very wide bus connecting main and cache memories. Since the bus is entirely on the integrated circuit, it allows very high speed signalling, due to its very low RC time constant.

The two memories are quite distinct in their performance and operational characteristics. Present Enhanced DRAMs have burst mode cycle times of 15 ns on a sequence of 2048 bits, and interface signals are CMOS/TTL compatible. They operate asynchronously, just like conventional DRAMs, and are entirely suitable for direct substitution in memory systems designed for conventional DRAM. Present Cache DRAMs have 10 ns cycle times on sequences of 64 bits extracted from cache storage, and have CMOS/TTL compatible interfaces. They operate synchronously, unlike conventional DRAMs, and they require memory system designs that are very different from those for conventional DRAMs. The synchronisation protocol of the proprietary Cache DRAM is not compatible with that for the industry standard synchronous DRAM discussed in Section 2.

The sophistication of the caches of the Enhanced DRAM and Cache DRAM is beneficial for fast random access of the memory, but superfluous for the sequential access undertaken by SLM memory. Neither of these emergent DRAM technologies seem particularly suitable for use in SLM memories.

The Rambus DRAM is a system wide architecture developed and patented by Rambus. As well as specifying operational characteristics of the DRAM circuitry, it also defines the interface electrical characteristics, bus structure, communication protocol and control hierarchy. The technology has been licensed by Toshiba, Fujitsu and NEC. Rambus DRAM operates synchronously, but not according to the protocol of the industry standard synchronous DRAMs considered in Section 2.

Rambus DRAM sustains an impressive cycle time of 2 ns on sequences of 256 bits. However, the benefits of the rapid serial data rate of Rambus are somewhat diminished by the fact that the Rambus bus is specified to be 9 bits wide, and there is no scope for increasing data throughput by using multiple buses. Further limiting the effective data throughput, is the fact that each transfer of a block of 256 bytes is preceded by the transfer of header bytes. Furthermore, the bus is shared by data, addresses and control signals, causing a 'staccato' data flow. Consequently, the potential data throughput of Rambus DRAM systems is not as impressive as the serial data rate would superficially suggest is possible.

Interface levels in Rambus memory systems are low voltage swing (0.6 v) signals on a terminated bus, to accommodate the high speed signalling. Present Rambus DRAMs have 4.5 Mb storage capacities. The Rambus architecture has provision for several blocks of DRAM, but just one master may fit on the bus, and this master must control the Rambus system and act as the gateway to the remainder of the system. In the context of SLM memory, the SLM would be external to the Rambus system, so it will receive its data via a circuitous route, with the potential for a communication bottleneck as the Rambus master distributes its data to the SLM. The Rambus architecture is not well suited to the role of SLM memory, primarily because of the inflexibility in bus width.

RamLink hardware consists of DRAMs with special interfaces connected in a ring
topology using LVDS links. There is an expectation among some experts that rings will eventually displace buses as the communication methodology in future computers. RamLink uses a packet based protocol in distributing data around the ring. Although RamLink components are not yet commercially available, the RamLink concept seems to have credibility, because of the support of the IEEE Computer Society for its accession to the status of universal standard.

The initial version of RamLink is expected to have a 2 ns cycle time for sequences (packets) of 64 8-bit bytes, with the packet delimited by a 6 byte header and 2 byte tail. DRAM rings are connected to a controller, which also acts as the gateway to the remainder of the system, which in the present context includes the SLM. Each DRAM in the ring introduces a delay in the data flow of at least one cycle time. Apart from the replacement of a bus topology by a ring topology, RamLink is similar to Rambus from a system viewpoint, and neither seem to be propitious solutions for SLM memory.
References


Glossary

BJT Bipolar Junction Transistor
The 'original' transistor structure, consisting of a layer of n-type semiconductor (the emitter), forming a junction with a thin lightly p-doped layer (the base), the other side of which forms a junction with another n-type layer (the collector). This describes an npn transistor; pnp transistors are also possible. BJTs have three distinct modes of operation: cut-off mode, in which negligible current flows through all three electrodes; active mode, in which the large charge flow from the emitter to the collector is controlled by the emitter-base voltage, but in which the base electrode current is relatively insignificant (BJTs in linear circuits operate in active mode); and saturation mode, in which significant current flows through all three electrodes, and the base region becomes saturated with minority carriers (electrons in nnp transistors).

CCD Charge Coupled Device
A one dimensional array of devices formed by appropriate patterning of surface and buried electrodes on a semiconductor die. By applying certain voltages to the electrodes, an array of potential wells is formed, in which charge from photovoltaic detectors accumulates. On correct cycling of the electrode potentials, charge is transferred between adjacent wells. The analogue charge 'spilling' out of the end of the CCD line array may be sensed and converted into a digital representation of the photodetector signal. CCDs are loaded with data in parallel from the photodetectors, but the data only can be read out serially. Planar CCD arrays usually are fabricated as several line arrays stacked side by side.

CMOS Complementary Metal Oxide Semiconductor
An integrated circuit technology in which the only transistors are n-channel and p-channel enhancement MOSFETs, usually appearing in pairs. CMOS logic gates only consume power when switching state, have wider noise margins than nMOS, and a better tolerance to voltage and temperature variation than nMOS. These properties usually render CMOS the favoured integrated circuit technology.

CTT Centre-tap terminated
An adaptable, fast, low voltage swing interface standard. For unterminated lines, CTT has the same characteristics as standard CMOS interfaces, so it is downward compatible with low voltage CMOS and TTL levels. For terminated lines, the CTT interface automatically adjusts to 0.8 V low voltage swing operation.

DRAM Dynamic Random Access Memory
Semiconductor memory with an extremely small cell size. Usually, a bit storage cell is a capacitor that stores charge in one logic state, and a transistor switch that connects the cell to the periphery of the memory array. The charge on the capacitor leaks away over time, hence it must be periodically refreshed by specialised circuitry. Accessing DRAM is inherently slower than accessing SRAM. In the case of writing, the DRAM cell loads the line driver with its large cell capacitor, while the equivalent load in an SRAM cell is the small FET gate capacitance. In the case of reading,
the DRAM cell capacitor must partially discharge with a long \( RC \) time constant, while the corresponding time constant for an SRAM cell is determined by the small FET capacitance components. Additionally, DRAMs must be periodically refreshed, and the cell capacitor charge restored after reading, neither of which are needed by SRAMs. These factors combine to give SRAM an intrinsic speed advantage over DRAM.

**ECL** *Emitter Coupled Logic*
Digital integrated circuit technology using BJTs connected as differential pairs to form logic gates. Transistors switch between cut-off and active modes in logic transitions. By avoiding saturation, ECL achieves the ultimate switching speed possible from its devices. ECL has niche applications in the highest speed, low density integrated circuit domain. ECL has lower density and higher power dissipation than TTL and especially MOS technologies. The very low output impedance of ECL gates gives them particularly good output drive.

**FLC** *Ferroelectric Liquid Crystal*
FLCs are formed when chiral liquid crystal molecules aggregate in the smectic C* phase. These materials are ferroelectric, that is, they have a finite spontaneous polarisation. They are also birefringent because of the long range orientational order of the long molecules. In equilibrium the polarisation vector tends to align along the external electric field direction (minimum potential energy state), and this determines the orientation of the molecules, which in turn determines the orientation of the optical axes of the material anisotropy, which in turn determines the optical path length for light propagating through the material, which determines the phase of the light ray at the output surface of the FLC layer. By this mechanism FLCs fulfil the role of the light modulating layer of SLMs. In practical SLMs, the only possible electric field direction changes are reversals, so there are usually only two accessible FLC states (hysteresis in state transitions prevents the FLC from entering other states without deliberate forcing), and FLC SLMs are inherently binary light modulators.

**Gb** *gigabit*
1 gigabit \(= 2^{30} \) bits \(= 1073741824 \) bits.

**GTL** *Gunning Transceiver Logic*
A fast, low power, low voltage swing interface standard with a 0.8 v voltage swing. GTL drivers typically dissipate about 10 mW, which compares favourably with fast ECL drivers that consume about 125 mW. The exceptionally low power consumption of GTL makes it feasible to incorporate hundreds of GTL drivers and receivers on the one integrated circuit.

**LVDS** *Low Voltage Differential Signalling*
A fast, low power, low noise, low voltage swing interface standard, with a 0.25 v voltage swing. Signals propagate along a pair of closely coupled conductors, as in conventional transmission lines, and this provides enhanced noise immunity. The differential signal is defined by the potential difference between the two conductors,
not a single conductor and a space and time variant 'ground' potential. Up to 1 v of common mode noise on the two conductors can be tolerated without degradation of the signal. LVDS requires a pair of pins for each interface port.

Mb megabit
1 megabit ≡ 2^20 bits = 1048576 bits.

MB megabyte
1 megabyte ≡ 2^20 bytes = 1048576 bytes.

MOSFET Metal Oxide Semiconductor Field Effect Transistor
A transistor in which the current flow between the source and the drain is influenced (i.e. switched) by the voltage applied to the interposed gate electrode. Varieties are n-channel, in which the current is composed of only electrons; and p-channel, in which the current is composed of only holes. MOSFETs may also be classified as enhancement type, where a gate voltage of opposite polarity to the current carriers must be applied to make the FET conductive; and depletion type, which are conductive until a gate voltage of the same polarity as the current carriers is applied.

nMOS n-channel Metal Oxide Semiconductor
An integrated circuit technology in which the only transistors are n-channel enhancement and depletion MOSFETs. nMOS logic gates persistently consume power when the output state is low. nMOS technology attained maturity before CMOS technology, but now it seems that nMOS is inexorably being replaced by CMOS.

SLM Spatial Light Modulator
A planar device whose output is a spatially coherent light wavefront. A specific spatial pattern of amplitude or phase variation is imposed on the extended wavefront by the SLM. In an optical correlator this pattern represents either the image (1st SLM) or the filter (2nd SLM). Optically addressed SLMs have a photosensitive layer superimposed on the light modulating layer, and the photosensitive layer is illuminated by an image, which is coupled through to the light modulating layer as the required pattern. They usually have continuous surface properties. In principle the degree of modulation is continuously variable, but in practice the modulating mechanism may be quasi-bistable, yielding effectively binary modulation. Adressing is parallel, since the pattern emerges at all locations in unison. Electrically addressed SLMs have electronic circuitry in intimate contact with the light modulating layer. This structure is achieved by either depositing the modulating layer on a monolithic integrated circuit, or depositing thin film devices and circuitry on the modulating layer. The electronic circuit essentially is an array of memory cells that each impose a localised voltage on the light modulating layer. Spatial variation of the applied voltages defines the pattern. The surface state is necessarily pixelated, according to the size and separation of the memory cells. Addressing is partially serial, as not all memory cells can be accessed at the same time. Usually each memory cell has a one bit capacity, giving intrinsically binary modulation properties.

SRAM Static Random Access Memory
Semiconductor memory in which each cell is a bistable circuit, such as two inverters
connected in a ring (known as a flipflop). In nMOS SRAMs, bit storage cells have device counts of at least six; four transistors for the flipflop, and two transistors for the access switches to the complementary outputs of the flipflop. SRAM cells are typically four times larger than DRAM cells—this accounts for the fourfold density advantage that DRAM has over SRAM. The logic state in the SRAM cell is retained permanently while the power is on; there is no need for refreshing. Thus, SRAMs are always available for accessing, unlike DRAMs, which can not be accessed during the periodic refresh cycles. In nMOS SRAMs, there is always one inverter in the flipflop that is conducting (the one with high input and low output). Accordingly, all SRAM cells consume significant power at all times, unlike DRAM cells, which only dissipate power as a result of the slow leakage of charge from their capacitor.

**TSOP Thin Small Outline Package**
A type of package for integrated circuits, which is conducive to high speed operation. The small size of TSOPs allows short connecting leads between package pins and circuit pads, which results in small pin inductances, which in turn reduces the propensity for voltage spikes and ringing on fast signal transitions.

**TTL Transistor-Transistor Logic**
Digital integrated circuit technology using BJTs as active devices. Transistors switch between cut-off and saturation modes in logic transitions. The considerable base charging time in the transition to saturation mode is the limiting factor in switching speed. TTL voltage levels for logic states have become a common interface voltage standard for digital systems, even when the component integrated circuits use a technology other than TTL.
DISTRIBUTION LIST

Memory design for electrically addressed spatial light modulators
Robert S. Caprari

DEFENCE ORGANISATION

Task Sponsor
DCFD(L)  
SO2 SIS  

S&T Program
Chief Defence Scientist
FAS Science Policy
AS Science Corporate Management
Counsellor, Defence Science, London
Counsellor, Defence Science, Washington
Scientific Adviser to MRDC, Thailand
Director General Scientific Advisers and Trials
Scientific Adviser Policy and Command
Navy Scientific Adviser
Scientific Adviser, Army
Air Force Scientific Adviser
Director Trials

Aeronautical and Maritime Research Laboratory
Director, Aeronautical and Maritime Research Laboratory
Chief, Maritime Operations Division

Electronics and Surveillance Research Laboratory
Director, Electronics and Surveillance Research Laboratory
Chief, Communications Division
Chief, Microwave Radar Division
Chief, Electronic Warfare Division
Head, Information and Signal Processing Discipline
Chief, Land, Space and Optoelectronics Division
Research Leader, Space and Surveillance Systems
Head, Image Processing Discipline
Dr P. Miller, LSOD
Author

Number of Copies
1
1
Doc Data Sht
Doc Data Sht
Doc Data Sht
1
1
Doc Data Sht
1
1
1
1
1
1
1
1
1
1
1
1
1
2
Mr M. Royce, LSOD 1
Mr S. Sutherland, LSOD 1
Mr P. Virgo, LSOD 1
Mr A. Yakovleff, LSOD 1

**DSTO Libraries**
- Library Fishermens Bend 1
- Library Maribyrnong 1
- Library DSTOS 2
- Australian Archives 1
- Library, MOD, Pyrmont Doc Data Sht

**Forces Executive**
- Director General Force Development (Sea) Doc Data Sht
- Director General Force Development (Land) 1

**Army**
- ABCA Office, G-1-34, Russell Offices, Canberra 4
- SO(Science), IIQ 1 Division, Milpo, Enoggera, Qld 4057 Doc Data Sht

**S&I Program**
- Defence Intelligence Organisation 1
- Library, Defence Signals Directorate Doc Data Sht

**B&M Program (libraries)**
- Officer in Charge, TRS, Defence Central Library 1
- Officer in Charge, Document Exchange Centre 1
- Additional copies for DEC for exchange agreements
  - US Defense Technical Information Center 2
  - UK Defence Research Information Centre 2
  - Canada Defence Scientific Information Service, Canada 1
  - NZ Defence Information Centre, New Zealand 1

**National Library of Australia** 1

**UNIVERSITIES AND COLLEGES**
- Australian Defence Force Academy Library 1
- Head of Aerospace and Mechanical Engineering, ADFA 1
- Deakin University Library, Serials Section (M List) 1
- Senior Librarian, Hargrave Library, Monash University 1
This report examines the provision of a high performance memory system for an electrically addressed spatial light modulator (SLM), destined for use in an optical correlator. Two distinct fast memory system designs are proposed. One design is advanced to the stage of specification of major elements of its architecture and timing signals. The other design is developed to the conceptual stage. Appendices review recent high performance semiconductor memory technology.