ANNUAL REPORT
Joint Services Electronics Program
DAAG29-84-K-0024
January 1, 1986 - December 31, 1986

TWO-DIMENSIONAL SIGNAL PROCESSING AND
STORAGE AND THEORY AND APPLICATIONS
OF ELECTROMAGNETIC MEASUREMENTS

DISTRIBUTION STATEMENT A
Approved for public release
Distribution Unlimited

JANUARY 1987

GEORGIA INSTITUTE OF TECHNOLOGY
A UNIT OF THE UNIVERSITY SYSTEM OF GEORGIA
SCHOOL OF ELECTRICAL ENGINEERING
ATLANTA, GEORGIA 30332
This is an annual report on research conducted under the auspices of the Joint Services Electronics Program. Specific topics covered are: digital signal processing, parallel processing architectures, two-dimensional optical storage and processing, hybrid optical/digital signal processing, electromagnetic measurements in the time domain, and automatic radiation measurements for near-field and far-field transformations.
ANNUAL REPORT

Joint Services Electronics Program
Contract DAAG29-84-K-0024
January 1, 1986 - December 31, 1986

TWO-DIMENSIONAL SIGNAL PROCESSING AND STORAGE
AND
THEORY AND APPLICATIONS OF ELECTROMAGNETIC
MEASUREMENTS

January, 1987
School of Electrical Engineering
Georgia Institute of Technology
Atlanta, Georgia 30332

Approved for public release.
Distribution unlimited.
# TABLE OF CONTENTS

| I. OVERVIEW .................................................. | 1 |
| II. SIGNIFICANT RESEARCH ACCOMPLISHMENTS ....................... | 1 |
| III. WORK UNITS: TWO-DIMENSIONAL SIGNAL PROCESSING AND STORAGE | |
| Number 1: Multidimensional Digital Signal Processing R. W. Schafer .................. | 3 |
| Number 2: Multiprocessor Architectures for Digital Signal Processing T. P. Barnwell, III .................. | 9 |
| Number 3: Two-Dimensional Optical Storage and Processing T. K. Gaylord .................. | 23 |
| Number 4: Two-Dimensional Optical/Electronic Signal Processing W. T. Rhodes .................. | 28 |
| IV. WORK UNITS: THEORY AND APPLICATION OF ELECTROMAGNETIC MEASUREMENTS | |
| Number 6: Electromagnetic Measurements in the Time- and Frequency-Domains G. S. Smith .................. | 37 |
| Number 7: Automated Radiation Measurements for Near- and Far-Field Transformations E. B. Joy .................. | 40 |
I. OVERVIEW

This annual report covers the second year of research carried out under Contract DAAG29-84-K-0024. The research is part of the Joint Services Electronics Program and is administered by the U. S. Army Research Office. Research activities are concentrated in two areas: (1) Two-Dimensional Signal Processing and Storage, and (2) Theory and Application of Electromagnetic Measurements.

The research in two-dimensional signal processing and storage is concentrated in five major areas. These areas overlap and the research activities interact and reinforce one another. Research in Work Unit Number 1, *Multidimensional Digital Signal Processing*, is concerned with the theory, design, and implementation of digital signal representations and digital signal processing algorithms and systems. Work Unit Number 2, *Multiprocessor Architectures for Digital Signal Processing*, focuses upon hardware and software problems in the use of multiport memories and multiple processors for high-speed implementations of digital signal processing algorithms. The research in Work Unit Number 3, *Two-Dimensional Optical Storage and Processing*, is concerned with problems of using holographic information storage as the basis for multiport memories and for optical computation. Work Unit Number 4, *Two-Dimensional Optical/Electronic Signal Processing*, is concerned with the theory, implementation, and application of combined optical and electronic digital signal processing techniques. Work Unit Number 5 is directed toward problems in the design of VLSI implementations of digital signal processing systems.

The other two work units in the JSEP program are concerned with electromagnetic measurements. In Work Unit Number 6, *Electromagnetic Measurements in the Time- and Frequency-Domains*, research is concerned with both theoretical and experimental investigations of the use of time-domain and frequency-domain methods for measuring the characteristics of materials and electromagnetic systems. Work Unit Number 7, *Automated Measurements for Near- and Far-Field Transformations*, is concerned with assessing the accuracy of computed fields on the surface of lossy radomes and with compensating for probe effects when near-field measurements are made on spherical and arbitrary surfaces.

The report begins with the Lab Director's summary of the accomplishments during the period January 1, 1986 to December 31, 1986. Following this are brief reports on the individual work units. These reports list personnel supported and discuss in general terms the research that was carried out during the reporting period. Also included in each work unit report is a complete list of publications on the research during this period. Complete copies of these publications are available in the Annual Report Appendix.

II. SIGNIFICANT RESEARCH ACCOMPLISHMENTS

The following accomplishments are, in the judgement of the laboratory directors, of particular significance and potential and are therefore worthy of special mention.

2.1 Experimental Demonstration of Parallel Logic Operations

A major achievement of the continuing research in Work Unit Number 3 was the experimental demonstration of both parallel EXCLUSIVE OR and parallel NAND logic operations to achieve content addressability and thus digital parallel truth-table look-up processing. This work was published in the invited paper in *Optical Engineering*. 
This result provides practical experimental demonstrations of optical digital parallel processing using array logic. It is highly significant since it demonstrates that large scale parallel digital processors can be implemented optically. Potential applications include remote sensing, air traffic control, synthetic aperture radar imaging, missile guidance, and adaptive antenna array beamforming.

2.2 A Unified Theory of Translation-Invariant Image Processing Systems

A major achievement of the research in Work Unit Number 1 is the development of a new theory for representing images and image processing systems. Maragos, in his Ph.D. thesis, has developed a new theory of translation-invariant systems in which both signals and systems are fundamentally represented by sets rather than by functions. This leads to a theory of signals and systems in which geometric structure is prominent. The theory has already been applied to gain new insight into the properties and implementation of many common image transformations and it potentially can serve as the basis for the design and implementation of new image transformations specifically directed toward enhancing, detecting, and coding of geometric structure in images.

2.3 Monostatic Near-Field Radar Cross-section Measurement

A monostatic near-field radar cross-section measurement of a simple target (a square flat plate) was performed on a planar surface near-field measurement system. The far-field monostatic radar cross-section was correctly determined from the near-field measurements. This demonstration was an important step in the process of developing the monostatic near-field radar cross-section technique. Should this technique become fully developed, the monostatic scattering of large (full scale) targets could be accurately measured. The near-field, intermediate field and far-field scattering properties are determined in one measurement set. Radar anomalies such as glint can be accurately predicted from these measurements.
WORK UNIT 1

TITLE:
Multidimensional Digital Signal Processing

SENIOR PRINCIPAL INVESTIGATOR:
R. W. Schafer, Regents' Professor

SCIENTIFIC PERSONNEL:
R. M. Mersereau, Professor
M. H. Hayes, Associate Professor
C. Au Yeung, Graduate Research Assistant (Ph.D. Candidate)
J. E. Bevington, Graduate Research Assistant (Ph.D. Candidate)
D. Y. Suh, Graduate Research Assistant (Ph.D. Candidate)
L. Hertz, Graduate Research Assistant (Ph.D. Candidate)
C. H. Richardson, Graduate Research Assistant (Ph.D. Candidate)

SCIENTIFIC OBJECTIVE:
The long term scientific objective of this research is to understand the means by which multidimensional signals such as images should be modelled and represented to facilitate the encoding, enhancement and automatic extraction of information from such signals, and to develop, analyze and extend computer algorithms for these purposes.

RESEARCH ACCOMPLISHMENTS:

A. Image Segmentation by Texture (Bevington, Mersereau)

This work has evolved from its initial exploration of the use of linear prediction coefficients as features for texture characterization to the development of a program for general purpose image segmentation. A split/merge algorithm has been developed which makes use of the maximum likelihood edge detector developed earlier along with simpler boundary detectors. This work has led to a statistical characterization of a number of edge detectors and an analysis of their performance on a variety of different edge types - points, grey level discontinuities, texture boundaries, cracks, ridges, and valleys. This study is continuing. The image segmentation program is being applied to the problem of low bit-rate image coding.

B. Constrained Signal Estimation

This project represents a continuation of a long standing project on iterative procedures for deconvolution of blurred images. This particular phase of the research is concerned with the development of techniques for iterative deconvolution using a maximum entropy criterion. Theoretical extensions from the power spectrum estimation problem have been developed which allow the results of that problem to be applied to deblurring. Four procedures for finding the maximum entropy solution have been developed and have been applied to the removal of lowpass blurs and also to the problem of reconstructing a multidimensional signal from its projections. Conditions on the existence of a feasible solution have been proven and the theory is currently being advanced to make the procedure
more robust in the presence of measurement errors. Current efforts are also being directed toward the issue of reducing the complexity of the resulting algorithms.

C. Low Bit-Rate Encoding of Binary Video Signals

Research has begun on a project to efficiently encode two-level (binary) image sequences. Currently this work has not gone beyond the stage of recreating previously available results but it is expected that this project will make extensive use of ideas from mathematical morphology which were developed earlier.

D. Quadratically Convergent Deconvolution Algorithms

Iterative algorithms based on the method of successive approximations have become very popular for signal deconvolution due to the flexibility that they allow for the incorporation of signal constraints into the restoration. One of the limitations of these iterative algorithms is that they only achieve a linear rate of convergence. With a slight modification, however, it was shown that it is possible to achieve a quadratically convergent deconvolution algorithm. In effect, this modification corresponds to a reinitialization of the deconvolution algorithm with a new observation equation at each iteration. It was shown that the corresponding distortion operator \( h \) converges quadratically to an impulse and, as a result, the restoration \( x \) converges quadratically to \( x \). Therefore, when the standard iteration requires \( k \) iterations to achieve a given reduction in the mean square error, the modified iteration will require only \( \log_2(k) \) iterations.

E. Signal Modeling and Power Spectrum Estimation

Power spectrum estimation is a special form of signal reconstruction problem where one wants to determine the autocorrelation function or its Fourier transform from a finite time observation of a time series or from a noisy and truncated autocorrelation sequence. We have considered two problems related to signal modeling and the application of these models to spectrum estimation. The first problem is concerned with the modeling of a signal as the sum of sinusoids in white noise where the sinusoidal frequencies are varying as a function of time. Typically, with such a model an adaptive version of the Pisarenko harmonic decomposition would be applied to the data to extract the model parameters. By exploiting some properties of the Levinson/Durbin recursion, however, a new algorithm was developed for estimating the model parameters. In particular, an iterative procedure for finding the white noise power was proposed which uses a bisection search algorithm which terminates when the set of reflection coefficients corresponding a set of correlation values satisfy a specific set of conditions. Once the white noise power has been determined, the sinusoid frequencies are extracted by rooting the polynomial associated with the reflection coefficients. For nonstationary data, this algorithm was made into an adaptive algorithm where both the white noise power and the minimum eigenvector (eigenfilter) are recursively updated in time.

The second modeling problem considered was concerned with developing an autoregressive moving average lattice filter model for a linear time-varying system. As a result of this work a new ARMA lattice filter structure was developed which is consistent with the characteristics of the well-known autoregressive and moving average lattice filters. In particular, this ARMA lattice is realized in terms of a fully orthogonal lattice set of basis vectors and it evaluates all optimal lattice ARMA filters of lower order. In addition the ARMA lattice structure, a fast recursive least squares algorithm for the evaluation of the lattice filter coefficients was developed [18].
F. Theory and Application of Mathematical Morphology

A major accomplishment of our research during the present contract is the development of a general theory of translation-invariant systems. This general theory significantly extends the theory of mathematical morphology and it unifies a wide range of commonly used image processing operations under a common framework. Specifically, we showed that any translation-invariant, increasing, system can be represented as a set union of erosions by all structuring elements belonging to a characteristic set of elements called the kernel of the system. Although the kernel of a system may generally contain an infinite number of elements, Maragos showed that a basis or smaller sufficient set of elements exists for a broad and interesting class of translation-invariant, increasing, systems. It was also shown that this theory of minimal elements applies to morphological filters, median filters, order-statistics filters, edge detectors, shape recognition transformations, and an interesting class of linear shift-invariant systems. The theory has already lead to new insights into the properties of such systems and also to new approaches to the implementation of such systems.

The general theory has potential to be applied in the design and synthesis of image transformations. Since any translation-invariant increasing system can be represented as a union of erosions by its kernel elements, new systems can be defined by specifying the elements of the kernel basis. It is not known how to choose the kernel basis so as to create a system with prescribed desired properties. In order to learn how to do this, we have begun to study some of the systems that we already know have a finite kernel basis, such as morphological opening and closing, median filters, order-statistics filters, and certain shape recognition transformations.

With this knowledge as a base, it may be possible to determine techniques for designing nonlinear image processing systems to perform specific processing functions such as noise removal, edge detection, and shape detection.

So far, our most impressive results have been for two-level (binary) images. However, most images are multi-level (greytone) images. Greytone images can be represented by a collection of sets which specify image pixels that exceed different thresholds. Indeed, if enough threshold sets are used, the greytone image can be represented with high accuracy. In many cases, it is desirable to obtain thresholded versions of an image with only a few different thresholds. An example is in representing images of integrated circuits or printed circuit boards where the image consists of regularly shaped geometric structures of only a few different types of material. Obtaining thresholded images that retain information on the basic geometric structures is a nontrivial task due to uneven lighting and focussing problems.

A new approach to multi-level thresholding has been developed. In this method, the threshold is adapted so as to insure that edges in the thresholded image are aligned with the edges of the original greytone image. In addition, thresholds are adapted spatially to account for non-uniform lighting, shadows, etc.

Research is now being directed at combining of contextual and other symbolic knowledge with the morphological transformations to obtain automatic algorithms for a variety of image analysis problems.
PUBLICATIONS:

Books or Chapters in Books


Journal Articles (published or accepted)


2. E. Karlsson and M.H. Hayes, "ARMA modeling of linear time-varying systems: Lattice filter structures and fast RLS algorithms", Accepted for publication in *IEEE Trans. on Acoust., Speech, and Sig. Proc.*


4. M.H. Hayes, "Inverse problems: An overview", to appear in *J. of Soc. of Inst. and Control Engineers, JAPAN (invited).*


Papers in Conference Proceedings


**Papers Submitted (for Journal Articles or Conference Proceedings)**


Null
WORK UNIT NUMBER 2

TITLE:
Multiprocessor Architectures for Digital Signal Processing

SENIOR PRINCIPAL INVESTIGATOR
T. P. Barnwell, III, Professor

SCIENTIFIC PERSONNEL
C. J. M. Hodges, Research Engineer
D. L. Smith, Research Engineer
S. H. Lee, (Ph.D. candidate)
V. Owei (Ph.D. candidate)

SCIENTIFIC OBJECTIVES

The primary objective of this proposed work unit is to develop systematic techniques for the automatic generation of provably optimal multiprocessor implementations for a broad class of digital signal processing (DSP) algorithms and for a broad class of multiprocessors systems. Stated in another fashion, the goal of this research is to develop DSP “compilers,” where the input is an algorithm specification and the output is a complete, optimal multiprocessor implementation.

A basic philosophy of this research has always been to perform the theoretical developments in the context of an actual multiprocessor system. The first system used for this purpose, an eight processor LSI-11/2 system [7-8], is now obsolete. For the future, the proposed research will be tested using a personal computer (PC) based multiprocessor utilizing AT&T DSP-32 processors.

BACKGROUND:

Over the past decade, the rapid development of digital integrated circuit techniques has caused increased interest in the area of parallel architectures for digital signal processing. Nowhere is this more evident than in multidimensional DSP algorithms, where the computational dimensions of the algorithms make serial processing unreasonable for virtually all real-time applications.

In any DSP oriented implementation, there will inevitably be a mix of both intrinsically serial and intrinsically parallel tasks which must be realized. Digital signal processing algorithms, as a group, are unique in that they are typically both highly computationally intense and also have a high degree of internal structure. For the purposes of this research, a DSP implementation is considered to be a realization which is dominated by its signal processing functions, and in which its non-signal-processing tasks (control, user interface, decision making, etc.) play a relatively minor role.

A. Algorithm Descriptions

For most of the past research from this work unit, the DSP algorithms have been specified as *shift-invariant flow graphs* (SIFG) [10]. A SIFG is defined as a directed...
graph in which all operations are specified at the nodes, and in which the branches are directed paths which specify the flow of data between nodes. The shift-invariant constraint requires that shifting the (set of) input sequences results only in a corresponding shift in the (set of) output sequences.

SIFG’s are capable of representing a very large class of interesting DSP algorithms. Of course, they can easily represent all those systems which are representable by signal flow graphs. Hence, they can represent a large class of linear and non-linear as well as time-varying systems. In particular, they are capable of representing digital filters, DFT structures, FFT structures (indeed all fast discrete transform algorithms), correlation, convolution, homomorphic analysis, matched filtering, linear predictive analysis, LMS adaptive filter structures, direct form recursive least squares, and lattice form least squares, to name just a few. In addition, SIFG can also represent many algorithms which are not typically DSP algorithms, including a very large class of matrix operations as well as algorithms specified in terms of low level logic operations (e.g. a digital multiplier structure). In brief, SIFG’s can represent all algorithms which do not include any data dependent branch operations. Clearly, the vast majority of DSP algorithms, as well as other equivalent algorithms, belong to this class.

A fully specified flow graph (FSFG) is a SIFG in which the nodal operations are additionally constrained to be the atomic (kernel) operations of the underlying constituent processors which are to be used in the realization. Thus the atomic operations represent the smallest granularity at which possible parallelism may be exploited. The name generic flow graph is used to distinguish between those SIFG’s that are also FSFG’s and those SIFG’s whose nodes are not all atomic operations.

B. Fully Specified Flow Graph Bounds

The distinguishing characteristic of the previous research is the underlying use of bounds on the optimal performance of an algorithm in the generation of the multiprocessor implementations. These bounds are characteristics of the graph defining the algorithm, not of the mechanism or type of realization. In all, three different bounds are used: the iteration period bound (previously called the sample period bound [10-12]), which is the minimum possible time between iterations of the algorithm; the delay bound, which is the minimum time between the availability of an input (set) and the availability of the corresponding output (set); and the processor bound, which is the minimum number of processors required to achieve the specified iteration period. Implementations which achieve the iteration period bound are said to be rate-optimal. Implementations which achieve the delay bound are called delay-optimal. Implementations which use the minimum possible number of processors for a specified iteration period are said to be processor-optimal.

The concept of bounds on the multiprocessor realization of flow graphs was first introduced in the context of signal flow graphs by Fettweis [13] and later extended by Renfors and Neuvo [14] and Schwartz and Barnwell [10-12]. The basic assumptions are that the algorithm to be implemented is represented as a FSFG, and that the algorithm is to be implemented on a synchronous multiprocessor. It is also assumed that the computation times for all the nodal operations are known. This condition is easily met in a homogeneous multiprocessor in which all the processing elements are the same. Nothing is assumed about the communications structure, I/O constraints, or the details of the realization. Hence, the resulting bounds reflect the absolute limits on computation rate and delay based on the atomic structure of the constituent processors.
Of course, many things besides the structure of the algorithm and the fundamental operational capability of the processors may limit DSP implementations. Clearly, things such as I/O bandwidths, external resource availability, the number of available processors, and the communications architecture may impact the achievable rate, delay, and processor efficiency of an algorithm. But in their own way, each of these aspects can be addressed and corrected. For a particular multiprocessor system and a particular FSFG, the above bounds are fundamental. Hence, if total implementations can be developed which achieve these bounds, then it is clear that no other implementations exist which can operate at a higher rate, with less delay, or with higher efficiency.

C. Systolic Processors

The area of systolic processors is currently attracting the attention of many researchers in parallel processing for scientific and signal processing tasks. The term systolic is assumed to mean a regular iterated array of computational elements with strict nearest neighbor communications. There is a global clock, and on every clock cycle each processor inputs data, operates on the data and outputs data at the end of the cycle. No information can flow further than one processor in one system clock cycle. Systolic processors are a special case of a static pipeline. Until recently the area was dominated by systolic solutions to specific problems, presented without verification or derivation. However, a number of systematic and rigorous approaches to systolic implementations have now appeared, including one from this research program [10-11].

Systolic processors are attractive because they appear to be well matched to many of the constraints implied by VLSI implementations, and more importantly because solutions are relatively easy to find. The problem is that systolic implementations are often over-constrained (global clock tick, static pipeline configuration, etc.), resulting in solutions which have low processor utilization and which cannot be rate-optimal or delay-optimal. Much of the research already performed as part of this work unit can be viewed as searching for implementations which overcome the fundamental limitations of systolic processors.

D. SSIMD Implementations

Historically, Skewed Single Instruction Multiple Data (SSIMD) implementations were the first class of solutions which could overcome the systolic constraints and consistently achieve rate-optimal, processor-optimal implementations with nearest neighbor communications for a large class of interesting algorithms [1,15-19]. In SSIMD, exactly the same program is executed on each of the processors in the multiprocessor, and that program realizes exactly one iteration of the flow graph. In an SSIMD program, all of the arithmetic operations appear as explicit instructions, but the delay nodes are transformed into input-output pairs. In this way, the delay structure in the flow graph becomes the communications structure in the SSIMD realization [1,15-19].

For any given program and any given constituent processor, it is possible to compute a sampling period bound for the SSIMD realization [1]. This SSIMD bound for programs is equivalent to the iteration period bound for fully specified flow graphs. Hence, if a program can be generated such that the SSIMD bound is equal to the iteration period bound, then the SSIMD realization is rate-optimal.

The SSIMD approach to flow graph realizations is very attractive for many reasons. First, for all SSIMD realizations in which the number of processors is less than the processor bound, the implementations are perfectly efficient and the use of N processors
always increases the throughput by a factor of exactly N. Second, when the SSIMD iteration period bound is equal to the iteration period bound, as is the case for many recursive digital filter structures, then there exists no multiprocessor solution using the same constituent processor which is faster or more efficient. Third, although the sample period bound concept is not involved, SSIMD realizations work equally well for non-recursive structures. A significant and unique aspect of SSIMD solutions is that the program for a one processor solution is identical to the program for a P processor solution. The only difference is in the interprocessor communications connections. Finally, and most importantly, the all-important communications architecture for the final implementation is completely specified by the delay node structure of the flow graph. In particular, by constraining all the delay nodes to be first order, all single-time-index (one-dimensional) SSIMD solutions can be realized with a nearest-neighbor unidirectional ring. A similar result applies to two-dimensional flow graphs, [1,15-19]. However, if a more complex communications mechanism is available, then the flow graph can be defined to take advantage of it [1].

At the beginning of the current proposal period (1984), an SSIMD compiler had been demonstrated [5] which generates full multiprocessor implementations for the laboratory multiprocessor [7-8]. This compiler finds a rate-optimal SSIMD implementation, if it exists, and the best SSIMD implementation, if it does not. SSIMD implementations are always processor-optimal and communications-optimal (if they contain only first order delays). Two important points should be made concerning this SSIMD compiler. First, its application is by no means limited to the laboratory multiprocessor around which it was developed, and it can quite easily be used in top-down design systems using microprocessors, signal processing chips, or VLSI realizations. Second, and more important, is the result that if a rate-optimal SSIMD solution exists, it is very easy to find. The information available from the computation of the flow graph bounds defines so precisely the character of a rate-optimal solution that it is very simple to test whether an optimal SSIMD solution exists and to find it if it does. In contrast, finding the best sub-optimal solution is much more computationally intense. Hence we have the paradox that the most desirable optimal solutions are the easiest to find, but they may not always exist.

E. Hardware Support

From the very beginning, this work unit has maintained a philosophy of combining the theoretical developments with actual multiprocessor hardware for validation and innovation. Up until recently, the hardware system used was the laboratory multiprocessor system based on ten DEC LSI-11/2 microprocessors [7-8]. This system has proved very useful in the past, but is now really obsolete.

Based primarily on research results from this work unit, DARPA, in 1985, awarded Georgia Tech a two year project to build a small scale prototype of a DSP supercomputer. The system is called the Optimal Synchronous Cyclo-static Array, or OSCAR, and a sixteen processor prototype system was scheduled for completion in August, 1987. The OSCAR system was being built under the direction of C. J. M. Hodges, the same person who built the previous laboratory microprocessor system. Unfortunately, due to internal funding problems at DARPA, this project has been discontinued, and the OSCAR will not be completed.

Another component of the hardware environment which impacts this work unit is the development of DSP microprocessors like the TI TMS320 family, and the AT&T DSP family. At the current time, the Digital Signal Processing Laboratory at Georgia Tech is making heavy use of the TI TMS32010 processor using a board which plugs into an IBM PC [20]. Since
multiple boards can be plugged into the same system, fairly powerful multiprocessors can be configured, and these have been used for some applications [21] and in a graduate level course in multiprocessing. Although the existing boards do not have the synchronous features necessary to support the output of the multiprocessor compilers, they can be easily modified to include these features. In addition, the design and construction of new boards in this class is a relatively simple task. Because of the simplicity of constructing multiprocessors in this way, and because of the extremely good cost/performance of these machines, such small scale multiprocessors must become increasingly important in the future.

RESEARCH ACCOMPLISHMENTS:

Despite the catastrophic impact of the cancellation of the OSCAR project by DARPA, the research results from this work unit during the year have been quite good.

Work on the area of multiprocessor architectures for DSP since the last major proposal has centered in six areas: the generation of cyclo-static implementations for fully specified flow graphs; the generation of systolic implementations for fully specified flow graphs; efficient SSIMD compilation procedures; the generation of fully specified flow graphs from serial specifications; the use of blocked filters for increased multiprocessor throughput; and analysis/reconstruction filter bank theory. These topics are largely included in three theses [4,22,23], two of which are complete [4,22] and one of which is near completion [23].

A.1 Cyclo-static Implementations

By far the most important research result of this period was the creation of a cyclo-static compiler for fully specified flow graphs [3-4]. This work is part of the thesis research of David Schwartz [4].

The development of the cyclo-static compiler really grew out of an attempt to understand why SSIMD and PSSIMD implementations could be rate-optimal, processor-optimal, and communications-optimal while the systolic implementations for the same algorithms could not. There are really two separate reasons for the shortcomings of systolic arrays. The first is the fact that systolic processors are static pipelines. This means that any particular operation in an algorithm is assigned to a particular processor in the systolic array, and that operation is performed by that processor on every iteration. Hence, the operations are static and only the data moves through the multiprocessor. In contrast, SSIMD, PSSIMD, and cyclo-static processors are dynamic pipelines in which both the operations and the data move through the multiprocessor. The second reason is the effect of the global transfer clock. Indeed, this global transfer clock was the basic characteristic for which systolic arrays were named, giving the whole system its "pumping" action. There is no fundamental requirement that all the pipeline registers in the system be clocked simultaneously. There is also no reason that each processor must perform I/O on every clock cycle. In contrast, the input-output operations in SSIMD, PSSIMD, and cyclo-static implementations move in parallel, non-overlapping wavefronts (with a periodic pattern to the spacing of successive wavefronts).
Cyclo-static implementations are a new class of multiprocessor solutions which can be used for the optimal realization of iterative or recursive algorithms on synchronous multiprocessors. A processor which realizes cyclo-static implementations can be considered to be a generalization of systolic processors, wavefront array processors and SSIMD processors. Cyclo-static realizations differ in that they are provably optimal with respect to multiple criteria.

Cyclo-static implementations are a broad class of deterministically scheduled synchronous MIMD schedules which include systolic and SSIMD implementations as special cases. As previously noted, the primary feature of a cyclo-static implementation which distinguishes it from a systolic implementation is that, viewed from the reference point of a single iteration of the algorithm, a systolic implementation is static whereas a cyclo-static implementation is dynamic. In overly simplistic terms, if a systolic implementation can be considered to be an array of processors in which the instructions are fixed in space and the data travels through space, then a cyclo-static implementation is one in which both the instructions and data travel through space. This extra degree of freedom in a cyclo-static implementation is very important in generating optimal realizations. As its name implies, if viewed from points in space-time separated by an appropriate period, cyclo-static implementations can be considered static.

Cyclo-static solutions can be effectively found that achieve a subset of the following optimality criteria: rate optimal (maximally parallel, minimum iteration period), processor optimal (maximum processor efficiency), delay optimal (minimum throughput delay) and adjacent communications optimal (adjacent processor communications only). The procedure for finding solutions is a combinatorial optimization method which is efficient for typical realizations that are rate optimal. This problem was previously considered computationally intractable. In particular, previous researchers who considered optimal deterministic scheduling of flow graphs took the approach of transforming the original directed cyclic (containing loops) flow graph to a directed acyclic graph (loop free). Unfortunately the optimal solution to the acyclic graph can only fully exploit the parallelism of the original graph for a few special cases. The original aspects of this work are the direct utilization of the original cyclical graph, the concept of applying graph bounds as part of the multiprocessor compilations procedures and the introduction of a new approach to periodic scheduling.

A.2. The Generation of Systolic Implementations from FSFGs

Although systolic algorithms are not the primary topic of this research, many of the same formalisms and techniques which can be used to automatically generate cyclo-static implementations may also be used to generate systolic implementations. One of the important outputs of this research was a general method for the transformation of algorithms describable by SIFGs into equivalent systolic realizations [10-11]. Over the past several years, there has been considerable interest in systolic implementations, and a fairly large number of systolic algorithms have been published. For the most part, these algorithms have not been derived using any formal method, but rather each has been developed and presented separately, sometimes without extensive verification. The primary virtue of the method developed in this research is that it represents a rigorous and systematic procedure which, if correctly applied, always results in an error-free solution. The research showed that many of the previously published algorithms and many new algorithms can be generated by this single set of procedures.

The class of algorithms addressed by the systolic transformation procedures are all those procedures which can be described by shift-invariant flow graphs. As noted
previously, this is a very broad class of algorithms which includes not only all recursive and non-recursive linear shift invariant algorithms describable by signal flow graphs, but also matrix algorithms, systems involving decimation and interpolation, bi-linear systems, and many more.

The basis for the systolic transformation method are all graph theoretic techniques. The method is generally applied in three steps. First, an interleaving (up-sampling) transformation is applied if necessary. The transformation is necessary in most recursive systems to supply the delay elements which constrain data/information flow to a maximum distance of one adjacent processor per global clock tick (non-broadcasting). Second, a "static-pipeline iteration period bound" is computed for the transformed graph. This is equivalent to the iteration period bound with the further constraint that the implementation must be static. Finally, using a knowledge of the static-pipeline iteration period bound, a set of delay transformations are performed using either a cut-set or a single source maximum (delay) path spanning tree approach. The final resulting graph is directly realizable as a pipelined/systolic realization. Very similar results have just been reported by Jover [36].

Many well known systolic algorithms have odd numbered processors operating on odd clock cycles and even numbered processors operating on even clock cycles. For these algorithms it is possible to interleave two independent data streams and process both simultaneously. Using these techniques, it is easy to show that these properties are a consequence of the two-way interleave transformation required by step one for most recursive and minimum delay solutions.

The data interleaving is a consequence of the communications constraints. For many systems there is only one data stream to be processed. For these systems, the efficiency of systolic solutions is bounded above by the reciprocal of the interleaving factor. This inefficiency can be overcome by the new formal technique for merging operations resulting in a near systolic realization that is referred to as merged systolic [4].

Using these graph theoretic techniques, it is very simple to derive systolic solutions. Examples of FIR filters, IIR filters of direct form I and II, lattice filters, matrix-vector multipliers, and lower triangular systems solver have been presented [11]. The matrix-vector multiply is an extension of an FIR filter and the triangular system solver is an extension of an IIR filter.

A.3. Efficient SSIMD Compilation Procedures

As previously noted in the background section, one of the primary concerns at the beginning of this research period was the relatively slow computation times when the SSIMD compiler was unable to find rate optimal implementations. The basic reason for this effect was that when optimal implementations existed, the optimality constraints which could be applied generally resulted in an efficient compilation. The problem was that after the compiler proved that no optimal solution existed, there were no new constraints which could be applied while searching for a sub-optimal solution.

As part of the Ph.D. research of Sae Hun Lee, a new SSIMD compiler has been implemented which can efficiently compile both optimal and sub-optimal SSIMD implementations [6]. Like the previous compiler, the new SSIMD compiler takes FSFGs as input, and gives the best possible (fastest) SSIMD implementations as output. However unlike the previous compiler, the new compiler uses linked-list structures to represent and manipulate the structure of the FSFGs, rather than matrix structures.
The new compiler operates by ranking the loops of a recursive graph in order of their criticality, with the most critical loops first. The loops are then sequentially included in the partial solution using a restartable topological sorting technique. At each stage, the partial solution is tested for rate-optimality. If at any stage the procedure leads to a partial solution which is sub-optimal, then this constitutes a proof that no optimal solution exists. At this point, the compiler backs up and finds the lowest iteration period which is consistent with the existing partial solution. This new iteration period is used as the basis for a new optimality definition, with a corresponding set of new 'pseudo' critical loops, and the search continues. The dynamically defined pseudo critical loops and SSIMD iteration period bounds provide the constraints necessary to limit the search space, and reduce the complexity of the compiler.

In comparison tests between the new and the old compiler, the new compiler was found to operate at about twice the rate of the old compiler on problems which have a rate-optimal SSIMD solution, and about twenty times as fast on problems which do not. Of course, these numbers are only examples because the running times are highly problem dependent.

A.4. Multiprocessor Compilation from Serial Specifications

Another set of research results, largely included in the Ph.D. thesis of work Sae Hun Lee, is the presentation of a set of techniques for the generation of optimal multiprocessor realizations from non-parallel (serial) algorithm representations. These techniques, when they are fully implemented, will form the basis of a system which takes as input a serial presentation of the algorithm to be implemented, and gives as output an optimal SSIMD, static, or balanced PSSIMD solution [6,23]. If no optimal implementation of any of these classes exists, then the full cyclo-static compiler described above can be used to find an optimal solution.

The entire compilation procedure can be divided into four separate parts: the generation of a generic flow graph (GFG) from a serial specification; the generation of a fully specified flow graph from a GFG; the generation of the simplest multiprocessor schedule which attains rate-optimality; and the generation/specification of the communications solution. Although these four separate parts were developed to be combined into a single compiler, each is of interest in its own right.

The first part of this work deals with the problem of finding the best GFG which corresponds to a particular serially specified algorithm. The serial representation is in the form of a set of FORTRAN arithmetic statements and DO statements. The macro nodes in the GFG are in the form of the combination of any number of the same operation. The best GFG is defined as the one with the smallest number of nodes. The basic procedure is to identify the infinite loop which defines the algorithm, and to "unwind" all of the inside DO loops to form the graph.

Once a GFG is generated, the second part of the compiler converts it to a FSFG using a tree balancing technique [6,23]. The basic procedure here is to define a generic flow graph iteration period bound, which is a lower bound on the iteration period bounds of all FSFGs which can be generated from the GFG. Using the bound as a guide, a FSFG with the minimum possible iteration period bound is constructed.
The research has taken two separate, but related, approaches to the last two parts of the compiler. In the first approach, the compiler uses an extended version of the SSIMD compiler discussed in section A.3 to locate the simplest optimal schedule. The compiler first tries to find a rate-optimal SSIMD implementation. If the compiler proves that no such solution exists, instead of trying to find the best sub-optimal SSIMD solution, as before, it tries to find a rate-optimal static solution. If no such solution exists, it tries to find balanced PSSIMD solutions of increasing order until a rate-optimal solution is found. If no such solution exists, a full cyclo-static solution must be used [3-4]. Then, after the rate-optimal multiprocessor schedule is identified, the compiler finds the minimum pipelined and non-pipelined communications architecture which can support the solution.

In the second approach, which is still being investigated, the communications constraints are combined with the optimality constraints, and the last two parts of the compiler are combined.

A.5. Increasing the Parallelism of Filters

Recurrence relations, such as recursive filters, specified by fully specified flow graphs (FSFG), have a maximum parallelism that is constrained by one or more critical loops. In a processor optimal implementation, adding additional processors beyond the processor bound can not improve performance. However it is possible to increase the parallelism of the problem by modifying the FSFG. Two methods to increase the parallelism of recursive filters have been developed [4] [12]. The first approach is a blocking method which is based on the block state variable form for digital filters. For the block state variable form, an upper and lower bound on the minimum possible iteration period (iteration period bound) of a realization was developed. In addition, the associated number of processors required to achieve a given iteration period was determined. It was also shown that for many problems the blocked form has lower computational requirements and decreased finite word effects, even when implemented on a typical sequential uniprocessor. Similar, and related, results have been reported by Hui-Hung and Messerschmitt [24-25].

The second method, the "extended Moyer's method," [26] [4] is based on transforming the transfer function of a filter to a non-minimal form. Since the method is applied to the transfer function, the method is independent of the form of the filter. An equivalent, non-minimal, transfer function can be constructed that contains only delays of order L, \((z^{-L})\), in the loops. If the minimal form realization had a critical loop that contained \(k\) delays, the non-minimal form contains \(kL\) delays, and therefore the iteration period bound is a factor of \(L\) smaller. This method was suggested by Bellanger [26] for cases where \(L = 2^M\), and extended to the general integer case by Moyer. [26]. The results were further extended by Schwartz [4] to reformulate the solution in terms of real coefficients and analyzed in the rate bound framework.

Both methods exhibit a cost of \(O(L^2)\) processors to achieve the maximum processing rate resulting from transforming by order \(L\). The block state variable approach asymptotically approaches ideal speedup, while the extended Moyer's method always achieves ideal speedup. However the block state variable form has better numerical properties than the untransformed system, while the extended Moyer's method has poor numerical properties with increasing block size. While neither method is ideal, depending on the problem, both are useful for small transformation orders.
A.6. Time-Frequency Representations

The final area of research was analysis/reconstruction systems based on time-frequency representations. Such systems are fundamental components of many signal analysis and signal processing systems. This work is largely from the Ph.D. thesis of Mark Smith [22,27-30]. Mark Smith himself was supported on State research funds, and not directly on this contract. However, his thesis advisor, T. P. Barnwell III, was supported by JSEP for his part in this work.

There are two main contributions which came out of this portion of the research. The first was the invention of the conjugate quadrature filter (often referred to as Smith-Barnwell filters by other authors). These filters form the basis of the first ever maximally decimated two-band analysis/reconstruction systems [28,30] in which filters of arbitrarily good quality could be easily designed. These filters have been widely recognized as the solution to a long standing fundamental problem in digital signal processing. Multiband exactly reconstructing analysis/synthesis systems can be easily constructed using tree structures of conjugate quadrature filters.

The second major contribution in this area is the introduction of the alias-component matrix (AC-matrix) [28,31] formulation for the description of maximally decimated analysis/reconstruction systems. This formulation is based on the general filter bank transform (GFBT), which is a time-frequency representation which is similar to, but more general than, the short-time Fourier transform (STFT). The primary virtue of the AC-matrix formulation is that it allows most of the fundamental issues in maximally decimated time-frequency systems (filter quality, system complexity, frequency distortion, phase distortion, and aliasing) to be addressed separately in the design process. The effect of the AC-matrix formulation is to bring all previous maximally decimated analysis/reconstruction systems and many new systems under the umbrella of a single formalism. As such, all these related systems can be easily compared, and new systems can be easily designed.

During the last half of 1986, this work has been expanded by a new Ph.D. student named Kambis Nayebi [37]. One of the central unsolved problems in this area is to find techniques for design of high quality filter banks for analysis/reconstruction systems based on large numbers of channels. Nayebi [37] has been able to develop a set of orthogonality constraints between the coefficients of the filters in the filter banks. This represents a powerful new tool for the realization of such systems.
REFERENCES


PUBLICATIONS:

Theses


Journal Articles (published or accepted)


Papers in Conference Proceedings


Papers Submitted (for Journal Articles or Conference Proceedings)


WORK UNIT NUMBER 3

TITLE:

Two-Dimensional Optical Storage and Processing

SENIOR PRINCIPAL INVESTIGATOR:

T. K. Gaylord, Regents' Professor

SCIENTIFIC PERSONNEL:

E. I. Verriest, Associate Professor
J. A. Buck, Assistant Professor
E. N. Glytsis, Graduate Research Assistant (Ph.D. candidate)
A. Knoesen, Graduate Research Assistant (Ph.D. candidate)
T. A. Maldonado, A.R.O. Fellow (Ph.D. candidate)
R. S. Weis, Graduate Research Assistant (Ph.D. candidate)

SCIENTIFIC OBJECTIVE:

The long-term scientific objective of this research is to develop broadly-based, theoretical and experimental knowledge of two-dimensional optical information processing including algorithms, architectures, systems, and devices. This would bring together a range of concepts from basic physics to information processing in its most generalized form. Optical systems based on content-addressable memory processing, associative processing, Givens rotations, and hyperbolic rotations are being analyzed starting from basic physical principles and extending through experimental systems performance.

RESEARCH ACCOMPLISHMENTS:

A. Complexity of Residue Number System Processing

An analytic expression for the lower bound on the complexity of residue addition and multiplication was developed. Significant reduction of the the required stored logic in a content-addressable memory is noted. Errors in these results as published by C. A. Papachristou were corrected.

This research was published in IEEE Transactions on Computers.

B. Multi-Level Coded Residue-Based Content-Addressable-Memory Optical Computing

The extension of truth-table look-up processing beyond primitive operations (such as addition) to higher-level operations (such as discrete matched filtering) was presented. Use of the residue system and logical minimization techniques to reduce the required number of reference patterns stored in a content-addressable memory was illustrated for
16-bit full-precision addition. Multilevel coding of the numbers was introduced as a method to achieve further truth-table reduction. The required number of reference patterns for implementing the residue addition and multiplication operations were provided for all moduli from 2 through 32 with 2-, 3-, and 5-level coding. An optical holographic implementation of a system that processes multilevel coded numbers was presented.

This research was published in *Applied Optics*.

C. Truth-Table Look-Up Processing

The need for ultra-high-speed computing for a variety of modern processing problems has generated new interest in using truth-table look-up techniques. Further, due to the frequently parallel nature of these processing problems, optical systems appear to be promising for these applications. The basic principles of truth-table look-up processing were reviewed. The issues of number representation, multilevel coding, and logical minimization were developed. Example fixed-radix and residue number representations were given with and without multilevel coding. Logical reduction techniques were discussed with examples. A comparison of the number of truth-table entries needed for 16-bit full precision addition and multiplication were given, illustrating the advantage of the multilevel coded residue number representation.

These results were published in an invited paper in *Optical Engineering*.

D. Integrated-Optical Givens Rotation Device

The Givens rotation operation occupies a central role in linear algebraic signal processing. An integrated-optical coherent implementation of an elementary rotation matrix device, based on thick grating diffraction, to perform this operation was suggested. It was shown that existing electrooptic phase shifting and grating diffraction devices can be combined to produce a very fast Givens rotation device.

This research was published in *Applied Optics*.

E. Logical Minimization of Multilevel-Coded Functions

Discrete numerical values in digital processing systems may be encoded in two-level (binary) or higher-level (multilevel) representations. Multilevel coding can produce smaller and more efficient processors. In truth-table look-up processing, the number of entries (reference patterns) can be reduced using multilevel coding. Since parallel-input/parallel output optical truth-table look-up processors can be constructed based on holographic content-addressable memories, it is essential to know the minimum storage required to implement various functions. A new simple method for reducing multivalued functions was developed. This method is based on an extension of the Quine-McCluskey minimization method used for binary logic functions. This minimization method was then applied to the truth tables representing (1) modified signed-digit addition, (2) residue addition, and (3) residue multiplication. A programmable logic array gate configuration for the modified signed-digit adder was presented.

This research was published in *Applied Optics*.
F. Grating Diffraction

A rigorous coupled-wave analysis for metallic surface-relief gratings was presented. This approach allows an arbitrary complex permittivity to be used for the material and thus avoids the infinite conductivity (perfect conductor) approximation. Both TE and TM polarizations and arbitrary angles of incidence are treated. Diffraction characteristics for rectangular-groove gold gratings with equal groove and ridge widths were presented for freespace wavelengths of 0.5, 1.0, and 10.0 microns for all diffracted orders as a function of period, groove depth, polarization, and angle of incidence. Results included the following: (1) TM polarization diffraction characteristics vary more rapidly than do those for TE polarization, (2) 95% first-order diffraction efficiency occurs for TM polarization at 10.0 microns, (3) <0.1% zero-order specular reflectivity occurs for both TE and TM polarizations, (4) >50% absorption of incident power occurs at 0.5 micron, and (5) the perfect-conductor approximation is not valid for TM polarization at any of the wavelengths and is not valid for TE polarization at 0.5 micron.

This research was published in Journal of the Optical Society of America.

G. Analysis Interdigitated Electrodes on Electro-Optic Waveguides

Integrated optical interdigitated-electrode devices are used to diffract light and to launch acoustic waves. The electric field and the permittivity and strain tensors induced by a voltage applied to periodic interdigitated electrodes of finite thickness on the surface of an anisotropic electro-optic crystalline waveguide were calculated rigorously. The extremely important existence of a buffer layer between the electrodes and the waveguide occurring in practical devices was included in the analysis.

This work has been accepted for publication in Journal of Lightwave Technology.

H. Zero-Reflectivity Dielectric Surface Relief Gratings

Using rigorous coupled-wave analysis, high spatial-frequency rectangular-groove surface-relief phase gratings were shown to be capable of exhibiting zero reflectivity. Thus these corrugated surfaces may act as antireflection coatings in a variety of applications. The diffraction characteristics of rectangular-groove surface-relief gratings were presented for several ratios of incident wavelength to grating period as a function of filling factor, groove depth, angle of incidence, and polarization. The conditions for zero reflectivity were identified. Results are compared to single-homogeneous-layer approximate theory results. In the limit of long wavelengths for an electromagnetic wave in a dielectric of refractive index \( n \) normally incident upon a dielectric of index \( n' \), it was determined that for antireflection behavior, the grating groove depth should be \( \lambda/4(n' n)^{1/2} \) and the filling factor should be \( n/(n + n') \) or \( n^2/(n + n')^2 \) for the electric field perpendicular or parallel to the grating vector, respectively. The spectral and angular responses of these gratings are like those of single-homogeneous-layer antireflection coatings. These gratings also exhibit birefringent retardation.

This research has been accepted for publication in Applied Optics.
PUBLICATIONS:

Journal Articles (Published or Accepted):


Interactions and Technology Transfer

Analysis of allowed hybrid modes in anisotropic waveguides and interdigitated-electrode produced electric fields in integrated optical anisotropic waveguides is being performed for the US Armament Engineering and Development Center in Dover, New Jersey.
Honors and Awards

Thomas K. Gaylord received "Outstanding Contribution in Research" award for the paper "Analysis and applications of optical diffraction by gratings" given by the Southeastern Section of the ASEE, 1986.

Thomas K. Gaylord received "Best Research Paper in Engineering" award for paper "Analysis and applications of optical diffraction by gratings" from Sigma Xi, 1986.
WORK UNIT NUMBER 4

TITLE:
Two-Dimensional Optical/Electronic Signal Processing

SENIOR PRINCIPAL INVESTIGATOR:
W.T. Rhodes, Professor

SCIENTIFIC PERSONNEL:
J.N. Hereford, Graduate Research Assistant (Ph.D. candidate)
R.W. Stroud, Graduate Research Assistant (Ph.D. candidate)

SCIENTIFIC OBJECTIVE:

The long term scientific objective of this research is to gain a good understanding of the capabilities and limitations of hybrid optical/electronic methods for high throughput processing of 2-D signal information and to develop new and widely applicable techniques based on such methods. Emphasis is placed on establishing the capabilities of systems that mate well with digital signal processing systems.

RESEARCH ACCOMPLISHMENTS:

A. Bipolar Incoherent Spatial Filtering

Our original objective in this area was to develop effective methods for bipolar spatial filtering using incoherent optical systems that are simple to implement and efficient with respect to light utilization. That objective was augmented with the additional goal of maximizing overall system dynamic range in the case where optical and digital subsystems are combined with a scanning operation in between.

Work in 1985 saw completion of an elegant unifying theory of pupil function specification for the two kinds of two-pupil hybrid optical/electronic spatial filtering methods (one method involving interference of light from the two pupils, the other method not). Further, a two-step algorithm was developed for designing pupil functions that are optimal or nearly so in the sense of reducing noise and enhancing contrast. During the current year, three journal articles on the research have been published, a manuscript for another paper has been submitted for publication, and a paper on the subject was also presented at a conference.

Currently a new doctoral student is looking at a scheme for realizing pupil function syntheses of this kind using birefringent materials in common-path-interferometer-based imaging systems. This approach promises to eliminate many of the practical difficulties characteristic of other interferometer-based systems.
B. Optical Implementation of Morphological Transformations

In late 1985 we began a new study of highspeed opto-electronic methods for implementing morphological transformations (e.g., erosions, dilations, openings, closings) on binary images and nonlinear filtering operations (e.g., median filtering) on gray-scale images. The morphological transformations, which involve the interaction of the shape under study with a structuring element, are being used more and more in pattern recognition and artificial intelligence for robotic vision. Some of these operations are quickly performed using binary digital logic circuitry. Others, however (e.g., skeleton decomposition, where the pattern under study is reduced to primitive structural components), are sufficiently complex that TV-frame-rate processing is difficult to achieve. The schemes investigated perform the necessary operations in parallel. Real-time operation is contingent on finding a device that, much like high contrast film, will perform a hard limiting operation on an image, but essentially instantaneously. Such devices are currently under development (e.g., at GTE laboratories), and the methods being investigated appear to have good potential for success in a real-time environment.

In the processing operation, a binary input pattern is presented to a shift-invariant imaging system and a blurring operation performed. The blur function, which plays the role of the structuring element in the transformation, is often a disk, though other shapes are easily accommodated. The resulting blurred image is then hard limited. If the threshold is set for a high intensity, an erosion results; with a low intensity threshold, a dilation results. Setting the threshold at the 50% level results (with binary input and blur functions) in a median operation. Other threshold levels can be used to obtain other rank-order processing operations, assuming the input and blur functions are binary. These simple operations have been successfully demonstrated, and results have been presented at a conference. A journal publication is in preparation.

Continuing work in this area centers on further development of these opto-electronic methods and their extension to (a) skeleton decomposition of binary images and (b) nonlinear filtering (e.g., median filtering) of gray scale images. We have worked out conceptually a method for skeleton decomposition that involves integrating the results of a number of erosions followed by implementation of a Laplacian or similar operation and a hard limiting. All operations can be performed opto-electronically in parallel. A method for nonlinear processing of gray scale images has also been worked out and is under experimental investigation.

C. Other Image Processing Research

Quasi Space-Invariant Imaging. Contrary to widespread belief, a single-lens imaging system with limiting aperture in the lens plane is not space-invariant for coherent imaging and cannot, therefore, be described by a convolutional input-output relationship. Recently we conducted an experimental verification of a new principle of quasi space-invariance that applies to the intensity transmittance of diffuse, coherently illuminated objects imaged with what would normally be a space-variant single-lens imaging system. A brief manuscript presenting the principle is in preparation. The quasi space-invariance arises because of the whitening of the object spectrum introduced by the diffuser. Because of this whitening, all regions of the object "see" the same effective transfer function rather than substantially different ones. The principle is primarily of academic interest because of the speckle that is introduced in the image.
Coherent Optical Processing of Incoherently Illuminated Objects. Normally it is not possible to perform classical coherent spatial filtering operations like phase-contrast and dark-field imaging on objects that are illuminated incoherently (e.g., by light from a spatially extended incoherent source). We have recently discovered a new method by which, under a limited set of conditions, it is indeed possible to do coherent spatial filtering on incoherently-illuminated objects. The idea was first considered in the context of phase-contrast imaging of thermal plumes from submarines illuminated by noise from ocean waves. Our extension is to the optical case. A key restriction is that the object be only weakly diffracting, such as a weak phase object. We are preparing to verify the concept experimentally.

Falling Raster/Folded Spectrum Relationships. We, along with a small number of other researchers (e.g., William Stoner, Harper Whitehouse, Terry Turpin), have known for some time that alternatives exist to the conventionally-employed falling raster format for performing 2-D spectrum analysis on 1-D signal information. However, these alternatives have never been modeled analytically. During the past quarter a new doctoral student has begun the task of detailing the basic relationships between the non-conventional recording formats and the corresponding 2-D "folding" of the signal spectral distributions. It is not yet clear what kind of payoff we can expect from this study, though the resulting increased flexibility in recording format may be of use in some high-speed spectral analysis applications.

Low-Signal-to-Bias-Ratio Optical Signal Processing. Work that began under this program in connection with a Fourier transform scanning hybrid image processing system has evolved to center on the general problem area of bias buildup, signal-to-bias ratio, and signal-to-noise ratio in time-integration optical processing. This is a generic problem that affects several important classes of optical signal processing, including time-integration acousto-optic processing of radar and communication signals and incoherent holography.

We have begun a conceptual consolidation, with respect to bias problems, of previous work in the areas of time-integration optical signal processing, incoherent holography, and multiple-exposure holography and are developing an analytical formulation of the general problem applicable to these three areas. A method has been found and reported for optimizing small signal-to-bias-ratio exposures for time-integration optical processing where bleached silver-halide recording materials are used as the detector. Analytical details were reported at a recent conference. We are currently conducting experimental verification. Assuming our analytical results are confirmed, we shall submit a manuscript for Journal publication.
PUBLICATIONS:

Journal Articles (published or accepted)


Papers in Conference Proceedings


Papers Submitted (for Journal Articles or Conference Proceedings)


Interactions with DoD Labs

Met with Dr. Al Ellinithorpe of DARPA/STO on 30 May 1986.

Met with Dr. John Lee at the Naval Research Laboratory, 13 August 1986.

Met with P. Denzil Stilwell and Dr. John Lee at the Naval Research Laboratory, 23-25 September 1986.

Met with Dr. Jacques Ludman of RADC/Hanscom Field on 18 December 1986.
WORK UNIT NUMBER 5

TITLE:
Optimal Multiprocessor Structures for the Implementation of Digital Signal Processing algorithms on High Density Integrated Circuits

SENIOR PRINCIPAL INVESTIGATORS:
D. A. Schwartz, Assistant Professor and J. H. Schlag, Professor

SCIENTIFIC PERSONNEL:
H. Forren, (Ph.D. candidate)

SCIENTIFIC OBJECTIVE:
To develop techniques for the automatic generation of optimal or highly efficient implementations of digital signal processing algorithms for synchronous multiprocessor VLSI architectures.

RESEARCH ACCOMPLISHMENTS:

A. Generalization/Extension of Cyclo-Static Scheduling

Work performed in conjunction with H. Forren, a research assistant, has yielded generalizations and extensions to cyclo-static scheduling as well a rudimentary proposal for a basic VLSI processor node.

The method of cyclo-static scheduling has been generalized or extended in the following areas: 1) inhomogeneous processors; 2) pipelined processing elements; 3) hierarchical nodes; 5) a communications bound; and 6) Gantt chart permutations.

Previously only homogeneous processors were considered. For many DSP problems such as filtering, only the atomic operations of multiplication and addition are needed. When the processing elements are restricted to homogeneous processors even though processor utilization may be optimal, the system efficiency is lower since the processor must be capable of both multiplying and adding. This usually implies that the utilization within the processor node is sub-optimal. This is particularly important when considering algorithmically specialized (custom) processors in the VLSI context. For efficient utilization of silicon area a design based on simple multiplier and adder elements is more effective than a general purpose processor that must include complex control and storage and ALU. The previously reported lower bound on the number of processors can be directly extended by partitioning operations into a set of operations associated with each processor type. This can also be used to model inhomogeneous communications delay since the model normally binds communication delay to the producing operator.

Pipelined processing elements are attractive at both the discrete and VLSI element level. This is because pipelining usually yields the lowest cost approach to concurrency. Pipelined processor elements can also be handled by a minor extension to the previously reported bounds. The iteration period bound and the delay bound are now computed in
terms of the pipeline latency instead of the operational latency. Other expressions such as those for the processor lower bound and for the processor modulo constraint must use the stage latency. While the extensions are very simple, they are also very powerful and, in addition, have been successfully applied as a tool for writing optimal code for pipelined array processors and pipelined DSP chips.

Hierarchical graphs can be used to reduce the apparent complexity of the graph to the point where the optimization tools are effective. Further modifications of previous expressions must be made in order to allow the use of hierarchical nodes. Instead of associating a single latency with the time from the arrival of the last input to the appearance of the single output, an independent latency must be associated with each input or output (where multiple outputs must be considered as well).

A new bound on communications has been determined. This communications bound reflects the local characteristics of a homogeneous communications network required to implement a signal flowgraph. The number of neighbors with which a processor can directly communicate (including itself) is an important measure of network complexity. By observing the worst case fan-in or worst case fan-out of a signal flowgraph, as well as the pad time allowed along the respective branches, the direct neighbor lower bound can be established.

In prior work, only adjacent processor communications were considered in scheduling and mapping a solution onto a target. For an architecture that supports parallel communications and operations it is practical to pass data through more than one processor when the schedule has sufficient slack time between the availability of the data and the need to use it. From a valid schedule, the exact communication requirements of that schedule may be found directly. Simple formulas indicate the number of communication steps available for data to travel from one processor to another. The distance between these processors in terms of a network model is similarly found. If all chords of a network (if any) are shorter than this specification, the data can not travel far enough in the time allowed. Defining the local radius of the network to be the greatest distance that data can travel in one communications step simplifies the process of determining a valid (if it exists) processor mapping given a schedule. These preliminary results may form the basis of an effective scheduler that can determine the minimum possible communications support for a given FSFG/algorithm.

The Gantt Chart has been presented as a useful representation of a schedule or program. Numerous manipulations to the Gantt Chart allow one schedule or program to be transformed into another trivially different one. In the case of the program, a dramatic difference in the communications network may result. A brief list of manipulations is presented here.

Columns may be arbitrarily rotated, effecting only a relative time shift.
Rows may be arbitrarily switched, effecting which processor does which row.
Row fragments may be sliced and spliced so long as there is no time shift.
Any two (or all) cycles may be spliced into a single cycle.

An important result of manipulating Gantt Charts is the ability to transform any schedule or program into SSIMD form with homogeneous processors. The homogeneous processor requirement is obvious for SSIMD. If inhomogeneous processors are required, then cycles may be spliced until a single cycle exists for each type of operation or processor. If the number of rows in each cycle is the same, then the result is a PSSIMD solution. If
the number of rows in any two cycles differ, then the result is a multiple cyclo-static instruction streams, multiple data streams (MCIMD) solution. [MCIMD is an extension of cyclo-static to allow multiple lattice vectors, which are often necessary for inhomogeneous solutions].

The subject of the following section is the result of work performed in conjunction with P. W. Hutto, a graduate student. Hutto was supported by with funds from DARPA and the State of Georgia (E-Funds). His advisor, D. A. Schwartz was supported in this research by JSEP. Progress and preliminary results has been made in the following areas.

B. Optimal Data Routing

Based on the theoretical properties of cyclo-static realizations, DARPA sponsored the design of a small scale model of a super computer for digital signal processing which is referred to as OSCAR (Optimal Synchronous Cyclo-static ARray). OSCAR features a generalized/extended ILLIAC-IV type interconnection network. The ILLIAC-IV interconnection network was a simple SIMD machine that required each node to execute the identical routing instruction. OSCAR allows independent, arbitrary, five port interconnection at each node. In addition the interconnection network operates in parallel with processor node computations. This powerful interconnection network is the backbone of OSCAR's fine grain parallelism. However, for those algorithms of interest that do not meet the model (cyclo-static) of the existing compiler tools (e.g. FFT), the problem of optimal data routing for implementations where data must be passed through one or more processor nodes to reach its destination is largely unsolved. Recent research has only addressed hypercube routing and restricted cases of ILLIAC-IV type meshes.

Spurred by problems in determining an optimal routing algorithm for a parallel FFT, a software testbed was developed to investigate optimal routing for generalized many-to-many data permutations. An optimization technique based on simulated annealing was developed to determine optimal/near optimal routes. The method is based on assigning minimum length paths to all data paths, then annealing the paths by introducing delays and path permutations so as to eliminate all communications conflicts.

Preliminary results indicate that the method is very efficient at producing optimal/near optimal data routes. The viability of the the simulated annealing technique was suggested by the success of the method for optimizing layout routes for VLSI.

C. Differential Equation Compiler

The numerical solution to differential equations is a very common, CPU intensive application of scientific computing. In particular there has been great interest in the real time solution of the equations of motion for robot arms. It was noted that the solution to initial value problems that do not use data-adaptive integration techniques are precisely the class of problems that can be optimally solved with cyclo-static techniques. However the resulting FSFG's are complex and irregular. To provide a more effective environment a simple compiler was developed to demonstrate the feasibility of using a high level, FORTRAN-like, language to specify the systems of differential equations. The output of the compiler is a list that specifies the resulting FSFG. The current implementation is rather frail and needs to be strengthened to solve a larger class of equations, and to use more advanced solution techniques. This was only intended as a feasibility study and it is not planned to pursue this area further at this time.
D. Off Chip Interface Issues

As a result of developing the OSCAR architecture several results were discovered relating to communications design and off-chip interfaces in fine grain parallel architectures. It is unlikely that it will be possible to implement a high performance multiprocessor system on a single chip in the near future. Therefore the off chip interface is crucial in the design process. There is always a large speed penalty for going off chip. In the cyclo-static multiprocessor model it had been assumed that the interchip delay was negligible compared to the intrachip computation time (for an atomic operation). During the course of the past three years this has changed significantly.

Two methods have been developed to handle this problem. First, if the interchip delay is comparable and less than (or equal to) the operational delay, then the interchip communications time can be effectively pipelined by using an interleaved design. This allows all communications to be treated homogeneously and maximizes the system throughput. The penalties are that this requires partitioning (or cooperation) of system resources between the tasks, the existence of orthogonal tasks, increased throughput delay, and potential problems with synchronizing tasks.

The second method is to treat the interchip vs. intrachip communications as an inhomogeneous communications medium. In the cyclo-static system model it may be possible to assign all operations in a critical loop to a single chip and to split non-recursive sections and non-critical loops with sufficient slack time across chip boundaries by spending the slack time on the communications delay. Simple examples have demonstrated the feasibility of this approach, however a formal method to handle inhomogeneous communications is in the early stages of development and will require future work.

Both methods seem to indicate that for technologies such as GaAs where the ratio of interchip to intrachip delay is so large that fine grain parallelism across chip boundaries is impractical. This suggests a hierarchical approach of fine grain parallelism for intrachip scheduling and medium to coarse grain parallelism for interchip scheduling.

A second issue, closely related to the previous issue is communication bandwidth. To achieve fine grain parallelism a very large communications bandwidth is required. Consider the case of all atomic operations being simple binary operators (two inputs, one output). For each operation, potentially, two operands must be fetched from other processor elements. If the communications time is to be negligible compared to the computation time this implies approximately an order of magnitude more communications bandwidth than used in conventional design wisdom. In order to make it practical to find optimal/good schedules, at present it is necessary to insure that the communications is non-blocking. This can only be achieved by supporting an even higher communications bandwidth (which is a function of the algorithm and the target communications network). This does not imply that fine grain parallelism is flawed. What is does imply is that for a system to be able to solve a class of problems efficiently with large parallelism that there is a need for large communications bandwidth.

In short, powerful, high bandwidth communications is fundamental to fine grain parallelism. Much more work needs to done on the fundamental architectural implications of communications requirements on the off chip interface. It is also obvious that fine grain parallelism carries with it architectural complexities that may be better addressed by applying the fine grain approach and theory on a coarser grain (herein lies current applicability).
PUBLICATIONS:

Books or Chapters in Books


Papers in Conference Proceedings


WORK UNIT NUMBER 6

TITLE:
Electromagnetic Measurements in the Time and Frequency Domains

SENIOR PRINCIPAL INVESTIGATOR:
G. S. Smith, Professor

SCIENTIFIC PERSONNEL:
W. R. Scott, Jr., Assistant Professor
M. Gouker, Graduate Research Assistant (Ph.D. Candidate)

SCIENTIFIC OBJECTIVE:

The broad objective of this research is to develop new methodology for making electromagnetic measurements directly in the time domain or over a wide bandwidth in the frequency domain. This research includes the development of the theoretical analyses necessary to support the measurement techniques. One aspect of the research is the systematic study of radiating structures placed near or embedded in material bodies. In a practical situation, the radiator might serve as a diagnostic tool for determining the geometry, composition or electrical constitutive parameters of the body.

RESEARCH ACCOMPLISHMENTS:

During the last year, research was initiated on the following new topics.

A. Pulse Excited Antennas Near a Material Interface

Ground penetrating radar systems have been proposed for many applications; these include the detection of mines and buried unexploded ordinance, the location of buried utilities and the mapping of subsurface geological structure. The systems generally make use of a temporarily short, wide bandwidth, pulse. The pulse is transmitted and received by one or more antennas located above the surface of the earth.

The characterization and design of antennas for this application are complicated by two factors: the broad-band requirements and the effect of the air/earth interface on the performance of the antenna. Two types of antennas have generally been used for systems of this kind: dipole type antennas located very close to the air/earth interface and TEM horn antennas located at a larger distance from the air/earth interface.

We have initiated a research program to study new antennas for this application. Experimental facilities were constructed for measuring the performance of antennas over an interface between air and wet sand. Sensors and targets embedded in the sand are used to evaluate the field of the antenna.
Scale model antennas are used with this system; full size antennas would generally require larger test facilities than can be accommodated in our laboratory. Techniques have been developed for correcting errors in the automated time-domain instrumentation used with this system.

B. Materials for Electromagnetic Scale Models

Electromagnetic scale models are of value in the design and testing of electromagnetic systems that are on a scale too large or too small for routine laboratory use. For example, the performance of antennas over or buried in the earth can be evaluated in the laboratory using reduced size scale models operated at frequencies higher than those used for the actual antennas. For this model, a material is needed that will model the electrical properties of the earth. Scale models with increased size are also used. A microscopic antenna on a substrate may be modeled by a larger antenna operated at a frequency lower than that for the actual antenna. A material is then needed that can model the electrical properties of the substrate.

Clearly it is desirable to have a series of materials with a range of electrical properties or mixtures of materials with adjustable electrical properties for use in scale models. We are beginning a study of the electrical properties of emulsions (mixtures of oil and water) as binary solutions with adjustable electrical properties. The constituents of the emulsion are relatively inert and satisfy all of the requirements as to toxicity, flammability, etc. We started a systematic investigation of the electrical properties of several emulsions, with the objective of obtaining information on:

a. the range of electrical parameters that can be realized,

b. The frequency dependence of the electrical properties,

c. the accuracy of mixing formulas for predicting the electrical properties,

d. the stability of the emulsions.
PUBLICATIONS:

Journal Articles (published or accepted)


Papers in Conference Proceedings


Interaction with DoD Labs

During the period of the contract, a study entitled "Hardened Antenna Technology Assessment" was performed for the Air Force (RADC, Griffis AFB). This study made use of information developed on the Joint Services Electronics Program.
TITLE:
Automated Radiation Measurements for Near- and Far-Field Transformations

SENIOR PRINCIPAL INVESTIGATOR:
E. B. Joy, Professor

SCIENTIFIC PERSONNEL:
- R. E. Wilson, Graduate Research Assistant (Ph.D. Candidate)
- B. Keith Rainer (Ph.D. Candidate)
- Will D. Caraway (M.S. Candidate)
- John R. Thomas (M.S. Candidate)
- Darryll G. Wright (M.S. Candidate)
- Mike G. Guler (M.S. Candidate)
- Sherill J. Edwards (B.S. Candidate)

SCIENTIFIC OBJECTIVE:
The long term objective of this research is to understand the near field and far field coupling between antennas in the presence of scatterers. Special emphasis is placed on determination of limits of accuracy in the measurement of the fields radiated or scattered by an antenna-under-test by a second antenna and to develop techniques and computer algorithms for compensation of such measurements due to known geometrical or electromagnetic anomalies.

Three application areas are pursued: a) antenna measurements, where the effects of scatterers are suppressed or compensated; b) scattering measurements, where the effects of scatterers are enhanced and c) radome measurement, where the effects of the scatterer (the radome) are of equal importance to the antenna measurement.

RESEARCH ACCOMPLISHMENTS:
A. Spectral Evaluation of Reflector Surfaces Used for Compact Antenna Ranges

The test zone field quality of a compact antenna range is dependent on primarily three factors: the surface accuracy and smoothness of the range reflector, the scattering and diffraction from the edge of the reflector and the complex vector far field pattern of the feed antenna. It was shown that the effect of the surface accuracy and smoothness can be quantified using a newly developed spectral analysis technique. The surface error is expressed as a two-dimensional summation of sinusoids. The illuminating field is expressed as a two-dimensional spectrum of plane waves incident on the reflector. It is shown that the effect of the reflector surface error is to phase modulate each plane wave with a modulation index proportional to the magnitude of the sinusoidal components of the surface error. The compact range geometry was shown to be a low-pass filter, allowing only those plane waves with spatial period of approximately three wavelengths and below to pass to the quiet zone. Thus it was learned which reflector surface accuracy were
important for compact range use and the relationship between surface accuracy and test zone field performance. These results were presented at the 1986 Antenna Measurement Techniques Association Workshop on Compact Ranges and presented at the 1986 Antenna Measurement Techniques Association Meeting.

B. Near-Field Measurement of Radome Performance

Initial work has been carried out to develop a technique of near-field measurements of radome performance. Spherical surface near-field measurements have been made of a fused silica, tangent ogive, missile radome at 11.75 GHz. A special antenna was designed and constructed and an existing radome was used for the measurement. A "backward transform" algorithm is concurrently under development to determine the fields on the outer surface of the radome from the measured near-field data. Backward transforming using a spherical wave expansion is both non-unique and unstable. The higher order modes, which must be allowed in the solution, increase rapidly as the radius of the back projection decreases. Preliminary results show that sufficient information is contained in the radiating modes and little is lost in setting the higher order mode amplitudes to zero. Papers describing the near-field measurements made with and without a radome and the results of the initial work on radome performance assessment using this new measurement technique were presented at the 18th Symposium on Electromagnetic Windows and the 1986 Antenna Measurement Techniques Association Meeting.

This work, plus other publications and technology transfers of the principal investigator are summarized in the following:
PUBLICATIONS:

Books or Chapters in Books


Journal Articles (published or accepted)


Papers in Conference Proceedings


Technology Transfer

1. U. S. Navy: M. I. T. Lincoln Laboratory is developing a high performance surveillance radar system which incorporates a shipboard mounted ultra-low sidelobe planar phased array antenna. The array has a 5-meter by 10-meter aperture and a -55 dB rms sidelobe levels. M.I.T. Lincoln Laboratory will install a multi-million dollar plane-polar near-field measurement system to test and align the phased array. To achieve the desired measurement and alignment accuracies, the K-correction probe
position error compensation technique and algorithm developed under JSEP sponsorship will be transferred to M.I.T. Lincoln Laboratory. (The request for proposal on the facility specifies use of this technique.)

2. U. S. Army Electronic Proving Ground, Fort Huachuca, Arizona: The spectral analysis technique, for the evaluation of reflector surfaces developed under JSEP sponsorship is currently being used in the design of a large outdoor compact antenna measurement range at Fort Huachuca.
END
7-81
Dtic