Final Technical Report to the
Advanced Research Projects Agency
for Grant MDA972-93-1-0028

Methods and Components for
Optical Contention Resolution
in High Speed Networks

DISTRIBUTION STATEMENT A
Approved for public release
Distribution Unlimited

SPACE, TELECOMMUNICATIONS and RADIO SCIENCE LABORATORY

DEPARTMENT OF ELECTRICAL ENGINEERING, STARLAB / SEL
STANFORD UNIVERSITY • STANFORD, CALIFORNIA 94305-4055

DTIC QUALITY INSPECTED 2
DISCLAIMER NOTICE

THIS DOCUMENT IS BEST QUALITY AVAILABLE. THE COPY FURNISHED TO DTIC CONTAINED A SIGNIFICANT NUMBER OF PAGES WHICH DO NOT REPRODUCE LEGIBLY.
Final Technical Report to the
Advanced Research Projects Agency
for Grant MDA972-93-1-0028

Methods and Components for
Optical Contention Resolution
in High Speed Networks

DISTRIBUTION STATEMENT B
Approved for public release
Distributed unlimited

August 15, 1993 - March 31, 1996

Submitted by:

Prof. Leonid G. Kazovsky, Principal Investigator
Department of Electrical Engineering
Durand 202, MC-9515
Stanford University
Stanford, CA 94305-9515
Tel: (415) 725-3818
Fax: (415) 723-9251
E-mail: leonid@cher.stanford.edu

DTIC QUALITY INSPECTED 3
1. Executive Summary

This report covers the work performed by Stanford University for the CORD (COnention Resolution with Delay-lines) project under Advanced Research Projects Agency contract number MDA972-93-1-0028. The duration of the project was from August 15, 1993 to March 31, 1996. The CORD project participants are a consortium of GTE Laboratories, Stanford University, and the University of Massachusetts.

The purpose of the CORD project was to develop and demonstrate components and methods to resolve the problem of packet contention in very high speed networks, using an all-optical approach. The approach demonstrated in the project is based on a concept in which arriving packets are switched and stored in optical delay lines so as to interleave empty and occupied time slots, increasing significantly the network throughput and reducing management complexity.

The CORD consortium effort has three main goals: investigating networking issues involved in optical contention resolution (University of Massachusetts), constructing the experimental Contention Resolution Optics (CRO) (GTE Laboratories) and building a packet-switched optical network prototype employing the CRO and novel signaling/synchronization techniques (Stanford University). This report contains background information and description of the CRO but is primarily focused on the contributions made by Stanford University.

To demonstrate the performance of the CRO, we built a two-node wavelength division multiplexed (WDM) testbed. A signaling scheme was developed to transmit ATM packets at a per node transmission rate of 2.5 Gb/s with routing information transmitted in parallel at 80 Mb/s. Optical transceivers were designed and built as were electronics to generate and receive traffic and measure the performance of the testbed. Novel synchronization techniques were developed to achieve global packet synchronization and ultra high-speed clock recovery to allow all-optical packet switching of ATM (53 byte) packets at 2.5 Gb/s. Further custom electronics were developed to synchronize, control, and monitor the operation of the CRO. The CRO was characterized in detail and significant changes were made to improve its performance. We also investigated alternate CRO configurations to further improve performance and scale for more than thirty delay-line stages.
The principles and technologies developed by the CORD consortium are flexible and are expected to find future application, together or separately, in a variety of topologies and network architectures. The CRO concept is directly applicable to other key problems in optical networks, such as all-optical time-slot synchronization, all-optical bridging, and packet/circuit integration in optical networks. The multiple subcarrier header encoding-decoding and the slot and clock synchronization techniques are expected to yield a substantial contribution towards the solution of the critical problems of signaling and ultra-fast synchronization in all-optical packet-switched networks.
2. Introduction

The COntent Resolution by Delay-lines (CORD) project is an ATM optical packet-switched LAN consortium effort of GTE Laboratories, Stanford University, and the University of Massachusetts. The implementation of CORD, as with other optical packet-switched networks, requires that the problems of resource contention, signaling, and local and global synchronization be resolved. With CORD, an optical solution to resource contention, based on the use of optical switches and delay lines is utilized. Signaling is achieved with subcarrier multiplexing of packet headers. Synchronization issues are resolved by means of clock tone multiplexing techniques, digital processing for ultra-fast clock recovery, and distributed techniques for global packet-slot alignment.

The CORD consortium has three main goals: investigating networking issues involved in optical contention resolution (University of Massachusetts), constructing the experimental Contention Resolution Optics (CRO) (GTE Laboratories) and building a packet-switched optical network prototype employing the CRO and novel signaling/synchronization techniques (Stanford University).

The objectives of the CORD Consortium (COntent Resolution by Delay Lines) are to demonstrate the reliable feasibility and effectiveness of the CRO principle and of its enabling devices, as well as to develop strategic technologies for the implementation of packet-switched all-optical networks, such as subcarrier-based packet header encoding-decoding, global packet-slot synchronization and ultra-fast clock recovery. A network prototype will combine the protocol, system and device technologies developed by the Consortium in order to assess their viability and effectiveness. By developing and demonstrating these key technologies, the CORD project hopes to pave the way towards WDM all-optical packet switched network solutions which may take full advantage of the potential capacity of the optical medium.
The main project tasks are:

- Investigate networking issues and develop optimal CRO control strategies (University of Massachusetts);
- Develop the key devices and construct an experimental CRO fabric based on integrated semiconductor technologies (GTE);
- Build an optical packet-switched network prototype, utilizing the CRO, subcarrier-based packet header encoding-decoding, global packet-slot synchronization and ultra-fast clock recovery (Stanford University).

More specifically, the tasks of the University of Massachusetts are the following. To conduct a general investigation of the applicability, effectiveness and salability of the concept of optical contention resolution in the context of optical networking. To design and evaluate an optimal CRO structure and control strategies, optical channel access management, and network resource allocation and traffic management. In particular, the objectives of the CRO control strategy are the minimization of residual contentions and of the time spent by a packet in the CRO. Lastly, to evaluate and optimize global network performance in communication systems using CRO based nodes.

The task of GTE is the development of key devices for a rugged and reliable implementation of a CRO. GTE has specifically developed a fast semiconductor optical switch and will achieve its integration with an optical amplifier on a InP substrate. Dielectric waveguide interconnection technology based on the concept of mode transformation in uniform waveguides is used to provide the necessary low loss fiber-to-switch and switch-to fiber pigtailing. The optical amplifiers are included in the CRO module to offset the device insertion losses and reduce the burden on the switch and interconnection tolerances which may impair its practicality.

Stanford University is building a two-node network prototype which includes the CRO fabric provided by GTE. Among the specific tasks of Stanford University there are system analysis and design and implementation of: high speed optical digital transmitters and receivers, subcarrier-based packet header encoders and decoders, high speed node and
CRO control electronics, global time-sloting circuitry, ultra-fast clock recovery for header and payload bits, ATM-size packet generation and reception logic at 2.488 Gb/s/s and error checking and network performance measurement circuitry. Network performance evaluation tests will be conducted to assess the effectiveness of the CRO approach.

The principles and technologies developed by the CORD consortium are flexible and are expected to find future application, together or separately, in a variety of topologies and network architectures. The CRO concept is directly applicable to other key problems in optical networks, such as all-optical time-slot synchronization, all-optical bridging, and packet/circuit integration in optical networks. The multiple subcarrier header encoding-decoding and the slot and clock synchronization techniques are expected to yield a substantial contribution towards the solution of the critical problems of signaling and ultra-fast synchronization in all-optical packet-switched networks.

2.1 Background and Previous Work

Both circuit and packet switching have been proposed to provide the user with the required flexibility for an efficient access to optical networks. Several optical circuit switching solutions have been recently proposed. Recently, a trend has developed that places the emphasis on packet switching solutions to provide the added flexibility for supporting bulk data transfer, multimedia and other advanced applications [1 - 3]. These solutions appear to be preferable in the field of high supercomputing and LAN/MAN backboning, and are potentially more flexible and efficient in allocating bandwidth to multiple users with different transmission needs, especially at high transmission speeds. In addition, the packet switching approach is fostered by the worldwide emergence of the ATM standard.
In this context, all-optical solutions are attracting a considerable interest, whereby some or all of the switching and routing operations are done without conversion of packets between the optical and electronic domain within the network. This way, the potentially huge bandwidth available in optical fibers can be better exploited, without incurring electronic bottlenecks.

Even in all-optical packet networks, the fundamental problem of resource contention still arises. When multiple packets, arriving on different fibers or wavelengths, need to simultaneously access one or more of the optical network resources (for instance receivers, switches or wavelengths), the resulting contention must be resolved. In an optical network a solution maintaining the packets in the optical domain is desirable.

One possible way to tackle the contention problem in packet switched systems with WDM-TDM access techniques is the use of Switched optical Delay Lines (SDL). An SDL unit is an optical fabric based on a combination of optical switching matrices and fiber delay lines. It allows to redistribute packets contending for the same resource over time and space, while keeping them in optical form. This way, re-scheduling of resource access by the contending packets can be obtained, and contention can be reduced.

SDLs for optical packet switched networks were originally proposed in [4], as a means to deal with receiver contentions in a WDM star network. The technique, termed Quadro, was later proposed for rings, and multi-hop optical Manhattan Street Networks (MSNs) [4 - 6]. Since then several research groups have explored the concept, in various contexts, to deal with different contention problems started to explore the potential of this principle in different optical packet switched network architectures. The use of delay lines to reduce packet deflection penalties in MSNs and ShuffleNets was analyzed in [6], and, with soliton transmission, in [5]. In [6-7], all-optical NxN switches have been proposed for WANs to optically resolve output link contentions. An approach for

---

1 In this report we will term "all-optical" any approach in which the optical packet is not converted to the electronic domain prior to reaching the final destination node.
simultaneously dealing with resource contentions as well as time-misaligned frames was proposed in [8].

In Europe, the RACE-2039 project (ATMOS) has demonstrated the feasibility of an ATM photonic switching fabric based on fiber delay lines. In Japan a recent experiment has realized an FDM output buffering in an ATM switch architecture using an SDL routing-type time-division interconnection network, called FRONTIERNET [24]. In Australia, recirculating fiber delay lines have been used in self-routing photonic packet switches to resolve output contention for asynchronous photonic switching applications.

The actual practicality of the SDL approach, however, still needs to be proved. For instance, the traditional LiNbO$_3$ interferometric switching devices, on which virtually all prototypes have been based so far, are very difficult to control and are polarization sensitive. A key step that needs to be taken in view of actual applications is the fabrication of a polarization-insensitive device, easy-to control and adjust, that features integrated switches and possibly integrated amplifiers.

Another challenge with optical packet-switched networks is that of signaling. A routing or receiving node must be able to retrieve header information from simultaneously incoming WDM-multiplexed packets on the fly. To accomplish this task, a number of signaling techniques have been investigated, including subcarrier multiplexing [8 - 11] combined modulation [12 - 14], and dedicated control wavelengths [1, 15]. We proposed to use subcarrier multiplexing (SCM) of packet headers for the signaling function in CORD and named it as multi-channel subcarrier multiplexed-(MSCM) signaling. This technique is successfully tested in conjunction with fully operational transmission and synchronization subsystems. Several implementation issues, such as crosstalk between baseband payload and headers, is investigated.

Finally, synchronization, both at the packet level and bit level, is a critical aspect of all-optical packet networks. Packet slot synchronization is required so that packets arrive at the network routing and receiving nodes with their boundaries aligned in time. Bit
recovery must be performed on a packet-by-packet basis, since packets in adjacent timeslots may be coming from different transmitters. Conventional clock recovery techniques, such as phase-locked loops, fail in packet-switched environment. Ultra-fast digital techniques, potentially capable of recovering the clock in a few bits, have been recently proposed [16 - 18]. Alternatively, a clock tone can be sent along with the packet payload data [19, 20].

This report describes our effort to build a prototype of a WDM all-optical packet switched star network where receiver contentions are substantially alleviated by means of an SDL receiver front-end, called Contention Resolution Optics (CRO). Subcarrier multiplexing of packet headers is used to encode and decode header information. A novel global packet-slot synchronization technique is used. A fully digital clock recovery techniques is used to decode header bits, whereas for packet data bits the clock is directly extracted from a multiplexed tone.

This report is dividing into ten sections. Section 1 is the executive summary. In Section 2, we give the introduction, background, previous work on the all-optical packet-switched networks, and we describe the CORD network testbed that has been built at Stanford University. In Section 3, a detailed description and experimental results of the testbed and in particular, the CRO, is given. In Section 4, the multi-channel subcarrier multiplexed signaling (MSCM) technique we have developed for signaling in CORD is explained and performance of the technique is given. In Section 5, we detail the synchronization techniques we developed and the performance of them. In Section 6, we describe the logics of the packet generators as well as the error detectors we developed for the CORD project. In Section 7, we summarize and conclude the work done on CORD and its impact on all-optical packet switched networking. Section 8 includes the references made in this report. Section 9 includes Stanford’s CORD project publications. Finally, section 10 contains the contributors to the CORD project from Stanford University.
2.2 CORD Testbed Topology

CORD can support an arbitrary network topology, including star, ring, and mesh. For experimental implementation, a WDM passive star topology has been selected because of its widespread use for LAN and MAN applications. Our testbed contains two nodes, each transmitting on a unique wavelength: \( \lambda_A = 1310 \) nm and \( \lambda_B = 1320 \) nm (Figure 2.1). Each node transmits ATM packets (53 bytes) at 2.5 Gb/s, addressed both to the other node and to itself, thus originating possible receiver contentions. At each receiver, a WDM demultiplexer separates the two wavelengths. Contentions may arise when two packets are arriving at a receiver at the same time on different wavelengths. These contentions can be dealt with using the CRO (CRO stands for Contention Resolution Optics - see Section 3).

![CORD Testbed Diagram](image)

*Figure 2.1 CORD testbed.*

When contentions occur, the CRO places one packet into an optical delay line, equal in length to the transmission time of the packet. The other packet is being received in the meantime (Figure 2.2). When the queued packet exits the delay line, it is routed to the receiver which is now finished receiving the other packet. In this case both packets are eventually received and there are no lost packets. If, however, additional packets destined
for the same node were to arrive before the reception of both previous packets has been completed, then another contention will result.

![Diagram of packet arrival and delay lines](image)

*Figure 2.2 Contention Resolution Optics (CRO).*

If the total arriving throughput exceeds the capacity of the CRO-equipped receiver, the queue will fill up with surplus packets, and some packets will have to be dropped. The capacity of the queue is determined by the number of delay line stages and the length of each fiber delay stage. As the queue capacity is increased, the CRO's ability to resolve contentions increases. The capacity of the queue is limited by the optical power budget. In the CORD testbed, a two stage delay line CRO is used. One node (Node 1) is equipped with the CRO; the other (Node 2, for comparison) is not.

The CORD node receivers are equipped with header detectors. Header detectors employ 10/90 splitters that tap off 10% of the light arriving at the node and feed it to a multichannel subcarrier multiplexed (MSCM) header receiver. No wavelength sensitive or tunable devices are needed. If two packets arrive simultaneously on the two wavelengths, the ATM payload data overlaps at baseband (in the header detector photodiode current) and is undecodable. However, the headers do not interfere with each other because they arrive on two distinct subcarriers, and can be decoded in parallel. This technique is scaleable by assigning a unique subcarrier frequency to each node.
CORD headers have been chosen to be 20 bits long, and are transmitted over a full packet time-slot. The duration of the slot is set at 250 ns, to comfortably fit an ATM cell (whose duration is 170 ns at 2.5 Gb/s) and allow enough time for guard bands and clock recovery. The header bits tell the receiving node whether an incoming packet is addressed to that node. The subcarrier frequency on which the header is encoded tells the receiving node which transmitting node sent the packet; it also identifies the wavelength on which the packet is traveling. Using this information, the receiving node sets up its receiver and the CRO if so equipped. An optical delay line is inserted after the header detector to allow time for processing of the headers and CRO set-up.

Packets have to arrive at the nodes aligned in time. The nodes have to learn about the presence of incoming packets and read their addresses in order to set up the CRO or receiver switch. Finally, bit clock recovery for both the address information and the ATM payload data must be performed on a packet-by-packet basis. The techniques we developed to achieve these goals and the testbed implementation are described in detail in the following sections.
3. CRO - Contention Resolution Optics

Contention resolution using delay lines is the key concept used by this Consortium to optically deal with fundamental resource contentions in optical (WDM or TDM) packet-switched networks without resorting to electronic conversion of the transmitted data and thus avoiding electronic bottlenecks. To show this problem and the principle of the solution, consider packets arriving at a node on multiple channels (space separated fibers, and/or WDM channels). The packets are statistically distributed in time, and can be either destined to the node or need to be switched to outgoing channels. The multiple simultaneously arriving packets give rise to contentions occurring when two or more packets are competing for the same resource. This resource can be a receiver (when packets are destined to the node), it can be an outgoing channel (when packets destined to another network node need to be switched through), or packets may require the same switching element or temporary storage regardless of their final destination. It is important to observe that in some topologies only certain types of contentions will exist. For instance, in a WDM star topology in which the number of receivers is smaller than the number of wavelengths, only receiver contentions need to be resolved at the destination node. On the other hand in a multihop topology, in addition to possible receiver contentions between packets destined to the node, packets passing through the node will contend for common outgoing links. The principle of optical contention resolution considered by the Consortium is based on the use of Switched Delay Lines which shift contending packets in space and in time, to resolve the packet overlap, i.e., contention, at all relevant optical resources. Thus the same solution, using a single hardware mechanism, can be applied not only to different types of contentions, but also to resolve a combination of different contention types occurring simultaneously at a given node. The presented approach utilizes optical 2x2 switches reconfigurable on a packet-by-packet basis, and delay lines for optically storing the contending packets in the switching process. The net result is a switched delay line strategy, which interleaves the arriving packets onto one or more contention-resolved, and highly reduced packet loss, streams. The interleaving operation is controlled by local node intelligence, driven by inband or out of band signaling.
3.1 Preprototype CRO

The Contention Resolution Optics (CRO) preprototype was developed at GTE Laboratories and delivered to Stanford University on March 6, 1995. We have been characterizing the CRO and its individual components as well as trying to integrate it into the CORD network testbed. The CRO is the optical front-end in which packet contentions are resolved optically by routing contending packets to one or two delay-lines in order to stagger their arrival at the receiver. GTE Laboratories' primary role in the CORD consortium is to develop polarization insensitive optical switches with integrated semiconductor optical amplifiers to be used in a compact, robust CRO. To aid in the development of the CORD network testbed and to provide valuable performance data to be used in the development of the integrated CRO, GTE Laboratories has developed the CRO preprototype, a discrete version of the CRO.

The preprototype CRO contains LiNbO3 switches as the switching elements. These devices have switching speeds as low as 1 ns but have the disadvantage of being polarization sensitive. Because of the switch dependence on polarization, the CRO preprototype contains polarization controllers, polarization monitors, and polarization maintaining fiber. Also, included in the CRO is an optical power splitter and photodetector used to recover the packet headers at all wavelengths. A single mode fiber delay line is used to delay the packet arrival while the control information is received and the CRO switches are configured. In summary, the components of the single-stage CRO as received from GTE included:

1. 10/90 power splitter with integrated header detector;
2. 100 m single mode fiber (SMF) delay line;
3. Wavelength Demultiplexer;
4. Two manual polarization controllers (MPC), one for each wavelength;
5. Two polarization monitors (PM), one for each wavelength;
6. Three LiNbO3 optical switches;
7. Three semiconductor optical amplifiers (SOA);
8. Two 50 m polarization maintaining fiber (PMF) delay lines.

A detailed diagram of the CRO preprototype, herein referred to simply as the CRO, is shown in Figure 3.1. A photograph of the CRO is shown in Figure 3.2. The remainder of this section contains the results of our characterization measurements of each component in the CRO preprototype, especially in regard to polarization instability or misalignment, and a discussion of the actions taken to correct these problems.

Figure 3.1 Block diagram of contention resolution optics preprototype.
3.1.1 Integrated Header Detector

The Integrated Header Detector (IHD) is the first device in the CRO. Ideally, it is made up of a 10/90 power splitter and a conventional photodetector. We characterized the IHD built at GTE and compared it with a Discrete Header Detector (DHD), that we assembled using discrete parts.

3.1.2 Single Mode Fiber Delay Line

A 100 meter Single Mode Fiber (SMF) delay line is used to delay the payload data signals 500 ns while the packet header is received, the CRO configuration is calculated, and the switches are configured. The SMF delay line is positioned before the polarization controllers so the absolute polarization of the signal in the SMF delay is not important. However, the stability of the polarization of the signal is very important since any polarization drift would be transferred directly to the input of the first CRO switch,
deteriorating its performance. While characterizing the optical switches, we noticed a problem with the stability of the incoming signal polarization and traced one source to be the SMF delay line. We measured and characterized this polarization instability.

That measurement was taken as follows. We used a DFB laser, a polarization controller, a section of single polarization (SP) fiber, the SMF delay line and an optical power meter. The experimental setup is shown in Figure 3.3. To measure the polarization drift with time, first the polarization controller was adjusted for maximum power measured at the power meter, corresponding to a linear polarization aligned to the major axis of the SP fiber. Then, the controller was adjusted for the minimum received power and the residue power level was measured over time with the polarization controller in a fixed position. The ratio of the residue power to total power is plotted as the extinction ratio in Figure 3.3.

![Experimental setup to measure polarization drift in SMF delay.](image)

There are a total of 9 sets of data plotted in Figure 3.4. The 6 curves grouped in the lower left side of the graph correspond to data taken with the 100 m SMF delay. The 3 remaining curves correspond to the identical experiment without the 100 m SMF delay. It is seen that without the delay line, the polarization extinction ratio is above 30 dB for more than 5 hours (30 dB extinction ratio corresponds to a polarization angle of less than 2 degrees). With the 100 m SMF delay, the polarization changed very rapidly and the extinction ratio degraded by more than 20 dB within 10 minutes. This level of

---

2 SP fiber is a special kind of optical fiber that guides only one of the two linear orthogonal polarizations of the fiber. Therefore, SP fibers bleed off completely one polarization and effectively behave as fiber polarizers. From our experience with the CRO we developed the conviction that bad switch extinction ratios are often due to poor polarization alignment. It would probably be recommendable that polarization sensitive devices be pigtailed with SP fiber rather than PM fiber, because this greatly reduces the negative impact of polarization misalignments in the setup.
polarization drift is not acceptable because the performance of the optical switches would degrade by approximately the same amount.

We used a HP8509B polarization analyzer to measure directly the polarization instability of the delay line that previously we had measured indirectly. In Figure 3.5 we show the plot of the Poincare' sphere trajectory spanned by the output state of polarization of the single-mode fiber delay line, over a time of 30 minutes.

![The Extinction Ratio of Polarizing Fiber Changing with Time](image)

*Figure 3.4 Extinction ratio of polarizer after SMF delay vs. time*
Figure 3.5  Plot of the trajectory spanned by the state of polarization of the light at the output of the single-mode delay-line over 30 minutes.

From this direct characterization it is quite evident that the drift due to the single-mode delay line is indeed responsible for the polarization instability that was found in the previous extinction-ratio experiment.

We used the Single-Polarization (SP) fiber which has better than 35 dB extinction ratio. As a result, stability is fully ensured and at the same time launch alignment requirements are greatly eased. A 15-degree launch misalignment has the only effect of producing a power penalty in the output light from the SP fiber of 0.3 dB, whereas the polarization state of the output stays aligned with the SP fiber axis. Therefore, no degradation of the switch extinction ratio is incurred. For comparison, if the SP fiber was not used, a 15-degree polarization misalignment would cause the switch extinction ratio to degrade to less than 12 dB.
In practice, the use of SP fiber immediately after the single-mode delay line converts polarization fluctuations into power fluctuations, in a very benign way. To check if these residual power fluctuation might be of concern, we conducted a test as follows: we used the same setup as in Figure 3.3. Then we aligned the polarization for maximum power transfer across the SP fiber and measured the power fluctuations over 2 hours. The result is shown in Figure 3.6.

As it can be seen, power fluctuations are small and very smooth in time. Such fluctuations would not impair reception at all. With reference to the schematic of Figure 3.1, it turns out that for the application of this concept to the CRO, two SP fibers are needed. These fibers should be placed at the output of the two polarization controllers. The reason for this is that the two wavelengths that are separated in the WDM demux will not in general have the same state of polarization, so that they have to first be individually polarization-aligned.

![Graph showing power fluctuations over time](image)

*Figure 3.6 Plot of the power fluctuations over time at the output of the SP fiber when the system is initially aligned for maximum power output.*
3.1.3 Wavelength Demultiplexer, Manual Polarization Controllers, and Polarization Monitors

We measured the extinction ratio of the WDM demultiplexer to be greater than 20 dB and the fiber to fiber loss of 0.7 dB at 1320 nm and 1.1 dB at 1310 nm. The WDM demux is polarization insensitive.

Manual polarization controllers (MPCs) are used in the CRO at each output of the wavelength demultiplexer in order to align the optical signal for the LiNbO$_3$ switches. In the original CRO, the MPCs were fusion spliced between the wavelength demultiplexer and the polarization monitors. Therefore, we were not able to directly measure the loss through the component. Therefore, the two polarization monitors have been removed and we can measure directly the cascade of the WDM demux and the polarization controllers. The attenuation turns out to be 1 dB on the 1320 branch and 2 dB on the 1310 branch. Therefore the attenuation of the polarization controllers alone is likely to be around 1 dB.

The semiconductor optical amplifiers (SOAs) were packaged with single mode fiber pigtails and therefore do not reliably maintain the polarization of the signal through the devices. Hence, we have inserted two additional MPCs in the CRO immediately following the SOAs to adjust the polarization before the second LiNbO$_3$ switch.

The polarization monitors were located after each one of the two MPCs that are directly connected to the wavelength demultiplexer, as indicated in Figure 3.1. The purpose of the polarization monitors was to aid in the alignment of the optical signal polarization to the polarization maintaining fiber core of the first switch pigtails. However, since we employed SP fiber to automatically guarantee that the polarization state launched into the switch pigtails is correct, the polarization monitors are no longer useful and replaced by the SP fibers.
3.1.4 Polarization Maintaining Delay-Line

In the CRO there are two 50m PM delay lines. We will call them "first" and "second" PM delay lines, in the order in which they appeared in the original CRO setup of Figure 3.1.

PM fiber preserves the input state of polarization only as long as the input light is perfectly aligned with one of the fiber principal axes. In case this requirement is not satisfied, then PM fiber is even less polarization-preserving than regular single-mode fiber, due to its intrinsic high birefringence.

We set up an experiment identical to the one shown in Figure 3.3, where instead of the 100m single-mode delay line we inserted the second 50m PM delay line. We then adjusted the MPC at the input so that maximum extinction ratio was achieved. This would theoretically ensure that the launched light is aligned with one of the two principal axes of the PM fiber, namely the one that is perfectly orthogonal to the analyzing SP fiber transmission axis, that we use as a reference. Then we recorded the evolution of the extinction ratio with time. The results are shown in Figure 3.7.
The output power of PM+PZ

![Graph of output power over time](image)

Figure 3.7 Plot of the power fluctuations over time at the output of the PM fiber delay-line in the experiment, when the system is initially aligned for maximum extinction.

The initial extinction value of over 30 dB is maintained for less than a minute. In less than 10 minutes the extinction ratio goes down to only 10 dB and then it fluctuates between these extreme values. The quasi-periodic fluctuation seen in figure is typical of PM fibers when the light is launched at an angle with respect to perfect alignment. The PM fiber misalignment that can be estimated from the plot in Figure 3.7 amounts to over 16.5 degrees. As a result, even assuming that the reference was off by 4.5 degrees, a minimum of 12 degree misalignment is still attributable to the PM fiber delay line alone.

When connecting such fiber to a switch, its performance would lead to a best case extinction ratio of -13.7 dB that most likely, due to switch imperfections, would degrade even further. This result is unacceptable for CRO operation and we decided to have this PM delay line re-connectorized.
After the reconnectionization, the worst result is shown in Figure 3.8. It is a blown-up picture of the trajectory spanned over the Poincaré sphere when stress is applied to the PM fiber. The diameter of the circle encompassing the whole trajectory is directly related to the angular alignment error. The plot shows a trajectory that is consistent with an angular error of 7 degrees, that would lead to an extinction ratio of at least 24.20 dB. The other installed connector performed even better, yielding an expected extinction ratio of 27.7 dB.

![Image](image)

**Figure 3.8** Trajectory over the Poincaré sphere of the state of polarization at the output of the PM delay line connectorized by Wave Optics, when stress is applied. Angular error and extinction ratio are shown on the left side of the window.

### 3.1.5 LiNbO₃ Optical Switches

As mentioned earlier, the LiNbO₃ switches are polarization sensitive. Therefore, the switching characteristics are dependent on the polarization of the optical signal. If a the
polarization of the signal is not correctly aligned, then cross talk through the switch will be large. The output power of the switch as a function of the control voltage is shown in Figure 3.9 for TE polarization for the 1310 nm switch input. When the switching voltage is zero, the switch is in the 'bar' state, and the 'cross' state occurs at a voltage of 3.3 V. It is seen in Figure 3.9 that the switching voltage must be accurate to within 0.3 V to achieve a 20 dB extinction ratio which is typically required to avoid interference from the other channel.

![Graph showing output power vs switching voltage for 1310 nm](image)

*Figure 3.9 Switch output power vs. control voltage.*

The output power as a function of control voltage for the 1320 nm output of the switch is shown in Figure 3.10. It can be seen that for this output, the 'cross' state occurs at a slightly higher voltage, about 3.9 V. This is unfortunate because the switching voltage cannot be controlled individually for each output. Therefore, the mid voltage, 3.6 V must be used for best operation of the switch but this corresponds to an extinction ratio of approximately 15 dB, not 20 dB. The effect of the ripple in power on the 1320 nm path can also be seen in Figure 3.10. The ripple is verified caused by the coherent crosstalk of
the WDM and first switch. Nevertheless, near 20 dB extinction ratio is still achievable with this switch.

Figure 3.10 Output power of switch vs. control voltage.

A very accurate characterization of the first switch of the CRO showed that for a unified switching voltage of 3.7 volts for both the 1310nm and 1320nm input ports, no more than 11 dB of maximum extinction ratio could in fact be obtained, and that this performance degraded with time because of the polarization instability of the single-mode delay line.

In order to fix these problem, we decided to insert short spans of SP fiber between the MPC and the first switch. In so doing, we removed the polarization monitors to avoid the heavy loss they brought about. The insertion of the SP fiber was meant to resolve the polarization instability problem. We also reconnectorized the switches. All connectors were requested as angle-polished, polarization-aligned. Polarization alignment with the
connector key ensures that the light launched from the SP fiber is correctly aligned with the PM switch pigtails. Angle polishing is for low reflections.

With all the new connectors in place, we were then able to run a full test of the whole first section of the re-assembled CRO, as depicted in Figure 3.11.

![Block diagram of the CRO section that was tested.](image)

The results at the two output ports of the switch are shown in Figure 3.12, for a unified switching voltage of 3.7 volts. It can be clearly seen that now the performance of this section of the CRO is good: at least 17 dB extinction ratio is obtained at both output ports (up from 11 dB) and this result holds stable throughout the experiment, that was run for over 1.5 hours. The small power fluctuation is due to the polarization instability of the single mode delay line at the input of the system. Thanks to the use of SP fiber, this polarization change is converted into harmless small and slow power fluctuations, as it was shown in the delay-line stand-alone test of . Extinction ratio is unaffected.
Figure 3.12 Measured switch extinction ratio at both output ports, versus time.
3.1.6 Semiconductor Optical Amplifiers (SOA)

The semiconductor optical amplifiers (SOAs) are used in the CRO after the first switch to compensate for the signal loss through the CRO. We measured the fiber to fiber gain of the SOAs at both 1310 nm and 1320 nm. They are between 10 and 11 dB for 1310 nm and approximately 9 dB for 1320 nm. The optical spectra at the output of the SOA is shown in Figure 3.13 for 1310 nm and in Figure 3.14 for 1320 nm. To verify the gain measurements made with the optical spectrum analyzer, we measured the electrical amplitude of a received modulated signal before and after the SOA. The measured gain with this method agreed with the spectrum analyzer method.

![Spectrum Chart]

Figure 3.13 Optical spectrum of output of SOA for 1310 nm.
Figure 3.14 Optical spectrum of output of SOA for 1320 nm.

The SOAs used in the CRO have similar performance for both polarizations and are therefore packaged with single mode fiber pigtails. Unfortunately, this causes some drift in the polarization of the signal through the device. Therefore, polarization controllers are necessary after the SOAs.

We used the HP8509B polarization analyzer to make sure that the amplifiers, that are polarization-independent and fitted with conventional single-mode fiber, do not cause polarization-state instability in the light that they amplify.
The result of our measurements by replacing the single-mode delay line with a semiconductor amplifier show that very good stability was achieved (see Figure 3.15). The polarization extinction ratio is better than 30 dB during 30 minute measurement.

![Image](image)

*Figure 3.15 Measured polarization stability through a semiconductor amplifier.*

3.2 Preprototype CRO Performance

Based on the results described above, regarding some of the components of the CRO after re-connectorization and the introduction of SP fibers in the system, we are able to better estimate the impact of the residual crosstalk that we expect to see across the CRO. A layout of the new version of the CRO preprototype is shown in Figure 3.16. It makes use of GEC-Marconi LiNbO3 switches and HP semiconductor optical amplifiers (SOA). The WDM demux was custom made by JDS-FITEL. Because the LiNbO3 switches are polarization-dependent, most of the fiber used in the CRO is polarization maintaining, including the delay-lines. Optical switches have 5-6 dB insertion loss, which is compensated by the SOAs (10-12 dB fiber-to-fiber gain at 1310 nm). The CRO SOAs
have some polarization dependence (3-dB) and are pigtailed with conventional single-mode fiber.

![Schematic Diagram](image here)

*Figure 3.16: Schematic of the discrete CRO preprototype. S1, S2, and S3 are LiNbO3 switches. SOA: semiconductor optical amplifier. MPC: manual polarization controller. WDM: WDM demultiplexer. SMF: single mode fiber. PMF: polarization maintaining fiber. SPF: single polarization fiber.*

The theoretical results were analyzed on crosstalk interference. The residual crosstalk from the switches can give rise to two kinds of phenomena: *incoherent crosstalk* and *coherent crosstalk*.

The incoherent kind of crosstalk takes place when the useful signal is mixed with a leakage signal having a different wavelength, or having the same wavelength but orthogonal polarization. In this case the leakage contribution is photodetected at the receiver independently of the useful signal. Crosstalk at the electrical output of the receiver occurs as the sum of the photocurrents generated by the useful signal alone plus the photocurrent generated by the leakage signal alone.

Since the useful signal gets crosstalk contributions from two consecutive switches, these add up. In the worst case, if the two crosstalk contributions at the first and second switch
are coherent between themselves, the total amount of crosstalk power is four times the individual crosstalk contribution. Note the we are still talking about incoherent crosstalk. We are assuming coherence only between the two interfering signals, but not between the interfering signals and the useful signal.

For the assumed 16 dB extinction ratio at both switches, the maximum amount of disturbance at the electrical level is going to give a 10% eye closure. In other words, if the ideal (noiseless) eye aperture of the received photocurrent is assumed to be 1, the eye aperture in the presence of the maximum amount of incoherent crosstalk would be 0.9. This in turn corresponds to a power penalty of approximately 0.5 dB (optical).

For the crosstalk to be coherent, then the leakage signal must be the same wavelength, polarization and phase as the useful interfered signal. The fact that the phase coincides is a stochastic matter, and we will just assume the worst case, i.e. perfect phase alignment. For the other two conditions to be met, the only possible signal path is: through the same input of the first switch; out through the two outputs of the first switch (one contains the useful signal and the other the crosstalk signal); in through the two inputs of the second switch; out through the same output of the second switch. For this to happen, the interfering signal experiences 16 dB extinction ratio twice, so that when it joins again with the useful signal its power level is at least 32 dB lower. However, since it interacts coherently with the useful signal, it is "amplified" through homodyning in the photodetector. The final result is that coherent crosstalk too is going to cause about 0.5 dB penalty.

The two effects together add up, and the total optical power penalty is then 1 dB, in a two-switch CRO. However, we made several worst-case assumptions as to the coherent recombination of the various crosstalk contributions, so that on average a lower penalty can be expected.

If a 13 dB switch extinction ratio is assumed, then the worst-case optical penalty amounts to approximately 2.3 dB. For a three-switch CRO, the penalty increases. With
16 dB extinction ratio at each switch, a 2.8 dB optical penalty is incurred. With 13 dB extinction ratio, however, a breakdown is incurred. If an averaged decision threshold is used (as it was assumed in all of the above calculations) the optical power penalty jumps up to 13 dB and the system clearly cannot operate anymore. If instead the decision threshold is optimized, then the penalty is 7.5 dB. Again, these are worst-case penalties, and most of the time the special coherency conditions assumed here won't happen. However, it must be understood that the system would be in very critical conditions and countermeasures should be taken.

It is therefore clear that, even though a two-switch CRO can tolerate relatively high amounts of crosstalk, a three-switch CRO is much more demanding and the absolute minimum amount of extinction ratio for all switches at any time should not be below 15-16 dB.

From these calculations it is apparent that the performance initially delivered by the switches, as they arrived at Stanford (11 dB extinction ratio, was insufficient to support system operation. This indirect evidence is also complemented by the direct BER measurements presented in the previous progress report. The remedies devised to improve switch performance were absolutely needed for a mult-stage CRO to operate.

Figure 3.17 shows the CRO optical power penalty vs. number of CRO stages. The dashed line is the theoretical power penalty using the measured device’s parameters [25]; solid lines are the experimental measurements. The fiber-to-fiber gain of the SOA is 10-12 dB, about twice the loss of the optical switches (5-6 dB). Therefore, the SOA is inserted between every other optical switch, resulting in the zigzag theoretical curve. The measured Node 2 (no CRO) receiver sensitivity, when 2.5 Gb/s ATM cells are received, is -17.4 dBm (BER = 10⁻⁹); it is limited by the receiver thermal noise. We have shown that the overall performance and scalability can be improved by using APDs [26] and/or by converting to 1.55 μm technology and replacing SOAs with EDFAs [27]. However,
APDs with sufficient bandwidth were not available at the inception of the CORD project and GTE's semiconductor optical switch process was limited to the 1.3 μm window.

![Figure 3.17: Optical power penalty vs. number of CRO stages.](image)

For the single stage CRO (two optical switches and one fiber delay line), the optical power penalties are 4.1-dB and 6.3-dB for the 1310-nm and 1320-nm, respectively. The additional 2.2 dB penalty for 1320-nm wavelength results from the smaller optical gain of the SOA at the longer wavelength (the gain peak of the SOA is at 1295 nm). Because of the achieved 17 - 19 dB extinction ratio, no observable change in BER performance was seen when the interfering channel was suppressed to remove incoherent crosstalk and/or when the fiber in the CRO branch that would carry coherent crosstalk was disconnected. For the two stage CRO (three optical switches and two fiber delay lines), the optical power penalty is 7.1-dB for the 1310-nm wavelength. Because the current CRO (including both optical switches and SOAs) is optimized for the 1310-nm wavelength, the
power penalty for the 1320-nm wavelength is more than 10 dB with the two-stage CRO. We are currently working on different CRO configurations which address this problem as explained in section 3.3.

3.3 CRO Scalability and Alternative Configuration

The current CRO configuration reduces the hardware complexity by sharing components (fiber delay lines, SOAs, and optical switches) between several wavelengths. However, it has severe limitations to scaling beyond 10 stages. More than 10 CRO stages is essential for large-scale switching applications such as optical ATM switches. The main difficulties of scaling are: (1) the wavelength-dependent gain of the SOAs (2.2 dB difference per stage); (2) different switching voltages in the 2x2 optical switches; (3) coherent and incoherent crosstalk caused by the finite extinction ratio of the optical switches; and (4) ASE noise accumulation that can saturate subsequent SOAs. Most of these difficulties can be greatly reduced by configuring the CRO to keep wavelengths spatially separate, and optimizing each individual path. However, this comes at the expense of more hardware.

One CRO configuration in which wavelengths are kept separate is shown in Figure 3.18. We will refer to this configuration as Configuration 2 and the original configuration, shown in Figure 3.18, as Configuration 1. The different wavelength channels in Configuration 2 are separated into different paths after the WDM demux. The CRO of each stage (per wavelength) consists of a 1x2 optical switch, a fiber delay-line buffer, and a 3-dB coupler, instead of the 2x2 optical switch of Configuration 1. Configuration 2 is particularly favorable to semiconductor optical amplifier switches since good 1x2 switches are much easier to fabricate than 2x2 switches. The gain difference of SOA for different wavelengths can be compensated by adjusting bias currents in each individual wavelength path.

---

3 The switching voltage difference for two input ports makes the switch extinction ratio drop from better than 20 dB (each individual port) to less than 20 dB.
Figure 3.18: Alternative CRO configuration.

Configuration 2 does not have the different switching voltage problem because only 1x2 switches are used. 25-dB extinction ratio can be achieved when the LiNbO$_3$ switch is optimized for one input port; this greatly reduces the crosstalk interference. In addition to a better extinction ratio of the 1x2 optical switch, the crosstalk between different wavelengths is greatly alleviated in Configuration 2 due to the separate wavelength paths. For crosstalk-stringent applications, the extinction ratio per stage can be increased (nearly doubled) by replacing the 3-dB coupler with a 2x1 optical switch.

One limitation on the number of CRO stages in Configuration 1 is imposed by the SOA saturation resulting from the accumulated ASE noise. Because SOAs in Configuration 1 are shared by different wavelengths, no (narrowband) optical filter can be inserted to remove the accumulated ASE noise. In Configuration 2, a narrowband optical filter can be inserted to suppress the accumulated ASE noise threatening to saturate SOAs.

Figure 3.19 depicts the total optical power (including both signal and noise) at the output of the last SOA for both configurations. Because the strong ASE noise accumulates quickly in Configuration 1, SOAs saturate much faster than in Configuration 2 when a moderate bandwidth (2 nm) optical filter is inserted periodically. For a typical SOA with 0-5 dBm output saturation power, Configuration 1 is limited to less than 10 CRO stages (5 SOAs) while Configuration 2 allows more than 30 CRO stages (15 SOAs). Figure 3.20 shows the optical power penalty for both configurations. The power penalty of
Configuration 2 is about 4 dB smaller than that of Configuration 1 because the ASE-ASE beat noise is suppressed by the optical filter.

**Figure 3.19:** Total output power of the SOA vs. number of CRO stages.

**Figure 3.20:** Power penalty vs. number of CRO stage for both Configurations 1 and 2.
4. Signaling

The packet destination address, or "header", must be read before the packet enters the CRO, so that the CRO can be properly configured. This operation must be performed simultaneously on all wavelengths without decoding the high-speed ATM payload data.

In CORD, this function is accomplished using subcarrier-multiplexing of the headers. Each node directly modulates the laser with a signal that encodes the high-speed ATM cell data at baseband, while the header is simultaneously encoded on a subcarrier. The subcarrier frequency is higher than the tails of the data spectrum, and each node uses a different, fixed subcarrier frequency (Figure 4.1).

![Baseband Signal at Node 1](image)

![Optical Signal Transmitted by Node 1](image)

![Baseband Signal at Node 2](image)

![Optical Signal Transmitted by Node 2](image)

Figure 4.1: Spectrum of MSCM signaling.
4.1 CORD Transmitter

The transmitting portion of CORD nodes is nearly identical for both nodes, except that the subcarrier frequencies are different (3 GHz for Node 1 and 3.5 GHz for Node 2). Figure 4.2 shows the transmitter block diagram including both payload data and subcarrier signaling channels. The payload data logic generates 2.5 Gb/s ATM packets (53 bytes in length, NRZ) and a 2.5 GHz clock signal synchronized with the data for the fast payload data synchronization (discussed in Section 5). A low-pass prefilt ter is used to remove the higher-order sidelobes of the payload data spectrum above 2.5 GHz; this is essential to reduce the crosstalk between the payload data and the subcarrier signaling channels [28]. The VCOs generate a subcarrier frequency at 3 GHz for Node 1 and 3.5 GHz for Node 2. The 80 Mb/s control signal logic can modulate the amplitude (ASK); frequency (CPFSK); or phase (PSK/DPSK) of the subcarrier. Because both pilot-tone and subcarrier header are narrowband signals, a microwave power combiner is used to combine them together. It is more difficult to combine the 2.5 Gb/s payload data with the microwave signals because a common microwave combiner has a low-frequency cutoff around tens of MHz, which causes problems when a long stream of 1's or 0's is transmitted. Another approach is to use a resistive coupler. With resistor couplers, there is no low-frequency cutoff but they suffer from a 6-dB power loss and poor isolation (6 dB) between two input ports, leading to crosstalk penalties.
**Figure 4.2: CORD node transmitters.**

For the CORD project, we used a directional coupler made by two cross-coupled microstrip lines as shown in Figure 4.3. With the directional coupler, the data enters at the direct port input, and passes the coupler directly without any low-frequency cutoff. The clock tone enters at the coupling port, and can be coupled to the direct port output if the coupling coefficient matches the clock frequency. A directional coupler has advantages of broad bandwidth (DC to tens of GHz) and high isolation (larger than 40 dB) between the two input ports for essentially no crosstalk. However, because of the symmetry between the direct port and the coupling port, the high frequency component of the data will actually couple to the coupling port, which causes waveform distortion as shown in Figure 4.4 (b). To minimize this distortion, we built a simple RC equalizer. The low frequency signal is attenuated by the resistor of the RC equalizer while the high frequency signal passes the capacitor without attenuation. By properly choosing the corner frequency (i.e. $1/(2\pi RC)$) the RC equalizer can compensate for the distortion caused by the directional coupler.
In Figure 4.4, BER is plotted vs. received optical power for the cases of (a) with neither the directional coupler nor the equalizer and the transmitter; (b) with the directional coupler but without the equalizer; and (c) with both the directional coupler and the equalizer. The corresponding eye-diagrams are included to illustrate how the frequency response of the directional coupler is compensated for by the equalizer. Trace (a) shows the BER vs. received optical power with neither the directional coupler nor the equalizer at the transmitter. The receiver sensitivity is -17.4 dBm ($2^{23}$-1 PN sequence is used). Trace (b) shows that the sensitivity degrades by 2 dB when the directional coupler is in place. The cause of the 2 dB penalty is apparent in the corresponding eye diagram. The waveform distortion seen in the eye diagram is imposed by the directional coupler. Trace (c) and the corresponding eye diagram show that the equalizer successfully compensates for the waveform distortion caused by the directional coupler. It is seen that with the 'equalized coupler' there is almost no penalty over the case when no coupler is used.
Figure 4.4 BER vs. received optical power for (a) Neither coupler nor equalizer at transmitter; (b) With coupler but without equalizer; and (c) With both coupler and equalizer.

4.2 CORD Receiver

The block diagram of CORD receiver is shown in Figure 4.5. It consists of two sub-receivers: one for the subcarrier control channels, and the other for the payload data channel. The received optical signal contains both wavelengths (1310 nm and 1320 nm), which contain payload data and control information from both nodes. An optical splitter is used to split the incoming light into two branches, which go to the control channel receiver and payload data receiver, separately.
4.2.1 Payload Data Receiver

The payload data receiver is a direct detection receiver. The photodetector converts the received optical signal into an electrical signal. A preamplifier boosts the signal power before splitting to ensure a good signal-to-noise ratio for both data and pilot-tone receiver. A resistive power splitter is used to divide the signal into two branches. The upper branch is for the payload data. It includes a baseband amplifier and a low-pass filter (LPF). Theoretically, a matched filter with a notch at the pilot-tone frequency will give the best performance [29]. We built a two-pole microstrip-line LPF to suppress the out-of-band noise and the pilot-tone. The 3-dB bandwidth of the LPF is 1.8 GHz with 14-dB attenuation at the pilot-tone frequency (2.5 GHz).

The lower branch is for the pilot-tone. It consists of a narrowband amplifier and a bandpass filter (BPF). The bandwidth of the BPF is chosen to be 60 MHz, a trade-off between the clock recovery time and optical power penalty [29].

4.2.2 Subcarrier Header Receiver

In the header receiver, the received optical signal is converted to an electrical signal with a single photodiode. The electrical signal is then amplified and split. Each signal is then filtered with a 200 MHz bandpass filter with the passband centered for each subcarrier channel (3 GHz and 3.5 GHz). Demodulation is performed asynchronously by a delay-and-multiply mixer followed by a LPF. The ASK, DPSK, and CPFSK demodulators are identical except for the length of the time delay. The time delay (τ) for the demodulators is: zero for ASK; one bit for DPSK; and between zero and one bit, depending on the frequency discriminating range (95 MHz in our setup, corresponding to τ = 5.25 ns), for CPFSK demodulation.
4.3 Modulation Format Comparison

To implement simple and robust MSCM signaling for a WDM optical network, we use off-the-shelf VCOs (voltage controlled oscillators) to generate subcarriers at the transmitters. An 80 Mb/s control signal modulates the amplitude (ASK); the frequency (CPFSK); or the phase (PSK/DPSK) of the subcarrier. For CPFSK modulation, the 80 Mb/s signaling modulates the VCO directly. The ASK and DPSK modulators are slightly more complicated than the CPFSK modulator. A mixer is used to upconvert the 80 Mb/s NRZ signal to the subcarrier frequency generated by the VCO. A power amplifier is used to boost the subcarrier power because the mixer has a conversion loss of 6 to 8 dB. At the subcarrier receiver, the subcarrier signaling is demodulated asynchronously, which avoids the difficulty of carrier recovery.

A main concern with implementing MSCM signaling using asynchronous demodulation is the lack of active frequency-tracking capabilities (e.g. PLL). Typical VCO frequency drift is 10 to 30 MHz (1% of the oscillating frequency [30]) at multi-GHz oscillating frequencies within normal operating temperatures. This drift can lead to performance
degradation. An additional consideration with MSCM signaling systems is bandwidth efficiency. The modulation bandwidth of the laser is limited; thus, the narrower the subcarrier signaling bandwidth, the more signaling channels can be accommodated.

The theoretical analysis of MSCM signaling in WDM networks is presented in [31]. Figure 4.6 shows the experimental subcarrier receiver sensitivity vs. subcarrier frequency for ASK, CPFSK, and DPSK. DPSK signaling has the best receiver sensitivity, -29 dBm. However, the 1-dB power penalty bandwidth of DPSK signaling is only 16 MHz. The receiver sensitivity of CPFSK signaling is -28.1 dBm and the 1-dB power penalty bandwidth is 50 MHz. The receiver sensitivity of ASK signaling is -24.2 dBm. The experimental 1-dB power penalty bandwidth of the ASK signaling is about 200 MHz - mainly limited by the filter bandwidth. Figure 4.7 show the power spectrum densities of ASK, CPFSK, and DPSK (the same as PSK) signaling, respectively. The ASK spectrum is similar to the DPSK one, except for the spike caused by the DC component in the NRZ signal. Note that the nearest sidelobes in the ASK and DPSK spectrum are only 12 dB lower than the mainlobe. In order to avoid adjacent channel interference in a multichannel signaling system, a wider channel spacing is therefore needed between subcarrier frequencies. Figure 4.7 shows that the CPFSK spectrum is better confined, which yields better spectrum efficiency. The channel spacing of ASK and DPSK signaling would need to be twice that of CPFSK signaling in order to ensure negligible performance degradation from adjacent channel interference [31]. Table 4-1 summarized the performance of different subcarrier modulation formats. Based on Table 4-1, we decided to use CPFSK signaling on CORD.
Figure 4.6: Optical sensitivity vs. subcarrier frequency for different subcarrier modulation formats.

<table>
<thead>
<tr>
<th></th>
<th>Delay (τ)</th>
<th>1-dB penalty bandwidth</th>
<th>Receiver sensitivity</th>
<th>Channel spacing</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASK</td>
<td>0</td>
<td>200 MHz</td>
<td>-24.2 dB</td>
<td>300 MHz</td>
</tr>
<tr>
<td>CPFSK</td>
<td>5.25 ns</td>
<td>50 MHz</td>
<td>-28.1 dB</td>
<td>150 MHz</td>
</tr>
<tr>
<td>DPSK</td>
<td>13.6 ns</td>
<td>16 MHz</td>
<td>-29 dB</td>
<td>300 MHz</td>
</tr>
</tbody>
</table>

Table 4-1 Performance Comparisons of Different Subcarrier Modulation Formats.
Figure 4.7 Power spectrum densities of ASK, CPFSK, and PSK/DPSK.
4.4 Power Splitting Ratio and Modulation Depth Optimization

The payload data and subcarrier control header share the power of one laser. Hence, two important parameters, the control channel modulation depth at the transmitter and the optical coupler splitting ratio at the receiver, need to be optimized to yield the optimum system. We demonstrated an experimental method to optimize these two parameters which can be used in any signaling techniques sharing the power with the payload data such as MSCM [25 - 28] or combined modulation signaling [13 - 14].

We investigated the performance of the system for different receiver power splitting ratios by using 10/90, 30/70, and 50/50 optical couplers, respectively. Figure 4.8 shows the receivers' sensitivity versus the subcarrier control channel modulation depth when the payload data and control channel are split at the receiver with a 50/50 ratio. The payload data modulation depth is one minus the control channel modulation depth. The sensitivity is defined as the optical power before the optical power splitter to achieve BER equal to $10^{-9}$ for payload data or control channel receivers. The sensitivity of the payload data receiver with no control channel transmitted is -14.5 dBm. The 3 dB difference from -17.5 dBm is caused by the 50/50 coupler at the receiver. When the control channel modulation depth equals zero, all the power is allocated for the payload data at the transmitter and no optical power is allocated for the control channel; the sensitivity of the control channel for this modulation depth is infinite. When the modulation depth of the control channel is increased, more power is allocated for the control channel at the transmitter; therefore, the sensitivity for the control channel improves, but the sensitivity for the payload data channel degrades. When the modulation depth equals one, all the power is allocated for the control channel at the transmitter. The sensitivity of the control channel with no payload data transmitted is measured to be -25 dBm, which is also 3 dB away from the previously measured -28 dBm. The solid lines in Figure 4.8 are the predicted curves under the assumption that there is no crosstalk between the control channel and the payload data channel. The experimental measurements are marked by 'x' and 'o' for comparison.
To guarantee reasonable performance for both the payload data receiver and the control channel receiver, the intersection of the two curves in Figure 4.8 should be selected for the system operating point. The modulation depth of the payload data and control channel at this point provide the best sensitivities for both receivers. The theoretically calculated intersection point corresponds to a receiver sensitivity of -13.8 dBm; the experimentally measured result is -12.8 dBm. The 1 dB difference between theory and experiment is most likely due to the crosstalk between the payload data and subcarrier control channels. The corresponding control channel modulation depth is 0.08; i.e., only 8% of the total optical power is allocated to the control channel at the transmitter.

![Graph showing Receiver sensitivity optimization in MSCM signaling using 50/50 coupler.](image)

*Figure 4.8: Receiver sensitivity optimization in MSCM signaling using 50/50 coupler.*

It is possible to improve the system performance by changing the splitting ratio at the receiver coupler so that the power at the payload data receiver is increased; this can reduce the power loss for the payload data channel at the receiver, but at the price of allocating more power for the control channel at the transmitter. This is favored to some extent since the control channel receiver has better sensitivity than the payload data receiver because of the control channel's lower bit rate.
The best performance was achieved when a 10/90 coupler is used. Figure 4.9 shows the sensitivity versus control channel modulation depth when the payload data and control channel are split with a 10/90 ratio at the receiver: 90% of the total received optical power goes to the payload data receiver and 10% of the power goes to the control channel receiver. When the control channel modulation depth equals zero, all the power is allocated for the payload data (the sensitivity is -17 dBm, with 0.5 dB penalty from the 10/90 splitter) and no optical power is allocated for the control channel (the sensitivity of the control channel for this modulation depth is infinite.) When the modulation depth of the control channel is increased, more power is allocated for the control channel at the transmitter; therefore, the sensitivity for the control channel improves, but the sensitivity for the payload data channel degrades. When the modulation depth equals one, all the power is allocated for the control channel at the transmitter. The sensitivity of the control channel with no payload data transmitted is measured to be -18 dBm, which indicates a 10 dB penalty to the control channel receiver from the 10/90 split. The intersection of the two curves in Figure 4.9 provides the optimal system operating point. The subcarrier channel modulation depth at this point provides the best sensitivities for both receivers. The optimal sensitivity is -14 dBm when \( m = 0.45 \), i.e., 45% of the power is allocated for the subcarrier control channel and 55% of the power for the payload data channel at the transmitter.

The 90/10 receiver split ratio not only provides a better receiver sensitivity, but also provides a better operating point. In the 50/50 split case, the optimal power for the control channel at the transmitter is only 8% of the total optical power, which is much smaller than the payload data. This wide variance may cause practical design problems, such as the dynamic range requirements of the amplifier, laser, and photodetector. The dynamic range of these devices must be large enough to prevent crosstalk generated by the component nonlinearities; otherwise, the control channel's performance will degrade. Also, if the control channel modulation depth is small, another design problem is the stringent requirement of the low pass prefilter at the transmitter necessary to suppress the sidelobe generated by the payload data.
4.5 Scalability of MSCM Signaling

MSCM is transparent to the data format, which gives more flexibility in terms of network control. MSCM requires only one transmitting laser per node. In addition, the microwave components for implementing the control channel are well-developed and readily available. Therefore, MSCM provides a cost-effective way of implementing control channels in a packet-switched WDM network. Figure 4.10 shows a generic block diagram of an N-node WDM passive star optical network employing MSCM signaling. Each node transmits on a unique wavelength and receives on any one wavelength, selected either by a tunable optical filter or CRO-type device.

The passive star coupler evenly distributes the power coming from the transmitters into the receivers. Therefore, the total loss at each wavelength consists of the splitting loss of the passive star coupler and the fiber attenuation. For the system to work, the following conditional must be satisfied:

\[ P_i - 10\log N - 2\alpha L \geq P_r \]

*Equation 4-1*
where $P_t$ is the transmitting power, $N$ is the number of nodes, $\alpha$ is the fiber attenuation constant, $L$ is the average distance from each node to the passive star (network radius), and $P_r$ is the receiver optical power (sensitivity) required to achieve a bit-error-rate (BER) of $10^{-9}$ for both control and payload data channels.

$P_t - P_r$ is the power budget of the system; it can be used to evaluate the maximum number of network nodes given the distance $L$. To maximize the power budget of the system, we can either minimize the receiver sensitivity $P_r$ using APD/PIN receiver or increase the transmitting power, $P_t$, using optical amplifier. We will compare the power budget and network scalability of plain receiver (PIN/PIN as discussed in Section 4.4) with APD/PIN receiver (Section 4.5.1) and SOA-preamplified receiver (Section 4.5.2).

![Block diagram of MSCM transceiver.](image)

**Figure 4.10:** Block diagram of MSCM transceiver.

### 4.5.1 APD/PIN receiver

There are two photodetectors at each node as shown in Figure 4.10. One is for the payload data receiver and the other for the subcarrier control channel. In [34] we theoretically analyzed the performance of three different receiver structures including PIN/PIN (i.e., receiver using PINs for both payload data receiver and control channel receiver.), APD/PIN (i.e., receiver using a APD for the payload data receiver and a PIN for the control channel receiver.), and APD/APD (i.e., receiver using APDs for both
payload data receiver and control channel receiver.) receivers for the payload data/subcarrier control receivers. Although a APD/APD receiver structure achieved the best performance, it requires a high speed APD running at the subcarrier frequencies (multi-GHz) which costs is much more than a high speed PIN. Moreover, the performance of the APD (control channel) receiver degrades when the number of nodes is increased [34] because of the rapidly increased shot noise. It can also be shown that a APD/PIN receiver achieves comparable performance to a APD/APD receiver for systems with 50 or more nodes [34]. For these considerations, we chose a APD/PIN receiver combination.

The experiment parameters are summarized in Table 4-2. We compare below the performance of the PIN/PIN receiver to that of the APD/PIN receiver. All sensitivities quoted correspond to a BER=10⁻⁹.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Laser optical power</td>
<td>7.8 dBm</td>
<td>Amp. noise factor</td>
<td>3 dB</td>
</tr>
<tr>
<td>APD mult. factor</td>
<td>12</td>
<td>Receiver load resist.</td>
<td>50 Ω</td>
</tr>
<tr>
<td>APD excess noise factor</td>
<td>9</td>
<td>Responsivity</td>
<td>0.6</td>
</tr>
<tr>
<td>Data bandwidth B_data</td>
<td>1.8 GHz</td>
<td>Subcar. bandwidth B_sub</td>
<td>60 MHz</td>
</tr>
</tbody>
</table>

*Table 4-2 APD/PIN receiver experiment parameters.*

The measured sensitivity of the 80 Mb/s subcarrier control channel receiver is -28 dBm since the PIN receiver used was the same as was used for the PIN/PIN receiver. The measured payload data receiver sensitivity without a subcarrier control channel receiver is -27 dBm. The 10 dB sensitivity improvement over the PIN/PIN receiver is due to the APD gain. Figure 4.11 shows the receivers' sensitivity versus the subcarrier control channel modulation depth when r is optimized as 0.5. The sensitivity of the payload data receiver when m=0 is -24 dBm. The 3 dB difference from -27 dBm is caused by the 50/50 coupler. Similarly, the sensitivity of the control channel receiver when m=1 is -25.1 dBm. The 3.1 dB difference from -28 dBm also results from the 50/50 coupler. The
optimal sensitivity of the APD/PIN receiver is -20.9 dBm when \( m = 0.49 \). The APD/PIN receiver improves the sensitivity by about 7 dB compared with the PIN/PIN receiver.

![Graph](50/50-Coupler-for-Data-(APD)/Control-Channel-Receiver)

*Figure 4.11: Optical sensitivity vs. the subcarrier modulation depth for APD/PIN receiver.*

### 4.5.2 SOA-Preamplified Receiver

There are two possible positions to locate the SOA preamplifier at the receiver. They are denoted as point A and point B in Figure 4.10. We will refer to them as configuration A and configuration B, respectively. In configuration B, both the payload data and subcarrier receiver sensitivities benefit from the optical amplification of the SOA. In addition, the tunable optical filter used for selecting the desired payload data channel removes the out of band amplifier spontaneous emission (ASE) noise generated by the SOA, which further improves the payload data subreceiver sensitivity. However, when \( N \) is large, the SOA may saturate and generate undesired crosstalk, therefore, degrading the system performance. Configuration A eliminates this multi-wavelength amplification problem because the SOA amplifies only the desired channel selected by the tunable optical filter. The experiment parameters are shown in Table 4-3.
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Laser output power</td>
<td>7.8 dBm</td>
<td>Responsivity</td>
<td>0.6 A/W</td>
</tr>
<tr>
<td>SOA small signal gain</td>
<td>14 dB</td>
<td>Receiver load resistance</td>
<td>50 Ω</td>
</tr>
<tr>
<td>SOA noise figure</td>
<td>8 dB</td>
<td>Amplifier noise figure for</td>
<td>3 dB</td>
</tr>
<tr>
<td></td>
<td></td>
<td>data channel</td>
<td></td>
</tr>
<tr>
<td>SOA output 3dB-gain compression power</td>
<td>2 dBm</td>
<td>Amplifier noise figure for</td>
<td>1 dB</td>
</tr>
<tr>
<td></td>
<td></td>
<td>subcarrier channel</td>
<td></td>
</tr>
<tr>
<td>Optical filter bandwidth</td>
<td>1.5 nm</td>
<td>Loss of tunable filter</td>
<td>2 dB</td>
</tr>
</tbody>
</table>

*Table 4-3 SOA-preamplifier receiver experiment parameters.*

**Configuration A receiver:** Figure 4.12 shows the receiver sensitivity versus the subcarrier control channel modulation depth when \( r \) is optimized as 0.5. The optimal sensitivity of this receiver is -20 dBm when \( m=0.35 \). Configuration A improves the sensitivity by 6 dB compared with the plain receiver.

![Figure 4.12 Optical sensitivity vs. the subcarrier modulation depth for SOA-preamplifier Configuration A receiver.](image)

**Configuration B receiver:** The tunable optical filter removes the excess ASE noise and, henceforth, improves the receiver sensitivity further with respect to configuration A. Figure 4.13 shows the receiver sensitivity versus the subcarrier control channel...
modulation depth when \( r \) is optimized as 0.3. The optimal sensitivity of this receiver is -24.5 dBm when \( m=0.42 \). Configuration B improves the sensitivity by 10.5 dB compared with the plain receiver.

![70/30 Coupler for Data/Control Channel Receiver](image)

**Figure 4.13** Optical sensitivity vs. the subcarrier modulation depth for SOA-preamplifier Configuration B receiver.

The power budget of the plain PIN/PIN, APD/PIN, SOA-preamplified (configuration A and configuration B) receivers are summarized in Table 4-4 by using Equation 1-1. Table 4-4 (col. 2) shows that configuration B has the largest power budget compared with other receivers. However, one potential problem of configuration B is that the SOA may saturate and generate crosstalk degradation, which imposes a maximum number of nodes for the network. Thus configuration B is better for a smaller node number but longer distance network application. The receiver sensitivities of configuration A and APD receivers are comparable and almost independent of the node number. Henceforth, these receivers are better for larger network applications. If we assume a 12 km network radius (as IBM Rainbow-Net [3]) at the 1.3 \( \mu \)m wavelength, the total nodes this power budget can support is shown in Table 4-4 (col. 4). Although the experiment is demonstrated at 1.3 \( \mu \)m wavelength, similar lasers and SOAs are available at the 1.5 \( \mu \)m wavelength. We
have calculated the expected maximum node number and distance using the same power budget assuming 0.2 dB/km fiber loss at 1.5 μm. The results are also shown in Table 4-4 (cols. 5 and 6).

<table>
<thead>
<tr>
<th>Power budget</th>
<th>Max. dist. for 12-node network (1.3 μm)*</th>
<th>Max. nodes for 12-km network (1.3 μm)*</th>
<th>Max. radius for 12-node network (1.5 μm)†</th>
<th>Max. nodes for 12-km network (1.5 μm)†</th>
</tr>
</thead>
<tbody>
<tr>
<td>Plain PIN/PIN</td>
<td>20.7 dB</td>
<td>12 km</td>
<td>24 km</td>
<td>38</td>
</tr>
<tr>
<td>Configuration A</td>
<td>27.8 dB</td>
<td>21 km</td>
<td>42 km</td>
<td>200</td>
</tr>
<tr>
<td>Configuration B</td>
<td>32.3 dB</td>
<td>26 km</td>
<td>NA</td>
<td>53 km</td>
</tr>
<tr>
<td>APD/PIN</td>
<td>28.7 dB</td>
<td>22 km</td>
<td>44 km</td>
<td>245</td>
</tr>
</tbody>
</table>

* The fiber attenuation constant is assumed to be 0.4 dB/km at 1.3 μm.

† The fiber attenuation constant is assumed to be 0.2 dB/km at 1.5 μm.

_Table 4-4 MSCM signaling scalability summary._
5. Synchronization and Timing Budget

A key challenge in optical packet switched networks is synchronization. Both network-wide synchronization of packets and bit-level synchronization or clock recovery must be performed. For the CORD network testbed, we have developed a novel Distributed Slot Synchronization (DSS) technique [32] to address the former, and two techniques to address the later: Embedded Clock Transport (ECT) [29] for the 2.5 Gb/s payload data channel; and Delay-line Phase Alignment (DPA) [18] for the 80 Mb/s control channel. The performance of each synchronization technique must be considered when designing the timing budget of the network. The latter impacts the utilization of channel capacity in the network. The synchronization techniques developed for the CORD testbed and the overall timing budget are described in the following sections.

5.1 Distributed Slot Synchronization

In the CORD testbed, as with virtually all proposed WDM packet networks, each data packet travels through the network within the bounds of an allotted time span, called time-slot. When different packets, possibly traveling on different wavelengths, arrive at an end node, or transit through a routing device (like an optical switch), their time-slots need generally to be aligned in time for proper processing to take place. To achieve network-wide time-slot alignment, the Distributed Slot Synchronization (DSS) technique has been developed. The DSS method is robust and scalable to an arbitrary number of nodes over a star network and may be adapted to ring WDM packet networks as well, with minor changes.

With the DSS technique, one network node takes on the role of the ‘master’ node. It generates a 3-bit long (37.5 ns) slot marker called a ‘ping’ which is transmitted on the 3 GHz subcarrier that the master node also uses for header transmission. The ping is sent at the beginning of each time slot. The other nodes, called ‘slave’ nodes, also send their own pings and headers every time slot, over their own unique subcarrier frequencies (Figure 5.1). Each slave node then listens to both the master ping and its own returning ping, and tries to make them arrive at its receiver simultaneously by properly altering the
time of its own ping transmission. Once the ping slots are aligned at the receiver, the pings are also aligned at the star location, since no time-skew between them can take place along the fiber connecting the star to the slave node. Since all slave nodes align their own pings to the same master ping, all slave pings are aligned to one another at the star. Therefore they all reach every node aligned in time.

![Diagram](image.png)

**Figure 5.1: Distributed slot synchronization.**

This technique is scalable to any number of WDM nodes in an optical network. All slave nodes operate independently, since they only listen to the master ping and their own. Also, any slave node can take on the role of the master node. As a result, this technique is fully distributed and fault-tolerant. In addition, the DSS technique works with any slot length and is independent of the packet payload data rate. Its flexibility and reliability make it very attractive for a wide range of optical as well as wireless packet networking applications.

A critical implementation aspect of DSS is the locking circuitry between the master ping and the returning slave ping. In principle, it could be done with a PLL deriving its error signal from a comparison of the arrival times of the two pings. However, this solution is inadequate because the delay between slave ping transmission and reception can be too long when the node is distant from the star. The delay may cause the PLL to oscillate and never properly lock. Therefore, we sought a fully digital implementation, which is conceptually similar to the SONET bit stuffing and dropping techniques.
In our implementation, the difference between the arrival time of the master and slave pings is digitally compared. When it exceeds a certain amount $T_m$, the subsequent slave ping is emitted either $T_m$ seconds early or $T_m$ seconds late, as needed to realign it with the master ping. Ideally, this ensures ping alignment with a maximum jitter of exactly $\pm T_m$ seconds. In CORD, the ping arrival-time comparison logic as well as the time-slot shrinking/widening circuitry were implemented using fast FPGAs (Field Programmable Gate Arrays) clocked at 80 Mhz.

We built two versions of the DSS circuitry, one with $T_m$ equal to 1 header bit (12.5 ns) and the other with $T_m$ equal to half a header bit (6.5 ns). The expected alignment jitter is $\pm 12.5$ and $\pm 6.5$ns, respectively for the two systems.

When an adjustment is made, the corresponding time-slot is made either $T_m$ seconds longer or $T_m$ seconds shorter. As shown in Figure 5.2, margins for shortening the slots were amply provided for both header transmission (the “elasticity bits”) and baseband ATM cell transmission (44 ns).

![Diagram of 250 ns slot with components labeled: 250 ns slot, Guard Band (2 x 22ns), ATM Packet - 53 bytes (170 ns), Clock/Cell Boundary Recovery (36ns), Elasticity Bits, Destination Address, Ping Preamble Start Bit, and Transmitted at baseband: 2.5 Gbps, Transmitted at subcarrier: 80 Mbps.]

*Figure 5.2: Composition of 250ns slot.*
The slave ping round-trip delay problem that made it impossible to use PLL's is easily solved in this digital implementation. Every time an adjustment is made, further corrections are inhibited for the amount of time needed for the Tm-shifted pings to come back to the slave node. This blanking period prevents the onset of oscillations in the digital control loop.

A fundamental limitation to the maximum round-trip delay that can be handled with this method is established by the nominal frequency tolerance of the free-running clocks that trigger ping emission. For the system to work, the amount of free-running time skew accumulated over a star round-trip between the master ping and any slave ping cannot exceed Tm. Otherwise, during the blanking period after an adjustment, more skew is accumulated than can be corrected after the blanking period is over, and the system simply does not lock.

In Figure 5.3 we show a plot of the maximum distance from the star as a function of the maximum clock tolerance in the network. Note that this limitation is independent of the slot size. In CORD we bought off-the-shelf inexpensive quartz oscillators that proved to be within 20 ppm of each other. The resulting maximum network size, for Tm equal to 1 and 0.5 header bits, is 24 and 12 km, respectively.

Finally, a simple state machine was added to the control logic ensuring that adjustments take place only when at least 5 slots in a row are either late or early in excess of Tm. This was done to prevent adjustments from being triggered by isolated noise spikes causing a single ping to be erroneously detected off-synch.
Figure 5.3: Maximum node distance from star vs. clock tolerance.

We conducted experiments with the slave node connected to the star through 12 Km of fiber, equivalent to a node positioned 6 Km from the star for a 12 Km round-trip (Figure 5.4). The subcarrier power impinging on the header receiver photodiode was set at -26 dBm corresponding to a $10^{-9}$ BER for the header bits that are encoded on the same subcarrier (see Section 2).
Figure 5.4: Experimental setup for ping jitter measurements.

The slot synchronization results for Tm equal to one header bit are shown in Figure 5.5. The top trace is the transmitted master ping, which triggered the oscilloscope; the bottom trace is the transmitted slave ping. The pings are not aligned to one another at the transmitter; they are aligned at the receivers after traveling through different paths to the star coupler. The plots were accumulated over several minutes and show the maximum slave ping peak-to-peak jitter (with respect to the master ping) of ±12.5 ns.
Figure 5.5: Jitter of slave ping with respect to master ping with $T_m = 12.5\text{ns}$.

This synchronization technique proved extremely reliable in terms of keeping the lock even at a low received optical power. In Figure 5.6, we show that the jitter, measured over 3-minute runs, gracefully degrades as optical power is reduced. At -30 dBm, the header bit error probability is $10^{-4}$, but the time-slot alignment jitter has only increased to $\pm 29\ \text{ns}$. Lock is lost only at an extremely low received power (-34 dBm). This remarkable performance ensures that vital network functions, like synchronization, are maintained even in the case of degradation of some network elements resulting in a very low received power, to the great benefit of network reliability and recoverability.
Figure 5.6: Maximum jitter vs. received optical power for Tm = 1 bit.

For the Tm equal to half a bit circuitry, we changed the phase of the slave transmitter clock by $180^\circ$ to stretch the slot by half a bit. To drop a half bit, the clock phase was changed and a bit was dropped from the slot at the same time, producing the desired net result. A 100 Mhz low pass filter was built added to the clock input to remove higher frequency terms during clock phase changes. The experimental results over 12 km of fiber are shown in Figure 5.7. for Tm equal to half a header bit (6.25 ns). The plot was accumulated over several minutes and shows the maximum slave ping peak-to-peak jitter (with respect to the master ping) of $\pm 6.5$ ns, 0.25 ns larger than expected. This can be attributed to the combined drift in frequencies of the master and slave oscillators and noise in the clock signals of the master and slave transmitters.
Figure 5.7: Jitter of slave ping with respect to master ping for $T_m = 6.25$ ns.

5.2 Delay-line Phase Alignment

In CORD, packet headers are transmitted immediately after the ping signal on the node’s unique subcarrier frequency at a bit rate of 80 Mb/s. A header is sent during each time slot: if no ATM cell is transmitted on the payload data channel during the slot, then a dummy header address indicating an empty slot is sent.

Since the header receiver is continuously monitoring all subcarrier channels at all times, clock recovery of the header data stream could be performed using conventional PLL techniques. However, PLL’s can not solve the problem of synchronizing the receiving node internal 80 MHz clock with the recovered 80 MHz. Higher internal clock rates or special asynchronous circuitry would be needed to perform this further task. In addition, PLL’s are too slow for slot-by-slot clock recovery, and therefore this approach fails to
address the more general problem of fast clock synchronization in optical packet networks, in which packets from different nodes may arrive at a receiver in adjacent time slots with no clock frequency or phase consistency between them.

For the CORD project, a novel ultra-fast clock recovery technique was developed which enables header clock recovery on a slot-by-slot basis; we named our technique Delay-line Phase Alignment (DPA) [18]. With DPA, the incoming bit stream is delayed to align it to the receiver clock. Multi-tap delay lines with 2ns tap spacing are used to allow digital control of the length of delay. A block diagram of the DPA circuit is shown in Figure 5.8. The DPA concept assumes that the remote node transmitter clock and the receiving node clock do not drift substantially with respect to each other over the time-length of a single slot, a requirement that is satisfied even by the most inexpensive quartz oscillators. Therefore, only the relative phase of the transmission and reception clocks needs to be aligned.

![Figure 5.8: Delay-line Phase Alignment (DPA).](image)

Immediately after the ping, a four-bit preamble sequence (0101) is transmitted. The DPA circuit detects the bit transitions (i.e., zero-crossings) in the preamble. The best sampling time for header reception is half a bit after the bit transition time and the DPA circuit selects the tap which time-shifts the header for best reception.
The DPA uses multi-tap digital delay lines with 2 ns tap-to-tap delay. It is a very short delay, and few commercial devices meet this specification. We purchased two different models of such delay lines from two different manufacturers, and while characterizing them, found a high variance in the tap-to-tap delay both between devices and between taps of the same device. This result of this variance is non-uniform parallel oversampling. The outputs of a typical delay line are shown in Figure 5.9.

Figure 5.9: Output of the taps of a multi-tap delay line.

In addition to the variance in delays, the delay lines distort the waveform of the signals at the tap outputs, stretching the 'HIGH' phase and shrinking the 'LOW'. An example of this is shown in Figure 5.10.

68
Figure 5.10: Waveform distortion on the outputs of a delay line.

This is equivalent to a jitter in the position of the preamble '1-to-0' and '0-to-1' transitions, resulting in early '0-to-1' transitions and late '1-to-0' transitions. These two anomalies in the delay lines can potentially degrade the performance of the DPA, in terms of bit-error-rate. To minimize the effect of the non-uniform parallel oversampling, we characterized the delay lines that we purchased and selected the ones with the lowest variance of tap-to-tap delay.

The effect of the jitter of the transition positions, caused by the distortion of the waveform on the delay line taps outputs, is automatically minimized by the DPA technique. During the preamble, the circuit averages the positions of the transitions over an even number of transitions, so the stretching and shrinking of the waveform cancels itself and the result of the synchronization process is the optimum sampling point. Figure 5.11 shows the averaging result.
Figure 5.11: Averaging transitions in a distorted waveform.

Although the averaging yields the optimal sampling point, the waveform distortion results in unbalanced margin for time jitter in the following header bits to be sampled, as shown in Figure 5.12, therefore degrading the performance.

Figure 5.12: Different sampling margins in distorted waveform.

A critical aspect of this technique is the algorithm which is used during the preamble to estimate where in time the bit transitions occur. In theory, a 1-bit preamble, i.e., observing a single transition, would be sufficient. However, due to noise, transitions can be substantially jittery so that making the estimate on only one transition may lead to substantial errors. The current CORD implementation allows the DPA circuit to use a one, two or four-bit preamble and therefore it can look at up to four consecutive transitions to estimate the average zero-crossing time. Averaging effectively reduces the jitter. Provisions were also made to reject possible glitches and inconsistencies in the
received preamble pattern. Another important design parameter is how finely spaced the delay-line taps are, which sets the resolution for the transition time estimate or effectively the rate of oversampling. Currently, the spacing is 2 ns, i.e., six taps per bit, which was selected by trading-off between the added precision of additional taps and the increased circuit complexity.

We performed theoretical analysis of an optical packet-switched receiver employing our DPA. Figure 5.13 shows the BER performance of the DPA with six sampling taps for different lengths of the preamble sequence. The ideal curve assumes there is no penalty in clock recovery and represents the baseline performance. A longer preamble sequence helps to reduce the impact of noise and waveform distortion. The SNR penalty with respect to a perfect synchronized receiver is less than 3 dB if four preamble bits are used. The curves seems to indicate that averaging over two to four detected transitions is a good compromise between the system performance and the complexity of the circuitry. When the optical receiver is thermal noise-limited, the optical power penalty for 2- and 4-bit preambles is 2 dB and 1 dB, respectively (1 dB optical power penalty corresponds to 2 dB SNR penalty).
Figure 5.13: Theoretical performance of the DPA using different preamble lengths.

We investigated the performance of the DPA experimentally in the CORD testbed by transmitting the packets through laser, optical fiber, photodetector and demodulator. The BER of the DPA-recovered channel is shown in Figure 5.14. The ideal curve has been measured by connecting the clock directly from the transmitter to the receiver while the DPA BER has been measured with 1, 2 and 4 bit preamble lengths. Longer preambles have not been considered because of the increased complexity of the required logic and the small improvement versus the 4-bit preamble (Figure 5.13).

Our theoretical analysis and experimental measurements suggest the following conclusions: (1) the prototype performance agrees very well with theoretical predictions (compare Figure 5.14 with Figure 5.13); (2) The 4-bit DPA has only 1 dB penalty versus the ideal receiver, and the 2-bit DPA has only 0.8 dB additional penalty; (3) The DPA performs a clock acquisition from scratch in 50 ns (4 preamble bits) or 25 ns (2 preamble bits), outperforming all other clock recovery devices. The better performance of the 4-bit
DPA versus the 2-bit is due to the fact that it averages over 4 preamble bits instead of 2, therefore reducing the effect of noise and jitter.

In summary, our DPA provides ultra-fast clock alignment and has been proved to be effective and reliable. These features make it attractive for use in CORD and in other all-optical network architectures. The speed of the DPA circuitry is currently limited to 80 Mb/s, due to its FPGA-based implementation. Stanford University is now attempting to integrate the DPA in a custom CMOS chip, with the objective of upgrading its speed to 2.5 Gb/s [17].

![Graph showing BER vs. SNR](image)

*Figure 5.14: Experimental performance of the DPA with different preamble lengths.*

### 5.3 Embedded Clock Transport

The CORD slot-time payload signal is an ATM cell transmitted at 2.5 Gb/s. For payload reception, two circumstances make slot-by-slot clock recovery necessary: (a) there is no
synchronization among transmission clocks of different nodes; (b) packets are detected after possibly going through the CRO, which alters the phase relationship even of cells coming from the same transmitter. PLL-based clock recovery techniques are therefore inadequate because they are too slow.

In the CORD testbed, a 2.5 GHz clock synchronized with the payload data is transmitted along with the payload data; in the receiver, payload clock recovery is performed by extracting a clock tone. The clock tone falls exactly at a notch of the payload transmission spectrum; therefore, very small interference from the payload signal is incurred.

A number of trade-offs need to be carefully balanced to optimize the acquisition time and the optical power penalty [29]. The narrower the filter, the longer it takes for the pilot-tone to be recovered satisfactorily at the receiver. On the other hand, a wide filter bandwidth would allow for fast clock recovery. However, a wide filter would also increase the amount of data signal interference and noise affecting the recovered clock. This in turn would cause extra clock jitter and therefore a higher BER. To overcome the extra jitter, more optical power would have to be devoted to the clock tone transmission, at the expense of some system penalty. We selected a 60 MHz-wide BPF for CORD for fast clock recovery with a small system penalty. With the 60 MHz BPF, the 2.5 Gb/s clock is recovered in just 40 bits (16 ns) [33]. Before the decision circuit, the payload signal is passed through a band-reject filter to suppress the clock tone. The recovered clock output of the bandpass filter is used to clock the payload data receiver decision circuit.

5.3.1 Performance Analysis

The block diagram of the transmitter and receiver structures for the embedded clock transport are shown in Figure 5.15. At the transmitter side, the clock generator drives the data logic to generate the baseband payload data packet. The same clock signal is combined with the payload by a power combiner (usually a resistive coupler or a hybrid coupler). To guarantee the correct phase relation between payload data and clock signal, a
phase adjuster (delay line) can be used to adjust the clock phase. The combined payload data with clock signal is amplified and sent to the laser diode. The laser diode is biased above the threshold to avoid the crosstalk generated by nonlinear clipping.

Figure 5.15: Block diagram of the transmitter and receiver.

At the receiver, only one photodetector is needed to convert both the payload data and clock signal into electrical current and then into voltage signal. The voltage signal is split into two branches. The lower branch goes to a narrow bandpass filter for the clock recovery. The output clock is no longer a perfect sinusoid due to some noise from the channel and receiver circuitry as well as the crosstalk from the payload data. The noise is converted into phase jitter during the decision process. At the upper branch, the signal is sent into a lowpass filter to filter out the payload data and suppress the noise as well as the clock signal. At the decision circuit, the clock signal with some jitter is used to sample the payload data. The ultimate performance is determined by two factors: phase jitter of the clock signal and the amplitude noise of the payload data.
5.3.1.1 The Theory

The modulating current of the laser at the transmitter can be represented as:

\[ I_r(t) = I_d p(t) + I_c \frac{1}{2} \cos(2\pi f_o t) + I_{\text{threshold}} \]  \hspace{1cm} (1)

where \( I_{\text{threshold}} \) is the threshold current of the laser, and \( f_o = 1/T \) (T is the bit duration) is the embedded clock frequency to be transmitted; for simplicity, we assume the payload data as square waveform:

\[ p(t) = \begin{cases} 0 & \text{when } 0^n \text{ is transm} \\ 1 & (n-1)T \leq t \leq nT \text{ when } 1^n \text{ is transm} \end{cases} \]  \hspace{1cm} (2)

At the receiver side, the average signal power (not including noise) can be expressed as:

\[ P_{r-signal} = \frac{I_d^2 + 2I_c}{2} \eta_{\text{laser}} L \]  \hspace{1cm} (3)

where \( \eta_{\text{laser}} \) is the slope of the I-P curve of the laser, which characterizes the efficiency of the laser. \( L \) is a lumped factor including the combined loss of fiber, splicer, splitter, and the gain of optical amplifier.

At the receiver, the photodetector will convert the optical signal into electrical current as:

\[ I_r(t) = (I_d p(t) + I_c \frac{1}{2} \cos(2\pi f_o t)) \eta_{\text{laser}} L R_{sp} + N(t) \]  \hspace{1cm} (4)

where \( R_{sp} \) is the responsivity of the photodetector. \( N(t) \) is the combined noise which includes the noise of optical amplifier, shot noise, thermal noise, circuit noise, and dark current noise of the photodetector.

In general, it is reasonable to model the overall noise spectrum as white Gaussian noise. The current signal will be converted into voltage signal typically by a resistor or transimpedance amplifier. If we assume the power spectrum density of the current noise \( N(t) \) is \( N_0 / 2r^2 \), the power spectrum density of the voltage noise is then normalized to \( N_0 / 2 \). The voltage signal is split into two branches to the clock filter and data filter, respectively.

76
5.3.1.2 Clock Filter

A narrow band-pass filter is used to filter out the embedded clock tone. Here we assume that there is no reflected signal from the filter to cause extra penalty. In a real system, a broadband splitter (e.g. resistive splitter) and isolator usually is needed for this purpose. The center frequency of the clock filter is located at $f_o$ with the equivalent noise bandwidth $B_{\text{eff}}$. We denote the impulse response of the band-pass filter by $h_{\text{BPF}}(\tau)$. The output signal is the convolution of the input signal with the impulse response of the band-pass filter:

$$ r(t) = r_I(\tau) * h_{\text{BPF}}(\tau) = A_c \cos(2\pi f_o t) + rN(\tau) * h_{\text{BPF}}(\tau) + A_d P(\tau) * h_{\text{BPF}}(\tau) $$

(5)

where $A_c = I_c r$ and $A_d = I_d r$ are the voltage signals of the clock and the payload data.

The first term of eqn (5) is the desired clock signal. The second term is the filtered noise term. The third term is the crosstalk generated by the payload data. The contribution of the third (crosstalk) term is usually negligible when the bandwidth of the filter is narrow. It is because there is a notch of the power spectrum density of payload data at $f_o$. Therefore, to the first order approximation, we can neglect the crosstalk term if the bandpass filter is narrow. In Section 5.3.1.3.3, we will include the effect of the crosstalk term. By neglecting the crosstalk term temporarily, we can approximate eqn (5) as:

$$ r(t) = [A_c + n_e(t)] \cos(2\pi f_o t) + n_q(t) \sin(2\pi f_o t) = a(t) \cos(2\pi f_o t + \phi(t)) $$

(6)

$a(t)$ is the amplitude of the output signal from the bandpass filter. $\phi(t)$ is the phase of the output signal from the bandpass filter. Its probability density function is:

$$ P_\phi(\phi) = \frac{1}{2\pi} \exp[-SNR_c \phi^2] = \frac{1}{2\pi} \exp[-SNR_c \phi^2] \cos(\phi) \text{erfc}(-\sqrt{SNR_c} \cos(\phi)) $$

for $-\pi \leq \phi \leq \pi$

(7)

where

$$ SNR_c = \frac{A^2_c}{2\sigma^2_{\text{noise}}} = \frac{A^2_c}{2N_c B_{\text{eff}}} $$

(8)

is defined as the signal-to-noise ratio of the clock signal.
The phase jitter described can be approximated by a Gaussian distribution. The in-phase noise corresponds to the amplitude variation and the quadrature-phase noise corresponds to the phase variation. Assume the $SNR_c$ is high enough, then the small phase variation $\phi(t)$ can be approximated by $\frac{n_q(t)}{A_c}$. Since $n_q(t)$ is a Gaussian noise, $\frac{n_q(t)}{A_c}$ is also a Gaussian noise with standard deviation $\sigma_\phi = \sqrt{\frac{1}{2SNR_c}}$. The probability density function of $\phi(t)$ can therefore be approximated as:

$$P_\phi(\phi) = \frac{SNR_c}{\pi} \exp(-SNR_c\phi^2) \quad \text{for} \quad -\infty \leq \phi \leq \infty$$

(9)

![Figure 5.16: The standard deviation of phase jitter vs. SNR.](image)

Figure 5.16 shows the standard deviation versus $SNR_c$ for the exact phase distribution and the Gaussian Distribution. It suggests that the Gaussian distribution gives a good
estimation of the standard deviation of the phase jitter when the $SNR_c$ is larger than 10 dB. (The error is about 3% when $SNR_c$ is 10 dB.) Figure 5.17 shows the probability density function of both Gaussian distribution and the exact phase distribution when the $SNR_c$ equal 10 dB and 13 dB. The two distributions match very well (in fact they are overlapping) over most of the region. It seems that the Gaussian distribution can be used to approximate the exact phase distribution for the system performance calculation (e.g. BER). This is, however, incorrect since most error events occur when $\phi$ is large, which will be discussed in Section III. In other words, the tail distribution is more important for the calculation of BER. The enlarged plot in Figure 5.16 shows the tail distributions of the Gaussian and the exact phase distribution for $SNR_c$ equal to 13 dB. It shows that the Gaussian distribution is too optimistic compared with the exact phase distribution.

Figure 5.17: The P.D.F. of the phase distribution of the clock signal.
In short, after the clock filter the clock signal is recovered but the noise is converted into phase jittering of the clock signal. It will degrade the decision process and cause more errors as discussed in Section 5.3.1.3.1.

5.3.1.2.1 Data Filter

At the payload data filter, a low pass filter is used to remove the clock tone and the excess noise. Here we assume the impulse response of the LPF filter is given by:

$$h_{LPF} (t) = \begin{cases} 
1 & 0 \leq t \leq T \\
0 & \text{other} 
\end{cases}$$  \hspace{1cm} (10)

The choice of $h_{LPF} (t)$ is twofold: 1) It is a match filter for the payload data. Therefore it gives the maximum signal to noise ratio at the decision circuit. 2) The frequency domain spectrum of $h(t)$ is a sinc function with a notch at $f_p$, therefore completely removing the clock signal. This can also be figured out in the time domain. The convolution of clock signal and $h_{LPF} (t)$ is easily shown to be zero since the period of the clock is $T$. Therefore, after the low-pass filter we can express the output signal as:

$$s(t) = r(t) * h_{LPF} (t) = A_d P(t) * h_{LPF} (t) + rN(t) * h_{LPF} (t)$$  \hspace{1cm} (11)

If a perfect clock signal (without offset and jittering) is available to sample the signal, the optimum BER is:

$$BER = Q \left( \sqrt{SNR_d} \right)$$  \hspace{1cm} (12)

where $SNR_d = \frac{A^2_d T}{2N_0}$ defines the signal to noise ratio of the payload data.

5.3.1.2.2 Decision Circuit Without Phase Offset

At the decision circuit, the clock signal with some phase jitter from the clock filter is used to sample the payload data (with noise) from the data filter. Assume there is no phase offset in the phase jitter (i.e. the mean phase jitter is zero); then the bit error rate can be expressed as:

80
When SNR increases, the phase jittering will decrease. The probability density function of the phase distribution will also approach a delta function. The BER should reduce to the match filter case. This can be easily verified as:

\[ BER = \frac{1}{2} \int_{-\pi}^{\pi} Q(\sqrt{SNR_d} (\theta - \frac{\phi}{\pi}) \delta(\phi)) d\phi + Q(\sqrt{SNR_d}) \]

when \( SNR \to \infty \) (14)

5.3.1.2.3 Decision Circuit With Phase Offset

The analysis of the previous section assumed no constant phase offset at the decision circuit, that is, the mean of the phase jitter is zero \( (\mathbb{E}\phi = 0) \). When a constant phase offset exists, the BER formula can be obtained by changing the mean of the probability density function of the phase distribution. Therefore, it can be written as:

\[ BER = \frac{1}{2} \int_{-\pi}^{\pi} Q(\sqrt{SNR_d} (\theta - \frac{\phi}{\pi}) p_\phi(\phi - \theta)) d\phi + Q(\sqrt{SNR_d}) \]

where \( \mathbb{E}\phi = \theta \), is the mean of the phase distribution. (15)

When \( SNR \) is large, the formula can be approximated by:

\[ BER = \frac{1}{2} Q(\sqrt{SNR_d} (\theta - \frac{\theta}{\pi}) \delta(\phi)) + Q(\sqrt{SNR_d}) \]

when \( SNR \to \infty \) (16)

Therefore, some penalty exists compared with the optimal receiver due to the imperfect sampling position.

5.3.1.3 Performance Analysis and Design Consideration

5.3.1.3.1 Bit Error Rate (BER)

The ultimate performance measure of a digital system is the bit-error-rate (BER). Figure 5.18. shows some of the numerical results of BER versus SNRd for different SNR by
Gaussian distribution and the exact phase distribution. When the $SNR_c$ is large, both Gaussian distribution and exact phase distribution converge into the jitter-free optimal receiver as eqn (14). We will use this system as a baseline for the power penalty calculation in next section. When the $SNR_c$ decreases, the BER increases. It is due to the phase jittering in the recovered clock signal. The Gaussian distribution is seen to underestimate the BER when the $SNR_c$ is small. This is due to the difference in tail distributions between Gaussian distribution and the exact phase distribution explained in Section 5.3.1.2. The most interesting result is the existence of BER floor when the $SNR_c$ is finite. That is, no matter how much power increases in the payload data (SNR_d), the bit error rate cannot be improved further. The existence of the BER floor is due to the noise caused by the phase jitter. It can be explained by looking at eqn (7) and eqn (13). Eqn (7) shows that, if the $SNR_c$ is finite, there is finite probability of the phase $\phi$ distributed near the region of $\pi$ and -$\pi$. Eqn (13) shows that when $\phi$ approaches $\pi$ or -$\pi$, the BER cannot improve much. Although the BER floor exists for any finite $SNR_c$, the floor can be decreased by increasing the $SNR_c$. For example, with $SNR_c$ equal to 10 dB, the BER floor is about $10^{-7}$. When the $SNR_c$ increases to 13 dB, the BER floor decreases to $10^{-12}$. For a system designer, the design issue is therefore to use the minimum $SNR_c$ to achieve BER floor less than the specification, rather than avoiding the BER floor.
Figure 5.18: BER vs. SNRd for different SNRc.

5.3.1.3.2 Power Penalty

The power penalty is defined as the extra power needed to achieve the same bit error rate as the optimal system. In our case, the optimal system is defined as a system with a perfect clock signal (no phase offset and jittering). The BER of the optimal system is then the same as the curve in Figure 3.1.2.1.1 when $SNR_c$ approach infinity. The power penalty can be calculated numerically by using eqn (7) and eqn (13).

Now we investigate the optical power penalty of the embedded clock transport system. The optical power penalty is defined as:

$$\text{optical power penalty} = 10 \log \frac{P'_r}{P_{r\text{-signal}}} = 10 \log \frac{I'_d + 2I'_c}{I_d} = 10 \log \frac{A'_d + 2A'_c}{A_d}$$ (17)
The $A_d (I_d)$ in the denominator is the optical power needed for the payload data in the optimal system (we assume there is no power needed for the clock). The $A'_d (I'_d)$ and $A'_c (I'_c)$ are the optical power for the payload data and clock signal, respectively, in the embedded clock transport system.

Eqn (17) can be reduced to:

$$\text{optical power penalty} = 10 \log_{10} \frac{\sqrt{98 \times 10^2} + 2 \sqrt{2 \beta B_{eff} T}}{\sqrt{98}}$$

where $\beta = SNR'_c$ and $\alpha$ is the power penalty (in dB) due to finite $SNR'_c$.

The first term in the numerator stands for the optical power increase ($\alpha > 0$) of the payload data needed to compensate the power penalty due to the finite $SNR'_c$ in the embedded clock transport system. The second term stands for the optical power needed for the clock signal to achieve the corresponding $SNR'_c$. Increasing $\beta$ ($SNR'_c$) will reduce $\alpha$ and vice versa; thus, there is a trade-off relationship between $\alpha$ (the power penalty) and $\beta$ ($SNR'_c$). For each $B_{eff} T$ (normalized clock filter bandwidth), we can therefore minimize the optical power penalty by optimizing the tradeoff between $\alpha$ and $\beta$.

Another way of looking at eqn (18) is to fix $\beta$ ($SNR'_c$) and plot the optical power penalty versus $B_{eff} T$ (normalized clock filter bandwidth). Fixing $\beta$ also means fixing the phase distribution according to eqn (7). Figure 5.19 shows the optical power penalty versus normalized clock filter bandwidth when $SNR'_c = 13.08$ dB and $SNR'_c = 22$ dB. Since $SNR'_c$ is fixed, when $B_{eff} T$ reduces, less optical power is needed for the clock signal to keep the same $SNR'_c$. The optical power penalty caused by the clock signal is therefore reduce. When $B_{eff} T$ approaches zero, the only optical power penalty is the increased optical power of the payload data to compensate the electrical power penalty ($\alpha$) due to the finite $SNR'_c$, which is:

$$10 \log_{10} \frac{\sqrt{98 \times 10^2} + \sqrt{2 \beta B_{eff} T}}{\sqrt{98}} \rightarrow \frac{\alpha}{2} \text{ (dB)} \quad \text{when} \ B_{eff} T \rightarrow 0$$

84
When $SNR_c'$ is very small, the dominant optical power penalty comes from $\alpha/2$ (the first term in eqn (32)). Therefore, high $SNR_c'$ design (low phase jittering) will give better performance. However, when the clock filter bandwidth increases, higher $SNR_c'$ design means lots of optical power needed for the clock signal to combat the increasing noise. It will often become the dominant power penalty. Therefore, lower $SNR_c'$ (i.e. larger phase jitter) can reduce the optical power for the clock signal. The price is paid by the higher $\alpha$. It means more optical power is needed for the payload data to combat the increasing phase jitter. As long as the optical power reduction from the clock signal is bigger than the optical power increase from the payload data, the total optical power penalty can be decreased. From Figure 5.19 we know the optical power penalty is minimized by operating at the combined lower curves formed by the two curves. The optimal optical power penalty can be obtained by plotting all possible $(a,\beta)$ and taking the lower envelope of the curve. The result is shown in Figure 5.20. It shows the power penalty is less than 1.5 dB when $B_{eff}T$ is less than 0.05.
Figure 5.19: Power penalty vs. normalized clock filter bandwidth.

Figure 5.20 seems to suggest that narrow bandwidth filter is better since the optical power penalty is small. Especially, when $B_{\text{eff}}T$ goes to zero the penalty is zero. This is not true in a real system due to the following two reasons. First, the assumption of a perfect clock source is not valid in the real world. The clock usually contains some intrinsic phase noise due to the finite Q factor of the resonator at the clock generator. Therefore, the clock has some finite linewidth. The clock filter's bandwidth must be several times larger than this linewidth to avoid a substantial penalty. Besides the phase noise, all clock generators have some frequency dither due to the variation of temperature and other factors. The bandwidth of the clock filter must be wide enough to accommodate these factors. It is also known in the filter design that the narrower the bandwidth, the larger the insertion loss due to the manufacturing limitations, which also limits the bandwidth of the clock filter. The second and maybe more important reason in packet switching network is that the clock recovery time is inversely proportional to the
bandwidth of the filter. For faster clock recovery, we need to choose wider bandwidth of the clock filter. The relationship between clock recovery time and the bandwidth of the clock filter will be discussed in Section 5.3.1.4.

5.3.1.3.3 Crosstalk Estimation

Up to now we have not considered the crosstalk generated by payload data into clock receiver (the last term of eqn (5)). It is a reasonable approximation as long as the clock filter bandwidth is small enough. In this section, we will estimate the effect of the crosstalk generated by the payload data. As an approximation, we assume the crosstalk can be modeled as an equivalent noise. The new optical power penalty with the payload data crosstalk can therefore be written as:

$$
\text{optical power penalty} = 10 \log_{10} \left[ \frac{A_{d}' + 2A_{d}'}{A_{d}} \right] = 10 \log_{10} \left[ \frac{\sqrt{98 \times 10^{60}} + 2 \sqrt{2 \beta B_{eff} T + \frac{9 \alpha B^3}{3} \times 10^{60}} (B_{eff} T)^3}{\sqrt{98}} \right]
$$

for \( B_{eff} T < 0.2 \) \hspace{1cm} (20)

The first term in the numerator of eqn (20) is still the optical power of the payload data to compensate the penalty caused by the finite \( SNR' \). The second term is the power needed for the transmission of clock signal. However, there are two items compared with eqn (18), the first one in the square root is the optical power needed to combat the noise, and the second one is the optical power needed to combat the crosstalk of the payload data. Figure 10 shows the optical power penalty versus \( B_{eff} T \) after doing the optimization as discussed in the previous section. Both with payload data crosstalk (eqn(18)) and without crosstalk (eqn(20)) are plotted. It can be seen that the effect of the payload data crosstalk is less than 1 dB for \( B_{eff} T \) less than 0.2. It is almost negligible when \( B_{eff} T \) less than 0.05 (it is the normal operation range).
Figure 5.20: The optical power penalty vs. normalized clock filter bandwidth.

5.3.1.4 The Clock Recovery Time Analysis

One potential advantage of using embedded clock transport in packet switching network is the short clock recovery time. In the previous power penalty analysis, we assumed that the clock signal has reached steady state and ignored the transient behavior caused by the finite packet duration Tp. In this section, we focus on the transient behavior, which determines the clock recovery time. When a packetized clock waveform enters a bandpass filter, the output waveform is the convolution of the input waveform with the impulse response of the bandpass filter. The clock signal cannot be used for decision until it reach the "steady state". Therefore, the clock recovery time depends on the impulse response of the filter as well as the definition of the steady state. Figure 5.21 shows the input and output packetized clock waveforms with a second order bandpass Butterworth filter.
Assume the bandpass clock filter (centered at $f_c$) is given by

$$h_{BPP}(\nu) = 2h_{eq}(\nu) \cos2\pi f_c t$$

(21)

where $h_{eq}(\nu)$ is the equivalent low pass filter. The convolution of packetized clock signal and the bandpass filter can therefore be approximated by

$$h_{BPP}(\nu) * [A_c \cos2\pi f_o t] \Pi \left( \frac{t}{T_p} \right) \approx [2h_{eq}(\nu) \cos2\pi f_c t] * [A_c \cos2\pi f_o t] \Pi \left( \frac{t}{T_p} \right)$$

$$= [A_c h_{eq}(\nu) * \Pi \left( \frac{t}{T_p} \right) \cos2\pi f_o t]$$

(22)

Eqn (22) shows how to calculate the transient behavior at the output of clock filter. The first term (in bracket) represents the envelope of the output clock signal and the second term is the actual clock signal. The envelope term determines how long it takes for the clock signal to reach the steady state. Thus, the clock filter will add its contribution to the output waveform. The narrower bandwidth of the filter, the longer is the time needed to reach the steady state due to the time and frequency domains conjugation relation. We define the clock recovery time as the time needed for the output waveform (amplitude) to grow from 1% of the final value to 99% of the final value. This is quite a strict definition since it means the power deviation is -40 dB from its steady state values. Notice that only rising edge is used for the definition of clock recovery time. Although exactly the same time needed in the falling edge, in principle it can overlap with the rising edge of the following packet’s rising time since the clock filter is a linear system. It is shown in Figure 5.21.
Figure 5.21: Clock waveforms at the input and output of the clock filter.

We assume Butterworth filter is used due to its wide availability in microwave filter design. By the choice of filter response and the definition of steady state, we can calculate the clock recovery time (in bits, normalized to bit duration) versus the normalized bandwidth of the clock filter. Figure 5.22 shows some of the results for different orders of Butterworth filter. The clock recovery time is inverse proportional to the bandwidth of the clock filter. The high-order filters give slower recovery time since the high frequency component is attenuated more. Also, high-order filters have higher group delay which means some extra delay is needed for the payload data such that the payload data and the clock signal are synchronized at the decision circuit. Therefore, it is less desirable to use high-order filters for the clock filter.
Figure 5.22: Clock recovery time vs. normalized clock filter bandwidth.

Figure 5.19 and Figure 5.22 give the overall picture of the tradeoff between the power penalty and clock recovery time. By the use of the second-order Butterworth filter, it needs only 22 bits when $B_{\mathrm{eff}}T$ equals 0.05, which is much faster than any PLL can do. By Figure 5.19, it corresponds to 1.5 dB power penalty. For $B_{\mathrm{eff}}T$ equal to 0.03, 40 bits are needed for clock recovery, but the power penalty reduces to about 1.2 dB.

5.3.2 Experiment Results: Embedded Clock Transport

The design and implementation of the clock transceiver, together with the payload data and control header, have been described in the previous progress report. We successfully demonstrated the simultaneous transmission and detection of the 2.5 Gb/s payload data, clock signal, and 80 Mb/s control header over the fiber channel. The main experimental effort in this progress report is to confirm the theoretical prediction of the previous

91
section and to optimize the system's performance when the payload data and control headers are transmitted simultaneously.

5.3.2.1 Experimental Results: Jittering Measurements

The experiment setup is shown in Figure 5.23. The clock signal is generated by the BER tester clock and then combined with the payload data to modulate the laser. At the receiver the clock filter is used to filter out of the clock signal. The filter we used in the experiment has the noise equivalent bandwidth of about 75 MHz. It is a 5-th order 0.05 dB ripple Chebyshev bandwidth filter. The reason for using Chebyshev filter is its availability. The clock signal is fed into a spectrum analyzer to observe the spectrum of the clock tone and noise floor; then, the signal-to-noise ratio of the clock tone was measured. The noisy clock signal is also fed into the high speed digital oscillator for the jitter measurement. Figure 5.24 (a) shows the jittering measurement results with a relatively high SNRc=30 dB; the resulting jitter is about 2.5 ps (2.3°). The histogram is shown in the bottom part of the graph. Figure 5.24 (b) shows the jitter measurement results with SNRc=18 dB; the resulting jitter is about 6.8 ps (6.1°). The histogram is also shown in the bottom part of the graph. Figure 5.25 shows the measurement results of SNRc versus clock phase jittering. The solid line is the calculated results according to eqn (7). Since the clock at the transmitter at a fixed jitter is about 2.1 ps, the calculated results in Figure 5.25 are modified by considering the source jitter. Experiment results match the theoretical results very well in the high SNRc region. In the low SNRc region, the error is still within 1 dB.
Figure 5.23: Experimental setup for clock phase jittering measurement.

Figure 5.24: The histograms of the filtered clock waveforms.
5.3.2.2 Experimental Results: Clock Recovery Time Measurements

The experimental setup for measuring the clock recovery time is shown in Figure 5.23. The clock tone is generated by the VCO (voltage-control-oscillator) used in the CORD experiment to drive the payload logic. The 2.5 GHz clock signal is fed into an electrically controlled switch. The switch is used to truncate the continuous waveform into packetized clock waveform. The switches is controlled by a low speed external square wave which is also used as a trigger signal for the oscilloscope. The packetized clock waveform is then fed into the oscilloscope. Due to the lack of synchronization between the trigger waveform and the 2.5 GHz clock, the oscilloscope can only observe the envelope of the packetized clock. This is sufficient for the measurement of clock recovery time. Figure 5.27 and Figure 5.28 shows the input and output waveforms of the clock filter when the packet length is 250 ns (the actual packet length that is used in CORD). It can be seen that the clock is recovered relatively fast after passing the clock
filter. To get a more accurate estimate of the clock recovery time, we sent a shorter 70 ns packet to measure the clock recovery time. The results are shown in Figure 5.29 and Figure 5.30. Figure 5.29 is the input packetized clock waveform with duration 70 ns. Figure 5.30 is the output clock waveform after the filter. It shows that the clock recovery time is about 16 ns - 20 ns which is equivalent to about 40 to 50 bits. Since the clock filter is a 5-th order filter with equivalent noise bandwidth 75 MHz, the B²fT is 0.03 (1/B²fT=33.3). By comparing this result with the theoretical results shown in Section 5.3.1.4, we can see that it matches very well.

![Diagram](image)

*Figure 5.26: Experiment setup for measuring clock recovery time.*

![Waveform](image)

*Figure 5.27: Clock at the input of the clock filter with 250 ns duration.*
Figure 5.28: Clock at the output of the clock filter with 250 ns duration.

Figure 5.29: Clock at the input of the clock filter with 70 ns duration.
5.3.3 Penalty of Pilot Clock Tone on Payload Data

For CORD, 89% of the optical power is devoted to data transmission and 11% to the clock tone, yielding a 2 dB SNR penalty with respect to the ideal case when all the optical power is assigned to the payload. This result corresponds to a direct-detection optical penalty of 1 dB. In Figure 5.31, the receiver sensitivity (corresponding to BER=10^-9) is plotted for different pilot-tone modulation depths, m. The optimal receiver sensitivity is at about -16.5 dBm when the pilot-tone modulation depth is about 11%.

The optimal sensitivity is a balance between two tradeoffs: the clock jitter caused by noise/interference, and the optical power allocated for the pilot-tone transmission. When the modulation depth decreases below 11%, the power penalty increases dramatically and starts to generate bursts of errors because the clock jitter becomes too large to sample the data properly. The system fails when the modulation depth drops below 9% due to excessive clock jitter. When the modulation depth is larger than 10%, the power penalty is dominated by the power allocated for the pilot-tone transmission. When the modulation depth increases to 50%, the power penalty is 3.3 dB; 3 dB is due to 50% of
power allocated for the data, and 0.3 dB is caused by the crosstalk between the incompletely suppressed pilot-tone and the data channel. When the modulation increases beyond 50%, the penalty grows faster because the incompletely suppressed pilot-tone seriously distorts the data.

In summary, our embedded clock transport is fairly simple and robust, and is capable of recovering the clock in a few tens of bits with an optical power penalty of only 1 dB.

![Graph](image)

*Figure 5.31: Receiver sensitivity penalty due to clock tone.*

### 5.4 Timing Budget

In optical packet switched networks, the utilization of the channel capacity can be greatly reduced if a large amount of time, relative to a packet time-length, is allocated to clock recovery and allowed for switching time and slot jitter. In the CORD testbed, we have
optimized our synchronization methods and achieved a maximum channel utilization of 68%.

In addition to the slot jitter and payload clock recovery time addressed in earlier sections, the switching time of the optical switches, data framing synchronization, and delay-line variations must also be considered in the CORD timing budget. Framing synchronization is required because the continuity of the signal is disrupted when packets are switched. For frame recovery, we are using a STS-3 SONET frame recovery circuit which requires a minimum of 6 bytes for recovery. The switching time of the LiNbO$_3$ optical switches used in CORD is 4ns.

The components of the 250 ns CORD slot are summarized in Figure 5.32. The ATM cell itself, 53 bytes, takes up 170 ns of the slot. The remaining 80 ns are necessary for: a) 4 ns buffer for the switch transitions; b) 20 ns for frame recovery; c) 16 ns for clock recovery; and c) 13 ns for slot jitter. The additional 27 ns are further padding to account for variance between delay-line lengths and additional safeguard. Thus, the switching time of the switch is only one of many elements reducing the utilization of the channel.

![Figure 5.32: Components of 250 ns slot.](image-url)
6. Packet Generator and Tester

The fundamental purpose of the data packet generation and recovery logic of the CORD testbed is to control the CRO and to measure the performance of the network. The traffic generation and recovery logic performs the generation and recovery of both data and control channel traffic. High-level block diagrams of the transmitter and receiver electronic logic are shown in Figures 2.1 and 2.2 respectively. A description of each block's function, implementation, and current status are given in the following subsections. The data packet generation and recovery logic operates at 2.488 Gb/s. The 2.488 Gb/s serial bit stream is converted from/to a parallel format in order for the generation and recovery logic to run at a clock rate of 311 MHz. Since the control channel traffic has the bit rate of 80 Mb/s, the generation and recovery logic runs directly from an 80 MHz clock. A digital oversampling technique has been designed to allow the control channel recovery logic to recover the header data phase within a few bits. This topic and the packet slot synchronization were covered in the synchronization section of this report.
Figure 6.1: Transmitter electronic logic block diagram.

6.1 Transmitter Logic Subsystem

The eleven functional blocks of the transmitter logic subsystem, shown in Figure 6.1, are divided into three groups: (1) control; (2) payload data; and (3) common. Those associated with the 80 Mb/s control channel portion are: (1) Destination Selector; (2) Statistics Collector and Display; (3) Control Channel Transmitter; (4) Frequency Shift Key (FSK) Modulator; and (5) 80 MHz clock source. The blocks associated with the 2.488 Gb/s payload data channel are: (1) Payload Data Generator; (2) 64-to-8 Multiplexer; (3) 8-to-1 Multiplexer; and (4) 2.488 GHz clock source. The two remaining blocks are: (1) Packet Generator and Ping Transmitter; and (2) Signal Combiner. These
are used for the transmission of both the control and payload data channels. An explanation of each functional block is given in the following sub-sections.

6.1.1 Packet Generator and Ping Transmitter

The operation of the CORD testbed relies on the synchronization of slots between nodes. Slot synchronization is achieved with the transmission of a PING signal on the control channel at the beginning of each 250 ns slot. The transmitted PING signal is generated by the Control Channel Transmitter; but it is the Packet Generator and Ping Transmitter that controls when the PING is transmitted. For the master node of the CORD testbed, the PING transmission is dependent only on the local 4 MHz slot oscillator. The slave node of the testbed adjusts its 4 MHz slot oscillator to match the master node's. This frequency and phase tracking is controlled by the PING PLL in the receiver logic subsystem.

Although both CORD nodes transmit a PING at the start of every slot to simulate actual traffic, not every slot will contain payload data. It is the packet generator that determines which slots will contain payload data and which will not. The packet generator we built can operate in one of two possible modes. In the first mode, payload data is sent in every slot. In the second mode, a pseudo-random generator is used to provide a 1/3 probability that a slot will not contain payload data. Empty and full slots occur with an even distribution.

The logic circuits for the packet generator and PING transmitter are implemented in an AMD MACH 210 programmable device which is driven by an 80 MHz clock. The 80 MHz clock also drives the destination selector, control channel transmitter, and statistics collector and display modules so that each of these modules is easily synchronized. The payload data generator is driven at 77.75 MHz which is derived by dividing the 2.488 GHz clock by 32. Although this allows the packet generator and payload data generator to be up to 13 ns out of phase, it insures the synchronization between payload data generator and the payload multiplexer modules.
6.1.2 Signal Combiner

The addition of the payload data, 2.488 GHz clock tone, and control channel signals is an involved task, requiring highly selective microwave filters, attenuators, and couplers. Detailed description and analysis of, as well as experimental results from the signal combiner module have been given in earlier sections of this report.

6.1.3 Control Transmitter Modules

6.1.3.1 Destination Selector

The function of the destination Selector module of the transmitter electronic logic is similar to the packet generator. While the packet generator determines which slots should contain payload data, the destination selector determines which node slots with payload data should be sent to. Since CORD is a broadcast-and-select local area network, every slot is transmitted to all nodes. Therefore, the payload data generator and transmitter do not require input from the destination selector module. Our implementation of the destination selector may be selected to operate in one of three possible modes. In the first mode, the destination is selected randomly between nodes A and/or B, with equal probability. In the second mode, the destination is fixed to node A. Similarly, in the third mode, all payload data packets are destined for node B. The algorithm for determining destinations of packets in the first mode is the one most commonly used in network simulations.

6.1.3.2 Transmitter Statistics Collector and Display

To keep track of the number of payload packets transmitted, three counters are displayed at each transmitter. The first two counters correspond to the number of packets transmitted to node A and B, respectively. The third counter corresponds to the number of empty slots (that is, the number of slots in which no payload data was transmitted). Each counter is 32 bits long, which translates to the maximum number of bits equal to
4.29 billion. At the slot rate of 4 MHz, the minimum counter wrap over time will be over 17 minutes.

6.1.3.3 Control Channel Transmitter

The control channel transmitter combines the PING signal, a four bit preamble pattern used by the DPA at the receiver for clock recovery, and the destination address. It then transmits the entire signal during one 250 ns slot. The format of the control channel information has been described in a previous section and is summarized here in Figure 6.2. We have two control channel transmitters operating, one for each CORD testbed node.

![Figure 6.2: Format of the control channel slot.]

6.1.3.4 Frequency Shift Key (FSK) Modulator

To frequency-modulate the control channel signal onto its subcarrier (3 GHz for node A, 3.5 GHz for node B), a voltage controlled oscillator (VCO) is used. We are pleased with the frequency stability and SNR we have achieved with the FSK modulator as discussed in earlier sections.

6.1.3.5 80 MHz Clock

A crystal oscillator is used to generate the 80 MHz clock for the control transmitter logic circuits. As explained earlier, the 80 MHz clock drives the packet generator and PING transmitter, destination selector, control channel transmitter, and statistics collector and display modules. Therefore, each of these modules is synchronized and can be integrated onto a single PCB.
6.1.4 Payload Data Transmitter Modules

Encoding of the serial data is necessary to prevent long sequences of "ones" or "zeros". Long sequence of the same value causes the coupling capacitor to charge/discharge and thus reduces the decision threshold. Coding schemes are often used so that the data sequence meets the data transition density required by the system hardware. For CORD, the transmitter uses the ANSI X3T9.3 8B/10B encoding scheme (being used in the TAXI-chipset). This scheme assures that, in the long run, just as many "zeros" have been transmitted as "ones". In our implementation, a table is used for generating AC balanced blocks for transmission.

6.1.4.1 Randomness of the Test Data

The major task of the data tester is to characterize the performance of the transmission link. To measure the BER of the link, random data patterns are needed. For CORD, N different pseudo-random sequences are used for transmission. Each sequence consist of blocks of 64-bit words. For each packet, there is a packet ID. The payload data generator transmits one of the N sequences depending on the packet ID. Each sequence represents a different permutation of blocks stored in the generator; a list of random numbers is generated using the ID to select the corresponding blocks for the packet. As explained in section 6.2.2.3, this strategy allows for simpler design of the bit error check module.

6.1.4.2 Payload Data Generator

The payload data generator is driven at a 78 MHz clock frequency. This clock is generated by dividing the 2.488 GHz clock used for the serial payload data transmission. The payload data generator consists of CMOS programmable logic devices (PLD) and electrically erasable programmable read-only memory (EEPROM). The block diagram of the payload data generator is shown in Figure 6.3.
The data generator consists of Tx Controller and EEPROMs.

1. Tx Controller: The function is performed by a MACH210. Its functions are:
   
   (a) receive the signal TX_DATA from the packet generator module and start the transmission of a packet;
   
   (b) generate a random ID for each packet;
   
   (c) for each ID, generate the correct address & control signals so that the corresponding packet is transmitted from the EEPROMs to the 64-to-8 multiplexer module.

If no payload data is to be transmitted during a slot, then the Tx Controller loads the idle pattern to the 64-to-8 multiplexer module. At the start of each slot with payload data, a start-of-packet pattern is first sent before the payload data. This start-of-packet pattern is used by the receiving node for byte alignment and to determine the beginning of the incoming payload data.

Although the payload data generator is driven at 78 MHz, the payload data itself is latched to the 64-to-8 multiplexer module at a rate of 39 MHz. That is, one 64 bit word is transmitted at each 39 MHz period. The purpose of operating the payload data generator twice as fast as the payload data is transmitted is twofold. First, it reduces the maximum delay between a PING signal and the start of the payload data within the slot.
by a factor of two, making it less than 13 ns. Secondly, the extra clock cycle between each transmitted 64 bit word drastically reduces the complexity of the logic required to generate the EEPROM addresses from the random ID.

2. EEPROMs: They are implemented with Cypress CY7C258, 2K×16 Reprogrammable State Machine PROMs. They are used to store blocks of pseudo-random bit sequences.

6.1.4.3 64-to-8 Multiplexer

The high level block diagram of the 64-to-8 multiplexer module is shown in Figure 6.4. Critical to the 64-to-8 multiplexer module is synchronization to the 8-to-1 multiplexer module. The 311 MHz clock is divided by 4 as well as 8 in order to provide both 39 MHz and 78 MHz clocks to the payload data transmitter module. By providing the 78 MHz clock to the payload data transmitter module, we reduce the jitter between the packet generator signal and the beginning of the payload data transmission from 25 ns to 12.5 ns.

Figure 6.4: Block diagram of 64-to-8 multiplexer.
6.1.4.4 8-to-1 Multiplexer

The 8-to-1 multiplexer consists of a Vitesse GaAs SONET multiplex chip, mounted on a special purpose PCB. The connections between the 8-to-1 multiplexer module and the 64-to-8 multiplexer are through coaxial cables with SMA connectors.

The area of primary concern is the phase between the 8-to-1 multiplexer's on-chip divide by 8, 311 MHz clock generator, and that of the clock generated for the 64-to-8 multiplexer module. By integrating digitally controlled, variable delays into the clock distribution of the 64-to-8 multiplexer module, we are able to adjust the phase between the clocks manually. However, phase drift between the two clocks cannot be adjusted for. We have made measurements of the rate of drift and found that the probability of occurrence within an ATM packet is extremely low.

6.1.4.5 2.488 GHz clock source

To generate the 2.488 GHz clock for clock transmission and for the payload data transmitter, we are using a high performance VCO. We have measured the frequency stability of the VCO and found it to be sufficient for the CORD testbed. The 2.488 GHz clock is divided by 8 by the 8-to-1 multiplexer to generate a 311 MHz clock. This clock is then used by the 64-to-8 multiplexer where it is divided by 4 for use as a 77.75 MHz clock for the payload data generator module.

6.2 Receiver Logic Subsystems

The twelve functional blocks of the receiver logic subsystem, shown in Figure 6.5, are divided into two groups: (1) control; and (2) payload data. Those associated with the 80 Mb/s control channel portion are: (1) PING detector; (2) PING Phase Lock Loop (PLL); (3) Delay-Line Phase Alignment (DPA); (4) Comparator; (5) CRO Controller; and (6) Control Channel Receiver Statistics Collector and Display. The blocks associated with the 2.488 Gb/s payload data channel are: (1) 1-to-8 Demultiplexer; (2) 8-to-64 Demultiplexer; (3) Bit Error Checker; (4) Controller; (5) Payload Receiver Statistics
Collector and Display; and (6) 38.875 MHz Clock Source. An explanation of each functional block is given in the following sub-sections.

![Block Diagram]

All Control Channel Rx. Logic is driven at 80 MHz from local clock source.

*Figure 6.5: Receiver electronic logic block diagram.*

### 6.2.1 Control Receiver Modules

#### 6.2.1.1 PING detector

The PING detector has been implemented using the multi-tap delay line outputs from the DPA module, a resistor network, and a threshold detector. We purchased custom-made multi-tap delay lines with a much smaller delay variance than those of standard commercial multi-tap delay lines.
6.2.1.2 PING Phase Lock Loop

The PING PLL was explained in the earlier section on distributed slot synchronization. The output of the ping detectors are fed to a MACH 210 which contains the logic of the ping alignment controller. The ping alignment controller selects the appropriate clock for the slave header transmitter and instructs transmitter to add or drop a bit from a slot when required.

6.2.1.3 Delay-Line Phase Alignment

The clock recovery scheme of the control channel receiver, DPA, has been discussed in detail in the earlier synchronization section of this report. With DPA, the output of the control channel discriminator is quantized to TTL levels and provided at the input of a multitap delay-line. The outputs of the multitap delay-line provide several phase-shifted copies of the received signal. By sampling all multitap delay-line outputs in parallel, the DPA controller is able to determine the phase of the received bit stream. The DPA controller monitors a 4 bit 1010 preamble pattern, after which it selects the multitap delay-line output whose phase is closest to that of the receiver local oscillator's. If a bit error is received during the 4 bit preamble, the DPA controller will not select an output tap for reception. When this occurs, a lock failure signal is generated by the DPA and the received header is discarded.

The DPA provides accurate clock recovery in 4 bits with control logic running at the received signal bit rate (80 MHz). Experimental results of the performance of the DPA were reported in the earlier section.

6.2.1.4 Comparator

We have built-in CMOS programmable logic, a circuit which compares the received header with all valid patterns. The valid patterns include: (1) empty slot - no payload packet; (2) payload packet destined for node A; and (3) payload packet destined for node B. If the received header does not match a valid pattern, the comparator generates a packet
error signal. It is a very close approximation to consider an invalid pattern as a single bit error. For a BER=10^{-9}, the probability of an invalid pattern containing more than one errored bit = 3\times 10^{-17}.

6.2.1.5 CRO Controller

The CRO controller determines the state of each switch in the CRO, based on inputs provided by the comparator module. Due to a defect in one of the CRO pre-prototype switches, GTE initially supplied a CRO pre-prototype with a single delay line stage between two switches, as shown in Figure 6.6. The inputs to the CRO contain payload data packets. A '00' input corresponds to no payload data packets; a '01' represents a payload packet at the first switch input (1310 nm wavelength); a '10' represents a payload packet at the second switch input (1320 nm wavelength); and a '11' represents payload packets at both switch inputs.

![Fiber delay line diagram](image)

*Figure 6.6: Single delay-line CRO.*

The CRO controller determines the desirable switch configuration based on the following information: (1) which switch input(s) contain a payload data packet for reception; (2) is there a payload data packet currently in the delay line; and (3) which switch input should be given priority in the event of a contention. By alternating between which switch input should be given priority, the CRO remains unbiased. A state diagram and state table for the CRO controller are given in Figure 6.7 and Table 6-1. The outputs of the CRO controller control the state of the switches. A "1" output corresponds to setting the
switch to the cross state. A "0" output corresponds to setting the switch to the bar state and an "X" output corresponds to "don't care" so the switch may be set to either cross or bar states.

Figure 6.7: State diagram of CRO controller.

<table>
<thead>
<tr>
<th>State</th>
<th>Output</th>
<th>Next State for inputs =</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>00 01 10 11</td>
</tr>
<tr>
<td>0</td>
<td>0 0</td>
<td>0 0 1 4</td>
</tr>
<tr>
<td>1</td>
<td>1 0</td>
<td>1 0 1 5</td>
</tr>
<tr>
<td>2</td>
<td>0 0</td>
<td>6 3 2 3</td>
</tr>
<tr>
<td>3</td>
<td>1 1</td>
<td>7 3 2 2</td>
</tr>
<tr>
<td>4</td>
<td>0 1</td>
<td>6 3 2 2</td>
</tr>
<tr>
<td>5</td>
<td>¥ 1</td>
<td>7 3 2 3</td>
</tr>
<tr>
<td>6</td>
<td>1 0</td>
<td>1 0 1 5</td>
</tr>
<tr>
<td>7</td>
<td>¥ 1</td>
<td>0 0 1 4</td>
</tr>
</tbody>
</table>

Table 6-1: State table of CRO controller.
6.2.1.6 Control Channel Receiver Statistics Collector and Display

The control channel receiver statistics collector and display module is used to keep track of the number of packets which arrive at the node and are destined for it. Keeping count of packets at the transmitter and receiver provides confirmation that the network is not losing packets. Similarly, both control channel headers received and payload data packets received are kept track of. At the control channel receiver, six counters are displayed: (1) a counter for the number of packet headers received from node A; (2) a counter for the number of packet headers received from node B; (3) a counter to display the number of contentions resolved by the CRO (which is also equal to the number of packets that are routed through the delay-line); (4) a counter to display the number of contentions that could not be resolved (that is, the number of payload packets that were dropped); (5) a counter to display the number of bit errors received on the control channel; and (6) a counter to display the number of lock errors from the DPA.

6.2.2 Payload Data Receiver Modules

6.2.2.1 1-to-8 Demultiplexer

The 1-to-8 demultiplexer module converts the 2.488 Gb/s serial payload data signal to an 8-bit parallel ECL complementary signal with bit rates of 311 Mb/s per parallel output. The 1-to-8 demultiplexer provides the synchronization of the received payload and has built-in start of packet recognition circuitry. We have tested the performance of the start-of-packet circuitry and measured the delay between reception of the start-of-packet data pattern and the assertion of the start-of-packet signal, FP, generated by the demultiplexer. From our tests, we found that the FP signal is asserted between 1 and 2 byte clock periods after the start of packet pattern is demultiplexed. A sample of a measurement is shown in Figure 6.8: Timing of start of packet recognition circuitry. Because the variance of the FP signal delay is less than one byte clock period, it will not cause a jitter problem in the 8-64 demultiplexer module.
6.2.2.2 8-to-64 Demultiplexer

To further improve signal integrity on the 8-to-64 demultiplexer printed circuit board, it was necessary to terminate all complement outputs of every ECL component with a balanced load. This improves the ground bounce due to 64 bit transitions. From experimental measurements of the 1-to-8 demultiplexer module performance, we discovered that the FP signal arrives two 311 MHz clock periods earlier than expected with respect to the start of packet, A1-A2 transition. The frame synchronizer was modified accordingly. Additional handshake signals between the CMOS bit error checker and the 1-to-8 demultiplexer module were added to the 8-to-64 demultiplexer design. A block diagram of the 8-to-64 demultiplexer is shown in Figure 6.9 and a picture of the PCB is shown in Figure 6.10.
Figure 6.9: Block diagram of 64-to-8 demultiplexer.

Figure 6.10: 2.5 Gb/s 8-to-64 bit demultiplexer
6.2.2.3 Bit Error Checker

Checking bit errors in a packet is usually done by using error detection code. The most popular coding scheme is the cyclic redundant code (CRC). However, it is difficult to implement CRC generating circuitry for short sequences of 64-bit words which is the format of our generated payload data. An alternative to CRC is cross parity code which is somewhat easier to implement. However, with both CRC and cross parity code, it is not possible to determine the number of errored bits. Rather, only the detection of one or more bit errors in the entire payload packet is possible. For this reason, bit-by-bit error checking is utilized in the CORD testbed.

To perform the bit-by-bit error checking, the incoming payload data is first demultiplexed in by the 1-to-8 and 8-to-64 demultiplexer modules. Once in 64 bit parallel form, the received payload data is compared with 64 bit patterns stored in electrically erasable programmable read-only memory (EEPROM) at the receiver. The comparison is performed by programmable CMOS logic (4 AMD MACH210s). A controller is used to select the correct 64 bit pattern from the EEPROM to compare the received payload data with. Because the receiver EEPROM contains the same 64 bit words as the payload data generator module at the transmitter, it is just a question of determining which 64 bit pattern was transmitted. This information is sent in payload packet ID which is transmitted at the beginning of the payload packet. From the packet ID, the bit error checker controller determines which 64 bit patterns to load to the comparator and in which order to load them. By using the packet ID, the controller does not need to keep track of which node transmitted the payload packet.

Of course if a bit error occurs during the transmission of the packet ID, then the comparator may compare the wrong 64 bit patterns which would cause a large number of bit errors to be counted even if the payload data did not contain any errors. To prevent this from occurring, the packet ID is encoded with a redundant Hamming code, similar to the one used for control channel destination addresses. The probability of error in the packet ID is then reduced to less than \(1 \times 10^{-15}\) for a BER=10^-9. Because the comparator logic operates on the payload data in 64 bit parallel format, the logic is driven at 39 MHz.
The block diagram of the payload data bit error checker is shown in Figure 6.11. The LED display in the figure corresponds to a portion of the payload receiver statistics collector and display module.

![Block Diagram of Payload Data Bit Error Checker](image)

*Figure 6.11: Block diagram of payload data bit error checker.*

### 6.2.2.4 Controller

The payload receiver controller coordinates the synchronization of payload packet arrivals between the 1-to-8 demultiplexer, bit error checker, and statistic collector and display modules. Once signaled by the PING detector of the control receiver, the payload receiver controller enables the 1-to-8 demultiplexer to search for the start of packet data pattern. The controller will disable the 1-to-8 demultiplexer search after receiving confirmation from the bit error checker of receipt of the payload packet.
Although the payload receiver controller operates at 39 MHz, its output control signals to the 1-to-8 demultiplexer must be at ECL levels. For this reason, the controller has to be implemented as a separate module.

6.2.2.5 Payload Receiver Statistics Collector and Display

The statistics collected at the payload receiver include: (1) the number of bit errors; (2) the number of payload packets received; and (3) the number of invalid packet IDs received. As with the control receiver and transmitter statistics collector and display modules, counters with hex LED displays are utilized.

6.2.2.6 38.875 MHz Clock Source

Unlike with the payload data generator at the transmitter, the bit error rate check logic is not synchronized to the ECL and GaAs logic. While it was possible to divide the 2.488 GHz clock used for the GaAs logic at the transmitter, the receiver GaAs is driven by the received clock, which is not continuous. Therefore, if the receiver 2.488 GHz clock were used for the bit error rate check logic, the logic would not run continuously and would not be reliable. To avoid this, the receiver CMOS logic is driven by the local transmit 38.875 MHz clock oscillator. Because of the asynchronous nature of the arriving payload data, it is not necessary to synchronize the clocks of the CMOS and ECL logic at the receiver. Instead, FIFO buffers are used at the interface to allow asynchronous data transfer between the 16-to-64 demultiplexer and bit error check logic modules.
7. Project Conclusions

In this paper, we have presented results of an ongoing effort at Stanford University aimed at demonstrating the feasibility of all-optical solutions to deal with the problems of resource contention, signaling, and synchronization in optical packet switched networks. With the CORD (Contention Resolution by Delay Lines) project, we have shown that packet contentions can be resolved in the optical domain with an optical switch and delay-line device, Contention Resolution Optics (CRO).

We have built a two-node WDM testbed to demonstrate and test CRO configurations. We integrated single polarizing (SP) fiber to minimize the cross-talk penalty of LiNbO$_3$ optical switches and semiconductor optical amplifiers (SOA) to compensate for switch power losses in the CRO. We implemented and tested the performance of a two-stage CRO in the shared configuration. We propose and are currently experimenting with an alternative CRO configuration which can scale to more than ten stages and is more easily implemented with integrated optical technologies than the shared configuration.

In the CORD testbed, a slower bit-rate (80 Mb/s) signaling channel is transmitted in parallel with high-speed (2.5 Gb/s) payload data for packet routing and network-wide synchronization information. Multichannel Subcarrier Multiplexed (MSCM) signaling is used to transmit both payload data and signaling channels with a single laser per node with minimal penalty. With MSCM, only one photodiode is needed to receive multiple channels on multiple wavelengths simultaneously.

The payload data and signaling channels are slotted into 250ns slots and nodes are synchronized so that slots arrive at nodes aligned. We developed a novel distributed slot synchronization technique which is robust and scalable and performs with a maximum slot jitter of ±6.5 ns. To achieve per packet bit synchronization of the signaling channel, we developed an ultra-fast delay-line phase alignment technique which recovers data within 4 bits. For per packet bit synchronization of the payload data, a requirement of
optical packet switched networks, we transmit an explicit pilot-tone which is filtered at the receiver to recover the 2.5 GHz clock within 16 ns.

The foregoing principles and technologies are flexible and are expected to find applications, together or separately, in a variety of topologies and/or network architectures. The CRO concept is directly applicable to other key problems in optical networks, such as all-optical time-slot synchronization, all-optical bridging, and packet/circuit integration in optical networks. The MSCM header encoding-decoding and the slot and clock synchronization techniques provide solutions to the critical problems of signaling and ultra-fast synchronization in all-optical packet-switched networks. The innovative concepts in components and subsystems developed in CORD are expected to yield a substantial contribution toward the practical exploitation of all-optical packet-switched networks.
8. References


9. Publications

9.1 Refereed Journal Publications


9.2 Conference Papers


10. Contributors

10.1 Principal Investigator

Dr. Leonid G. Kazovsky was born in Leningrad, USSR, in 1947. He received his M.Sc. and Ph.D. degrees from the Leningrad Electrotechnical Institute of Communications, Leningrad, USSR, in 1969 and 1972, respectively, both in Electrical Engineering.

He moved to Israel in 1973. From 1974 to 1984 (with a one-year interruption for active military service), Dr. Kazovsky was teaching and doing research at Israeli and U.S. universities. From 1984 to 1990 he was with Bellcore, Red Bank, NJ, doing research on high-speed WDM optical fiber communication systems. In 1990, Dr. Kazovsky joined Stanford University as Professor of Electrical Engineering.

Dr. Kazovsky has published in the areas of optical communications, high-speed networks, applied optics, and signal processing. He is the author or co-author of more than one hundred journal technical papers, of numerous conference papers, and of a book published by Wiley. Dr. Kazovsky acted as a reviewer for various IEEE and IEE Transactions, Proceedings and Journals, as well as for funding agencies (National Science Foundation, Energy Research Council, etc.) and publishers (John Wiley & Sons, Macmillan, etc.). He serves or served on Technical Program Committees of OFC, CLEO, SPIE and GLOBECOM, and is an Associate Editor of IEEE Transactions on Communications, of IEEE Photonics Technology Letters and of Wireless Networks. Dr. Kazovsky is a Fellow of the IEEE and a Fellow of OSA.

10.2 Visiting Scholars

Mauro Cerisola was born in Torino, Italy, in 1966. He received the M.S. and Ph.D. degrees from the Politecnico di Torino, Italy, both in electrical engineering, in 1990 and 1994, respectively.
In 1994, he worked with the Optical Communication Research Laboratory at Stanford University, Stanford, CA, on analog and high-speed digital techniques for synchronization and clock recovery in WDM networks, within the CORD project. In 1995, he was a consultant for the Optical Communication Group at Politecnico di Torino, working on polarization modulation. He is currently with CSELT, Torino, Italy, working on systems for all-optical synchronization and switching. His research interests on systems for all-optical synchronization and switching. His research interests have been in the area of high-speed networks, high-speed digital design, and log and digital techniques for synchronization, and clock recovery in optical networks.

Pierluigi Poggiolini was born in Torino, Italy, in 1963. He received the M.S. (summa cum laude) and the Ph.D. degrees from Politecnico di Torino, Italy, in 1988 and 1993, respectively.

From 1988 to 1989, he was with the Italian State Telephone Company Research Center CSELT, working on performance analysis and computer simulation of lightwave transmission systems and optoelectronic devices. He also worked in the field of polarization scrambling-spreading techniques, where he holds an international patent. From 1990 to 1992, he was with the Optical Communications Research Laboratory at Stanford University, where he worked on polarization modulation and was involved in the STARNET broadband optical network project. He is currently a Research Assistant at Politecnico di Torino, where he works on both theoretical analysis and experimental implementation of polarization modulated optical systems. He is also a half-time Postdoctoral Fellow at Stanford University, where he works on all-optical high-speed packet networks and the CORD project.

10.3 Graduate Students

Thomas K. Fong received the B.E. and M.Eng.Sc. degrees from the University of New South Wales, Australia, in 1989 and 1991, respectively, and the Ph.D. degree from Stanford University, Stanford, CA, in 1995, all in electrical engineering.
From 1991 to 1995, he was a teaching assistant and research assistant at the Optical Communications Research Laboratory at Stanford University. His major technical activities there were primarily in analog optical links, and high speed optical networks. Since September 1995, he has been with the Broadband Access Research Department at AT&T Bell Laboratories, Crawford Hill, NJ, and has worked on multipath channel characterization, fading channel modeling, and frequency reuse techniques. His research interests are in digital transmission, wireless access, and high-speed optical networks.

R. Theodore Hofmeister received the B.S. degrees from Columbia University, NY, and Bates College, Lewiston, ME, majoring in electrical engineering and physics, respectively, in 1990. He received the M.S. degree in electrical engineering from Stanford University, in 1995, where he is currently pursuing the Ph.D. degree.

From 1990 to 1993, he worked for GTE in several communications-related positions with: GTE Telephone Operations; GTE Spacenet; GTE Laboratories; and GTE Government Systems. He is a member of the Optical Communications Research Laboratory at Stanford. His research interests include high-speed WDM networks and hybrid optical and electronic switching. Mr. Hofmeister is a member of Eta Kappa Nu, Tau Beta Pi, and IEEE.

Chung-Li Lu received the B.S. degree from National Taiwan University, Taipei, R.O.C., in 1990. He received the M.S. degree in electrical engineering form Stanford University, Stanford, CA, in 1993, where he is currently pursuing the Ph.D. degree.

From 1990 to 1992, he was an instructor in communication and radar systems with the Chinese Navy. He is a member of the Optical Communications Research Laboratory at Stanford. His research interests include high speed optical networks, RF photonics, and broadband access technologies.
Adisak Mekkittikul received the B.S. degree form KMITL, Bangkok, Thailand, in 1985, and the M.S. degree from Wichita State University, in 1987. He is currently working toward the Ph.D. degree in the Department of Electrical Engineering, Stanford University.

His research interests include high-performance packet switching architectures and stability and scheduling algorithms of input queued switches.

Delfin Jay M. Sabido IX was born in Manila, Philippines, in 1967. He received the B.S. degree (summa cum laude) in electrical engineering from the University of the Philippines, in 1989, and the M.S. and Ph.D. degrees from Stanford University, Stanford, CA both in electrical engineering, in 1991 and 1996, respectively.

From 1989 to 1990, he was an instructor in Electrical Engineering at the University of the Philippines, and founder and head of its PCB laboratory. At the same time, he served as a consultant and curriculum adviser of the National Engineering Center, Philippines. At present, he is a Product Development Engineer at Wave Optics, Inc. in Palo Alto, CA. His main research interests are in the area of fiber-optic communication systems and devices, analog optical links, and optical networks. Dr. Sabido is a member of OSA and Phi Kappa Phi.