# SHARED MEMORY SWITCHES

#### HIGH PERFORMANCE SWITCHES AND ROUTERS Wiley H. JONATHAN CHAO and BIN LIU Instructor: Mansour Rousta Zadeh

### **SHARED MEMORY SWITCHES**

#### Introduction

- A common shared memory for all inputs and outputs
- Every time slot:
  - Input ports store incoming cells
  - Output ports retrieve outgoing cells
- Work as output buffered switch
  - Optimal throughput
  - Optimal delay performance
- A shared buffer → Centralized memory management → Switch size limitation.

Cell Slot cell length memory access cycle  $\leq \frac{1}{2 \cdot N \cdot \text{link speed}}$ 

#### Logical queues in a shared-memory switches



Basic structure of a linked-list-based shared-memory switches





Cell Buffer Memory

#### **Insertion/Deletion**

• Two different ways:



(c) Delete HOL cell from the logical queue in (b)

(c) Delete HOL cell from the logical queue in (b)



#### **Using Dedicated FIFOs**



Mux: Multiplexer Demux: Demultiplexer IAF: Idle address FIFO

#### **Using CAM**

- RAM stores the cell
- CAM stores a tag

#### Unique tag for each cell: (i,s)

- i: output port number
- s: sequence number

#### The switch architecture:



Tag: (output address, sequence number), e.g., (i,s) MCI: Multicast connection identifier

#### For a write:

- 1. Read the write sequence number WS[i] from the write sequence RAM (WSRAM) (corresponding to the destination port i), and use this value (s) for the cell's tag {i, s}.
- 2. Search the tag CAM for the first empty location, emp.
- 3. Write the cell into the buffer B[emp]=cell, and {i, s} into the associated tag.
- 4. Increment the sequence number s by one, and update WS[i] with s+1.

#### For a read:

- 1. Read the read sequence number RS[j] from the read sequence RAM (RSRAM) (corresponding to the destination port j), say t.
- 2. Search for the tag with the value {j, t}.
- 3. Read the cell in the buffer associated with the tag with the value {j, t}.
- 4. Increment the sequence number t by one and and update RS[i] with t+1.

#### Comparison with linked list

| Bits                                                | Linked List                                                                | CAM Access                                                                             |
|-----------------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
| Cell storage<br>(decode/encode)                     | RAM<br>(decode)<br>256 × 424 = 108,544                                     | CAM/RAM<br>(neither)<br>256 × 424 = 108,544                                            |
| lookup                                              | Link: RAM $256 \times 8 = 2048$                                            | Tag:CAM $256 \times (4 + 7) = 2816$                                                    |
| Write and read reference<br>(queue length checking) | Address registers<br>(additional counters)<br>$2 \times 16 \times 8 = 256$ | Sequence number registers<br>(compare W and R numbers)<br>$2 \times 16 \times 7 = 224$ |
| Idle address storage<br>(additional overhead)       | IAF<br>(pointer maintenance,<br>extra memory block)                        | CAM valid bit<br>(none)                                                                |
|                                                     | $256 \times 8 = 2048$                                                      | $256 \times 1 = 256$                                                                   |
| Total                                               | 112,896                                                                    | 111,840                                                                                |

### **SPACE-TIME-SPACE APPROACH**

# Space-time-space approach

- Shared memory is partitioned into separate memories (SBMs)
- Two crosspoint space division switches used to distribute access to SBMs
- Crosspoint switching instead of time division MUX → Speedup



#### **SPACE-TIME-SPACE APPROACH**

# No blocking at inputs while SBMs are not full



Small shared memory modules +

#### Multistage network

#### Examples

- Washington university gigabit switch (WUGS) [16]
- Concentrator based growable switch [4]
- Multinet switch [8]
- Siemens switch [5]
- Alcatel switch [2]

#### Washington university gigabit switch (WUGS)

- Consists of 3 parts
  - Input port processors (IPP)
  - Central switching network
  - Output port processors (OPP)
- IPP tasks
  - Buffering input cells
  - Virtual-path-circuit translation
- OPP tasks
  - Resequencing cells
  - Recycling multicast cells to corresponding IPP
- Central switching network •
  - Benes topology
  - Good scalability due to recursive expansion
  - Good load balancing



#### **Concentrator based growable switch**

#### From left to right:

- Front-end broadcast network
- Address filters
- N\*8 concentrators
- 8\*8 shared memory ATM swithes



#### **Multicast shared memory switches**

- 3 methods
  - Multicast logical queue
  - Cell copy
  - Address copy

#### MULTISTAGE SHARED MEMORY SWITCHES Multicast logical queue

- An additional logical queue for multicast cells
- Advantages
  - Simplicity
  - Minimized queue update operations per time slot
- Service policy
  - Strict priority for multicast cells
  - Round robin
  - Weighted round robin
- Disadvantage
  - HOL blocking for multicast cells when strict priority is not used



#### **Cell copy method**

- Multicast cells are replicated
- Disadvantages
  - Replication storage of O(N<sup>2</sup>)
  - Need for cell copy circuit



Inputs

Outputs

#### Address copy method

- Address in SBM is replicated instead of the cell itself
- Less memory usage than cell copy method
- The need for multicast cell counters (MCC)



# **MULTISTAGE SHARED MEMORY**

