Computer Architecture
The Anatomy of Modern Processors


Memory

Memory Technologies

Two different technologies can be used to store bits in semiconductor random access memory (RAM): static static RAM and dynamic RAM.

Static RAM

Static RAM cells use 4-6 transistors to store a single bit of data. This provides faster access times at the expense of lower bit densities. A processor's internal memory (registers and cache) will be fabricated in static RAM. Because the industry has focussed on mass-producing dynamic RAM in ever-increasing densities, static RAM is usually considerably more expensive than dynamic RAM: due both to its lower density and the smaller demand (lower production volumes lead to higher costs!).

Static RAM is used extensively for second level cache memory, where its speed is needed and a relatively small memory will lead to a significant increase in performance. A high-performance 1998 processor will generally have 512kB to 4Mbyte of L2 cache.

Since it doesn't need refresh, static RAM's power consumption is much less than dynamic RAM, SRAM's will be found in battery-powered systems. The absence of refresh circuitry leads to slightly simpler systems, so SRAM will also be found in very small systems, where the simplicity of the circuitry compensates for the cost of the memory devices themselves.

Dynamic RAM

The bulk of a modern processor's memory is composed of dynamic RAM (DRAM) chips. One of the reasons that memory access times have not reduced as dramatically as processor speeds have increased is probably that the memory manufacturers appear to be involved in a race to produce higher and higher capacity chips. It seems there is considerable kudos in being first to market with the next generation of chips. Thus density increases have been similar to processor speed increases.

A DRAM memory cell uses a single transistor and a capacitor to store a bit of data. Devices are reported to be in limited production which provide 256 Mbits of storage in a single device. At the same period, CPUs with 10 million transistors in them are considered state-of-the-art. Regularity is certainly a major contributor to this apparent discrepancy .. a DRAM is about as regular as it is possible to imagine any device could be: a massive 2-D array of bit storage cells. In contrast, a CPU has a large amount of irregular control logic.


A typical DRAM cell with a single MOSFET
and a storage capacitor

Access modes
Almost all DRAMs fabricated require the address applied to the device to be asserted in two parts: a row address and a column address. This has a deleterious affect on the access time, but enables devices with large numbers of bits to be fabricated with fewer pins (enabling higher densities): the row and column addresses are applied to the same pins in row- and column-address phases of the access.


Read access for a DRAM device, showing application of the row and column addresses in conjunction with the row-address strobe (RAS) and column-address strobe (CAS).

The Access Time Myth
The performance of commercial DRAMs is commonly quoted in terms of the "access time" (tRAC in the figure), the time from the assertion of RAS to availability of the data. A more relevant figure when considering total system throughput is the cycle time (tRC in the figure) which is usually about twice as long. The cycle time is the minimum time between successive accesses to the same device and is thus the factor which determines data throughput.

Refresh
Charge leaks slowly from the storage capacitor in a DRAM cell and needs to be periodically refreshed: refresh times are in the ms region. When a DRAM is being refreshed, other accesses must be "held off". This increases the complexity of DRAM controllers (and causes SRAM to be the memory of choice in small systems, where the cost of the refresh circuitry would outweigh the extra cost of the SRAM chip itself) and has a small (several per cent) affect on the effective bandwidth as the memory is effectively "off-line" for a short time every few milliseconds.

Page mode
The bandwidth of DRAM chips can be increased by operating them in page mode. Several column addresses are applied for each row address.


Read access for a DRAM device operating in page mode.

Thus the overhead of asserting the address in two phases is reduced and throughput increased. Locality of reference makes this an effective strategy: once one location in a page is accessed, there is a high probability that other locations in the same page will be accessed also. It also helps filling cache lines, which will span several consecutive words in a modern processor.

Processor-Memory Interconnect

Bus

Split Address and Data Buses

Splitting the address and data buses allows a processor to overlap the data phase of a bus transaction with the address phase of a following transaction - achieving faster throughput.

Interleaved Systems

By arranging memory in banks, data throughput can be increased. Successive words of a multi-word burst are fetched from different memory banks: this means that the access latency for a memory word to be fetched from memory is incurred only for the first word of the burst. Subsequent words are fetched in parallel from different banks and are ready at the same time as the first word of the burst, so can be placed on the bus in succeeding bus cycles with no additional penalty.

Cross-bar switches

Bus processor-memory interconnects represent such a severe bottleneck in multiple processor systems that high-performance multi-processor
clusters now tend to provide cross-bar switch interconnections between processors and memory. One advantage of a cross-bar is that it provides multiple point-to-point connections between processors and banks of memory. Not only are there more links between processors and memory (increasing aggregate bandwidth) but the point-to-point links can be faster.

We'll look into this further in the general context of interconnection systems in parallel processors ..

Error Detection and Correction

Parity

Error Correcting Memory

Magnetic Memory
Back to the Table of Contents
© John Morris, 1998

e-REdING. Biblioteca de la Escuela Superior de Ingenieros de Sevilla.


IMPLEMENTACIÓN EN VHDL DEL MICROPROCESADOR ARM9

: Jurado Carmona, Francisco Javier
: Ingeniería Telecomunicación