Computer Architecture The Anatomy of Modern Processors
|
Memory
Memory Technologies
Two different technologies can be used to store bits
in semiconductor random access memory (RAM):
static static RAM and dynamic RAM.
Static RAM
Static RAM cells use 4-6 transistors to store a single bit of
data. This provides faster access times at the expense of lower bit densities. A
processor's internal memory (registers and cache) will be fabricated in static
RAM. Because the industry has focussed on mass-producing dynamic RAM in
ever-increasing densities, static RAM is usually considerably more expensive
than dynamic RAM: due both to its lower density and the smaller demand (lower
production volumes lead to higher costs!).
Static RAM is used extensively for second level cache memory, where its speed
is needed and a relatively small memory will lead to a significant increase in
performance. A high-performance 1998 processor will generally have 512kB to
4Mbyte of L2 cache.
Since it doesn't need refresh, static RAM's power consumption is much less
than dynamic RAM, SRAM's will be found in battery-powered systems. The absence
of refresh circuitry leads to slightly simpler systems, so SRAM will also be
found in very small systems, where the simplicity of the circuitry compensates
for the cost of the memory devices themselves.
Dynamic RAM
The bulk of a modern processor's memory is composed of dynamic RAM
(DRAM) chips. One of the reasons that memory access times have not reduced
as dramatically as processor speeds have increased is probably that the
memory manufacturers appear to be involved in a race to produce higher and
higher capacity chips. It seems there is considerable kudos in being first
to market with the next generation of chips. Thus density increases have
been similar to processor speed increases.
A DRAM memory cell uses a single transistor and a capacitor to store a
bit of data. Devices are reported to be in limited production which
provide 256 Mbits of storage in a single device. At the same period, CPUs
with 10 million transistors in them are considered state-of-the-art.
Regularity is certainly a major contributor to this apparent discrepancy
.. a DRAM is about as regular as it is possible to imagine any device
could be: a massive 2-D array of bit storage cells. In contrast, a CPU has
a large amount of irregular control logic. |
 A typical DRAM cell with a single MOSFET and a storage
capacitor |
Access modes
Almost all DRAMs fabricated require the address applied to
the device to be asserted in two parts: a row address and a column address. This
has a deleterious affect on the access time, but enables devices with large
numbers of bits to be fabricated with fewer pins (enabling higher densities):
the row and column addresses are applied to the same pins in row- and
column-address phases of the access.

Read access for a DRAM device, showing application of the row and column
addresses in conjunction with the row-address strobe (RAS) and column-address
strobe (CAS).
The Access Time Myth
The performance of commercial DRAMs is commonly
quoted in terms of the "access time" (tRAC in the figure), the time from the
assertion of RAS to availability of the data.
A more relevant figure when considering total system throughput is the cycle time (tRC in the figure) which is usually
about twice as long. The cycle time is the minimum time between successive
accesses to the same device and is thus the factor which determines data
throughput.
Refresh
Charge leaks slowly from the storage capacitor in a DRAM cell
and needs to be periodically refreshed: refresh times are in the ms region. When
a DRAM is being refreshed, other accesses must be "held off". This increases the
complexity of DRAM controllers (and causes SRAM to be the memory of choice in
small systems, where the cost of the refresh circuitry would outweigh the extra
cost of the SRAM chip itself) and has a small (several per cent) affect
on the effective bandwidth as the memory is effectively "off-line" for a short
time every few milliseconds.
Page mode
The bandwidth of DRAM chips can be increased by operating them
in page mode. Several column addresses are
applied for each row address.

Read access for a DRAM device operating in page mode.
Thus the overhead of asserting the address in two phases is reduced and
throughput increased. Locality of reference makes this an effective
strategy: once one location in a page is accessed, there is a high probability
that other locations in the same page will be accessed also. It also helps
filling cache lines, which will span several consecutive words in a modern
processor.
Processor-Memory Interconnect
Bus
Split Address and Data Buses
Splitting the address and data buses allows
a processor to overlap the data phase of a bus transaction with the address
phase of a following transaction - achieving faster throughput.
Interleaved Systems
By arranging memory in banks, data throughput
can be increased. Successive words of a multi-word burst are fetched from
different memory banks: this means that the access latency for a memory word to
be fetched from memory is incurred only for the first word of the burst.
Subsequent words are fetched in parallel from different banks and are ready at
the same time as the first word of the burst, so can be placed on the bus in
succeeding bus cycles with no additional penalty.
Cross-bar switches
Bus processor-memory interconnects represent such a
severe bottleneck in multiple processor systems that high-performance
multi-processor clusters now
tend to provide cross-bar switch
interconnections between processors and memory. One advantage of a cross-bar is
that it provides multiple point-to-point
connections between processors and banks of memory. Not only are there more
links between processors and memory (increasing aggregate bandwidth) but the
point-to-point links can be faster.
We'll look into this further in the general context of interconnection
systems in parallel processors ..
Error Detection and Correction
Parity
Error Correcting Memory
© John Morris, 1998