Concepts in Computing
CS4 - Winter 2007
Instructor: Fabio Pellacini

# Lecture 15: Architecture

## Background

We've been talking about how modern computers are actually constructed, and so far, we have learned about:

• Binary representation of data -- numbers, strings, etc.
• Boolean logic and digital circuits
• How to design circuits that do computations

Recall, too, that when we talked about algorithms, one key idea was that the operations must be "effectively computable". Where we're going now is to show how the digital circuits we learned about in the last few lectures can be put together in order to make a computer that is capable of taking instructions, and executing them.

At each level, notice that we are adding a layer of abstraction:

• Gates: Collections of wires and switches.
• Circuits: Collections of gates.

This is probably the most important idea in computer science (or, indeed, any science): taking a problem you've solved, and packaging it up as a "black box", so you can use it to solve other problems, without worrying about the details of how it works. (We saw the same idea with APIs, networking, modular software design, ....)

## The von Neumann Machine

John von Neumann proposed this model of computing in 1946. Prior to this point, the program was generally "hard-wired" into the computer. Von Neumann's first insight was that you could use some of the input data to tell the computer what to do. The computer would know how to do certain pre-chosen instructions, and the program would become part of the input.

The architecture consists of four major components, connected together by a bus, which is a bundle of wires allowing control signals to flow from one unit to another:

We'll look at each of these four components and how it basically works.

### Memory

Memory is where the computer stores instructions and data. It consists of an array of numbered "cells", each of which contains a block of bits. Different computers use different cell sizes, but a common value is 8 bits, often called one byte or octet. (Over the years, the word "byte" has been used for different sizes, although now it's fairly conventional.)

The number of a cell is called its address. This is a fixed-size unsigned integer. If an address on your computer has n bits, your computer can refer to 2n possible distinct addresses. This is the address space of the computer.

Address space is distinct from memory width, which is the number of bits per cell. If a cell has w bits, it can store 2w values. E.g., if w = 8, then values 0 - 255 (28 = 256 values) can be stored.

If a memory has an n bit address, it can store 2n w-bit cells. So, a memory with w = 8, n = 4 can store 24 = 16 cells of 28 = 256 values each.

For shorthand, we abuse SI notation a bit:

 bits values label 10 210 = 1024 kilo- (1KB = 1024 bytes) a typed page 20 220 = 1,048,576 mega- (1MB = 1024KB) a couple of novels 30 230 = 1,073,741,824 giga- (1GB = 1024MB) a personal library 40 240 = 1,099,511,627,776 tera- (1TB = 1024GB) a university library 50 250 = 1,125,899,906,842,624 peta- (1PB = 1024TB) all libraries in North America

There are two key operations on memory.

Fetch reads a copy of the value at the given address, without changing the value stored at that address.
Store writes a new value into the cell at the given address, replacing the previous value.

These operations use a memory address register (MAR) to indicate which address they want, and a memory data register (MDR) to get/put the value at that address.

Memory is random-access, meaning that you can access any value of the array at any time (vs. sequential access, like on a tape). Such memories are called RAM (random-access memory).

We can build a memory from circuits. It's a little tricky, but the basic requirement is a decoder circuit to select a memory location from an address. (If you're interested in how to build it, see the textbook, Sec. 4.5.) The decoder takes as input n bits. The bits represent the address, in binary. The decoder then has 2n lines as output. It will set exactly one of those lines to 1 (according to the number represented in the input), and the rest to 0. The lines select the one specified location in memory to be accessed.

While the memory above was drawn as a one-dimensional array, that simple layout doesn't scale well. A typical address space has 32 bits, thus requiring a 32-bit decoder with more than 4 billion outputs! For real memories, you do a two-dimensional trick: put your cells in a square grid, then half your address tells you what row, the other half tells you what column. E.g., 4-bit address, 16 addresses, 4 x 4 grid:

You may have heard of cache memory. The basic idea is that fetch/store operations in a typical memory take longer than do processing operations. Thus we keep a "snapshot" of some of the memory in a faster (but smaller) memory, called a cache. What do we keep in this snapshot? It turns out that once an address is used, it is likely to be used again, as will be nearby addresses (the "principle of locality"). Part of the design of a computer architecture is figuring out what to cache when. These days, there are even multiple levels of memory (e.g. smaller faster caches for larger slower caches for the main memory). To expand upon the book's analogy, your refrigerator is a cache for the store (so you don't have to leave the house to fetch common items), but if you're a true couch potato, you'll have a mini-fridge next to the couch as a cache for the main refrigerator.

### Arithmetic/Logic Unit (ALU)

The ALU is responsible for doing the work of computation: Adding, subtracting, multiplying, comparing values, etc. An ALU consists of:

• A small set of special memory locations called registers, which are used for "working space".
• A collection of circuits to compute the various functions you want: addition, comparison, etc.

Typical layout: two inputs (from any registers, or MDR for using what was read from memory), one output.

Inside the ALU, both inputs go to all computing circuits simultaneously, and a "multiplexor" circuit selects one value to actually take as the output. The ALU thus also needs input lines to tell it which function to compute.

### Control

The preceding leaves the question of how to coordinate the actions of memory and the ALU. In other words, how does the computer choose, at any given moment in time, what is going to happen?

The basic operation of a von Neumann computer works like this. We have to have a way of encoding instructions, that tell the computer what to do. The program to be executed (i.e., the ordered sequence of instructions) is stored in the computer's memory (don't worry about how it got there for now, we'll come back to that). Then the following steps take place:

• There is a register, called the program counter (PC), which gives the address of the next instruction.
• The instruction is fetched (read from memory)
• The instruction is decoded (to figure out what it means)
• The instruction is executed (the ALU computes the required values and the results are stored in memory or in an ALU register)

This "fetch-decode-execute" cycle is the core of what your computer is doing as it runs. Making all this happen in the correct sequence is the role of the control unit.

So what do the instructions look like? We need to specify both the identity of the operation (the opcode), as well as what it operates on (the operands). For regularity, each instruction occupies some number of bits, and lists first the opcode and then the operands. There are four general classes of computer instructions:

• Arithmetic: ADD, SUBTRACT, INCREMENT, DECREMENT
• Comparison: COMPARE
A typical ALU has "condition registers" (e.g., GT, EQ, and LT) that are set by the compare instruction, depending on the value:
• COMPARE 5, 3
GT = 1, EQ = 0, LT = 0
• COMPARE 7, 7
GT = 0, EQ = 1, LT = 0
• COMPARE -3, 5
GT = 0, EQ = 0, LT = 1
• Branches: JUMP, JUMPGT, JUMPEQ, HALT
Usually, you execute instructions in sequence, one after the other. But a program also needs to be able to skip around, so that it can do things like loops, "if" statements, etc. Also we'd like to be able to stop in a finite amount of time!
• Data transfer: LOAD, STORE, MOVE
Move data back and forth between Memory, ALU registers.
MEM to ALU, ALU to MEM, MEM to MEM, ALU to ALU

The control unit consists of a program counter (PC) plus an instruction register (IR), that holds the current instruction being executed. The opcode is fed into an instruction decoder that generates a group of signals that drive the rest of the units:

• Fetching and storing from memory
• Choosing operands for the ALU
• Choosing operations for the ALU
• Storing results into registers or memory
• Choosing the next instruction

### Input and Output (I/O)

There are many different types of devices: Hard disks, tapes, network cards, printers, displays, mice, keyboards, etc. Some are persistent storage (e.g., disks, tapes); these let you save things when the power is off. Memory, by contrast, goes dead and forgets when the power is cut. Others are transient, like displays mice, keyboards, etc.

The unifying characteristic that puts all these I/O devices where they are in the von Neumann architecture is that they are typically hundreds or thousands of times slower than the processor itself.

They all have different access characteristics; some are random access (like disks), others are sequential access (like tapes), some are read-only (like CD-ROM's), and still others are "stream" devices like network cards. For this reason, instead of having the processor communicate directly with the hardware, it's typical to have a special-purpose computer called an "I/O controller" for each device. Manufacturers agree upon a set of standards for I/O controllers: how they connect to the processor bus (number and type of wires), how commands are sent to the I/O controller by the main processor, how (if) data are read back.

Because I/O devices are so slow, you don't want the main processor to have to wait. So in many devices, the processor sendss a command to the I/O controller along the bus, then continues its work. The I/O controller does its thing, and when it's done, it "interrupts" the CPU to let it know the operation is complete.