Concepts in Computing
CS4 - Winter 2007
Instructor: Fabio Pellacini

Lecture 13: Binary

Overview

Up to this point, we have spoken about how computers compute in fairly abstract terms. We've assumed computers can deal with such concepts as numbers and strings of characters, without worrying at all about how these things are actually represented inside the guts of the machine.

For the next few lectures, we are going to be diving down into the inner world of modern computers, to get a snorkeler's view of how they actually work. So, beginning today, we are no longer happy with the view that computers deal with numbers, strings, arrays, objects, etc. -- unless we can figure out how they are represented inside the machine.

Looking Again at Decimal Numbers

Since the main reason computing machines were invented in the first place was to simplify arithmetic, the first thing we need to know is how to represent numbers. We humans represent numbers in a variety of different ways, but at this point in our history, almost everyone uses what's called "positional notation" for numbers:

  • Integers: 0, +5332, -4, etc.
  • Fractions: 1.5, -6.67, 1971.354, etc.

The basic idea is that there is some collection of digits (in our case, ten of them: 0, 1, 2, 3, ..., 9), and the value of a digit depends upon not only its own intrinsic value, but also its position in the number. Example, 6174:

posvalue*digit=digit value
0144
110770
21001100
3100066000

The sum of the values of all the digits is the number represented:
6000 + 100 + 70 + 4 = 6174.

Note that the value of each position is some power of 10. Since there are ten digits, we call this a base 10 or "decimal" numbering system.

posvalue
01(100)
110(101)
2100(102)
31000(104)
410000(105)
etc.

In principle, there is no reason why we couldn't represent numbers in the computer using decimal numbering. But in practice, that turns out to be hard: Building an electronic circuit that can reliably distinguish between ten different pressure (voltage) values is not an easy task. Since reliability and accuracy are the whole point of building a computer in the first place, that's a problem.

On the other hand, it's quite easy to build circuits that can manage to tell the difference between "low pressure" (low voltage) and "high pressure" (high voltage). For this reason, computers are built using a number system with two digit values: 0, and 1. Such a system is called binary.

Introduction to Binary

Binary numbers work just like decimal numbers -- it's a positional numbering system -- but now there are only 2 digits instead of 10. So now our positions have values that are powers of 2, instead of powers of 10:

posvalue
01(20)
12(21)
24(22)
38(24)
416(25)
etc.

... and 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, ...

"Binary digits" are usually called "bits" (a contraction of BInary digiTS). Just as a decimal number with (say) four digits is called a 4-digit number, a binary number with four bits is called a 4-bit number.

You read a binary number with the same technique you use for decimal numbers. Example, 1011:

posvalue*digit=digit value
0111
1212
2400
3818

The sum of the values of all the digits is the number represented:
8 + 0 + 2 + 1 = 11

Note: Sometimes you have to be careful to specify which base you are writing in. 1011 is a valid binary number, but it is also a valid decimal number! Here in the notes, I will use the notation #b(10110) to mean "10110 in binary" and #d(10110) to mean "10110 in decimal", in case there is any ambiguity.

What we just did is convert a binary number to a decimal number. Similarly, you can convert decimal numbers to binary. There are a number of ways to do this, but here is one:

to convertToDecimal given v
1. If v is zero, just write "0".  You're done.

2. Find the largest integer p such that 2^p ≤ v

3. Set d to 0
4. While p ≥ 0, do
5.   If 2^p ≤ v
6.     Write a "1"
7.     Subtract 2^p from v
8.   Else
9.     Write a "0"
10.  Subtract 1 from p
11. end loop.

Try this with v = 38:

vp2poutput
385321
641610
638100
6241001
21210011
001100110

Another way to do it is as follows:

1.  While true
2.    If v is even, then
3.      Write a "0" at the front of the output
4.    Else
5.      Write a "1" at the front of the output

6.    If v is zero, then
7.      Halt and return output
8.    Else
9.      Divide v by 2, discarding fractions
10. End loop.

Handling Negatives

These methods work as long as your values are zero or positive. What about if your values can be negative? In human notation, we just write a "-" in front, to indicate the sign of the number. (We do sometimes also write a "+" for positives, although usually we just assume it).

The same trick can be used to write negatives in binary, except now instead of writing "+" or "-" for the sign, we can write "0" to mean "positive" and "1" to mean "negative". You can then write the rest of the value as the absolute value of your number.

This is called signed magnitude representation:

  -12  =  1 1100
          | ||||
          | value of twelve in binary
          |
          sign (0 = zero/positive, 1 = negative)

By contrast, positive 12 would be written 01100. This basically just mirrors how we write numbers on paper.

Question: What does #b(110011) represent?

Answer: It depends: Is this a regular unsigned binary value? Or is it signed magnitude?
Unsigned: 32 + 16 + 2 + 1 = 51
Signed Mag: (negative) 16 + 2 + 1 = -19

Key idea: when interpreting a binary number, in addition to knowing the bits, you also have to know what representation to interpret it against.

Representing Values with Fractions

With what we have discussed so far, we can represent unsigned natural numbers, and signed integers (using signed magnitude). But what about fractional values?

Humans write fractional values like this: +553.884001

Let's look at what we've got here:

  • A sign, saying whether the number is positive or negative
  • A whole (integer) part, before the decimal point
  • A fractional part, after the decimal point

But there is another way to look at it, which is more useful for computing: We need to know

  • What the sign is
  • What the digits are
  • Where the decimal point goes

This is the idea behind scientific notation. Instead of giving the whole part and the fractional part separately, you represent all the digits together, and then separately write down where the decimal point belongs. In SN, +553.884001 is usually written as +5.53884001 x 102. This representation includes a sign, a mantissa (the digits, 5.53884001), and an exponent (2). By convention, we usually assume that the decimal point starts out between the first and second digits of the value. The exponent tells us how many positions to the left or right to move it, depending whether the exponent is negative (left) or positive (right).

The same trick works with binary:

  • We can encode the mantissa as a regular unsigned integer.
  • We can encode the exponent as a signed integer.
  • We can encode the sign of the whole value as a single bit.

Here's an example:

  
  |-- sign
  0 11011100  000011
  - --------  ------
    mantissa  exponent
   (unsigned) (signed magnitude)

The mantissa is "1.1011100". The exponent is #b(11) = #d(3), so this corresponds to 1.1011100 x 23 = 1101.1100

We interpret the values after the binary point as we do in decimal, but with negative powers of 2 instead of negative powers of 10:

1101 . 1100
----   ||
 13    ||
     + 1| x 2^-1  = 1/2 = 0.5
     +  1 x 2^-2  = 1/4 = 0.25

The total is 13.75.

This is called floating point representation. In order to make it work, we have to choose:

  • How many bits to represent the mantissa
  • How many bits to represent the exponent

Once we have chosen this, we know how to read a number in floating point format: If we are given the bits 010111000000011 and we are told "this is a floating-point number with a sign bit, an 8-bit mantissa, and a 6-bit signed-magnitude exponent", we know to break it up as follows:

0sign of number
10111000mantissa
0sign of exponent
00011magnitude of exponent

Converting a decimal value to floating point is not too hard. Let's do -5.625 as an example, assuming an 8-bit mantissa and 6-bit exponent:

  1. Convert the sign to 0/1.
    This value is negative, so we write 1 for the sign.
  2. Convert the integer part to unsigned:
    #d(5) = #b(101)
  3. To convert the fractional part, consider each negative power of 2 in turn:
    2-12-22-32-4
    0.50.250.1250.0625
    1010
    The sum is 0.5 + 0 + 0.125 = 0.625
  4. Put them together with a binary point between:
    101.101
  5. Move the decimal point so that it is immediately to the right of the first bit of the mantissa. To do this, you will need to adjust the exponent accordingly (increase to move left, decrease to move right):
    101.101 x 20 =
    10.1101 x 21 =
    1.01101 x 22
  6. Convert the exponent to signed magnitude:
    0 00010
  7. Glue the whole thing together, padding the mantissa with extra zeroes to fill it up to 8 bits total:
    1 101101 000010

Note: this is one representation of floating point numbers. There are others. For example, the textbook moves the decimal point all the way to the left (rather than leaving one digit to the left of the decimal point). The IEEE standard floating point representation is another. As mentioned above, then, a bunch of bits by itself is meaningless -- you must indicate how they are to be interepreted.

Floating point representations have many problems that we won't be able to cover here, but I did want to warn you about. For example, you're familiar with how 1/3 has a repeating decimal representation as 0.333333.... Of course, we have a finite number of digits in a computer, so that's already a problem. A related problem is that even some numbers that have a small number of digits (in decimal) require an infinite number of bits; e.g. #d(0.1) = #b(0.00011000110001100011...). By truncating this to only a fixed number of bits, we've introduced some error. The error can show up if, for example, we start adding floating point numbers together. Adding 0.1 to a running total 10 times might not yield exactly 1!

Representing Characters and Strings

Although numbers are a computer's primary concern, we also want computers to be able to understand letters, punctuation, etc. Rather than invent something new, we'll just assign each character a number, and the computer will use those numbers to represent the characters. The most common encoding in use today is the American Standard Code for Information Interchange (ASCII). In ASCII, each character is encoded as an 8-bit binary number. For instance:

binarydecimalcharacter
...
000010019(tab)
...
0010000032(space)
...
0011011046. (period)
...
00110000480 (zero)
00110001491 (one)
...
0100000165A (capital A)
0100001066B
0100001167C
...
0110000197a (lowercase a)
0110001098b
0110001199c
...

Treating characters as numbers internally has some big advantages. For example, if you want to compare two characters to see which one is "first" in dictionary order, you can just compare their ASCII values. The ASCII values were carefully chosen so that all the letters and digits are in the correct order; all the digits are before any of the letters, and all the capital letters are before the lower case letters.

Unfortunately, ASCII was designed without taking languages other than English into account. So, there are no ASCII encodings for accented characters, other alphabets like Greek and Cyrillic, or any of the non-alphabetic writing systems like Hanze/Kanji/Hangul, Hiragana and Katakana, Indic alphabets, etc. For this reason, a new encoding standard called "Unicode" has been developed to systematically encode every character used by every language spoken throughout the world (including some of the artificial ones!)

Unicode values are generally longer than 8 bits, but the principle behind Unicode is the same as ASCII: Each character is assigned a unique unsigned integer value, and those values are what the computer carries around in its memory.

A string can be encoded then as just a list of characters. To indicate how many characters there are, one can have a special character at the end, indicating that it is the end. Alternatively, one can start the string with a number that says how many characters are following.

Other Kinds of Data

Once you have the idea for encoding numbers, you can encode virtually anything you want in binary -- it simply becomes a matter of choosing a simple rule for how to convert back and forth between a binary representation and whatever conventional human representation you want to use. We saw how to encode a color by writing its red, green, and blue values as unsigned integers in some range. Then we can encode a picture by encoding the color of each of the dots (pixels) in the picture, as a grid. If the picture has m rows and n columns, you can write each row as n RGB values. Repeat that for each of the m rows, for a total of m * n RGB values. Since the image size can vary, we can write the dimensions of the image at the front as unsigned integers.