# Lecture 13: Binary

## Overview

Up to this point, we have spoken about how computers compute in fairly abstract terms. We've assumed computers can deal with such concepts as numbers and strings of characters, without worrying at all about how these things are actually represented inside the guts of the machine.

For the next few lectures, we are going to be diving down into the inner world of modern computers, to get a snorkeler's view of how they actually work. So, beginning today, we are no longer happy with the view that computers deal with numbers, strings, arrays, objects, etc. -- unless we can figure out how they are represented inside the machine.

## Looking Again at Decimal Numbers

Since the main reason computing machines were invented in the first place was to simplify arithmetic, the first thing we need to know is how to represent numbers. We humans represent numbers in a variety of different ways, but at this point in our history, almost everyone uses what's called "positional notation" for numbers:

- Integers: 0, +5332, -4, etc.
- Fractions: 1.5, -6.67, 1971.354, etc.

The basic idea is that there is some collection of digits (in our case, ten of them: 0, 1, 2, 3, ..., 9), and the value of a digit depends upon not only its own intrinsic value, but also its position in the number. Example, 6174:

pos | value | * | digit | = | digit value |

0 | 1 | 4 | 4 | ||

1 | 10 | 7 | 70 | ||

2 | 100 | 1 | 100 | ||

3 | 1000 | 6 | 6000 |

The sum of the values of all the digits is the number
represented:

6000 + 100 + 70 + 4 = 6174.

Note that the value of each position is some power of 10. Since there are ten digits, we call this a base 10 or "decimal" numbering system.

pos | value | |

0 | 1 | (10^{0}) |

1 | 10 | (10^{1}) |

2 | 100 | (10^{2}) |

3 | 1000 | (10^{4}) |

4 | 10000 | (10^{5}) |

etc. |

In principle, there is no reason why we couldn't represent numbers in the computer using decimal numbering. But in practice, that turns out to be hard: Building an electronic circuit that can reliably distinguish between ten different pressure (voltage) values is not an easy task. Since reliability and accuracy are the whole point of building a computer in the first place, that's a problem.

On the other hand, it's quite easy to build circuits that can
manage to tell the difference between "low pressure" (low voltage) and
"high pressure" (high voltage). For this reason, computers are built
using a number system with **two** digit values: 0, and
1. Such a system is called **binary**.

## Introduction to Binary

Binary numbers work just like decimal numbers -- it's a positional numbering system -- but now there are only 2 digits instead of 10. So now our positions have values that are powers of 2, instead of powers of 10:

pos | value | |

0 | 1 | (2^{0}) |

1 | 2 | (2^{1}) |

2 | 4 | (2^{2}) |

3 | 8 | (2^{4}) |

4 | 16 | (2^{5}) |

etc. |

... and 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, ...

"Binary digits" are usually called "bits" (a contraction of BInary digiTS). Just as a decimal number with (say) four digits is called a 4-digit number, a binary number with four bits is called a 4-bit number.

You read a binary number with the same technique you use for decimal numbers. Example, 1011:

pos | value | * | digit | = | digit value |

0 | 1 | 1 | 1 | ||

1 | 2 | 1 | 2 | ||

2 | 4 | 0 | 0 | ||

3 | 8 | 1 | 8 |

The sum of the values of all the digits is the number
represented:

8 + 0 + 2 + 1 = 11

Note: Sometimes you have to be careful to specify which base you are writing in. 1011 is a valid binary number, but it is also a valid decimal number! Here in the notes, I will use the notation #b(10110) to mean "10110 in binary" and #d(10110) to mean "10110 in decimal", in case there is any ambiguity.

What we just did is convert a binary number to a decimal number. Similarly, you can convert decimal numbers to binary. There are a number of ways to do this, but here is one:

to convertToDecimal given v 1. If v is zero, just write "0". You're done. 2. Find the largest integer p such that 2^p ≤ v 3. Set d to 0 4. While p ≥ 0, do 5. If 2^p ≤ v 6. Write a "1" 7. Subtract 2^p from v 8. Else 9. Write a "0" 10. Subtract 1 from p 11. end loop.

Try this with *v* = 38:

v | p | 2^{p} | output |

38 | 5 | 32 | 1 |

6 | 4 | 16 | 10 |

6 | 3 | 8 | 100 |

6 | 2 | 4 | 1001 |

2 | 1 | 2 | 10011 |

0 | 0 | 1 | 100110 |

Another way to do it is as follows:

1. While true 2. If v is even, then 3. Write a "0" at the front of the output 4. Else 5. Write a "1" at the front of the output 6. If v is zero, then 7. Halt and return output 8. Else 9. Divide v by 2, discarding fractions 10. End loop.

## Handling Negatives

These methods work as long as your values are zero or positive.
What about if your values can be negative? In human notation, we just
write a "-" in front, to indicate the *sign* of the number.
(We do sometimes also write a "+" for positives, although usually we
just assume it).

The same trick can be used to write negatives in binary, except now
instead of writing "+" or "-" for the sign, we can write "0" to mean
"positive" and "1" to mean "negative". You can then write the rest of
the value as the *absolute value* of your number.

This is called *signed magnitude* representation:

-12 = 1 1100 | |||| | value of twelve in binary | sign (0 = zero/positive, 1 = negative)

By contrast, positive 12 would be written 01100. This basically just mirrors how we write numbers on paper.

Question: What does #b(110011) represent?

Answer: It depends: Is this a regular unsigned binary value? Or is
it signed magnitude?

Unsigned: 32 + 16 + 2 + 1 = 51

Signed Mag: (negative) 16 + 2 + 1 = -19

Key idea: when interpreting a binary number, in addition to knowing
the bits, you also have to know *what representation* to
interpret it against.

## Representing Values with Fractions

With what we have discussed so far, we can represent unsigned natural numbers, and signed integers (using signed magnitude). But what about fractional values?

Humans write fractional values like this: +553.884001

Let's look at what we've got here:

- A sign, saying whether the number is positive or negative
- A whole (integer) part, before the decimal point
- A fractional part, after the decimal point

But there is another way to look at it, which is more useful for computing: We need to know

- What the sign is
- What the digits are
- Where the decimal point goes

This is the idea behind scientific notation. Instead of giving the
whole part and the fractional part separately, you represent all the
digits together, and then separately write down where the decimal
point belongs. In SN, +553.884001 is usually written as +5.53884001 x
10^{2}. This representation includes a *sign*, a
*mantissa* (the digits, 5.53884001), and an *exponent*
(2). By convention, we usually assume that the decimal point starts
out between the first and second digits of the value. The exponent
tells us how many positions to the left or right to move it, depending
whether the exponent is negative (left) or positive (right).

The same trick works with binary:

- We can encode the mantissa as a regular unsigned integer.
- We can encode the exponent as a signed integer.
- We can encode the sign of the whole value as a single bit.

Here's an example:

|-- sign 0 11011100 000011 - -------- ------ mantissa exponent (unsigned) (signed magnitude)

The mantissa is "1.1011100". The exponent is #b(11) = #d(3), so
this corresponds to 1.1011100 x 2^{3} = 1101.1100

We interpret the values after the binary point as we do in decimal, but with negative powers of 2 instead of negative powers of 10:

1101 . 1100 ---- || 13 || + 1| x 2^-1 = 1/2 = 0.5 + 1 x 2^-2 = 1/4 = 0.25

The total is 13.75.

This is called *floating point* representation. In order to make it
work, we have to choose:

- How many bits to represent the mantissa
- How many bits to represent the exponent

Once we have chosen this, we know how to read a number in floating point format: If we are given the bits 010111000000011 and we are told "this is a floating-point number with a sign bit, an 8-bit mantissa, and a 6-bit signed-magnitude exponent", we know to break it up as follows:

0 | sign of number |

10111000 | mantissa |

0 | sign of exponent |

00011 | magnitude of exponent |

Converting a decimal value to floating point is not too hard. Let's do -5.625 as an example, assuming an 8-bit mantissa and 6-bit exponent:

- Convert the sign to 0/1.

This value is negative, so we write 1 for the sign. - Convert the integer part to unsigned:

#d(5) = #b(101) - To convert the fractional part, consider each negative power of 2
in turn:

2 ^{-1}2 ^{-2}2 ^{-3}2 ^{-4}0.5 0.25 0.125 0.0625 1 0 1 0 - Put them together with a binary point between:

101.101 - Move the decimal point so that it is immediately to the right of
the first bit of the mantissa. To do this, you will need to adjust
the exponent accordingly (increase to move left, decrease to move
right):

101.101 x 2^{0}=

10.1101 x 2^{1}=

1.01101 x 2^{2} - Convert the exponent to signed magnitude:

0 00010 - Glue the whole thing together, padding the mantissa with extra
zeroes to fill it up to 8 bits total:

1 101101 000010

Note: this is one representation of floating point numbers. There are others. For example, the textbook moves the decimal point all the way to the left (rather than leaving one digit to the left of the decimal point). The IEEE standard floating point representation is another. As mentioned above, then, a bunch of bits by itself is meaningless -- you must indicate how they are to be interepreted.

Floating point representations have many problems that we won't be able to cover here, but I did want to warn you about. For example, you're familiar with how 1/3 has a repeating decimal representation as 0.333333.... Of course, we have a finite number of digits in a computer, so that's already a problem. A related problem is that even some numbers that have a small number of digits (in decimal) require an infinite number of bits; e.g. #d(0.1) = #b(0.00011000110001100011...). By truncating this to only a fixed number of bits, we've introduced some error. The error can show up if, for example, we start adding floating point numbers together. Adding 0.1 to a running total 10 times might not yield exactly 1!

## Representing Characters and Strings

Although numbers are a computer's primary concern, we also want computers to be able to understand letters, punctuation, etc. Rather than invent something new, we'll just assign each character a number, and the computer will use those numbers to represent the characters. The most common encoding in use today is the American Standard Code for Information Interchange (ASCII). In ASCII, each character is encoded as an 8-bit binary number. For instance:

binary | decimal | character |

... | ||

00001001 | 9 | (tab) |

... | ||

00100000 | 32 | (space) |

... | ||

00110110 | 46 | . (period) |

... | ||

00110000 | 48 | 0 (zero) |

00110001 | 49 | 1 (one) |

... | ||

01000001 | 65 | A (capital A) |

01000010 | 66 | B |

01000011 | 67 | C |

... | ||

01100001 | 97 | a (lowercase a) |

01100010 | 98 | b |

01100011 | 99 | c |

... |

Treating characters as numbers internally has some big advantages. For example, if you want to compare two characters to see which one is "first" in dictionary order, you can just compare their ASCII values. The ASCII values were carefully chosen so that all the letters and digits are in the correct order; all the digits are before any of the letters, and all the capital letters are before the lower case letters.

Unfortunately, ASCII was designed without taking languages other than English into account. So, there are no ASCII encodings for accented characters, other alphabets like Greek and Cyrillic, or any of the non-alphabetic writing systems like Hanze/Kanji/Hangul, Hiragana and Katakana, Indic alphabets, etc. For this reason, a new encoding standard called "Unicode" has been developed to systematically encode every character used by every language spoken throughout the world (including some of the artificial ones!)

Unicode values are generally longer than 8 bits, but the principle behind Unicode is the same as ASCII: Each character is assigned a unique unsigned integer value, and those values are what the computer carries around in its memory.

A string can be encoded then as just a list of characters. To indicate how many characters there are, one can have a special character at the end, indicating that it is the end. Alternatively, one can start the string with a number that says how many characters are following.

## Other Kinds of Data

Once you have the idea for encoding numbers, you can encode
virtually anything you want in binary -- it simply becomes a matter of
choosing a simple rule for how to convert back and forth between a
binary representation and whatever conventional human representation
you want to use. We saw how to encode a color by writing its red,
green, and blue values as unsigned integers in some range. Then we
can encode a picture by encoding the color of each of the dots
(*pixels*) in the picture, as a grid. If the picture has
*m* rows and *n* columns, you can write each row as *n*
RGB values. Repeat that for each of the *m* rows, for a total of
*m* * *n* RGB values. Since the image size can vary, we can
write the dimensions of the image at the front as unsigned
integers.