# Lecture 7: Arrays and Searching Algorithms

## Overview

- Review: algorithms
- Storing a set of related variables: arrays
- Searching through arrays, whether or not their values are in sorted order
- Other things arrays are good for

## Review

**Algorithm**: *An ordered collection of
unambiguous and effectively computable operations that, when followed
produces an observable result, and completes (halts) in a finite
amount of time.*

A couple of examples: making a sandwich, computing sums.

Instructions a computer understands:

- Definition of an algorithm:
`function(arguments) { instructions }`

- Basic operations:
- calculation:
`+,-,*,/`

- setting values:
`=`

- output:
`print`

- halt:
`return value`

- calculation:
- Conditional:
`if(condition) { instructions } else {instructions}`

- Iteration (looping):
`whie(condition) { instructions }`

The point is to define a set of simple operations that we agree are both "unambiguous" and "effectively computable", so that we can write algorithms with them.

## Arrays

Suppose you're given a telephone directory, which is a list of names and corresponding phone numbers. How do you look up "Smith, John" in the directory? How do you write a program to do that?

First off, we need a way of describing a telephone directory. In the
real world, it's a book, on whose pages are printed a (very long) list
of names and telephone numbers. The important idea is that we need to
talk about an *ordered sequence* of values.

A typical way to represent an ordered sequence is called an
*array*. Like an individual value, an array is stored in the
computer's memory, but the array holds multiple values. You can think
of an ordinary variable as being a named "box" containing a value; an
array can be thought of as a "row of boxes", containing a list of
values. To refer to an array in an algorithm, we'll give it a name,
like a variable. To indicate that a variable is an array, we will
use `var A = new array()`

Since an array is *ordered*, we need some way to tell which
value in the array is the first, which is the second, etc. To do
this, we give each box in the array a unique number: 0, 1, 2, etc. To
refer to some element of an array, we write the number of that element
in square brackets after the array name:

`A[0]` -- | the first element of the array (the one numbered 0) |

`A[1]` -- | the second element of the array (the one numbered 1) |

etc. |

The elements of an array are just like variables, in that they can contain values. In terms of our language of algorithms, this means we can write things like:

`A[0] = 25`

`foo = A[7] - 15`

`if (A[1] > A[2])`

...

We'll assume that we know how many elements are in an array (i.e.,
how many positions are available). We can write `A.length`

to mean "The number of elements in `A`

".

Note: If an array has *n* elements, the positions are 0, 1,
..., *n*-1. The textbook uses positions 1, 2, ..., *n*
(ignoring position 0).

### A scrambled telephone book, using arrays

With arrays available, one way we can represent our telephone directory is with two arrays:

`var N = new array()`

-- an array of names (the people in the directory)`var T = new array()`

-- an array of telephone numbers

Each of these arrays has the same number of entries, and for any
given entry in array `N`

, the corresponding entry in
`T`

gives the telephone number for that person. For now,
we'll assume that the directory is in some random order; we'll see
later how to take advantage of alphabetical order, and next time
how to put it in alphabetical order. For example:

`i` | `N[i]` | `T[i]` |

0 | Bob | 555-9876 |

1 | Evan | 555-1234 |

2 | Alice | 555-1234 |

3 | Dan | 555-0000 |

4 | Carla | 555-5555 |

Example: greeting everyone in the phone book.

## Sequential search

We want to design an algorithm `Search`

, that takes as
input arrays `N`

and `T`

representing a
telephone directory, along with a name `who`

to look up.
If `who`

is in the directory, the algorithm should print
their telephone number; otherwise, it should print "not found".

How do we do this? The basic idea is as follows:

- Keep a counter starting at 0, which indicates which entry in the directory we will look at next.
- Keep a variable
`found`

which is`true`

if we have found the name we are looking for, and`false`

if not.

function Search(N, T, who) { 1. var counter = 0; 2. var found = false; // && indicates "and" 3. while (found == false) && (counter < N.length) { 4. if(who == N[counter]) { 5. found = true; 6. } else { 7. counter = counter + 1; } } 8. if (found == true) { 9. print(T[counter]); // print out the phone number 10. } else { 11. print("not found"); } }

Example: Using the directory given above, what happens when you Search for "Alice" in the directory?

steps counter found N[counter] 1,2 0 False "Bob" 3,4,7 1 False "Evan" 3,4,7 2 False "Alice" 3,4,5 2 True "Alice" 3,8,9 2 True "Alice"

It prints `T[counter]`

which is `T[2]`

which
is "555-1234".

How about searching for "Barbara"?

steps counter found N[counter] 1,2 0 False "Bob" 3,4,7 1 False "Even" 3,4,7 2 False "Alice" 3,4,7 3 False "Dan" 3,4,7 4 False "Carla" 3,4,7 5 False (no entry) -- counter is now equal to N.length 3,10,11

It prints "not found" because `found`

is False.

## Other array processing algorithms

Now we know how to find a particular entry in an array. There are a lot of algorithms that have a very similar structure -- march through an array and "do something" -- find the largest element (or the smallest one), sum up the elements, count the number of elements satisfying some test, .... The book shows how to find the largest; see if you can think through the others.

It is also possible to store values *into* an array, in
addition to reading them out. For example, we could select a
"sub-directory" of phone numbers for people whose names begin with
a particular letter.

function Select(N, T, letter) { var SomeN = new array(); var SomeT = new array(); var counter = 0; var some = 0; while (counter < N.length) { if( startWith(N[counter],letter) ) { SomeN[some] = N[counter]; SomeT[some] = T[counter]; some = some + 1; } counter = counter + 1; } return (SomeN,SomeT); // "not formal" way of returning many things, see later }

The digit-wise addition of numbers, in the textbook, likewise stores into an output array the summed digits.

## Searching a sorted array

We have a search algorithm that works, regardless of how the names are ordered. But we can do better, since telephone directories are sorted in alphabetical order (how they get that way is a question for next time).

`i` | `N[i]` | `T[i]` |

0 | Alice | 555-1234 |

1 | Bob | 555-9876 |

2 | Carla | 555-5555 |

3 | Dan | 555-0000 |

4 | Evan | 555-1234 |

So, if you are looking for "Barbara" and you see "Bob", you know the entry for "Barbara" either had to be before "Bob", or it's not in the directory! We can take advantage of this with only a very small change to the original Search algorithm, by adding the following check:

function SortedSearch(N, T, who) { var counter = 0; var found = false; // && indicates "and" while (found == false) && (counter < N.length) { if(who == N[counter]) { found = true; } else { counter = counter + 1; if (N[counter] > who) { counter = N.length; } else { counter = counter + 1; } } } if (found == true) { print(T[counter]); // print out the phone number } else { print("not found"); } }

Intuitively, this captures the logical idea: "If you're searching from start to end, you haven't found the name you want, and you see a name that is after the name you want, you can give up, because the entries are in alphabetical order."

## Binary search

The algorithm for Search works, and is easy to understand, but it takes a lot of steps! Each time we go back to Step (3), the beginning of the While loop, we've progressed one step further into the list. We might end up taking as many steps as there are names. But since the list is in alphabetical order, we can do better:

- Instead of starting at the beginning of the list, start in the middle.
- If the middle entry is the one you want, stop! Otherwise, ask
yourself, is the entry I want
*before*this one, or*after*it. - If it's before, we can ignore all the entries in the second half of the directory; if it's after, we can ignore all the entries in the first half. Either way -- we can ignore half the entries!

This idea leads to a very clever algorithm called "binary search".

function BinarySearch(N, T, who) { var low = 0; var high = N.length - 1; var found = false; var mid = 0; while (found == false) && (low <= high) { mid = (low+high) / 2; // discards fractions if (N[mid] == who) { found = true; } else { if (who > N[mid]) { low = mid + 1; } else { high = mid - 1; } } } if (found == true) { print(T[mid]); } else { print("not found"); } }

Now, observe a very important property of this algorithm: each time
we test a name and it does *not* match the name we are looking
for (who), we discard (approximately) *half* the remaining
values that are to be searched. This is also the same reason that "20
questions" works -- each question helps you discard about half the
possible answers.

How many times can we do this? If you keep dividing by 2, and
discarding fractions, you will eventually get down to 1. The number
of times you can do this is the base-2 logarithm of the number of
entries you started with. log_{2}(x) is smaller than x for
all values of x > 1. This means, the number of times you will go
through the While loop in this binary search algorithm is smaller than
in the original algorithm. In fact, as the number of entries gets
bigger, the difference will become huge!