Concepts in Computing
CS4 - Winter 2007
Instructor: Fabio Pellacini

Lecture 7: Arrays and Searching Algorithms

Overview

  • Review: algorithms
  • Storing a set of related variables: arrays
  • Searching through arrays, whether or not their values are in sorted order
  • Other things arrays are good for

Review

Algorithm: An ordered collection of unambiguous and effectively computable operations that, when followed produces an observable result, and completes (halts) in a finite amount of time.

A couple of examples: making a sandwich, computing sums.

Instructions a computer understands:

  • Definition of an algorithm: function(arguments) { instructions }
  • Basic operations:
    • calculation: +,-,*,/
    • setting values: =
    • output:print
    • halt: return value
  • Conditional: if(condition) { instructions } else {instructions}
  • Iteration (looping): whie(condition) { instructions }

The point is to define a set of simple operations that we agree are both "unambiguous" and "effectively computable", so that we can write algorithms with them.

Arrays

Suppose you're given a telephone directory, which is a list of names and corresponding phone numbers. How do you look up "Smith, John" in the directory? How do you write a program to do that?

First off, we need a way of describing a telephone directory. In the real world, it's a book, on whose pages are printed a (very long) list of names and telephone numbers. The important idea is that we need to talk about an ordered sequence of values.

A typical way to represent an ordered sequence is called an array. Like an individual value, an array is stored in the computer's memory, but the array holds multiple values. You can think of an ordinary variable as being a named "box" containing a value; an array can be thought of as a "row of boxes", containing a list of values. To refer to an array in an algorithm, we'll give it a name, like a variable. To indicate that a variable is an array, we will use var A = new array()

Since an array is ordered, we need some way to tell which value in the array is the first, which is the second, etc. To do this, we give each box in the array a unique number: 0, 1, 2, etc. To refer to some element of an array, we write the number of that element in square brackets after the array name:

A[0]--the first element of the array (the one numbered 0)
A[1]--the second element of the array (the one numbered 1)
etc.

The elements of an array are just like variables, in that they can contain values. In terms of our language of algorithms, this means we can write things like:

  • A[0] = 25
  • foo = A[7] - 15
  • if (A[1] > A[2]) ...

We'll assume that we know how many elements are in an array (i.e., how many positions are available). We can write A.length to mean "The number of elements in A".

Note: If an array has n elements, the positions are 0, 1, ..., n-1. The textbook uses positions 1, 2, ..., n (ignoring position 0).

A scrambled telephone book, using arrays

With arrays available, one way we can represent our telephone directory is with two arrays:

  • var N = new array() -- an array of names (the people in the directory)
  • var T = new array() -- an array of telephone numbers

Each of these arrays has the same number of entries, and for any given entry in array N, the corresponding entry in T gives the telephone number for that person. For now, we'll assume that the directory is in some random order; we'll see later how to take advantage of alphabetical order, and next time how to put it in alphabetical order. For example:

iN[i]T[i]
0Bob555-9876
1Evan555-1234
2Alice555-1234
3Dan555-0000
4Carla555-5555

Example: greeting everyone in the phone book.

Sequential search

We want to design an algorithm Search, that takes as input arrays N and T representing a telephone directory, along with a name who to look up. If who is in the directory, the algorithm should print their telephone number; otherwise, it should print "not found".

How do we do this? The basic idea is as follows:

  • Keep a counter starting at 0, which indicates which entry in the directory we will look at next.
  • Keep a variable found which is true if we have found the name we are looking for, and false if not.
Here is the pseudocode for our algorithm, where I used numbers to indicate instructions.
function Search(N, T, who) {
 1.  var counter = 0;
 2.  var found = false;

     // && indicates "and"
 3.  while (found == false) && (counter < N.length) {
 4.    if(who == N[counter]) {
 5.      found = true;
 6.    } else {
 7.      counter = counter + 1;
       }
     }

 8.  if (found == true) {
 9.     print(T[counter]);    // print out the phone number
10.  } else {
11.     print("not found");
     }
}

Example: Using the directory given above, what happens when you Search for "Alice" in the directory?

steps  counter  found   N[counter]
1,2      0      False   "Bob"
3,4,7    1      False   "Evan"
3,4,7    2      False   "Alice"
3,4,5    2      True    "Alice"
3,8,9    2      True    "Alice"

It prints T[counter] which is T[2] which is "555-1234".

How about searching for "Barbara"?

steps  counter  found   N[counter]
1,2      0      False   "Bob"
3,4,7    1      False   "Even"
3,4,7    2      False   "Alice"
3,4,7    3      False   "Dan"
3,4,7    4      False   "Carla"
3,4,7    5      False   (no entry)   -- counter is now equal to N.length
3,10,11

It prints "not found" because found is False.

Other array processing algorithms

Now we know how to find a particular entry in an array. There are a lot of algorithms that have a very similar structure -- march through an array and "do something" -- find the largest element (or the smallest one), sum up the elements, count the number of elements satisfying some test, .... The book shows how to find the largest; see if you can think through the others.

It is also possible to store values into an array, in addition to reading them out. For example, we could select a "sub-directory" of phone numbers for people whose names begin with a particular letter.

function Select(N, T, letter) {
    var SomeN = new array();
    var SomeT = new array();
    var counter = 0;
    var some = 0;

    while (counter < N.length) {
        if( startWith(N[counter],letter) ) {
            SomeN[some] = N[counter];
            SomeT[some] = T[counter];
            some = some + 1;
        }
        counter = counter + 1;
    }
    
    return (SomeN,SomeT); // "not formal" way of returning many things, see later
}

The digit-wise addition of numbers, in the textbook, likewise stores into an output array the summed digits.

Searching a sorted array

We have a search algorithm that works, regardless of how the names are ordered. But we can do better, since telephone directories are sorted in alphabetical order (how they get that way is a question for next time).

iN[i]T[i]
0Alice555-1234
1Bob555-9876
2Carla555-5555
3Dan555-0000
4Evan555-1234

So, if you are looking for "Barbara" and you see "Bob", you know the entry for "Barbara" either had to be before "Bob", or it's not in the directory! We can take advantage of this with only a very small change to the original Search algorithm, by adding the following check:

function SortedSearch(N, T, who) {
    var counter = 0;
    var found = false;

    // && indicates "and"
    while (found == false) && (counter < N.length) {
       if(who == N[counter]) {
           found = true;
       } else {
           counter = counter + 1;
           if (N[counter] > who) {
               counter = N.length;
           } else {
               counter = counter + 1;
           }
       }
    }

    if (found == true) {
        print(T[counter]);    // print out the phone number
    } else {
       print("not found");
    }
}

Intuitively, this captures the logical idea: "If you're searching from start to end, you haven't found the name you want, and you see a name that is after the name you want, you can give up, because the entries are in alphabetical order."

Binary search

The algorithm for Search works, and is easy to understand, but it takes a lot of steps! Each time we go back to Step (3), the beginning of the While loop, we've progressed one step further into the list. We might end up taking as many steps as there are names. But since the list is in alphabetical order, we can do better:

  • Instead of starting at the beginning of the list, start in the middle.
  • If the middle entry is the one you want, stop! Otherwise, ask yourself, is the entry I want before this one, or after it.
  • If it's before, we can ignore all the entries in the second half of the directory; if it's after, we can ignore all the entries in the first half. Either way -- we can ignore half the entries!

This idea leads to a very clever algorithm called "binary search".

function BinarySearch(N, T, who) {
    var low = 0;
    var high = N.length - 1;
    var found = false;
    var mid = 0;

    while (found == false) && (low <= high) {
        mid = (low+high) / 2; // discards fractions
        if (N[mid] == who) {
            found = true;
        } else {
            if (who > N[mid]) {
                low = mid + 1;
            } else {
                high = mid - 1;
            }
        }
    }

    if (found == true) {
        print(T[mid]);
    } else {
        print("not found");
    }
}

Now, observe a very important property of this algorithm: each time we test a name and it does not match the name we are looking for (who), we discard (approximately) half the remaining values that are to be searched. This is also the same reason that "20 questions" works -- each question helps you discard about half the possible answers.

How many times can we do this? If you keep dividing by 2, and discarding fractions, you will eventually get down to 1. The number of times you can do this is the base-2 logarithm of the number of entries you started with. log2(x) is smaller than x for all values of x > 1. This means, the number of times you will go through the While loop in this binary search algorithm is smaller than in the original algorithm. In fact, as the number of entries gets bigger, the difference will become huge!

Log 1-10

Log 1-100