Concepts in Computing
CS4 - Winter 2007
Instructor: Fabio Pellacini

Lecture 18: Artificial Intelligence

Overview

Earlier this summer, there was a big celebration (AI50) of the 50th anniversary of the 1956 "Dartmouth Summer Research Project on Artificial Intelligence". The term "artificial intelligence" was coined for that 1956 meeting, and the workshop brought together the founders of the field, to lay out research agendas for the creation of machines that display human-like intelligence.

What is intelligence? Is it conceivable that a sufficiently complex computer could be made, that would be capable of human-like thought? What (if anyhing) would make you say that a machine is intelligent? (What makes you think that people are intelligent?)

Turing Test

Perhaps it's easier to define intelligence "operationally" -- what are some tasks that require intelligence? That was the approach taken by Alan Turing in his 1950 paper, "Computing Machinery and Intelligence", published in the journal Mind.

Turing started with a thought experiment: "The imitation game". The first player(A) is male, the second player (B) is female, and the third player (C) is the tester, trying to determine which is male and which is female. (Assume they communicate through a chat program or something, so that physical appearance and handwriting are not considered.) C is allowed to ask any questions of the two players. A's goal is to fool C (and thus A gives misleading information), while B's goal is to help C (and thus B gives useful information).

Now, suppose we replace player A in this game with a computer, and ask C to decide which is a human and which is a computer. Turing proposed that this is how we will answer the question of whether machines can think. This is often referred to as a "Turing Test".

There is actually a cash prize for someone who can write a program that "passes" a Turing Test. One example: ALICE. There are other interesting Turing test variants; e.g., "captcha"s (the little puzzles you have to solve to sign up for an account). Chess playing was perhaps seen as a type of Turing test (at least until Deep Blue). How about driving a car? David Cope did some tests to see if people could determine which pieces were composed by his program vs. by human composers (and his composer friends were upset when their compositions were decided to be the machine-generated ones).

Eliza and friends

Joseph Weizenbaum developed Eliza (named after Eliza Doolittle, character from "My Fair Lady" and "Pygmalion") in 1966. Eliza imitates a classical Rogerian psychotherapist, who acts like a mirror. Weizenbaum meant Eliza as a parody, and was disappointed to find people actually seriously interacting with Eliza.

Eliza (and lots of her chatterbot descendants) is based on a very simple "pattern matching" approach (e.g., see the Peter Norvig's implementation):

You sayEliza responds
... I remember [y]. Do often think of [y]?
Does thinking of [y] bring anything else to mind?
Why do you recall [y] right now?
... are you [y]? Why are you interested in whether I am [y] or not?
Would you prefer if I weren't [y]?
Perhaps I am [y] in your fantasies.

There is some basic processing to change "you" to "I", get the tense right, etc. This approach can generate some psychotherapist-like conversations.

Modern-day natural language processing, machine translation, etc. work quite differently from this. (In fact, they're closer to the spam filter idea discussed below.) The focus on simply trying to converse seems too directed toward a surface-level interpretation of the Turing test (can we fool someone) and not the intent (in order to reasonably converse, the machine must have broad knowledge, must learn, must interpret sentences, etc.).

Game Playing

At the heart of computer game playing algorithms is the notion of game-tree search. The basic idea is to look at all available moves and see which one is "best". But which one is best depends upon how your opponent will respond. So look ahead -- if you do this, then your opponent could do that, but then you could do this. This generates a tree of possible moves, with paths leading to various board configurations. Example with TicTacToe:
TTT game tree

With TicTacToe, this isn't too bad -- how many possible board configurations are there? 9 * 8 * ... * 2 * 1 = 362,880 possible boards. On a fast computer, that takes only a few seconds.

This approach applies to other games as well (e.g., chess), but chess is a much "bigger" game -- there are many more possible moves at any given moment in time. There are also other complications, such as castling, queening, etc., and repeated states (not possible with TTT). Chess is estimated to have approximately 1050 possible configurations! In general, if there are m possible moves per turn and t turns per game, then there are approximately mt possible configurations (the number will be less, since the possible moves gets more restricted, etc.)

So, we can't search through the whole tree of a chess game. Instead, we search as far as possible, in the time allowed, then take the path that looks best. We can evaluate any given board for quality, using the same kind of ideas that chess experts do (what's the count of number/type of pieces, how well are the pieces defended, etc.). IBM's Deep Blue, a special purpose, highly-parallel chess playing computer, evaluated 100-200 billion moves per second. We can also incorporate "expert advice" -- opening moves and endgames, etc.

Spam filtering

We often think of learning as a hallmark of intelligence. Computers learn in a variety of applications, though (e.g., handwriting and voice recognition). The clustering algorithms that we talked about last time are in fact a form of learning (called "unsupervised") -- they're trying to learn groups of similar items. More commonly, "supervised" learning is given some training data for which the right answer is given, and tries to learn what are the important features in the training data.

Let's consider supervised learning for spam filtering (see a nice essay by Paul Graham). As we discussed last time, a document can be represented as an array of word counts. Let's assume we start off with a whole set of good emails and a whole set of spam emails. How do the word counts differ between them (i.e., what words tend to show up in spam and not in good email)? (Interestingly, Graham notes that his program discovered that the word "ff0000" tends to be highly indicative of spam, which highlights stuff in bright red.) Now, given a new email, do its word counts look more like those of spam or good email?

The same basic idea -- statistical learning -- works when the training data are samples of handwriting (here are some examples of the way I write the letter "A", or a bunch of zip codes written by lots of different people); speech (say each of the following words 5 times); music (the entire corpus of Beethoven's sonatas); translation (parallel texts in English and French); etc. Identify a set of "features", and let the algorithms determine what's common about sets of examples and distinguishes them from others.