0:00:00
Lecture 1A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING]0:00:14
PROFESSOR: I'd like to welcome you to this course on computer science.0:00:28
Actually, that's a terrible way to start. Computer science is a terrible name for this business. First of all, it's not a science. It might be engineering or it might be art, but we'll0:00:40
actually see that computer so-called science actually has a lot in common with magic, and we'll see that in this course. So it's not a science. It's also not really very much about computers.0:00:53
And it's not about computers in the same sense that physics is not really about particle accelerators, and biology is0:01:02
not really about microscopes and petri dishes. And it's not about computers in the same sense that geometry is not really about using surveying instruments.0:01:16
In fact, there's a lot of commonality between computer science and geometry. Geometry, first of all, is another subject with a lousy name.0:01:25
The name comes from Gaia, meaning the Earth, and metron, meaning to measure. Geometry originally meant measuring the Earth or surveying.0:01:34
And the reason for that was that, thousands of years ago, the Egyptian priesthood developed the rudiments of geometry in order to figure out how to restore the0:01:45
boundaries of fields that were destroyed in the annual flooding of the Nile. And to the Egyptians who did that, geometry really was the use of surveying instruments.0:01:55
Now, the reason that we think computer science is about computers is pretty much the same reason that the Egyptians thought geometry was about surveying instruments.0:02:04
And that is, when some field is just getting started and you don't really understand it very well, it's very easy to confuse the essence of what you're doing with the tools0:02:15
that you use. And indeed, on some absolute scale of things, we probably know less about the essence of computer science than the0:02:25
ancient Egyptians really knew about geometry. Well, what do I mean by the essence of computer science? What do I mean by the essence of geometry?0:02:34
See, it's certainly true that these Egyptians went off and used surveying instruments, but when we look back on them after a couple of thousand years, we say, gee, what they were doing, the important stuff they were doing, was to0:02:45
begin to formalize notions about space and time, to start a way of talking about mathematical truths formally.0:02:57
That led to the axiomatic method. That led to sort of all of modern mathematics, figuring out a way to talk precisely about so-called declarative0:03:08
knowledge, what is true. Well, similarly, I think in the future people will look back and say, yes, those primitives in the 20th century0:03:18
were fiddling around with these gadgets called computers, but really what they were doing is starting to learn how to formalize intuitions about process, how0:03:32
to do things, starting to develop a way to talk0:03:47
precisely about how-to knowledge, as opposed to geometry that talks about what is true.0:03:56
Let me give you an example of that. Let's take a look. Here is a piece of mathematics that says what0:04:08
a square root is. The square root of X is the number Y, such that Y squared0:04:17
is equal to X and Y is greater than 0. Now, that's a fine piece of mathematics, but just telling you what a square root is doesn't really say anything0:04:26
about how you might go out and find one. So let's contrast that with a piece of imperative knowledge,0:04:37
how you might go out and find a square root. This, in fact, also comes from Egypt, not ancient, ancient Egypt. This is an algorithm due to Heron of Alexandria, called0:04:50
how to find a square root by successive averaging. And what it says is that, in order to find a square root,0:05:03
you make a guess, you improve that guess-- and the way you improve the guess is to average the guess0:05:12
and X over the guess, and we'll talk a little bit later about why that's a reasonable thing-- and you keep improving the guess until it's good enough. That's a method. That's how to do something as opposed to declarative0:05:25
knowledge that says what you're looking for. That's a process.0:05:34
Well, what's a process in general? It's kind of hard to say. You can think of it as like a magical spirit that sort of0:05:45
lives in the computer and does something. And the thing that directs a process is a pattern of rules0:05:56
called a procedure. So procedures are the spells, if you like, that control0:06:05
these magical spirits that are the processes. I guess you know everyone needs a magical language, and sorcerers, real sorcerers, use ancient Arcadian or Sumerian0:06:17
or Babylonian or whatever. We're going to conjure our spirits in a magical language called Lisp, which is a language designed for talking0:06:26
about, for casting the spells that are procedures to direct the processes. Now, it's very easy to learn Lisp. In fact, in a few minutes, I'm going to teach you,0:06:35
essentially, all of Lisp. I'm going to teach you, essentially, all of the rules. And you shouldn't find that particularly surprising. That's sort of like saying it's very easy to learn the0:06:46
rules of chess. And indeed, in a few minutes, you can tell somebody the rules of chess. But of course, that's very different from saying you understand the implications of those rules and how to use0:06:55
those rules to become a masterful chess player. Well, Lisp is the same way. We're going to state the rules in a few minutes, and it'll be very easy to see. But what's really hard is going to be the implications0:07:06
of those rules, how you exploit those rules to be a master programmer. And the implications of those rules are going to take us0:07:15
the, well, the whole rest of the subject and, of course, way beyond. OK, so in computer science, we're in the business of0:07:26
formalizing this sort of how-to imperative knowledge, how to do stuff. And the real issues of computer science are, of0:07:35
course, not telling people how to do square roots. Because if that was all it was, there wouldn't be no big deal. The real problems come when we try to build very, very large0:07:45
systems, computer programs that are thousands of pages long, so long that nobody can really hold them in their heads all at once.0:07:54
And the only reason that that's possible is because there are techniques for controlling the complexity of0:08:17
these large systems. And these techniques that are controlling complexity are what this course is really about. And in some sense, that's really what0:08:26
computer science is about. Now, that may seem like a very strange thing to say. Because after all, a lot of people besides computer scientists deal with controlling complexity.0:08:37
A large airliner is an extremely complex system, and the aeronautical engineers who design that are dealing with immense complexity.0:08:47
But there's a difference between that kind of complexity and what we deal with in computer science. And that is that computer science, in some0:08:57
sense, isn't real. You see, when an engineer is designing a physical system,0:09:07
that's made out of real parts. The engineers who worry about that have to address problems of tolerance and approximation and noise in the system.0:09:16
So for example, as an electrical engineer, I can go off and easily build a one-stage amplifier or a two-stage amplifier, and I can imagine cascading a lot of0:09:25
them to build a million-stage amplifier. But it's ridiculous to build such a thing, because long before the millionth stage, the thermal noise in those components way at the beginning is going to get0:09:34
amplified and make the whole thing meaningless. Computer science deals with idealized components.0:09:44
We know as much as we want about these little program and data pieces that we're fitting things together. We don't have to worry about tolerance.0:09:53
And that means that, in building a large program, there's not all that much difference between what I can build and what I can imagine, because the parts are these0:10:06
abstract entities that I know as much as I want. I know about them as precisely as I'd like. So as opposed to other kinds of engineering, where the0:10:15
constraints on what you can build are the constraints of physical systems, the constraints of physics and noise and approximation, the constraints imposed in0:10:24
building large software systems are the limitations of our own minds. So in that sense, computer science is like an abstract form of engineering.0:10:33
It's the kind of engineering where you ignore the constraints that are imposed by reality.0:10:42
Well, what are some of these techniques? They're not special to computer science. First technique, which is used in all of engineering, is a0:10:54
kind of abstraction called black-box abstraction.0:11:07
Take something and build a box about it. Let's see, for example, if we looked at that square root0:11:19
method, I might want to take that and build a box.0:11:29
That sort of says, to find the square root of X. And that0:11:38
might be a whole complicated set of rules. And that might end up being a kind of thing where I can put in, say, 36 and say, what's the square root of 36?0:11:50
And out comes six. And the important thing is that I'd like to design that0:11:59
so that if George comes along and would like to compute, say, the square root of A plus the square root of B, he can0:12:11
take this thing and use it as a module without having to look inside and build something that looks like this, like an A and a B and a square root box and another0:12:24
square root box and then something that adds that would put out the answer.0:12:33
And you can see, just from the fact that I want to do that, is from George's point of view, the internals of what's in here should not be important.0:12:44
So for instance, it shouldn't matter that, when I wrote this, I said I want to find the square root of X. I could have said the square root of Y, or the square root of A, or0:12:54
anything at all. That's the fundamental notion of putting something in a box0:13:03
using black-box abstraction to suppress detail. And the reason for that is you want to go off and build bigger boxes. Now, there's another reason for doing black-box0:13:13
abstraction other than you want to suppress detail for building bigger boxes. Sometimes you want to say that your way of doing something,0:13:24
your how-to method, is an instance of a more general thing, and you'd like your language to be able to express0:13:33
that generality. Let me show you another example sticking with square roots. Let's go back and take another look at that slide with the0:13:42
square root algorithm on it. Remember what that says. That says, in order to do something, I make a guess, and I improve that guess, and I sort of keep0:13:53
improving that guess. So there's the general strategy of, I'm looking for something, and the way I find it is that I0:14:02
keep improving it. Now, that's a particular case of another kind of strategy for finding a fixed point of something.0:14:14
So you have a fixed point of a function. A fixed point of a function is something, is a value.0:14:26
A fixed point of a function F is a value Y, such that F of Y equals Y. And the way I might do that is start with a guess.0:14:41
And then if I want something that doesn't change when I keep applying F, is I'll keep applying F over and over until that result doesn't change very much.0:14:50
So there's a general strategy. And then, for example, to compute the square root of X, I can try and find a fixed point of the function which0:15:00
takes Y to the average of X/Y. And the idea that is that if I really had Y equal to the square root of X, then Y and0:15:09
X/Y would be the same value. They'd both be the square root of X, because X over the square root of X is the square root of X.0:15:19
And so the average if Y were equal to the square of X, then the average wouldn't change. So the square root of X is a fixed point of that particular function.0:15:30
Now, what I'd like to have, I'd like to express the general strategy for finding fixed points. So what I might imagine doing, is to find, is to be able to use my language to define a box that says "fixed point," just like I could make a box that says "square root." And I'd like to be able to express this in my language.0:15:56
So I'd like to express not only the imperative how-to knowledge of a particular thing like square root, but I'd like to be able to express the imperative knowledge of0:16:05
how to do a general thing like how to find fixed point. And in fact, let's go back and look at that slide again.0:16:15
See, not only is this a piece of imperative knowledge, how to find a fixed point, but over here on the bottom,0:16:27
there's another piece of imperative knowledge which says, one way to compute square root is to apply this general fixed point method.0:16:36
So I'd like to also be able to express that imperative knowledge. What would that look like? That would say, this fixed point box is such that if I0:16:46
input to it the function that takes Y to the average of Y0:16:56
and X/Y, then what should come out of that fixed point box is a method for finding square roots.0:17:08
So in these boxes we're building, we're not only building boxes that you input numbers and output numbers, we're going to be building in boxes that, in effect, compute0:17:19
methods like finding square root. And my take is their inputs functions, like Y goes to the average of Y and X/Y. The reason we want to do that, the0:17:32
reason this is a procedure, will end up being a procedure, as we'll see, whose value is another procedure, the reason we want to do that is because procedures are going to be our0:17:42
ways of talking about imperative knowledge. And the way to make that very powerful is to be able to talk about other kinds of knowledge.0:17:53
So here is a procedure that, in effect, talks about another procedure, a general strategy that itself talks about general strategies.0:18:04
Well, our first topic in this course-- there'll be three major topics-- will be black-box abstraction. Let's look at that in a little bit more detail.0:18:15
What we're going to do is we will start out talking about how Lisp is built up out of primitive objects.0:18:27
What does the language supply with us? And we'll see that there are primitive procedures and primitive data. Then we're going to see, how do you take those primitives0:18:38
and combine them to make more complicated things, means of combination? And what we'll see is that there are ways of putting things together, putting primitive procedures together0:18:47
to make more complicated procedures. And we'll see how to put primitive data together to make compound data. Then we'll say, well, having made those compounds things,0:18:59
how do you abstract them? How do you put those black boxes around them so you can use them as components in more complex things?0:19:08
And we'll see that's done by defining procedures and a technique for dealing with compound data called data abstraction. And then, what's maybe the most important thing, is going0:19:19
from just the rules to how does an expert work? How do you express common patterns of doing things, like saying, well, there's a general method of fixed point0:19:28
and square root is a particular case of that? And we're going to use-- I've already hinted at it-- something called higher-order procedures, namely procedures whose inputs and outputs are0:19:40
themselves procedures. And then we'll also see something very interesting. We'll see, as we go further and further on and become more abstract, there'll be very--0:19:50
well, the line between what we consider to be data and what we consider to be procedures is going to blur at an incredible rate.0:20:03
Well, that's our first subject, black-box abstraction. Let's look at the second topic. I can introduce it like this.0:20:13
See, suppose I want to express the idea-- remember, we're talking about ideas--0:20:22
suppose I want to express the idea that I can take something and multiply it by the sum of two other things.0:20:36
So for example, I might say, if I had one and three and multiply that by two, I get eight. But I'm talking about the general idea of what's called linear combination, that you can add two things and0:20:46
multiply them by something else. It's very easy when I think about it for numbers, but suppose I also want to use that same idea to think about,0:20:56
I could add two vectors, a1 and a2, and then scale them by some factor x and get another vector. Or I might say, I want to think about a1 and a2 as being0:21:08
polynomials, and I might want to add those two polynomials and then multiply them by two to get a more complicated one.0:21:20
Or a1 and a2 might be electrical signals, and I might want to think about summing those two electrical signals and then putting the whole thing through an0:21:29
amplifier, multiplying it by some factor of two or something. The idea is I want to think about the general notion of that.0:21:38
Now, if our language is going to be good language for expressing those kind of general ideas, if I really,0:21:47
really can do that, I'd like to be able to say I'm going to multiply by x the sum of a1 and a2, and I'd like that to0:22:03
express the general idea of all different kinds of things that a1 and a2 could be. Now, if you think about that, there's a problem, because after all, the actual primitive operations that go0:22:16
on in the machine are obviously going to be different if I'm adding two numbers than if I'm adding two polynomials, or if I'm adding the representation of two0:22:25
electrical signals or wave forms. Somewhere, there has to be the knowledge of the kinds of various things that you can add and the ways of adding them.0:22:37
Now, to construct such a system, the question is, where do I put that knowledge? How do I think about the different kinds of choices I have? And if tomorrow George comes up with a new kind of object0:22:48
that might be added and multiplied, how do I add George's new object to the system without screwing up everything that was already there?0:22:57
Well, that's going to be the second big topic, the way of controlling that kind of complexity. And the way you do that is by establishing conventional0:23:07
interfaces, agreed upon ways of plugging things together.0:23:20
Just like in electrical engineering, people have standard impedances for connectors, and then you know if you build something with one of those standard impedances, you can plug it together with something else.0:23:32
So that's going to be our second large topic, conventional interfaces. What we're going to see is, first, we're going to talk about the problem of generic operations, which is the one I0:23:41
alluded to, things like "plus" that have to work with all different kinds of data.0:23:52
So we talk about generic operations. Then we're going to talk about really large-scale structures. How do you put together very large programs that model the0:24:01
kinds of complex systems in the real world that you'd like to model? And what we're going to see is that there are two very important metaphors for putting together such systems.0:24:11
One is called object-oriented programming, where you sort of think of your system as a kind of society full of little things that interact by sending0:24:21
information between them. And then the second one is operations on aggregates, called streams, where you think of a large system put together kind of like a signal processing engineer puts0:24:33
together a large electrical system. That's going to be our second topic.0:24:43
Now, the third thing we're going to come to, the third basic technique for controlling complexity, is making new languages. Because sometimes, when you're sort of overwhelmed by the0:24:54
complexity of a design, the way that you control that complexity is to pick a new design language. And the purpose of the new design language will be to0:25:03
highlight different aspects of the system. It will suppress some kinds of details and emphasize other kinds of details.0:25:12
This is going to be the most magical part of the course. We're going to start out by actually looking at the technology for building new computer languages.0:25:21
The first thing we're going to do is actually build in Lisp. We're going to express in Lisp the process of interpreting0:25:32
Lisp itself. And that's going to be a very sort of self-circular thing. There's a little mystical symbol that has to do with that. The process of interpreting Lisp is sort of a giant wheel0:25:45
of two processes, apply and eval, which sort of constantly reduce expressions to each other. Then we're going to see all sorts of other magical things.0:25:54
Here's another magical symbol. This is sort of the Y operator, which is, in some sense, the expression of infinity inside0:26:04
our procedural language. We'll take a look at that. In any case, this section of the course is called Metalinguistic Abstraction, abstracting by talking about0:26:24
how you construct new languages. As I said, we're going to start out by looking at the0:26:34
process of interpretation. We're going to look at this apply-eval loop, and build Lisp. Then, just to show you that this is very general, we're0:26:44
going to use exactly the same technology to build a very different kind of language, a so-called logic programming language, where you don't really talk about procedures at all that have inputs and outputs.0:26:54
What you do is talk about relations between things. And then finally, we're going to talk about how you implement these things very concretely on the very0:27:04
simplest kind of machines. We'll see something like this. This is a picture of a chip, which is the Lisp interpreter0:27:14
that we will be talking about then in hardware. Well, there's an outline of the course, three big topics.0:27:24
Black-box abstraction, conventional interfaces, metalinguistic abstraction. Now, let's take a break now and then we'll get started.0:27:33
[MUSIC PLAYING]0:28:04
Let's actually start in learning Lisp now. Actually, we'll start out by learning something much more important, maybe the very most important thing in this course, which is not Lisp, in particular, of course, but0:28:16
rather a general framework for thinking about languages that I already alluded to. When somebody tells you they're going to show you a language, what you should say is, what I'd like you to tell0:28:27
me is what are the primitive elements?0:28:37
What does the language come with? Then, what are the ways you put those together? What are the means of combination?0:28:50
What are the things that allow you to take these primitive elements and build bigger things out of them? What are the ways of putting things together?0:29:01
And then, what are the means of abstraction? How do we take those complicated things and draw0:29:15
those boxes around them? How do we name them so that we can now use them as if they were primitive elements in making still more complex things? And so on, and so on, and so on.0:29:26
So when someone says to you, gee, I have a great new computer language, you don't say, how many characters does it take to invert a matrix?0:29:35
It's irrelevant. What you say is, if the language did not come with matrices built in or with something else built in, how could I then build that thing?0:29:45
What are the means of combination which would allow me to do that? And then, what are the means of abstraction which allow me then to use those as elements in making more complicated0:29:55
things yet? Well, we're going to see that Lisp has some primitive data and some primitive procedures.0:30:05
In fact, let's really start. And here's a piece of primitive data in Lisp, number three.0:30:16
Actually, if I'm being very pedantic, that's not the number three. That's some symbol that represents Plato's concept of the number three.0:30:27
And here's another. Here's some more primitive data in Lisp, 17.4. Or actually, some representation of 17.4.0:30:40
And here's another one, five. Here's another primitive object that's built in Lisp, addition.0:30:52
Actually, to use the same kind of pedantic-- this is a name for the primitive method of adding things. Just like this is a name for Plato's number three, this is0:31:02
a name for Plato's concept of how you add things. So those are some primitive elements.0:31:12
I can put them together. I can say, gee, what's the sum of three and 17.4 and five? And the way I do that is to say, let's apply the sum0:31:25
operator to these three numbers. And I should get, what? eight, 17. 25.4.0:31:34
So I should be able to ask Lisp what the value of this is, and it will return 25.4.0:31:43
Let's introduce some names. This thing that I typed is called a combination.0:31:56
And a combination consists, in general, of applying an operator-- so this is an operator--0:32:09
to some operands. These are the operands.0:32:21
And of course, I can make more complex things. The reason I can get complexity out of this is because the operands themselves, in general, can be0:32:30
combinations. So for instance, I could say, what is the sum of three and the product of five and six and eight and two?0:32:45
And I should get-- let's see-- 30, 40, 43. So Lisp should tell me that that's 43.0:32:56
Forming combinations is the basic needs of combination that we'll be looking at. And then, well, you see some syntax here.0:33:10
Lisp uses what's called prefix notation, which means that the operator is written to the left of the operands.0:33:25
It's just a convention. And notice, it's fully parenthesized. And the parentheses make it completely unambiguous. So by looking at this, I can see that there's the operator,0:33:36
and there are one, two, three, four operands. And I can see that the second operand here is itself some0:33:46
combination that has one operator and two operands. Parentheses in Lisp are a little bit, or are very unlike0:33:55
parentheses in conventional mathematics. In mathematics, we sort of use them to mean grouping, and it sort of doesn't hurt if sometimes you leave out parentheses if people understand0:34:04
that that's a group. And in general, it doesn't hurt if you put in extra parentheses, because that maybe makes the grouping more distinct. Lisp is not like that.0:34:13
In Lisp, you cannot leave out parentheses, and you cannot put in extra parentheses, because putting in parentheses always means, exactly and precisely, this is a0:34:23
combination which has meaning, applying operators to operands. And if I left this out, if I left those parentheses out, it0:34:32
would mean something else. In fact, the way to think about this, is really what I'm doing when I write something like this is writing a tree.0:34:42
So this combination is a tree that has a plus and then a thee and then a something else and an eight and a two.0:34:54
And then this something else here is itself a little subtree that has a star and a five and a six.0:35:03
And the way to think of that is, really, what's going on are we're writing these trees, and parentheses are just a way0:35:13
to write this two-dimensional structure as a linear character string. Because at least when Lisp first started and people had0:35:22
teletypes or punch cards or whatever, this was more convenient. Maybe if Lisp started today, the syntax of Lisp would look like that.0:35:31
Well, let's look at what that actually looks like on the computer. Here I have a Lisp interaction set up. There's a editor.0:35:41
And on the top, I'm going to type some values and ask Lisp what they are. So for instance, I can say to Lisp, what's the value of that symbol? That's three.0:35:50
And I ask Lisp to evaluate it. And there you see Lisp has returned on the bottom, and said, oh yeah, that's three. Or I can say, what's the sum of three and four and eight?0:36:06
What's that combination? And ask Lisp to evaluate it. That's 15.0:36:16
Or I can type in something more complicated. I can say, what's the sum of the product of three and the0:36:27
sum of seven and 19.5? And you'll notice here that Lisp has something built in0:36:37
that helps me keep track of all these parentheses. Watch as I type the next closed parentheses, which is going to close the combination starting with the star. The opening one will flash.0:36:48
Here, I'll rub those out and do it again. Type close, and you see that closes the plus. Close again, that closes the star.0:36:57
Now I'm back to the sum, and maybe I'm going to add that all to four. That closes the plus. Now I have a complete combination, and I can ask Lisp for the value of that.0:37:07
That kind of paren balancing is something that's built into a lot of Lisp systems to help you keep track, because it is kind of hard just by hand doing all these parentheses.0:37:16
There's another kind of convention for keeping track of parentheses. Let me write another complicated combination. Let's take the sum of the product of three and five and0:37:33
add that to something. And now what I'm going to do is I'm going to indent so that the operands are written vertically. Which the sum of that and the product of 47 and--0:37:47
let's say the product of 47 with a difference of 20 and 6.8. That means subtract 6.8 from 20.0:37:58
And then you see the parentheses close. Close the minus. Close the star. And now let's get another operator. You see the Lisp editor here is indenting to the right0:38:08
position automatically to help me keep track. I'll do that again. I'll close that last parentheses again. You see it balances the plus.0:38:20
Now I can say, what's the value of that? So those two things, indenting to the right level, which is0:38:29
called pretty printing, and flashing parentheses, are two things that a lot of Lisp systems have built in to help you keep track. And you should learn how to use them.0:38:42
Well, those are the primitives. There's a means of combination. Now let's go up to the means of abstraction. I'd like to be able to take the idea that I do some0:38:52
combination like this, and abstract it and give it a simple name, so I can use that as an element. And I do that in Lisp with "define." So I can say, for0:39:01
example, define A to be the product of five and five.0:39:17
And now I could say, for example, to Lisp, what is the product of A and A?0:39:26
And this should be 25, and this should be 625. And then, crucial thing, I can now use A--0:39:36
here I've used it in a combination-- but I could use that in other more complicated things that I name in turn. So I could say, define B to be the sum of, we'll say, A and0:39:53
the product of five and A. And then close the plus.0:40:03
Let's take a look at that on the computer and see how that looks. So I'll just type what I wrote on the board. I could say, define A to be the product of five and five.0:40:23
And I'll tell that to Lisp. And notice what Lisp responded there with was an A in the bottom. In general, when you type in a definition in Lisp, it responds with the symbol being defined.0:40:35
Now I could say to Lisp, what is the product of A and A? And it says that's 625.0:40:46
I can define B to be the sum of A and the product of five0:40:59
and A. Close a paren closes the star. Close the plus. Close the "define." Lisp says, OK, B, there on the bottom.0:41:11
And now I can say to Lisp, what's the value of B? And I can say something more complicated, like what's the sum of A and the quotient of B and five?0:41:26
That slash is divide, another primitive operator. I've divided B by five, added it to A. Lisp says, OK, that's 55.0:41:36
So there's what it looks like. There's the basic means of defining something. It's the simplest kind of naming, but it's not really0:41:47
very powerful. See, what I'd really like to name-- remember, we're talking about general methods-- I'd like to name, oh, the general idea that, for0:41:56
example, I could multiply five by five, or six by six, or0:42:10
1,001 by 1,001, 1,001.7 by 1,001.7. I'd like to be able to name the general idea of0:42:22
multiplying something by itself. Well, you know what that is. That's called squaring.0:42:31
And the way I can do that in Lisp is I can say, define to0:42:43
square something x, multiply x by itself.0:42:57
And then having done that, I could say to Lisp, for example, what's the square of 10?0:43:06
And Lisp will say 100. So now let's actually look at that a little more closely. Right, there's the definition of square.0:43:17
To square something, multiply it by itself. You see this x here.0:43:26
That x is kind of a pronoun, which is the something that I'm going to square. And what I do with it is I multiply x, I0:43:35
multiply it by itself.0:43:44
OK. So there's the notation for defining a procedure. Actually, this is a little bit confusing, because this is sort of how I might use square.0:43:53
And I say square root of x or square root of 10, but it's not making it very clear that I'm actually naming something.0:44:02
So let me write this definition in another way that makes it a little bit more clear that I'm naming something. I'll say, "define" square to be lambda of x times xx.0:44:36
Here, I'm naming something square, just like over here, I'm naming something A. The thing that I'm naming square-- here, the thing I named A was the value of this combination.0:44:49
Here, the thing that I'm naming square is this thing that begins with lambda, and lambda is Lisp's way of saying make a procedure.0:45:00
Let's look at that more closely on the slide. The way I read that definition is to say, I define square to be make a procedure--0:45:12
that's what the lambda is-- make a procedure with an argument named x. And what it does is return the results of0:45:22
multiplying x by itself. Now, in general, we're going to be using this top form of0:45:32
defining, just because it's a little bit more convenient. But don't lose sight of the fact that it's really this. In fact, as far as the Lisp interpreter's concerned,0:45:41
there's no difference between typing this to it and typing this to it. And there's a word for that, sort of syntactic sugar.0:45:54
What syntactic sugar means, it's having somewhat more convenient surface forms for typing something. So this is just really syntactic sugar for this0:46:04
underlying Greek thing with the lambda. And the reason you should remember that is don't forget that, when I write something like this, I'm really naming something.0:46:14
I'm naming something square, and the something that I'm naming square is a procedure that's getting constructed. Well, let's look at that on the computer, too.0:46:24
So I'll come and I'll say, define square of x to be times xx.0:46:49
Now I'll tell Lisp that. It says "square." See, I've named something "square." Now, having done that, I can ask Lisp for, what's0:47:00
the square of 1,001? Or in general, I could say, what's the square of the sum0:47:14
of five and seven? The square of 12's 144.0:47:25
Or I can use square itself as an element in some combination. I can say, what's the sum of the square of three and the0:47:36
square of four? nine and 16 is 25. Or I can use square as an element in some much more0:47:49
complicated thing. I can say, what's the square of, the sqare of, the square of 1,001?0:48:07
And there's the square of the square of the square of 1,001. Or I can say to Lisp, what is square itself? What's the value of that?0:48:17
And Lisp returns some conventional way of telling me that that's a procedure. It says, "compound procedure square." Remember, the value of square is this procedure, and the thing with the stars0:48:30
and the brackets are just Lisp's conventional way of describing that. Let's look at two more examples of defining.0:48:45
Here are two more procedures. I can define the average of x and y to be the sum of x and y divided by two.0:48:54
Or having had average and mean square, having had average and square, I can use that to talk about the mean square of0:49:03
something, which is the average of the square of x and the square of y. So for example, having done that, I could say, what's the0:49:13
mean square of two and three?0:49:24
And I should get the average of four and nine, which is 6.5. The key thing here is that, having defined square, I can0:49:37
use it as if it were primitive. So if we look here on the slide, if I look at mean square, the person defining mean square doesn't have to0:49:50
know, at this point, whether square was something built into the language or whether it was a procedure that was defined.0:49:59
And that's a key thing in Lisp, that you do not make arbitrary distinctions between things that happen to be0:50:08
primitive in the language and things that happen to be built in. A person using that shouldn't even have to know. So the things you construct get used with all the power0:50:17
and flexibility as if they were primitives. In fact, you can drive that home by looking on the computer one more time. We talked about plus.0:50:26
And in fact, if I come here on the computer screen and say, what is the value of plus? Notice what Lisp types out.0:50:36
On the bottom there, it typed out, "compound procedure plus." Because, in this system, it turns out that the addition operator is itself a compound procedure.0:50:45
And if I didn't just type that in, you'd never know that, and it wouldn't make any difference anyway. We don't care. It's below the level of the abstraction that we're dealing with.0:50:54
So the key thing is you cannot tell, should not be able to tell, in general, the difference between things that are built in and things that are compound.0:51:03
Why is that? Because the things that are compound have an abstraction wrapper wrapped around them. We've seen almost all the elements of Lisp now.0:51:12
There's only one more we have to look at, and that is how to make a case analysis. Let me show you what I mean. We might want to think about the mathematical definition of0:51:22
the absolute value functions. I might say the absolute value of x is the function which has the property that it's negative of x.0:51:35
For x less than zero, it's zero for x equal to zero. And it's x for x greater than zero.0:51:49
And Lisp has a way of making case analyses. Let me define for you absolute value. Say define the absolute value of x is conditional.0:52:03
This means case analysis, COND. If x is less than zero, the answer is negate x.0:52:22
What I've written here is a clause. This whole thing is a conditional clause,0:52:33
and it has two parts. This part here is a predicate or a condition.0:52:44
That's a condition. And the condition is expressed by something called a predicate, and a predicate in Lisp is some sort of thing that returns either true or false.0:52:53
And you see Lisp has a primitive procedure, less-than, that tests whether something is true or false. And the other part of a clause is an action or a thing to do,0:53:06
in the case where that's true. And here, what I'm doing is negating x. The negation operator, the minus sign in Lisp is a little bit funny.0:53:17
If there's two or more arguments, if there's two arguments it subtracts the second one from the first, and we saw that. And if there's one argument, it negates it. So this corresponds to that.0:53:27
And then there's another COND clause. It says, in the case where x is equal to zero, the answer is zero.0:53:37
And in the case where x is greater than zero, the answer is x. Close that clause.0:53:46
Close the COND. Close the definition. And there's the definition of absolute value. And you see it's the case analysis that looks very much like the case analysis you use in mathematics.0:53:58
There's a somewhat different way of writing a restricted case analysis. Often, you have a case analysis where you only have one case, where you test something, and then depending0:54:08
on whether it's true or false, you do something. And here's another definition of absolute value which looks almost the same, which says, if x is less than zero, the0:54:21
result is negate x. Otherwise, the answer is x. And we'll be using "if" a lot. But again, the thing to remember is that this form of0:54:30
absolute value that you're looking at here, and then this one over here that I wrote on the board, are essentially the same.0:54:39
And "if" and COND are-- well, whichever way you like it. You can think of COND as syntactic sugar for "if," or you can think of "if" as syntactic sugar for COND, and it doesn't make any difference.0:54:48
The person implementing a Lisp system will pick one and implement the other in terms of that. And it doesn't matter which one you pick.0:55:02
Why don't we break now, and then take some questions. How come sometimes when I write define, I put an open0:55:11
paren here and say, define open paren something or other, and sometimes when I write this, I don't put an open paren?0:55:22
The answer is, this particular form of "define," where you say define some expression, is this very special thing for defining procedures.0:55:33
But again, what it really means is I'm defining this symbol, square, to be that. So the way you should think about it is what "define" does0:55:44
is you write "define," and the second thing you write is the symbol here-- no open paren-- the symbol you're defining and what you're defining it to be.0:55:54
That's like here and like here. That's sort of the basic way you use "define." And then, there's this special syntactic trick which allows you to0:56:05
define procedures that look like this. So the difference is, it's whether or not you're defining a procedure. [MUSIC PLAYING]0:56:38
Well, believe it or not, you actually now know enough Lisp to write essentially any numerical procedure that you'd write in a language like FORTRAN or Basic or whatever,0:56:49
or, essentially, any other language. And you're probably saying, that's not believable, because you know that these languages have things like "for statements," and "do until while" or something.0:57:00
But we don't really need any of that. In fact, we're not going to use any of that in this course. Let me show you.0:57:10
Again, looking back at square root, let's go back to this square root algorithm of Heron of Alexandria. Remember what that said.0:57:20
It said, to find an approximation to the square root of X, you make a guess, you improve that guess by averaging the guess and X over the guess.0:57:32
You keep improving that until the guess is good enough. I already alluded to the idea. The idea is that, if the initial guess that you took0:57:44
was actually equal to the square root of X, then G here would be equal to X/G. So if you hit the square root, averaging them0:57:54
wouldn't change it. If the G that you picked was larger than the square root of X, then X/G will be smaller than the square root of X, so0:58:03
that when you average G and X/G, you get something in between. So if you pick a G that's too small, your answer will be too large.0:58:13
If you pick a G that's too large, if your G is larger than the square root of X and X/G will be smaller than the square root of X. So averaging always gives you something in between.0:58:24
And then, it's not quite trivial, but it's possible to show that, in fact, if G misses the square root of X by a little bit, the average of G and X/G will actually keep0:58:34
getting closer to the square root of X. So if you keep doing this enough, you'll eventually get as close as you want. And then there's another fact, that you can always start out0:58:44
this process by using 1 as an initial guess. And it'll always converge to the square root of X. So that's this method of successive averaging due to0:58:55
Heron of Alexandria. Let's write it in Lisp. Well, the central idea is, what does it mean to try a0:59:05
guess for the square root of X? Let's write that. So we'll say, define to try a guess for the square root of0:59:24
X, what do we do? We'll say, if the guess is good enough to be a guess for0:59:44
the square root of X, then, as an answer, we'll take the guess. Otherwise, we will try the improved guess.0:59:58
We'll improve that guess for the square root of X, and we'll try that as a guess for the square root of X. Close1:00:09
the "try." Close the "if." Close the "define." So that's how we try a guess. And then, the next part of the process said, in order to1:00:18
compute square roots, we'll say, define to compute the1:00:28
square root of X, we will try one as a guess for the square root of X. Well, we have to define a couple more things.1:00:40
We have to say, how is a guess good enough? And how do we improve a guess? So let's look at that. The algorithm to improve a guess for the square root of1:00:53
X, we average-- that was the algorithm-- we average the guess with the quotient of dividing X by the guess.1:01:03
That's how we improve a guess. And to tell whether a guess is good enough, well, we have to decide something. This is supposed to be a guess for the square root of X, so one possible thing you can do is say, when you take that1:01:14
guess and square it, do you get something very close to X? So one way to say that is to say, I square the guess, subtract X from that, and see if the absolute value of that1:01:26
whole thing is less than some small number, which depends on my purposes.1:01:35
So there's a complete procedure for how to compute the square root of X. Let's look at the structure of that a little bit.1:01:47
I have the whole thing. I have the notion of how to compute a square root. That's some kind of module.1:01:56
That's some kind of black box. It's defined in terms of how to try a guess for the square1:02:07
root of X. "Try" is defined in terms of, well, telling whether something is good enough and telling1:02:16
how to improve something. So good enough. "Try" is defined in terms of "good enough" and "improve."1:02:30
And let's see what else I fill in. Well, I'll go down this tree. "Good enough" was defined in terms of absolute value, and square.1:02:40
And improve was defined in terms of something called averaging and then some other primitive operator. Square root's defined in terms of "try." "Try" is defined in1:02:49
terms of "good enough" and "improve," but also "try" itself. So "try" is also defined in terms of how to try itself.1:03:02
Well, that may give you some problems. Your high school geometry teacher probably told you that it's naughty to try and define things in terms of themselves, because it doesn't1:03:13
make sense. But that's false. Sometimes it makes perfect sense to define things in terms of themselves. And this is the case.1:03:22
And we can look at that. We could write down what this means, and say, suppose I asked Lisp what the square root of two is.1:03:32
What's the square root of two mean? Well, that means I try one as a guess for the1:03:42
square root of two. Now I look. I say, gee, is one a good enough guess for the square root of two?1:03:51
And that depends on the test that "good enough" does. And in this case, "good enough" will say, no, one is not a good enough guess for the square root of two. So that will reduce to saying, I have to try an improved--1:04:10
improve one as a guess for the square root of two, and try that as a guess for the square root of two.1:04:19
Improving one as a guess for the square root of two means I average one and two divided by one. So this is going to be average.1:04:29
This piece here will be the average of one and the quotient of two by one.1:04:40
That's this piece here. And this is 1.5.1:04:49
So this square root of two reduces to trying one for the square root of two, which reduces to trying 1.5 as a1:05:03
guess for the square root of two. So that makes sense. Let's look at the rest of the process. If I try 1.5, that reduces.1:05:14
1.5 turns out to be not good enough as a guess for the square root of two. So that reduces to trying the average of 1.5 and two divided1:05:23
by 1.5 as a guess for the square root of two. That average turns out to be 1.333. So this whole thing reduces to trying 1.333 as a guess for1:05:34
the square root of two. And then so on. That reduces to another called a "good enough," 1.4 something or other. And then it keeps going until the process finally stops with1:05:45
something that "good enough" thinks is good enough, which, in this case, is 1.4142 something or other. So the process makes perfect sense.1:05:59
This, by the way, is called a recursive definition.1:06:14
And the ability to make recursive definitions is a source of incredible power. And as you can already see I've hinted at, it's the thing1:06:24
that effectively allows you to do these infinite computations that go on until something is true, without having any other constricts other than the ability to call a procedure.1:06:35
Well, let's see, there's one more thing. Let me show you a variant of this definition of square root here on the slide.1:06:46
Here's sort of the same thing. What I've done here is packaged the definitions of "improve" and "good enough" and "try" inside "square1:06:55
root." So, in effect, what I've done is I've built a square root box. So I've built a box that's the square root procedure that1:07:07
someone can use. They might put in 36 and get out six. And then, packaged inside this box are the definitions of "try" and "good enough" and "improve."1:07:26
So they're hidden inside this box. And the reason for doing that is that, if someone's using this square root, if George is using this square root, George probably doesn't care very much that, when I implemented1:07:39
square root, I had things inside there called "try" and "good enough" and "improve." And in fact, Harry might have1:07:48
a cube root procedure that has "try" and "good enough" and "improve." And in order to not get the whole system confused, it'd be good for Harry to package his internal procedures inside his cube root procedure.1:07:58
Well, this is called block structure, this particular way of packaging internals inside of a definition.1:08:09
And let's go back and look at the slide again. The way to read this kind of procedure is to say, to define "square root," well, inside that definition, I'll have the1:08:23
definition of an "improve" and the definition of "good enough" and the definition of "try." And then, subject to those definitions, the way I do square root is to try one.1:08:36
And notice here, I don't have to say one as a guess for the square root of X, because since it's all inside the square root, it sort of has this X known.1:08:54
Let me summarize. We started out with the idea that what we're going to be doing is expressing imperative knowledge.1:09:04
And in fact, here's a slide that summarizes the way we looked at Lisp. We started out by looking at some primitive elements in1:09:13
addition and multiplication, some predicates for testing whether something is less-than or something's equal. And in fact, we saw really sneakily in the system we're1:09:22
actually using, these aren't actually primitives, but it doesn't matter. What matters is we're going to use them as if they're primitives. We're not going to look inside. We also have some primitive data and some numbers.1:09:34
We saw some means of composition, means of combination, the basic one being composing functions and building combinations with operators and operands.1:09:44
And there were some other things, like COND and "if" and "define." But the main thing about "define," in particular,1:09:53
was that it was the means of abstraction. It was the way that we name things. You can also see from this slide not only where we've been, but holes we have to fill in. At some point, we'll have to talk about how you combine1:10:03
primitive data to get compound data, and how you abstract data so you can use large globs of data as if they were primitive.1:10:13
So that's where we're going. But before we do that, for the next couple of lectures we're going to be talking about, first of all, how it is that1:10:25
you make a link between these procedures we write and the processes that happen in the machine. And then, how it is that you start using the power of Lisp1:10:36
to talk not only about these individual little computations, but about general conventional methods of doing things.1:10:45
OK, are there any questions? AUDIENCE: Yes. If we defined A using parentheses instead of as we did, what would be the difference? PROFESSOR: If I wrote this, if I wrote that, what I would be1:10:58
doing is defining a procedure named A. In this case, a procedure of no arguments, which, when I ran it, would1:11:07
give me back five times five. AUDIENCE: Right. I mean, you come up with the same thing, except for you really got a different-- PROFESSOR: Right. And the difference would be, in the old one--1:11:16
Let me be a little bit clearer here. Let's call this A, like here. And pretend here, just for contrast, I wrote, define D to1:11:35
be the product of five and five. And the difference between those, let's think about interactions with the Lisp interpreter.1:11:45
I could type in A and Lisp would return 25. I could type in D, if I just typed in D, Lisp would return1:12:01
compound procedure D, because that's what it is. It's a procedure. I could run D. I could say, what's the value of running D?1:12:12
Here is a combination with no operands. I see there are no operands. I didn't put any after D. And it would say, oh, that's 25.1:12:22
Or I could say, just for completeness, if I typed in, what's the value of running A? I get an error.1:12:31
The error would be the same one as over there. It'd be the error would say, sorry, 25, which is the value1:12:40
of A, is not an operator that I can apply to something.0:00:00
Lecture 1B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING BY J.S. BACH]0:00:14
PROFESSOR: Hi. You've seen that the job of a programmer is to design processes that accomplish particular goals, such as0:00:24
finding the square roots of numbers or other sorts of things you might want to do. We haven't introduced anything else yet. Of course, the way in which a programmer does this is by0:00:34
constructing spells, which are constructed out of procedures and expressions. And that these spells are somehow direct a process to0:00:46
accomplish the goal that was intended by the programmer. In order for the programmer to do this effectively, he has to understand the relationship between the particular things that he writes, these particular spells, and the0:00:56
behavior of the process that he's attempting to control. So what we're doing this lecture is attempt to establish that connection in as clear a way as possible.0:01:07
What we will particularly do is understand how particular patterns of procedures and expressions cause particular patterns of execution, particular0:01:17
behaviors from the processes. Let's get down to that. I'm going to start with a very simple program.0:01:28
This is a program to compute the sum of the squares of two numbers. And we'll define the sum of the squares of x and y to be0:01:45
the sum of the square of x-- I'm going to write it that way-- and the square of y where the square of x is the0:02:08
product of x and x. Now, supposing I were to say something to this, like, to0:02:17
the system after having defined these things, of the form, the sum of the squares of three and four, I am hoping0:02:26
that I will get out a 25. Because the square of three is nine, and the square of four is 16, and 25 is the sum of those. But how does that happen?0:02:36
If we're going to understand processes and how we control them, then we have to have a mapping from the mechanisms of this procedure into the way in which these processes behave.0:02:49
What we're going to have is a formal, or semi-formal, mechanical model whereby you understand how a machine could, in fact, in principle, do this. Whether or not the actual machine really does what I'm0:03:00
about to tell you is completely irrelevant at this moment. In fact, this is an engineering model in the same way that, electrical resistor, we write down a model v equals0:03:09
i r, it's approximately true. It's not really true. If I put up current through the resistor it goes boom. So the voltage is not always proportional to the current,0:03:20
but for some purposes the model is appropriate. In particular, the model we're going to describe right now, which I call the substitution model, is the simplest model0:03:29
that we have for understanding how procedures work and how processes work. How procedures yield processes. And that substitution model will be accurate for most of0:03:39
the things we'll be dealing with in the next few days. But eventually, it will become impossible to sustain the illusion that that's the way the machine works, and we'll go to other more specific and particular models that will0:03:50
show more detail. OK, well, the first thing, of course, is we say, what are the things we have here?0:03:59
We have some cryptic symbols. And these cryptic symbols are made out of pieces. There are kinds of expressions. So let's write down here the kinds of expressions there are.0:04:17
And we have-- and so far I see things like numbers. I see things like symbols like that.0:04:32
We have seen things before like lambda expressions, but they're not here. I'm going to leave them out. Lambda expressions, we'll worry about them later.0:04:44
Things like definitions. Things like conditionals.0:04:58
And finally, things like combinations.0:05:07
These kinds of expressions are-- I'll worry about later-- these are special forms. There are particular rules for each of these.0:05:17
I'm going to tell you, however, the rules for doing a general case. How does one evaluate a combination? Because, in fact, over here, all I really have are combinations and some symbols and numbers.0:05:29
And the simple things like a number, well, it will evaluate to itself. In the model I will have for you, the symbols will disappear. They won't be there at the time when you need them, when0:05:40
you need to get at them. So the only thing I really have to explain to you is, how do we evaluate combinations? OK, let's see.0:05:50
So first I want to get the first slide. Here is the rule for evaluating an application.0:06:01
What we have is a rule that says, to evaluate a combination, there are two parts, three parts to the rule. The combination has several parts.0:06:12
It has operators and it has operands. The operator returns into a procedure. If we evaluate the operator, we will get a procedure.0:06:22
And you saw, for example, how I'll type at the machine and out came compound procedure something or other. And the operands produce arguments.0:06:31
Once we've gotten the operator evaluated to get a procedure, and the argument is evaluated to get argument-- the operand's value to get arguments-- we apply the procedure to these arguments by copying the0:06:43
body of the procedure, which is the expression that the procedure is defined in terms of. What is it supposed to do? Substituting the argument supplied for the formal0:06:53
parameters of the procedure, the formal parameters being the names defined by the declaration of the procedure. Then we evaluate the resulting new body, the body resulting0:07:02
from copying the old body with the substitutions made. It's a very simple rule, and we're going to do it very formally for a little while.0:07:12
Because for the next few lectures, what I want you to do is to say, if I don't understand something, if I don't understand something, be very mechanical and do this.0:07:23
So let's see. Let's consider a particular evaluation, the one we were talking about before. The sum of the squares of three and three.0:07:35
What does that mean? It says, take-- well, I could find out what's on the square-- it's some procedure, and I'm not going to worry about the representation, and I'm not going to write it on the0:07:44
blackboard for you. And I have that three represents some number, but if I have to repeat that number, I can't tell you the number. The number itself is some abstract thing.0:07:54
There's a numeral which represents it, which I'll call three, and I'll use that in my substitution. And four is also a number. I'm going to substitute three for x and four for y in the0:08:06
body of this procedure that you see over here. Here's the body of the procedure. It corresponds to this combination, which is an addition.0:08:17
So what that reduces to, as a reduction step, we call it, is the sum of the square of three and the square of four.0:08:30
Now, what's the next step I have to do here? I say, well, I have to evaluate this. According to my rule, which you just saw on that overhead0:08:40
or slide, what we had was that we have to evaluate the operands-- and here are the operands, here's one and here's the next operand--0:08:49
and how we have to evaluate procedure. The order doesn't matter. And then we're going to apply the procedure, which is plus, and magically somehow that's going to produce the answer.0:08:59
I'm not to open up plus and look inside of it. However, in order to evaluate the operand, let's pick some arbitrary order and do them. I'm going to go from right to left.0:09:08
Well, in order to evaluate this operand, I have to evaluate the parts of it by the same rule. And the parts are I have to find out what square is-- it's some procedure, which has a formal parameter x.0:09:19
And also, I have an operand which is four, which I have to substitute for x in the body of square.0:09:28
So the next step is basically to say that this is the sum of the square of three and the product of four and four.0:09:40
Of course, I could open up asterisk if I liked-- the multiplication operation-- but I'm not going to do that. I'm going to consider that primitive.0:09:50
And, of course, at any level of detail, if you look inside this machine, you're going to find that there's multiple levels below that that you don't know about. But one of the things we have to learn how to0:09:59
do is ignore details. The key to understanding complicated things is to know what not to look at and what not compute and what not to think.0:10:09
So we're going to stop this one here and say, oh, yes, this is the product of two things. We're going to do it now. So this is nothing more than the sum of the square0:10:19
of three and 16. And now I have another thing I have to evaluate, but that square of three, well, it's the same thing.0:10:29
That's the sum of the product of three and three and 16, which is the sum of nine and 16, which is 25.0:10:44
So now you see the basic method of doing substitutions. And I warn you that this is not a perfect description of0:10:54
what the computer does. But it's a good enough description for the problems that we're going to have in the next few lectures that you0:11:03
should think about this religiously. And this is how the machine works for now. Later we'll get more detailed.0:11:12
Now, of course, I made a specific choice of the order of evaluation here. There are other possibilities. If we go back to the telestrator here and look at0:11:21
the substitution rule, we see that I evaluated the operator to get the procedures, and I evaluated the operands to get the arguments first, before I do the application.0:11:31
It's entirely possible, and there are alternate rules called normal order evaluation whereby you can do the substitution of the expressions which are the0:11:41
operands for the formal parameters inside the body first. And you'll get also the same answer. But right now, for concreteness, and because this0:11:50
is the way our machine really does it, I'm going to give you this rule, which has a particular order. But that order is to some extent arbitrary, too.0:12:01
In the long run, there are some reasons why you might pick one order or another, and we'll get to that later in the subject.0:12:12
OK, well now the only other thing I have to tell you about just to understand what's going on is let's look at the rule for conditionals. Conditionals are very simple, and I'd like to examine this.0:12:27
A conditional is something that is if-- there's also cond, of course-- but I'm going to give names to the parts of the expression. There's a predicate, which is a thing that is0:12:39
either true or false. And there's a consequent, which is the thing you do if the predicate is true.0:12:48
And there's an alternative, which is the thing you do if the predicate is false. It's important, by the way, to get names for, to get names0:13:00
for, the parts of things, or the parts of expressions. One of the things that every sorcerer will tell you is if you have the name of a spirit, you have power over it.0:13:10
So you have to learn these names so that we can discuss these things. So here we have a predicate, a consequent, and an alternative. And, using such words, we see that an if expression, the0:13:21
problems you evaluate to the predicate expression, if that yields true, then you then go on to evaluate the consequent. Otherwise, you evaluate the alternative expression.0:13:34
So I'd like to illustrate that now in the context of a particular little program.0:13:43
Going to write down a program which we're going to see many times. This is the sum of x and y done by what's called Peano0:13:58
arithmetic, which is all we're doing is incrementing and decrementing. And we're going to see this for a little bit. It's a very important program. If x equals zero, then the result is y.0:14:12
Otherwise, this is the sum of the decrement of x and the increment of y.0:14:23
We're going to look at this a lot more in the future. Let's look at the overhead. So here we have this procedure, and we're going to look at how we do the substitutions, the sequence of0:14:33
substitutions. Well, I'm going to try and add together three and four. Well, using the first rule that I showed you, we substitute three for x and four four y in the body of0:14:45
this procedure. The body of the procedure is the thing that begins with if and finishes over here. So what we get is, of course, if three is zero, then the0:14:54
result is four. Otherwise, it's the sum of the decrement of three and the increment of four. But I'm not going to worry about these yet because three0:15:04
is not zero. So the answer is not four. Therefore, this if reduces to an evaluation of the expression, the sum to the decrement of three and the0:15:14
increment of four. Continuing with my evaluation, the increment I presume to be primitive, and so I get a five there.0:15:23
OK, and then the decrement is also primitive, and I get a two. And so I change the problem into a simpler problem. Instead of adding three to four, I'm adding two to five.0:15:33
The reason why this is a simpler problem is because I'm counting down on x, and eventually, then, x will be zero.0:15:43
So, so much for the substitution rule. In general, I'm not going to write down intermediate steps when using substitutions having to do with ifs, because0:15:52
they just expand things to become complicated. What we will be doing is saying, oh, yes, the sum of three and four results in the sum of two and five and0:16:01
reduces to the sum of two and five, which, in fact, reduces to the sum of one and six, which reduces to the sum of zero and seven over here, which reduces to a seven.0:16:14
That's what we're going to be seeing. Are there any questions for the first segment yet? Yes? STUDENT: You're using one plus and minus one plus.0:16:24
Are those primitive operations? PROFESSOR: Yes. One of the things you're going to be seeing in this subject is I'm going to, without thinking about it, introduce0:16:33
more and more primitive operations. There's presumably some large library of primitive operations somewhere. But it doesn't matter that they're primitive-- there may be some manual that lists them all.0:16:43
If I tell you what they do, you say, oh, yes, I know what they do. So one of them is the decrementor-- minus one plus-- and the other operation is increment, which is one plus.0:16:53
Thank you. That's the end of the first segment. [MUSIC PLAYING BY J.S. BACH]0:17:19
PROFESSOR: Now that we have a reasonably mechanical way of understanding how a program made out of procedures and0:17:28
expressions evolves a process, I'd like to develop some intuition about how particular programs evolve particular processes, what the shapes of programs have to be in order0:17:39
to get particular shaped processes. This is a question about, really, pre-visualizing. That's a word from photography.0:17:49
I used to be interested in photography a lot, and one of the things you discover when you start trying to learn about photography is that you say, gee, I'd like to be a creative photographer.0:17:58
Now, I know the rules, I push buttons, and I adjust the aperture and things like that. But the key to being a creative person, partly, is to be able to do analysis at some level.0:18:09
To say, how do I know what it is that I'm going to get on the film before I push the button. Can I imagine in my mind the resulting image very precisely0:18:23
and clearly as a consequence of the particular framing, of the aperture I choose, of the focus, and things like that?0:18:32
That's part of the art of doing this sort of thing. And learning a lot of that involves things like test strips. You take very simple images that have varying degrees of0:18:44
density in them, for example, and examine what those look like on a piece of paper when you print them out. You find out what is the range of contrasts that you can0:18:54
actually see. And what, in a real scene, would correspond to the various levels and zones that you have of density in an image.0:19:05
Well, today I want to look at some very particular test strips, and I suppose one of them I see here is up on the telestrator, so we should switch to that.0:19:14
There's a very important, very important pair of programs for understanding what's going on in the evolution of a process0:19:24
by the execution of a program. What we have here are two procedures that are almost identical. Almost no difference between them at all.0:19:35
It's a few characters that distinguish them. These are two ways of adding numbers together. The first one, which you see here, the first one is the sum0:19:48
of two numbers-- just what we did before-- is, if the first one is zero, it's the answer of the second one. Otherwise, it's the sum of the decrement of the first and the increment of the second.0:19:57
And you may think of that as having two piles. And the way I'm adding these numbers together to make a0:20:06
third pile is by moving marbles from one to the other. Nothing more than that. And eventually, when I run out of one, then the other is the sum.0:20:15
However, the second procedure here doesn't do it that way. It says if the first number is zero, then the answer is the second.0:20:24
Otherwise, it's the increment of the sum of the decrement of the first number and the second. So what this says is add together the decrement of the0:20:35
first number and the second-- a simpler problem, no doubt-- and then change that result to increment it. And so this means that if you think about this in terms of0:20:45
piles, it means I'm holding in my hand the things to be added later. And then I'm going to add them in. As I slowly decrease one pile to zero, I've got what's left0:20:57
here, and then I'm going to add them back. Two different ways of adding. The nice thing about these two programs is that they're almost identical.0:21:06
The only thing is where I put the increment. A couple of characters moved around. Now I want to understand the kind of behavior we're going0:21:15
to get from each of these programs. Just to get them firmly in your mind-- I usually don't want to be this careful-- but just to get them firmly in your mind, I'm going to write0:21:24
the programs again on the blackboard, and then I'm going to evolve a process. And you're going to see what happens. We're going to look at the shape of the process as a consequence of the program.0:21:34
So the program we started with is this: the sum of x and y0:21:44
says if x is zero, then the result is y. Otherwise, it's the sum of the decrement of x and the0:21:56
increment of y. Now, supposing we wish to do this addition of three and0:22:05
four, the sum of three and four, well, what is that? It says that I have to substitute the arguments for0:22:14
the formal parameters in the body. I'm doing that in my mind. And I say, oh, yes, three is substituted for x, but three is not zero, so I'm going to go directly to this part and0:22:28
write down the simplified consequent here. Because I'm really interested in the behavior of addition. Well, what is that? That therefore turns into the sum of two and five.0:22:38
In other words, I've reduced this problem to this problem. Then I reduce this problem to the sum of one and six, and0:22:47
then, going around again once, I get the sum of zero and seven. And that's one where x equals zero so the result is y, and0:22:56
so I write down here a seven. So this is the behavior of the process evolved by trying to add together three and four with this program.0:23:07
For the other program, which is over here, I will define0:23:20
the sum of x and y. And what is it? If x is zero, then the result is y-- almost the same--0:23:32
otherwise the increment of the sum of the decrement of x and y.0:23:47
No. I don't have my balancer in front of me.0:23:56
OK, well, let's do it now. The sum of three and four. Well, this is actually a little more interesting. Of course, three is not zero as before, so that results in0:24:07
the increment of the sum of the decrement of x, which is two and four, which is the increment of0:24:19
the sum of one and-- whoops: the increment of the increment. What I have to do now is compute what this means.0:24:30
I have to evaluate this. Or what that is, the result of substituting two and four for x and y here. But that is the increment of the sum of one0:24:40
and four, which is-- well, now I have to expand this. Ah, but that's the increment of the increment of the0:24:52
increment of the sum of zero and four. Ah, but now I'm beginning to find things I can do.0:25:03
The increment of the increment of the increment of-- well, the sum of zero and four is four.0:25:12
The increment of four is five. So this is the increment of the increment of five, which is the increment of six, which is seven.0:25:26
Two different ways of computing sums. Now, let's see. These processes have very different shapes. I want you to feel these shapes.0:25:36
It's the feeling for the shapes that matters. What's some things we can see about this? Well, somehow this is sort of straight.0:25:45
It goes this way-- straight. This right edge doesn't vary particularly in size.0:25:54
Whereas this one, I see that this thing gets bigger and then it gets smaller. So I don't know what that means yet,0:26:03
but what are we seeing? We're seeing here that somehow these increments are expanding out and then contracting back.0:26:13
I'm building up a bunch of them to do later. I can't do them now. There's things to be deferred. Well, let's see.0:26:23
I can imagine an abstract machine. There's some physical machine, perhaps, that could be built to do it, which, in fact, executes these programs exactly as I tell you, substituting character strings in like this.0:26:34
Such a machine, the number of such steps is an approximation of the amount of time it takes. So this way is time.0:26:45
And the width of the thing is how much I have to remember in order to continue the process. And this much is space. And what we see here is a process that takes a time0:26:58
which is proportional to the argument x. Because if I made x larger by one, then I'd had an extra line.0:27:08
So this is a process which is space-- sorry-- time. The time of this process is what we say order of x.0:27:20
That means it is proportional to x by some constant of proportionality, and I'm not particularly interested in what the constant is. The other thing we see here is that the amount of space this0:27:31
takes up is constant, it's proportional to one. So the space complexity of this is order of one.0:27:42
We have a name for such a process. Such a process is called an iteration.0:27:51
And what matters here is not that some particular machine I designed here and talked to you about and called a substitution machine or whatever--0:28:00
substitution model-- managed to do this in constant space. What really matters is this tells us a bound. Any machine could do this in constant space.0:28:09
This algorithm represented by this procedure is executable in constant space. Now, of course, the model is ignoring some things, standard0:28:18
sorts of things. Like numbers that are bigger take up more space and so on. But that's a level of abstraction at which I'm cutting off. How do you represent numbers? I'm considering every number to be the same size.0:28:28
And numbers grow slowly for the amount of space they take up and their size. Now, this algorithm is different in its complexity.0:28:38
As we can see here, this algorithm has a time complexity which is also proportional to the input0:28:48
argument x. That's because if I were to add one to three, if I made a larger problem, which is larger by one here, then I'd add a line at the top and I'd add a line at the bottom.0:29:00
And the fact that it's a constant amount, like this is twice as many lines as that, is not interesting at the level of detail I'm talking about right now. So this is a time complexity order of the input argument x.0:29:13
And space complexity, well, this is more interesting. I happen to have some overhead, which you see over here, which is constant approximately.0:29:23
Constant overhead. But then I have something which increases and decreases and is proportional to the input argument x. The input argument x is three. That's why there are three deferred increments sitting0:29:34
around here. See? So the space complexity here is also order x. And this kind of process, named for the kind of process,0:29:44
this is a recursion. A linear recursion, I will call it, because of the fact0:29:56
that it's proportional to the input argument in both time and space. This could have been a linear iteration.0:30:13
So then what's the essence of this matter? This matter isn't so obvious. Maybe there are other models by which we can describe the differences between iterative and recursive processes.0:30:23
Because this is hard now. Remember, we have-- those are both recursive definitions. What we're seeing there are both recursive definitions,0:30:32
definitions that refer to the thing being defined in the definition. But they lead to different shape processes. There's nothing special about the fact that the definition0:30:42
is recursive that leads to a recursive process. OK. Let's think of another model. I'm going to talk to you about bureaucracy.0:30:52
Bureaucracy is sort of interesting. Here we see on a slide an iteration. An iteration is sort of a fun kind of process.0:31:04
Imagine that there's a fellow called GJS-- that stands for me-- and he's got a problem: he wants to add together three and four.0:31:13
This fella here wants to add together three and four. Well, the way he's going to do it-- he's lazy-- is he's going to find somebody else to help him do it. They way he finds someone else to--0:31:22
he finds someone else to help him do it and says, well, give me the answer to three and four and return the result to me. He makes a little piece of paper and says, here, here's a0:31:32
piece of paper-- you go ahead and solve this problem and give the result back to me. And this guy, of course, is lazy, too. He doesn't want to see this piece of paper again.0:31:41
He says, oh, yes, produce a new problem, which is the sum of two ad five, and return the result back to GJS.0:31:50
I don't want to see it again. This guy does not want to see this piece of paper. And then this fellow makes a new problem, which is the0:32:01
addition of the sum of one and six, and he give it to this fella and says, produce that answer and returned it to GJS. And that produces a problem, which is to add together zero0:32:11
and seven, and give the result to GJS. This fella finally just says, oh, yeah, the answer is seven, and sends it back to GJS. That's what an iteration is.0:32:20
By contrast, a recursion is a slightly different kind of process. This one involves more bureaucracy. It keeps more people busy.0:32:30
It keeps more people employed. Perhaps it's better for that reason. But here it is: I want the answer to the problem three and four. So I make a piece of paper that says, give the result0:32:40
back to me. Give it to this fella. This fellow says, oh, yes, I will remember that I have to add later, and I want to get the answer the problem two0:32:51
plus four, give that one to Harry, and have the results sent back to me-- I'm Joe. When the answer comes back from Harry, which is a six, I0:33:01
will then do the increment and give that seven back to GJS. So there are more pieces of paper outstanding in the0:33:10
recursive process than the iteration. There's another way to think about what an iteration is and0:33:19
the difference between an iteration and a recursion. You see, the question is, how much stuff is under the table? If I were to stop--0:33:28
supposing I were to kill this computer right now, OK? And at this point I lose the state of affairs, well, I0:33:37
could continue the computation from this point but everything I need to continue the computation is in the valuables that were defined in the procedure that the0:33:48
programmer wrote for me. An iteration is a system that has all of its state in explicit variables. Whereas the recursion is not quite the same.0:34:01
If I were to lose this pile of junk over here, and all I was left with was the sum of one and four, that's not enough information to continue the process of computing out the0:34:10
seven from the original problem of adding together three of four. Besides the information that's in the variables of the formal0:34:20
parameters of the program, there is also information under the table belonging to the computer, which is what things have been deferred for later.0:34:30
And, of course, there's a physical analogy to this, which is in differential equations, for example, when we talk about something like drawing a circle.0:34:42
Try to draw a circle, you make that out of a differential equation which says the change in my state as a function of0:34:51
my current state. So if my current state corresponds to particular values of y and x, then I can compute from them a derivative0:35:00
which says how the state must change. And, in fact, you can see this was a circle because if I0:35:09
happen to be, say, at this place over here, at one, zero, for example, on this graph, then it means that the0:35:20
derivative of y is x, which we see over here. That's one, so I'm going up. And the derivative of x is minus y, which0:35:29
means I'm going backwards. I'm actually doing nothing at this point, then I start going backwards as y increases. So that's how you make a circle.0:35:40
And the interesting thing to see is a little program that will draw a circle by this method. Actually, this won't draw a circle because it's a forward oil or integrator and will eventually0:35:49
spiral out and all that. But it'll draw a circle for a while before it starts spiraling. However, what we see here is two state variables, x and y.0:35:58
And there's an iteration that says, in order to circle, given an x and y, what I want is to circle with the next values of x and y being the old value of x decrement by y0:36:08
times dt where dt is the time step and the old value of y being implemented by x times dt, giving me the new values0:36:17
of x and y. So now you have a feeling for at least two different kinds of processes that can be evolved by0:36:28
almost the same program. And with a little bit of perturbation analysis like this, how you change a program a little bit and see how the0:36:37
process changes, that's how we get some intuition. Pretty soon we're going to use that intuition to build big, hairy, complicated systems.0:36:46
Thank you. [MUSIC PLAYING BY J.S. BACH]0:37:06
PROFESSOR: Well, you've just seen a simple perturbational analysis of some programs. I took a program that was very similar to another program and looked at them both and saw how they evolved processes.0:37:18
I want to show you some variety by showing you some other processes and shapes they may have. Again, we're going to take very simple things, programs that you wouldn't want to ever write.0:37:29
They would be probably the worst way of computing some of the things we're going to compute. But I'm just going to show you these things for the purpose of feeling out how to program represents itself as the rule0:37:42
for the evolution of a process. So let's consider a fun thing, the Fibonacci numbers. You probably know about the Fibonacci numbers.0:37:53
Somebody, I can't remember who, was interested in the growth of piles of rabbits. And for some reason or other, the piles of rabbits tend to0:38:03
grow exponentially, as we know. And we have a nice model for this process, is that we start with two numbers, zero and one.0:38:13
And then every number after this is the sum of the two previous. So we have here a one. Then the sum of these two is two.0:38:22
The sum of those two is three. The sum of those two is five. The sum of those two is eight. The sum of those two is 13.0:38:31
This is 21. 34. 55. Et cetera.0:38:40
If we start numbering these numbers, say this is the zeroth one, the first one, the second one, the third one, the fourth one, et cetera. This is the 10th one, the 10th Fibonacci number.0:38:51
These numbers grow very fast. Just like rabbits. Why rabbits grow this way I'm not going to hazard a guess. Now, I'm going to try to write for you the very simplest0:39:02
program that computes Fibonacci numbers. What I want is a program that, given an n, will produce for0:39:13
me Fibonacci event. OK? I'll write it right here.0:39:28
I want the Fibonacci of n, which means the-- this is the n, and this is Fibonacci of n. And here's the story.0:39:38
If n is less than two, then the result is n. Because that's what these are.0:39:47
That's how you start it up. Otherwise, the result is the sum of Fib of n minus one and0:39:58
the Fibonacci number, n minus two.0:40:10
So this is a very simple, direct specification of the description of Fibonacci numbers that I gave you when I introduced those numbers. It represents the recurrence relation in the simplest0:40:21
possible way. Now, how do we use such a thing? Let's draw this process. Let's figure out what this does. Let's consider something very simple by computing0:40:31
Fibonacci of four. To compute Fibonacci of four, what do I do? Well, it says I have--0:40:41
it's not less than two. Therefore it's the sum of two things. Well, in order to compute that I have to compute, then, Fibonacci of three and Fibonacci of two.0:40:57
In order to compute Fibonacci of three, I have to compute Fibonacci of two and Fibonacci of one.0:41:08
In order to compute Fibonacci of two, I have to compute Fibonacci of one and Fibonacci of zero. In order to compute Fibonacci of one, well,0:41:18
the answer is one. That's from the base case of this recursion. And in order to compute Fibonacci of one, well, that0:41:28
answer is zero, from the same base. And here is a one. And Fibonacci of two is really the sum of Fibonacci of one.0:41:38
And Fib of zero, in order to compute that, I get a one, and here I've got a zero.0:41:47
I've built a tree. Now, we can observe some things about this tree. We can see why this is an extremely bad way to compute0:41:56
Fibonacci numbers. Because in order to compute Fibonacci of four, I had to compute Fibonacci of two's sub-tree twice.0:42:07
In fact, in order way to add one more, supposing I want to do Fibonacci of five, what I really have to do then is compute Fibonacci of four plus Fibonacci of three.0:42:18
But Fibonacci of three's sub-tree has already been built. This is a prescription for a process that's0:42:27
exponential in time. To add one, I have to multiply by something because I take a proportion of the existing thing and add it to itself to0:42:38
add one more step. So this is a thing whose time complexity is order of--0:42:48
actually, it turns out to be Fibonacci-- of n. There's a thing that grows exactly at Fibonacci numbers.0:43:01
It's a horrible thing. You wouldn't want to do it. The reason why the time has to grow that way is because we're presuming in the model-- the substitution model that I gave you, which I'm not doing formally here, I sort of now spit it out in a simple way--0:43:14
but presuming that everything is done sequentially. That every one of these nodes in this tree has to be examined.0:43:24
And so since the number of nodes in this tree grows exponentially, because I add a proportion of the existing nodes to the nodes I already have to add one, then I know0:43:35
I've got an exponential explosion here. Now, let's see if we can think of how much space this takes up.0:43:44
Well, it's not so bad. It depends on how much we have to remember in order to continue this thing running. Well, that's not so hard. It says, gee, in order to know where I am in this tree, I0:43:54
have to have a path back to the root. In other words, in order to-- let's consider the path I would have to execute this. I'd say, oh, yes, I'm going to go down here.0:44:03
I don't care which direction I go. I have to do this. I have to then do this. I have to traverse this tree in a sort of funny way.0:44:12
I'm going to walk this nice little path. I come back to here. Well, I've got to remember where I'm going to be next. I've got to keep that in mind. So I have to know what I've done.0:44:21
I have to know what's left. In order to compute Fibonacci of four, at some point I'm going to have to be down here. And I have to remember that I have to go back and then go0:44:32
back to here to do an addition. And then go back to here to do an addition to something I haven't touched yet. The amount of space that takes up is the path, the longest path.0:44:42
How long it is. And that grows as n. So the space-- because that's the length of the deepest0:44:53
line through the tree-- the space is order of n. It's a pretty bad process.0:45:09
Now, one thing I want to see from this is a feeling of what's going on here. Why are there-- how is this program related to this process?0:45:20
Well, what are we seeing here? There really are only two sorts of things this program does. This program consists of two rules, if you will.0:45:29
One rule that says Fibonacci of n is this sum that you see over here, which is a node that's shaped like this.0:45:42
It says that I break up something into two parts. Under some condition over here that n is greater than two,0:45:52
then the node breaks up into two parts. Less than two. No. Greater than two. Yes.0:46:01
The other possibility is that I have a reduction that looks like this. And that's this case.0:46:10
If it's less than two, the answer is n itself. So what we're seeing here is that the process that got built locally at every place is an instance of this rule.0:46:22
Here's one instance of the rule. Here is another instance of the rule. And the reason why people think of programming as being hard, of course, is because you're writing down a general0:46:32
rule, which is going to be used for lots of instances, that a particular instance-- it's going to control each particular instance for you.0:46:43
You've got to write down something that's a general in terms of variables, and you have to think of all the things that could possibly fit in those variables, and all those have to lead to the process you want to work.0:46:53
Locally, you have to break up your process into things that can be represented in terms of these very specific local rules.0:47:03
Well, let's see. Fibonaccis are, of course, not much fun. Yes, they are. You get something called the golden ratio, and we may even0:47:12
see a lot of that some time. Well, let's talk about another thing. There's a famous game called the Towers of Hanoi, because I want to teach you how to think about these recursively.0:47:24
The problem is this one: I have a bunch of disks, I have a bunch of spikes, and it's rumored that somewhere in the0:47:34
Orient there is a 64-high tower, and the job of various monks or something is to move these spikes in some complicated pattern so eventually--0:47:43
these disks-- so eventually I moved all of the disks from one spike to the other. And if it's 64 high, and it's going to take two to the 64th0:47:54
moves, then it's a long time. They claim that the universe ends when this is done.0:48:03
Well, let's see. The way in which you would construct a recursive process is by wishful thinking. You have to believe.0:48:14
So, the idea. Supposing I want to move this pile from here to here, from spike one to spike two, well, that's not so hard.0:48:25
See, supposing somehow, by some magic-- because I've got a simpler problem-- I move a three-high pile to here-- I can only move one disk at a time, so identifying how I did it. But supposing I could do that, well, then I could just pick0:48:37
up this disk and move it here. And now I have a simple problem. I have to move a three-high tower to here, which is no problem.0:48:46
So by two moves of a three high tower plus one move of a single object, I can move the tower from here to here.0:48:55
Now, whether or not-- this is not obvious in any deep way that this works. And why?0:49:04
Now, why is it the case that I can presume, maybe, that I can move the three-high tower? Well, the answer is because I'm always counting down, and0:49:14
eventually I get down to zero-high tower, and a zero-high tower requires no moves. So let's write the algorithm for that.0:49:24
Very easy. I'm going to label these towers with numbers, but it doesn't matter what they're labelled with. And the problem is to move an n-high tower from a spike0:49:35
called From to a spike called To with a particular spike called Spare. That's what we're going to do.0:49:50
Using the algorithm I informally described to you, move of a n-high tower from From to To with a Spare.0:50:06
Well, I've got two cases, and this is a case analysis, just like it is in all the other things we've done.0:50:20
If n is zero, then-- I'm going to put out some answers-- Done, we'll say. I don't know what that means.0:50:29
Because we'll never use that answer for anything. We're going to do these moves. Else. I'm going to do a move.0:50:40
Move a tower of height less than n, the decrement of n height. Now, I'm going to move it to the Spare tower.0:50:51
The whole idea now is to move this from here to here, to the Spare tower-- so from From to Spare--0:51:03
using To as a spare tower. Later, somewhere later, I'm going to move that same n-high0:51:14
tower, after I've done this. Going to move that same n minus one-high tower from the Spare tower to the To tower using the0:51:24
From tower as my spare. So the Spare tower to the To tower using0:51:40
the From as the spare. All I have to do now is when I've gotten it in this0:51:51
condition, between these two moves of a whole tower-- I've got it into that condition-- now I just have to move one disk.0:52:03
So I'm going to say that some things are printing a move and I don't care how it works. From the To.0:52:17
Now, you see the reason why I'm bringing this up at this moment is this is an almost identical program to this one in some sense.0:52:26
It's not computing the same mathematical quantity, it's not exactly the same tree, but it's going to produce a tree. The general way of making these moves is going to lead0:52:38
to an exponential tree. Well, let's do this four-high. I have my little crib sheet here otherwise I get confused.0:52:54
Well, what I'm going to put in is the question of move a tower of height four from one to spike two using spike three0:53:10
as a spare. That's all I'm really going to do. You know, let's just do it. I'm not going to worry about writing out the traits of this. You can do that yourself because it's very simple.0:53:21
I'm going to move disk one to disk three. And how do I get to move disk one to disk three? How do I know that? Well, I suppose I have to look at the trace a little bit.0:53:32
What am I doing here? Well, and this is not-- n is not zero. So I'm going to look down here. This is going to require doing two moves.0:53:41
I'm only going to look at the first one. It's going to require moving-- why do I have move tower? It makes it harder for me to move.0:53:52
I'm going to move a three-high tower from the from place, which is four, to the spare, which is two,0:54:04
using three as my-- no, using from--0:54:15
STUDENT: [INAUDIBLE PHRASE]. PROFESSOR: Yes. I'm sorry. From two-- from one to three using two as my spare.0:54:26
That's right. And then there's another move over here afterwards. So now I say, oh, yes, that requires me moving a two-high0:54:37
tower from one to two using three as a spare. And so, are the same, and that's going to require me moving and one-high tower from one to three0:54:52
using two as a spare. Well, and then there's lots of other things to be done.0:55:03
So I move my one-high tower from one to three using two as a spare, which I didn't do anything with. Well, this thing just proceeds very simply.0:55:15
I move this from one to two. And I move this disk from three to two. And I don't really want to do it, but I move from one to three.0:55:24
Then I move two to one. Then I move two to three. Then one to three.0:55:36
One to two. Three to two. Three to one. This all got worked out beforehand, of course.0:55:46
Two to one. Three to two. One to three. STUDENT: [INAUDIBLE PHRASE]. PROFESSOR: Oh, one to three.0:55:55
Excuse me. Thank you. One to two. And then three to two. Whew.0:56:04
Now what I'd like you to think about, you just saw a recursive algorithm for doing this, and it takes exponential time, of course. Now, I don't know if there's any algorithm that doesn't take exponential time-- it has to.0:56:14
As I'm doing one operation-- I can only move one thing at a time-- there's no algorithm that's not going to take exponential time. But can you write an iterative algorithm rather than a0:56:24
recursive algorithm for doing this? One of the sort of little things I like to think about.0:56:33
Can you write one that, in fact, doesn't break this problem into two sub-problems the way I described, but rather proceeds a step at a time using a more local rule?0:56:48
That might be fun. Thank you so much for the third segment. Are there questions?0:56:57
STUDENT: [INAUDIBLE] a way to reduce a tree or recursion problem, how do you save the immediate work you have done0:57:06
in computing the Fibonacci number? PROFESSOR: Oh, well, in fact, one of the ways to do is what you just said. You said, I save the intermediate work.0:57:16
OK? Well, let me tell you-- this, again, we'll see later-- but suppose it's the case that anytime I compute anything, any one of these Fibonacci numbers, I remember the table0:57:28
that takes only linear time to look up the answer. Then if I ever see it again, instead of doing the expansional tree, I look it up.0:57:37
I've just transformed my problem into a problem that's much simpler. Now, of course, there are the way to do this, as well. That one's called memoization, and you'll see it sometime0:57:47
later in this term. But I suppose there's a very simple linear time, and, in fact, iterative model for computing Fibonaccis, and0:57:57
that's another thing you should sit down and work out. That's important. It's important to see how to do this. I want you to practice.0:00:00
Lecture 2A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:25
PROFESSOR: Well, yesterday was easy. You learned all of the rules of programming and lived. Almost all of them.0:00:34
And so at this point, you're now certified programmers-- it says. However, I suppose what we did is we, aah, sort of got you a0:00:48
little bit of into an easy state. Here, you still believe it's possible that this might be programming in BASIC or Pascal with just a funny syntax.0:00:59
Today, that illusion-- or you can no longer support that belief. What we're going to do today is going to completely smash that.0:01:08
So let's start out by writing a few programs on the blackboard that have a lot in common with each other. What we're going to do is try to make them abstractions that0:01:19
are not ones that are easy to make in most languages. Let's start with some very simple ones that you can make in most languages.0:01:28
Supposing I want to write the mathematical expression which adds up a bunch of integers. So if I wanted to write down and say the sum from i0:01:38
equal a to b on i. Now, you know that that's an easy thing to compute in a closed form for it, and I'm not interested in that. But I'm going to write a program that0:01:47
adds up those integers. Well, that's rather easy to do to say I want to define the0:01:57
sum of the integers from a to b to be--0:02:08
well, it's the following two possibilities. If a is greater than b, well, then there's nothing to be0:02:17
done and the answer is zero. This is how you're going to have to think recursively. You're going to say if I have an easy case that I know the answer to, just write it down.0:02:26
Otherwise, I'm going to try to reduce this problem to a simpler problem. And maybe in this case, I'm going to make a subproblem of the simpler problem and then do something to the result.0:02:35
So the easiest way to do this is say that I'm going to add the index, which in this case is a, to the result of adding0:02:46
up the integers from a plus 1 to b.0:03:02
Now, at this point, you should have no trouble looking at such a definition. Indeed, coming up with such a thing might be a little hard in synthesis, but being able to read it at this point0:03:12
should be easy. And what it says to you is, well, here is the subproblem I'm going to solve. I'm going to try to add up the integers, one fewer integer0:03:24
than I added up for the the whole problem. I'm adding up the one fewer one, and that subproblem, once I've solved it, I'm going to add a to that, and that will0:03:35
be the answer to this problem. And the simplest case, I don't have to do any work. Now, I'm also going to write down another simple one just0:03:44
like this, which is the mathematical expression, the sum of the square from i equal a to b.0:03:55
And again, it's a very simple program.0:04:11
And indeed, it starts the same way. If a is greater than b, then the answer is zero.0:04:21
And, of course, we're beginning to see that there's something wrong with me writing this down again. It's the same program. It's the sum of the square of a and the sum of the square of0:04:42
the increment and b. Now, if you look at these things, these programs are0:04:54
almost identical. There's not much to distinguish them. They have the same first clause of the conditional and0:05:03
the same predicate and the same consequence, and the alternatives are very similar, too. They only differ by the fact that where here I have a,0:05:15
here, I have the square of a. The only other difference, but this one's sort of unessential is in the name of this procedure is sum int, whereas0:05:25
the name of the procedure is sum square. So the things that vary between these two are very small. Now, wherever you see yourself writing the same thing down0:05:36
more than once, there's something wrong, and you shouldn't be doing it. And the reason is not because it's a waste of time to write something down more than once.0:05:45
It's because there's some idea here, a very simple idea, which has to do with the sigma notation--0:05:54
this much-- not depending upon what it is I'm adding up. And I would like to be able to--0:06:03
always, whenever trying to make complicated systems and understand them, it's crucial to divide the things up into as many pieces as I can, each of which I understand separately.0:06:13
I would like to understand the way of adding things up independently of what it is I'm adding up so I can do that having debugged it once and understood it once and having0:06:24
been able to share that among many different uses of it. Here, we have another example. This is Leibnitz's formula for finding pi over 8.0:06:40
It's a funny, ugly mess. What is it? It's something like 1 over 1 times 3 plus 1 over 5 times 70:06:50
plus 1 over 9 times 11 plus-- and for some reason, things like this tend to have0:06:59
interesting values like pi over 8. But what do we see here? It's the same program or almost the same program. It's a sum.0:07:09
So we're seeing the figure notation, although over here, we're dealing with incrementing by 4, so it's a slightly different problem, which means that over here, I0:07:20
have to change a by 4, as you see right over here. It's not by 1. The other thing, of course, is that the thing that's0:07:31
represented by square in the previous sum of squares, or a when adding up the integers. Well, here, I have a different thing I'm adding up, a different term, which is 1 over a times a plus 2.0:07:44
But the rest of this program is identical. Well, any time we have a bunch of things like this that are identical, we're going to have to come up with some sort of0:07:53
abstraction to cover them. If you think about this, what you've learned so far is the rules of some language, some primitive, some means of0:08:03
combination, almost all of them, the means of abstraction, almost all of them. But what you haven't learned is common patterns of usage.0:08:13
Now, most of the time, you learn idioms when learning a language, which is a common pattern that mean things that are useful to know in a flash. And if you build a great number of them, if you're a0:08:22
FORTRAN programmer, of course, everybody knows how to-- what do you do, for example, to get an integer which is the biggest integer in something.0:08:31
It's a classic thing. Every FORTRAN programmer knows how to do that. And if you don't know that, you're in real hot water because it takes a long time to think it out. However, one of the things you can do in this language that0:08:41
we're showing you is not only do you know something like that, but you give the knowledge of that a name. And so that's what we're going to be going after right now.0:08:53
OK, well, let's see what these things have in common. Right over here we have what appears to be a general0:09:02
pattern, a general pattern which covers all of the cases we've seen so far. There is a sum procedure, which is being defined.0:09:15
It has two arguments, which are a lower bound and an upper bound. The lower bound is tested to be greater than the upper bound, and if it is greater, then the result is zero.0:09:27
Otherwise, we're going to do something to the lower bound, which is the index of the conversation, and add that result to the result of following the procedure0:09:40
recursively on our lower bound incremented by some next operation with the same upper bound as I had before.0:09:53
So this is a general pattern, and what I'd like to do is be able to name this general pattern a bit.0:10:03
Well, that's sort of easy, because one of the things I'm going to do right now is-- there's nothing very special about numbers. Numbers are just one kind of data.0:10:14
It seems to me perfectly reasonable to give all sorts of names to all kinds of data, for example, procedures.0:10:23
And now many languages allow you have procedural arguments, and right now, we're going to talk about procedural arguments. They're very easy to deal with. And shortly, we'll do some remarkable things that are not0:10:33
like procedural arguments. So here, we'll define our sigma notation.0:10:43
This is called sum and it takes a term, an A, a next0:10:55
term, and B as arguments. So it takes four arguments, and there was nothing particularly special about me writing this in lowercase.0:11:06
I hope that it doesn't confuse you, so I'll write it in uppercase right now. The machine doesn't care. But these two arguments are different.0:11:17
These are not numbers. These are going to be procedures for computing something given a number. Term will be a procedure which, when given an index,0:11:26
will produce the value of the term for that index. Next will be given an index, which will produce the next index. This will be for counting.0:11:36
And it's very simple. It's exactly what you see. If A is greater than B, then the result is 0.0:11:52
Otherwise, it's the sum of term applied to A and the sum0:12:04
of term, next index.0:12:14
Let me write it this way.0:12:29
Now, I'd like you to see something, first of all. I was writing here, and I ran out of space. What I did is I start indenting according to the0:12:38
Pretty-printing rule, which says that I align all of the arguments of the procedure so I can see which ones go together.0:12:47
And this is just something I do automatically, and I want you to learn how to do that, too, so your programs can be read and understood. However, what do we have here?0:12:57
We have four arguments: the procedure, the lower index-- lower bound index-- the way to get the next index, and the upper bound.0:13:09
What's passed along on the recursive call is indeed the same procedure because I'm going to need it again, the0:13:18
next index, which is using the next procedure to compute it, the procedure for computing next, which I also have to have separately, and that's different. The procedure for computing next is different from the0:13:27
next index, which is the result of using next on the last index. And I also have to pass along the upper bound.0:13:37
So this captures both of these and the other nice program that we are playing with.0:13:47
So using this, we can write down the original program as instances of sum very simply.0:14:08
A and B. Well, I'm going to need an identity procedure0:14:17
here because ,ahh, the sum of the integers requires me to in0:14:29
this case compute a term for every integer, but the term procedure doesn't want to do anything to that integer. So the identity procedure on A is A or X or whatever, and I0:14:41
want to say the sum of using identity of the term procedure0:14:52
and using A as the initial index and the incrementer being the way to get the next index and B being the high0:15:05
bound, the upper bound. This procedure does exactly the same as the sum of the integers over here, computes the same answer.0:15:17
Now, one thing you should see, of course, is that there's nothing very special over here about what I used as the formal parameter. I could have, for example, written this0:15:27
X. It doesn't matter. I just wanted you to see that this name does not conflict with this one at all. It's an internal name.0:15:37
For the second procedure here, the sum of the squares, it's even a little bit easier.0:15:53
And what do we have to do? Nothing more than add up the squares, this is the procedure0:16:02
that each index will be given, will be given each-- yes. Each index will have this done to it to get the term. That's the thing that maps against term over here.0:16:13
Then I have A as the lower bound, the incrementer as the next term method, and B as the upper bound.0:16:26
And finally, just for the thing that we did about pi sums, pi sums are sort of-- well, it's even easier to think about them this way0:16:35
because I don't have to think. What I'm doing is separating the thing I'm adding up from the method of doing the addition. And so we have here, for example, pi sum A B0:16:57
of the sum of things. I'm going to write the terms procedure here explicitly without giving it a name. This is done anonymously.0:17:07
I don't necessarily have to give a name to something if I just want to use it once. And, of course, I can write sort of a expression that0:17:18
produces a procedure. I'm going to write the Greek lambda letter here instead of L-A-M-B-D-A in general to avoid taking up a lot of space on blackboards.0:17:27
But unfortunately, we don't have lambda keys on our keyboards. Maybe we can convince our friends in the computer industry that this is an important. Lambda of i is the quotient of 1 and the product of i and the0:17:43
sum of i 2, starting at a with the way of incrementing being0:17:58
that procedure of an index i, which adds i to 4, and b being0:18:08
the upper bound. So you can see that this notation, the invention of the0:18:17
procedure that takes a procedural argument, allows us to compress a lot of these procedures into one thing.0:18:26
This procedure, sums, covers a whole bunch of ideas. Now, just why is this important? I tried to say before that it helps us divide a problem into0:18:37
two pieces, and indeed, it does, for example, if someone came up with a different way of implementing this, which,0:18:46
of course, one might. Here, for example, an iterative implementation of sum.0:18:55
Iterative implementation for some reason might be better than the recursive implementation. But the important thing is that it's different.0:19:06
Now, supposing I had written my program this way that you see on the blackboard on the left. That's correct, the left.0:19:17
Well, then if I want to change the method of addition, then I'd have to change each of these. Whereas if I write them like this that you see here, then0:19:30
the method by which I did the addition is encapsulated in the procedure sum. That decomposition allows me to independently change one part of the program and prove it perhaps without changing0:19:43
the other part that was written for some of the other cases. Thank you. Are there any questions?0:19:52
Yes, sir. AUDIENCE: Would you go over next A and next again on-- PROFESSOR: Yes. It's the same problem. I'm sure you're going to-- you're going to have to work on this. This is hard the first time you've ever seen0:20:01
something like this. What I have here is a-- procedures can be named by variables.0:20:10
Procedures are not special. Actually, sum square is a variable, which has gotten a value, which is a procedure. This is define sum square to be0:20:20
lambda of A and B something. So the procedure can be named. Therefore, they can be passed from one to another, one procedure to another, as arguments.0:20:31
Well, what we're doing here is we're passing the procedure term as an argument to sum just when we get it around in the next recursive.0:20:41
Here, we're passing the procedure next as an argument also. However, here we're using the procedure next.0:20:50
That's what the parentheses mean. We're applying next to A to get the next value of A. If you look at what next is mapped against, remember that0:20:59
the way you think about this is that you substitute the arguments for the formal parameters in the body. If you're ever confused, think of the thing that way.0:21:10
Well, over here, with sum of the integers. I substitute identity for a term and 1 plus the0:21:21
incrementer for next in the body. Well, the identity procedure on A is what I get here.0:21:30
Identity is being passed along, and here, I have increment 1 plus being applied to A and 1 plus is being0:21:41
passed along. Does that clarify the situation? AUDIENCE: We could also define explicitly those two functions, then pass them.0:21:51
PROFESSOR: Sure. What we can do is we could have given names to them, just like I did here. In fact, I gave you various ways so you could see it, a variety. Here, I define the thing which I passed the name of.0:22:05
I referenced it by its name. But the thing is, in fact, that procedure, one argument X, which is X. And the identity procedure is just0:22:14
lambda of X X. And that's what you're seeing here. Here, I happened to just write its canonical name there for0:22:26
you to see. Is it OK if we take our five-minute break?0:23:15
As I said, computers to make people happy, not people to make computers happy. And for the most part, the reason why we introduce all this abstraction stuff is to make it so that programs can0:23:26
be more easily written and more easily read. Let's try to understand what's the most complicated program we've seen so far using a little bit of0:23:36
this abstraction stuff. If you look at the slide, this is the Heron of Alexandria's method of computing square roots that we saw yesterday.0:23:51
And let's see. Well, in any case, this program is a little0:24:00
complicated. And at the current state of your thinking, you just can't look at that and say, oh, this obviously means something very clear.0:24:10
It's not obvious from looking at the program what it's computing. There's some loop here inside try, and a loop does something0:24:21
about trying the improvement of y. There's something called improve, which does some0:24:30
averaging and quotienting and things like that. But what's the real idea? Can we make it clear what the idea is? Well, I think we can.0:24:41
I think we can use abstraction that we have learned about so far to clarify what's going on. Now, what we have mathematically is a procedure0:24:54
for improving a guess for square roots. And if y is a guess for a square root, then what we want to get we'll call a function f.0:25:04
This is the means of improvement. I want to get y plus x/y over 2, so the average of y and x0:25:17
divided by y as the improved value for the square root of x such that-- one thing you can notice about this function f0:25:27
is that f of the square root of f is in fact the0:25:36
square root of x. In other words, if I take the square root of x and substitute it for y here, I see the square root of x plus x divided by the square of x, which is the square root of x.0:25:47
That's 2 times the square root of x divided by 2, is the square root of x. So, in fact, what we're really looking for is we're looking for a fixed point, a fixed point of the function f.0:26:17
A fixed point is a place which has the property that if you put it into the function, you get the same value out.0:26:27
Now, I suppose if I were giving some nice, boring lecture, and you happened to have in front of you an HP-35 desk calculator like I used to have when I0:26:36
went to boring lectures. And if you think it was really boring, you put it into radians mode, and you hit cosine, and you hit cosine, and you hit cosine.0:26:45
And eventually, you end up with 0.734 or something like that. 0.743, I don't remember what exactly, and it gets closer and closer to that.0:26:54
Some functions have the property that you can find their fixed point by iterating the function, and that's0:27:03
essentially what's happening in the square root program by Heron's method. So let's see if we can write that down, that idea.0:27:14
Now, I'm not going to say how I compute fixed points yet. There might be more than one way. But the first thing to do is I'm going to say what I just said.0:27:24
I'm going to say it specifically, the square root. The square root of x is the fixed point of that procedure0:27:48
which takes an argument y and averages of x0:27:59
divided by y with y. And we're going to start up with the initial guess for the0:28:08
fixed point of 1. It doesn't matter where it starts. A theorem having to do with square roots.0:28:18
So what you're seeing here is I'm just trying to write out by wishful thinking. I don't know how I'm going to make fixed point happen. We'll worry about that later. But if somehow I had a way of finding the fixed point of the0:28:29
function computed by this procedure, then I would have-- that would be the square root that I'm looking for.0:28:39
OK, well, now let's see how we're going to write-- how we're going to come up with fixed points. Well, it's very simple, actually. I'm going to write an abbreviated version here just so we understand it.0:29:00
I'm going to find the fixed point of a function f-- actually, the fixed point of the function computed by the procedure whose name will be f in this procedure.0:29:09
How's that? A long sentence-- starting with a particular starting value.0:29:19
Well, I'm going to have a little loop inside here, which is going to push the button on the calculator repeatedly, hoping that it will eventually converge.0:29:28
And we will say here internal loops are written by defining internal procedures.0:29:39
Well, one thing I'm going to have to do is I'm going to have to say whether I'm done. And the way I'm going to decide when I'm done is when the old value and the new value are close enough so I can't distinguish them anymore.0:29:50
That's the standard thing you do on the calculator unless you look at more precision, and eventually, you run out of precision. So the old value and new value, and I'm going to stay0:30:06
here if I can't distinguish them if they're close enough, and we'll have to worry about what that is soon.0:30:20
The old value and the new value are close enough to each other and let's pick the new value as the answer. Otherwise, I'm going to iterate around again with the0:30:33
next value of old being the current value of new and the next value of new being the result of calling f on new.0:30:54
And so this is my iteration loop that pushes the button on the calculator. I basically think of it as having two registers on the calculator: old and new. And in each step, new becomes old, and new gets F of new.0:31:09
So this is the thing where I'm getting the next value. And now, I'm going to start this thing up0:31:20
by giving two values. I wrote down on the blackboard to be slow0:31:30
so you can see this. This is the first time you've seen something quite this complicated, I think. However, we might want to see the whole thing over here in0:31:44
this transparency or slide or whatever. What we have is all of the details that are required to0:31:57
make this thing work. I have a way of getting a tolerance for a close enough procedure, which we see here. The close enough procedure, it tests whether u and v are0:32:06
close enough by seeing if the absolute value of the difference in u and v is less than the given tolerance, OK? And here is the iteration loop that I just wrote on the blackboard and the initialization for it, which0:32:17
is right there. It's very simple.0:32:34
But let's see. I haven't told you enough. It's actually easier than this. There is more structure to this problem than I've already told you.0:32:43
Like why should this work? Why should it converge? There's a hairy theorem in mathematics tied up in what I've written here.0:32:52
Why is it that I should assume that by iterating averaging the quotient of x and y and y that I should get the right answer? It isn't so obvious.0:33:03
Surely there are other things, other procedures, which compute functions whose fixed points would also be the square root.0:33:12
For example, the obvious one will be a new function g, which maps y to x/y.0:33:27
That's even simpler. The fixed point of g is surely the square root also, and it's a simpler procedure.0:33:37
Why am I not using it? Well, I suppose you know. Supposing x is 2 and I start out with 1, and if I divide 1 into 2, I get 2.0:33:47
And then if I divide 2 into 2, I get 1. If I divide 1 into 2, I get 2, and 2 into 2, I get 1, and I never get any closer to the square root. It just oscillates.0:33:59
So what we have is a signal processing system, an electrical circuit which is oscillating, and I want to damp out these oscillations.0:34:10
Well, I can do that. See, what I'm really doing here when I'm taking my average, the average is averaging the last two values of something which oscillates, getting something in between.0:34:21
The classic way is damping out oscillations in a signal processing system. So why don't we write down the strategy that I just said in a0:34:31
more clear way? Well, that's easy enough. I'm going to define the square root of x to be a fixed point0:34:53
of the procedure resulting from average damping. So I have a procedure resulting from average damp of0:35:10
the procedure, that procedure of y, which divides x by y0:35:24
starting out at 1. Ah, but average damp is a special procedure that's going0:35:33
to take a procedure as its argument and return a procedure as its value. It's a generalization that says given a procedure, it's0:35:42
the thing which produces a procedure which averages the last value and the value before and after running the procedure.0:35:51
You can use it for anything if you want to damp out oscillations. So let's write that down. It's very easy.0:36:00
And stylistically here, I'm going to use lambda notation because it's much easier to think when you're dealing with procedure, the mid-line procedures, to understand that the procedures are the objects I'm dealing with, so I'm going0:36:11
to use lambda notation here. Not always. I don't always use it, but very specifically here to expand on that idea, to elucidate it.0:36:28
Well, average damp is a procedure, which takes a procedure as its argument, which we will call f.0:36:37
And what does it produce? It produces as its value-- the body of this procedure is a thing which produces a procedure, the construct of the procedures right here, of0:36:47
one argument x, which averages f of x with x.0:37:10
This is a very special thing. I think for the first time you're seeing a procedure which produces a procedure as its value.0:37:21
This procedure takes the procedure f and does something to it to produce a new procedure of one argument x, which averages f--0:37:31
this f-- applied to x and x itself. Using the context here, I apply average damping to the0:37:40
procedure, which just divides x by y. It's a division. And I'm finding to fixed point of that, and that's a clearer0:37:51
way of writing down what I wrote down over here, wherever it was. Here, because it tells why I am writing this down.0:38:07
I suppose this to some extent really clarifies what Heron of Alexandria was up to. I suppose I'll stop now. Are there any questions?0:38:18
AUDIENCE: So when you define average damp, don't you need to have a variable on f? PROFESSOR: Ah, the question was, and here we're having--0:38:28
again, you've got to learn about the syntax. The question was when defining average damp, don't you have to have a variable defined with f?0:38:38
What you are asking about is the formal parameter of f? AUDIENCE: Yeah. PROFESSOR: OK. The formal parameter of f is here. The formal parameter of f--0:38:47
AUDIENCE: The formal parameter of average damp. PROFESSOR: F is being used to apply it to an argument, right? It's indeed true that f must have a formal parameter.0:38:57
Let's find out what f's formal parameter is. AUDIENCE: The formal parameter of average damp. PROFESSOR: Oh, f is the formal parameter of average damp. I'm sorry. You're just confusing a syntactic thing.0:39:07
I could have written this the other way. Actually, I didn't understand your question. Of course, I could have written it this other way.0:39:19
Those are identical notations. This is a different way of writing this.0:39:31
You're going to have to get used to lambda notation because I'm going to use it. What it says here, I'm defining the name average damp0:39:40
to name the procedure whose of one argument f. That's the formal parameter of the procedure average damp.0:39:49
What define does is it says give this name a value. Here is the value of for it.0:40:01
That there happens to be a funny syntax to make that easier in some cases is purely convenience.0:40:10
But the reason why I wrote it this way here is to emphasize that I'm dealing with a procedure that takes a procedure as its argument and produces a procedure as its value.0:40:23
AUDIENCE: I don't understand why you use lambda twice. Can you just use one lambda and take two arguments f and x? PROFESSOR: No. AUDIENCE: You can't? PROFESSOR: No, that would be a different thing.0:40:32
If I were to write the procedure lambda of f and x, the average of f of x and x, that would not be something which would be allowed to take a procedure as an argument and0:40:42
produce a procedure as its value. That would be a thing that takes a procedure as its argument and numbers its argument and produces a new number. But what I'm producing here is a procedure to fit in the0:40:53
procedure slot over here, which is going to be used over here. So the number has to come from here. This is the thing that's going to eventually end up in the x.0:41:04
And if you're confused, you should do some substitution and see for yourself. Yes? AUDIENCE: Will you please show the definition for average0:41:15
damp without using lambda notation in both cases. PROFESSOR: I can't make a very simple one like that. Let me do it for you, though. I can get rid of this lambda easily.0:41:26
I don't want to be-- actually, I'm lying to you. I don't want to do what you want because I think it's more0:41:37
confusing than you think. I'm not going to write what you want.0:41:55
So we'll have to get a name. FOO of x to be of F of x and x and return as a value FOO.0:42:17
This is equivalent, but I've had to make an arbitrary name up. This is equivalent to this without any lambdas.0:42:26
Lambda is very convenient for naming anonymous procedures. It's the anonymous name of something. Now, if you really want to know a cute way of doing this,0:42:39
we'll talk about it later. We're going to have to define the anonymous procedure. Any other questions?0:42:49
And so we go for our break again.0:43:31
So now we've seen how to use high-order procedures, they're called. That's procedures that take procedural arguments and produce procedural values to help us clarify and abstract0:43:43
some otherwise complicated processes. I suppose what I'd like to do now is have a bit of fun with that and sort of a little practice as well.0:43:54
So let's play with this square root thing even more. Let's elaborate it and understand what's going on and make use of this kind of programming style.0:44:04
One thing that you might know is that there is a general method called Newton's method the purpose of which is to find the roots--0:44:15
that's the zeroes-- of functions. So, for example, to find a y such that f of y equals 0, we0:44:38
start with some guess. This is Newton's method.0:44:51
And the guess we start with we'll call y0, and then we will iterate the following expression.0:45:01
y n plus 1-- this is a difference equation-- is yn minus f of yn over the derivative with respect to y0:45:17
of f evaluated at y equal yn. Very strange notation.0:45:26
I must say ugh. The derivative of f with respect to y is a function.0:45:35
I'm having a little bit of unhappiness with that, but that's all right. It turns out in the programming language world, the notation is much clearer. Now, what is this?0:45:45
People call it Newton's method. It's a method for finding the roots of the function f.0:45:54
And it, of course, sometimes converges, and when it does, it does so very fast. And sometimes, it doesn't converge, and, oh well, we have to do something else.0:46:03
But let's talk about square root by Newton's method. Well, that's rather interesting. Let's do exactly the same thing we did last time: a bit of wishful thinking.0:46:13
We will apply Newton's method, assuming we knew how to do it. You don't know how to do it yet. Well, let's go.0:46:25
What do I have here? The square root of x. It's Newton's method applied to a procedure which will0:46:37
represent that function of y, which computes that function of y. Well, that procedure is that procedure of y, which is the0:46:48
difference between x and the square of y.0:47:00
Indeed, if I had a value of y for which this was zero, then y would be the square root of x.0:47:13
See that? OK, I'm going to start this out searching at 1. Again, completely arbitrary property of square roots that0:47:23
I can do that. Now, how am I going to compute Newton's method? Well, this is the method.0:47:32
I have it right here. In fact, what I'm doing is looking for a fixed point of some procedure.0:47:41
This procedure involves some complicated expressions in terms of other complicated things. Well, I'm trying to find the fixed point of this. I want to find the values of y, which if I put y in here, I0:47:54
get the same value out here up to some degree of accuracy. Well, I already have a fixed point process around to do that.0:48:05
And so, let's just define Newton's method over here.0:48:19
A procedure which computes a function and a guess, initial guess. Now, I'm going to have to do something here.0:48:28
I'm going to need the derivative of the function. I'm going to need a procedure which computes the derivative of the function computed by the given a procedure f.0:48:42
I'm trying to be very careful about what I'm saying. I don't want to mix up the word procedure and function. Function is a mathematical word. It says I'm mapping from values to other values, a set0:48:52
of ordered pairs. But sometimes, I'll accidentally mix those up. Procedures compute functions.0:49:07
So I'm going to define the derivative of f to be by wishful thinking again. I don't know how I'm going to do it. Let's worry about that later--0:49:18
of F. So if F is a procedure, which happens to be this one over here for a square root, then DF will be the derivative0:49:31
of it, which is also the derivative of the function computed by that procedure. DF will be a procedure that computes the derivative of the function computed by the procedure F. And then given0:49:42
that, I will just go looking for a fixed point.0:49:51
What is the fixed point I'm looking for? It's the one for that procedure of one argument x, which I compute by subtracting x.0:50:00
That's the old-- that's the yn here. The quotient of f of x and df of x, starting out with the0:50:21
original guess. That's all very simple.0:50:32
Now, I have one part left that I haven't written, and I want you to see the process by which I write these things, because this is really true. I start out with some mathematical idea, perhaps.0:50:43
By wishful thinking, I assume that by some magic I can do something that I have a name for. I'm not going to worry about how I do it yet.0:50:54
Then I go walking down here and say, well, by some magic, I'm somehow going to figure how to do that, but I'm going to write my program anyway.0:51:04
Wishful thinking, essential to good engineering, and certainly essential to a good computer science. So anyway, how many of you wished that your0:51:15
computer ran faster? Well, the derivative isn't so bad either. Sort of like average damping.0:51:28
The derivative is a procedure that takes a procedure that computes a function as its argument, and it produces a0:51:38
procedure that computes a function, which needs one argument x. Well, you all know this definition. It's f of x plus delta x minus f of x over delta x, right?0:51:49
For some small delta x. So that's the quotient of the difference of f of the sum of0:51:59
x and dx minus f point x divided by dx.0:52:18
I think the thing was lining up correctly when I balanced the parentheses. Now, I want you to look at this.0:52:27
Just look. I suppose I haven't told you what dx is. Somewhere in the world I'm going to have to write down0:52:44
something like that. I'm not interested. This is a procedure which takes a procedure and produces an approximation, a procedure that computes an approximation0:52:55
of the derivative of the function computed by the procedure given by the standard methods that you all know and love.0:53:04
Now, it may not be the case that doing this operation is such a good way of approximating a derivative. Numerical analysts here should jump on me and0:53:14
say don't do that. Computing derivatives produces noisy answers, which is true. However, this again is for the sake of understanding.0:53:24
Look what we've got. We started out with what is apparently a mathematically complex thing. and. In a few blackboards full, we managed to decompose the0:53:35
problem of computing square roots by the way you were taught in your college calculus class-- Newton's method-- so that it can be understood.0:53:45
It's clear. Let's look at the structure of what it is we've got. Let's look at this slide.0:53:54
This is a diagram of the machine described by the0:54:03
program on the blackboard. There's a machine described here. And what have I got? Over here is the Newton's method function f that we have0:54:17
on the left-most blackboard. It's the thing that takes an argument called y and puts out the difference between x and the square of y, where x is0:54:32
some sort of free variable that comes in from the outside by some magic. So the square root routine picks up an x, and builds this0:54:43
procedure, which I have the x rolled up in it by substitution. Now, this procedure in the cloud is fed in as the f into0:54:58
the Newton's method which is here, this box. The f is fanned out.0:55:08
Part of it goes into something else, and the other part of it goes through a derivative process into something else to produce a procedure, which computes the function which is0:55:20
the iteration function of Newton's method when we use the fixed point method. So this procedure, which contains it by substitution--0:55:33
remember, Newton's method over here, Newton's method builds this procedure, and Newton's method has in it defined f and0:55:43
df, so those are captured over here: f and df. Starting with this procedure, I can now feed this to the fixed point process within an initial guess coming out from0:55:55
the outside from square root to produce the square root of x. So what we've built is a very powerful engine, which allows0:56:07
us to make nice things like this. Now, I want to end this with basically an idea of Chris0:56:19
Strachey, one of the grandfathers of computer science. He's a logician who lived in the-- I suppose about 10 years ago or 15 years ago, he died.0:56:30
I don't remember exactly when. He's one of the inventors of something called denotational semantics. He was a great advocate of making procedures or functions0:56:40
first-class citizens in a programming language. So here's the rights and privileges of first-class citizens in a programming language.0:56:50
It allows you to make any abstraction you like if you have functions as first-class citizens. The first-class citizens must be able0:56:59
to be named by variables. And you're seeing me doing that all the time. Here's a nice variable which names a procedure which computes something.0:57:13
They have to be passed as arguments to procedures. We've certainly seen that. We have to be able to return them as values from procedures.0:57:23
And I suppose we've seen that. We haven't yet seen anything about data structures. We will soon, but it's also the case that in order to have a first-class citizen in a programming language, the0:57:33
object has to be allowed to be part of a data structure. We're going to see that soon. So I just want to close with this and say having things0:57:43
like procedures as first-class data structures, first-class data, allows one to make powerful abstractions, which encode general methods like Newton's method0:57:53
in very clear way. Are there any questions? Yes. AUDIENCE: Could you put derivative instead of df directly in the fixed point?0:58:02
PROFESSOR: Oh, sure. Yes, I could have put deriv of f right here, no question.0:58:11
Any time you see something defined, you can put the thing that the definition is there because you get the same result.0:58:21
In fact, what that would look like, it's interesting. AUDIENCE: Lambda. PROFESSOR: Huh? AUDIENCE: You could put the lambda expression in there. PROFESSOR: I could also put derivative of f here. It would look interesting because of the open paren,0:58:32
open paren, deriv of f, closed paren on an x. Now, that would have the bad property of computing the derivative many times, because every time I would run this0:58:43
procedure, I would compute the derivative again. However, the two open parens here both would be meaningful.0:58:52
I want you to understand syntactically that that's a sensible thing. Because if was to rewrite this program-- and I should do it right here just so you see because that's a good question--0:59:11
of F and guess to be fixed point of that procedure of one0:59:25
argument x, which subtracts from x the quotient of F0:59:34
applied to x and the deriv of F applied to x.0:59:53
This is guess. This is a perfectly legitimate program,1:00:02
because what I have here-- remember the evaluation rule. The evaluation rule is evaluate all of the parts of the combination: the operator and the operands.1:00:12
This is the operator of this combination. Evaluating this operator will, of course, produce the1:00:21
derivative of F. AUDIENCE: To get it one step further, you could put the1:00:30
lambda expression there, too. PROFESSOR: Oh, of course. Any time I take something which is define, I can put the thing it's defined to be in the place where the thing1:00:40
defined is. I can't remember which is definiens and which is definiendum. When I'm trying to figure out how to do a lecture about this1:00:50
in a freshman class, I use such words and tell everybody it's fun to tell their friends.1:00:59
OK, I think that's it.0:00:00
Lecture 2B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING]0:00:21
PROFESSOR: Well, so far in this course we've been talking about procedures, and then just to remind you of this framework that we introduced for talking about languages,0:00:31
we talked about the primitive things that are built into the system. We mentioned some means of combination by which you take the primitive things0:00:40
and you make more complicated things. And then we talked about the means of abstraction, how you can take those complicated things and name them so you can use them as simple building blocks.0:00:49
And then last time you saw we went even beyond that. We saw that by using higher order procedures, you can actually express general methods for computing things.0:00:58
Like the method of doing something by fixed points, or Newton's method, and so the incredible expressive power you can get just by combining these means of abstraction.0:01:08
And the crucial idea in all of this is the one that we build a layered system. So for instance, if we're writing the square root0:01:17
procedure, somewhere the square root procedure uses a procedure called good-enough,0:01:31
and between those there is some sort of abstraction boundary. It's almost as if we go out and in writing square root,0:01:41
we go and make a contract with George, and tell George that his job is to write good-enough, and so long as good-enough works,0:01:50
we don't care what it does. We don't care exactly how it's implemented. There are levels of detail here that are George's concern and not ours.0:02:00
So for instance, George might use an absolute value procedure that's written by Harry, and we don't much care about that or even know that, maybe, Harry exists.0:02:13
So the crucial idea is that when we're building things, we divorce the task of building things from the task of implementing the parts.0:02:27
And in a large system, of course, we have abstraction barriers like this at lots, and lots, and lots of levels. And that's the idea that we've been using so far over and over0:02:36
in implementing procedures. Well, now what we're going to do is look at the same issues for data. We're going to see that the system has primitive data.0:02:46
In fact, we've already seen that. We've talked about numbers as primitive data. And then we're going to see their means of combination for data. There's glue that allows you to put primitive data together0:02:55
to make more complicated, kind of compound data. And then we're going to see a methodology for abstraction0:03:04
that's a very good thing to use when you start building up data in terms of simpler data. And again, the key idea is that you're going to build the system in layers0:03:13
and set up abstraction barriers that isolate the details at the lower layers from the thing that's going on at the upper layers. The details at the lower layers, the ideas, they won't matter.0:03:25
They're going to be George's concern because he signed this contract with us for how the stuff that he implements behaves, and how he implements the thing is his problem.0:03:36
All right, well let's look at an example. And the example I'm going to talk about is a system that does arithmetic on rational numbers. And what I have in mind is that we should have something0:03:46
in the computer that allows us to ask it, like, what's the sum of 1/2 and 1/4, and somehow the system0:03:56
should say, yeah, that's 3/4. Or we should be able to say what's 3/4 times 2/3,0:04:11
and the system should be able to say, yeah, that's 1/2. Right? And you know what I have in mind. And you also know how to do this from, I don't know,0:04:20
fifth grade or sixth grade. There are these formulas that say if I have some fraction which is a numerator over a denominator, and I want to add that to some other fraction which0:04:31
is another numerator over another denominator, then the answer is the numerator of the first times the denominator of the second, plus the numerator0:04:43
of the second times the denominator of the first. That's the numerator of the answer, and the denominator is the product0:04:52
of the two denominators. Right? So there's something from fifth or sixth grade fraction arithmetic. And then similarly, if I want to multiply two things, n1 over d1 multiplied by n2 over d20:05:05
is the product of the numerators over the product of the denominators.0:05:14
So it's no problem at all, but it's absolutely no problem to think about what computation you want to make in adding and multiplying these fractions.0:05:23
But as soon as we go to implement it, we run up across something. We don't have what a rational number is.0:05:33
So we said that the system gives us individual numbers, so we can have 5 and 3, but somehow we0:05:42
don't have a way of saying there's a thing that has both a 3 and a 4 in it, or both a 2 and a 3. It's almost as if we'd like to imagine that somehow there0:05:54
are these clouds, and a cloud somehow has both a numerator and a denominator in it, and that's what we'd like to work in terms of.0:06:06
Well, how are we going to solve that problem? We're going to solve that problem by using this incredibly powerful design strategy that you've already seen us use over and over.0:06:16
And that's the strategy of wishful thinking.0:06:25
Just like before when we didn't have a procedure, we said, well, let's imagine that that procedure already exists. We'll say, well, let's imagine that we have these clouds.0:06:36
Now more precisely what I mean is let's imagine that we have three procedures, one called make-RAT.0:06:47
make-RAT is going to take as arguments two numbers, so I'll call them numerator and denominator,0:06:57
and it'll return for us a cloud-- one of these clouds. I don't really know what a cloud is.0:07:07
It's whatever make-RAT returns, that's its business. And then we're going to say, suppose we've got one of these clouds, we have a procedure called numer, which takes in a cloud that has an n and a d in it,0:07:20
whatever a cloud is, and I don't know what it is, and returns for us the numerator part. And then we'll assume we have a procedure denom,0:07:31
which again takes in a cloud, whatever a cloud is, and returns for us the denominator [? required. ?] This is just like before, when if we're0:07:40
building a square root, we assume that we have good enough. Right? And what we'll say is, we'll go find George, and we'll say to George, well, it's your business0:07:49
to make us these procedures. And how you choose to implement these clouds, that's your problem. We don't want to know.0:07:58
Well, having pushed this task off onto George, then it's pretty easy to do the other part. Once we've got the clouds, it's pretty easy0:08:07
to write the thing that does say addition of rational numbers. You can just say define, well, let's say +RAT.0:08:21
Define +RAT, which will take in two rational numbers, x and y. x and y are each these clouds.0:08:31
And what does it do? Well, it's going to return for us a rational number.0:08:40
What rational number is it? Well, we've got the formulas there. The numerator of it is the sum of the product of the numerator0:08:52
of x and the denominator of y.0:09:02
It's one thing in the sum. And the other thing in the numerator is the product of the numerator of y and the denominator of x.0:09:19
The star, close the plus. Right, that's the first argument to make-RAT, which is the numerator of the thing I'm constructing. And then the rest of the thing goes0:09:28
into make-RAT is the denominator of the answer, which is the product of the denominator of x0:09:37
and the denominator of y. Like that.0:09:46
OK? So there is the analog of doing rational number addition. And it's no problem at all, assuming that we have these clouds.0:09:59
And of course, we can do multiplication in the same way. Define how to get the product of two rational numbers,0:10:11
call it *RAT. Takes in two of these clouds, x and y, it returns0:10:20
a rational number, make-RAT, whose numerator is the product of the numerators-- numerator of x0:10:32
times the numerator of y. And the denominator of the thing it's going to return0:10:41
is the product of the denominators.0:10:57
Well, except that I haven't told you what these clouds are, that's all there is to it. See, what did I do? I assumed by wishful thinking that I0:11:08
had a new kind of data object. And in particular, I assumed I had ways of creating these data objects. Make-RAT creates one of these things.0:11:18
This is called a constructor. All right, I have a thing that constructs such data objects.0:11:29
And then I assume I have things that, having made these things, I have ways of getting the parts out. Those are called selectors.0:11:42
And so formally, what I said is I assumed I had procedures that are constructors and selectors for these data objects, and then I went off and used them.0:11:52
That's no different in kind from saying I assume I have a procedure good-enough, and I go use it to implement square root. OK, well before we go on, let's ask0:12:05
the question of why do we want to do this in the first place? See, why do we want a procedure like +RAT that takes in two0:12:16
rational numbers and returns a rational number? See, another way to think about this is, well, here's this formula.0:12:25
And I've also got to implement something that adds rational numbers. One other way to think about is, well, there's this thing, and I type in four numbers, an n1, and a d1,0:12:34
and an n2, and a d2. And it sets some registers in the machine to this numerator and this denominator. So I might say, well, why don't I0:12:43
just add rational numbers by I type in four numbers, numerators and denominators, and get out two numbers, which is a numerator and a denominator. Why are we worrying about building things0:12:54
like this anyway? Well, the answer is, suppose you want to think about expressing something like this,0:13:06
suppose I'd like to express the idea of taking two rational numbers, x plus y, say,0:13:15
and multiplying that by the sum of two other rational numbers. Well, the way I do it, having things like +RAT and *RAT,0:13:28
is I'd say, oh yeah, what that is is just the product. That's *RAT of the sum of x and y and the sum of s and t.0:13:51
So except for syntax, I get an expression that looks like the way I want to think about it mathematically. I want to say there are two numbers.0:14:02
There's a thing which is the sum of them, and there's a thing which is the sum of these two. That's this and this. And then I multiply them.0:14:12
So I get an expression that matches this expression. If I did the other thing, if I said, well, the way I want to think about this is I type into my machine four numbers, which are the numerators and the denominators of x and y,0:14:24
and then four more numbers, which are the numerators and denominators of s and t. And then what I'd be sitting with is, well, what would I do? I'd add these, and somehow I'd have0:14:33
to have two temporary variables, which are the numerators and denominators of this sum, and I'd go off and store them someplace.0:14:42
And then I'd go over here, I'd type in four more numbers, I'd get two more temporary variables, which are the numerators and denominators of s and t. And then finally, I put those together by multiplying them.0:14:54
You see, what's starting to happen, there are all these temporary variables, which are sort of the guts of the internals of these rational numbers that start hanging0:15:04
out all over the system. And of course, if I had more and more complicated expressions, there'd be more and more guts hanging out that confuse my programming.0:15:13
And those of you who sort of programmed things like that, where you're just adding numbers in assembly language, you sort of see you have to suddenly be concerned with these temporary variables.0:15:23
But more importantly than confusing my programming, they're going to confuse my mind. Because the whole name of this game0:15:33
is that we'd like the programming language to express the concepts that we have in our heads, like rational numbers are things that you can add and then take0:15:43
that result and multiply them. Let's break for questions.0:15:59
Yeah? AUDIENCE: I don't quite see the need- when we had make-RAT with the numerator and denominator, we had to have the numerator and denominator to pass as parameters to create the cloud,0:16:08
and then we extracted to get back what we had to have originally. PROFESSOR: That's right. So the question is, I sort of have the numerator and the denominator,0:16:17
why am I worrying about having the cloud given that I have to get the pieces out? That's sort of what I tried to say at the end, but let me try and say it again, because that's really0:16:27
the crucial question. The point is, I want to carry this numerator and denominator around together all the time.0:16:36
And it's almost as if I want to know, yeah, there's a numerator and denominator in there, but also, I would like to say, fine, but from another point0:16:47
of view, that's x. And I carry x around, and I name it as x, and I hold it. And I can say things like, the sum of x and y, rather than just have-- see, it's not so bad when I only0:16:58
think about x, but if I have a system with 10 rational numbers, suddenly I have 20 numerators and denominators, which are not necessarily-- if I don't link them, then it's just 20 arbitrary numbers that are not0:17:09
linked in any particular way. It's a lot like saying, well, I have these instructions that are the body of the procedures, why do I want to package them and say it's the procedure? It's exactly the same idea.0:17:31
No? OK. Let's break, let's just stretch and get somebody-- [INAUDIBLE] [MUSIC PLAYING]0:18:27
OK, well, we've been working on this rational number arithmetic system, and then what we did, the important thing about what we did, is we thought about the problem0:18:37
by breaking it into two pieces. We said, assume there is this contract with George, and George has figured out the way to how to construct these clouds,0:18:47
provided us procedures make-RAT, which was a constructor, and selectors, which are numerator and denominator. And then in terms of that, we went off0:18:56
and implemented addition and multiplication of rational numbers. Well, now let's go look at George's problem. How can we go and package together0:19:05
a numerator and a denominator and actually make one of these clouds? See, what we need is a kind of glue, a glue for data objects0:19:15
that allows us to put things together. And Lisp provides such a glue, and that glue is called list structure.0:19:30
List structure is a way of gluing things together, and more precisely, Lisp provides a way of constructing things called pairs.0:19:44
There's a primitive operator in Lisp called cons. We can take a look at it.0:19:54
There's a thing called cons. Cons is an operator which takes in two arguments called0:20:03
x and y, and it returns for us a thing called a pair. All right, so a thing called a pair that has a first part0:20:17
a second part. So cons takes two objects. There's a thing called a pair.0:20:26
The first part of the cons is x, and the second part of the cons is y. And that's what it builds. And then we also assume we have ways of getting things out.0:20:36
If you're given a pair, there's a thing called car, and car of a pair, p, gives you out the first part of the pair, p.0:20:46
And there's a thing called cdr, and cdr of the pair, p, gives you the second part of the pair, p. OK, so that's how we construct things.0:20:56
There's also a conventional way of drawing pictures of these things. Just like we write down that as the conventional way of writing0:21:10
Plato's idea of two, the way we could draw a diagram to represent cons of two and three is like this.0:21:21
We draw a little box. And so here's the box we're talking about, and this box has two arrows coming out of it.0:21:30
And say the first part of this pair is 2, and the second part of this pair is 3. And this notation has a name, it's0:21:40
called box and pointer notation.0:21:55
By the way, let me say right now that a lot of people get confused that there's some significance to the geometric way I drew these pointers, the directions. Like some people think it'd be different0:22:05
if I took this pointer and turned it up here, and put the 3 out here. That has no significance. All right? It's merely you have a bunch of arrows, these pointers, and the boxes.0:22:15
The only issue is how they're connected, not the geometric arrangement of whether I write the pointer across, or up, or down. Now it's completely un-obvious, probably,0:22:26
why that's called list structure. We're not actually going to talk about that today. We'll see that next time.0:22:37
So those are pairs, there's cons that constructs them. And what I'm going to know about cons, and car, and cdr, is precisely that if I have any x and y, all right,0:22:51
if I have any things x and y, and I use cons to construct a pair, then the car of that pair0:23:01
is going to be x, the thing I put in, and the cdr of that pair is going to be y. That's the behavior of these operators, cons, car, and cdr.0:23:12
Given them, it's pretty clear how George can go off and construct his rational numbers. After all, all he has to do-- remember George's problem was to implement make-RAT, numerator,0:23:22
and denom. So all George has to do is say define make-RAT of some n and a d-- so all I have to do is cons them.0:23:40
That's cons of n and d. And then if I want to get the numerator out, I would say define the numerator, numer,0:23:57
of some rational number, x. If the rational number's implemented as a pair, then all I have to do is get out the car of x.0:24:06
And then similarly, define the denom is going to be the cdr,0:24:19
the other thing I put into the pair. Well, now we're in business.0:24:28
That's a complete implementation of rational numbers. Let's use it. Suppose I want to say, so I want to think about how to add 1/2 plus 1/4 and watch the system work.0:24:43
Well, the way I'd use that is I'd say, well, maybe define a. I have to make a 1/2.0:24:53
Well, that's a rational number with numerator 1 and denominator 2, so a will be make-RAT of 1 and 2.0:25:05
And then I'll construct the 1/4. I'll say define d to be make-RAT of 1 and 4.0:25:23
And if I'd like to look at the answer-- well, assuming I don't have a special thing that prints rational numbers, or I could make one-- I could say, for instance, define the answer to be +RAT of a and b, and now I can say,0:25:46
what's the answer? What are the numerators and denominators of the answer? So if I'm adding 1/2 and 1/4, I'll say, what is the numerator of the answer?0:26:04
And the system is going to type out, well, 6. Bad news.0:26:13
And if I say what's the denominator of the answer,0:26:22
the system's going to type out 8. So instead of what I would really like, which is for it to say that 1/2 and 1/4 is 3/4,0:26:35
this foolish machine is going to say, no, it's 6/8. Well, that's sort of bad news. Where's the bug?0:26:47
Why does it do that, after all? Well, it's the way that we just had +RAT. +RAT just took the-- it said you add the numerator times0:26:56
the denominator, you add that to the numerator times the denominator, and put that over the product of the two denominators, and that's why you get 6/8.0:27:05
So what was wrong with our implementation of +RAT? What's wrong with that rational number arithmetic stuff that we did before the break?0:27:15
Well, the answer is one way to look at it is absolutely nothing's wrong. That's perfectly good implementation. It follows the sixth grade, fifth grade mathematic0:27:25
for adding fractions. One thing we can say is, well, that's George's problem. Like, boy, wasn't George dumb to say0:27:36
that he can make a rational number simply by sticking together the numerator and the denominator? Wouldn't it be better for George,0:27:45
when he made a rational number, to reduce the stuff to lowest terms? And what I mean is, wouldn't it be better for George,0:27:55
instead of using this version of make-RAT, to use this one on the slide? Or instead of just saying cons together n and d, what you do0:28:09
is compute the greatest common divisor of n and d, and gcd is the procedure which, well, for all we care is a primitive, which computes the greatest common divisor of two numbers.0:28:20
So the way I can construct a rational number is get the greatest common divisor of the two numbers, and I'm going to call that g, and then0:28:30
instead of consing together n and d, I'll divide them through. I'll cons together the quotient of n by the the gcd and the quotient of d by the gcd.0:28:40
And that will reduce the rational number to lowest terms. So when I do this addition, when +RAT calls make-RAT--0:28:54
and for the definition of +RAT it had a make-RAT in there-- just by the fact that it's constructing that, the thing will get reduced to lowest terms automatically.0:29:09
OK, that is a complete system. For rational number arithmetic, let's look at what we've done.0:29:19
All right, we said we want to build rational number arithmetic, and we had a thing called +RAT. We implemented that.0:29:29
And I showed you multiplying rational numbers, and although I didn't put them up there, presumably we'd like to have something that subtracts rational numbers, and I don't know,0:29:39
all sorts of things. Things that test equality in division, and maybe things that print rational numbers in some particular way. And we implemented those in terms of pairs.0:29:52
These pairs, cons, car, and cdr that are built into Lisp. But the important thing is that between these and these,0:30:05
we set up an abstraction barrier. We set up a layer of abstraction.0:30:17
And what was that layer of abstraction? That layer of abstraction was precisely the constructor and the selectors. This layer was make-RAT, and numer, and denom.0:30:38
This methodology, another way to say what it's doing, is that we are separating the way something is used,0:30:53
separating the use of data objects, from the representation of data objects.0:31:07
So up here, we have the way that rational numbers are used, do arithmetic on them. Down here, we have the way that they're represented, and they're separated by this boundary.0:31:17
The boundary is the constructors and selectors. And this methodology has a name. This is called data abstraction.0:31:35
Data abstraction is sort of the programming methodology of setting up data objects by postulating constructors and selectors to isolate use from representation.0:31:47
Well, so why? I mean, after all, we didn't have to do it this way. It's perfectly possible to do rational number addition without having any compound data objects, and here on the slide0:31:58
is one example. We certainly could have defined +RAT, which takes in things x and y, and we'll say, well what are these rational numbers really?0:32:10
So really, they're just pairs, and the numerator's the car and the denominator's the cdr. So what we'll do is we'll take the car of x times the cdr of y, multiply them.0:32:23
Take the car of y times the cdr of x, multiply them. Add them. Take the cdr of x and the cdr of y, multiply them, and then constitute together.0:32:35
Well, that sort of does the same thing. But this ignores the problem of reducing things to lowest terms, but let's not worry about that for a minute.0:32:47
But so what? Why don't we do it that way? Right? After all, there are sort of fewer procedures to define, and it's a lot more straightforward.0:32:57
It saves all this self-righteous BS about talking about data abstraction. We just sort of do it. I mean, who knows, maybe it's even marginally more efficient depending on whatever compiler were using for this.0:33:07
What's the point of isolating the use from the representation? Well, it goes back to this notion of naming.0:33:17
Remember, one of the most important principles in programming is the same as one of the most important principles in sorcery, all right? That's if you have the name of the spirit,0:33:27
you get control over it. And if you go back and look at the slide, you see what's in there is we have this thing +RAT,0:33:36
but nowhere in the system, if I have a +RAT and a -RAT and a *RAT, and things that look like that, nowhere in the system do I have a thing that I can point0:33:46
at which is a rational number. I don't have, in a system like that,0:33:57
the idea of rational number as a conceptual entity. Well, what's the advantage of that? What's the advantage of isolating the idea of rational numbers as a conceptual entity,0:34:08
and really naming it with make-RAT, numerator, and denominator. Well, one advantage is you might want to have0:34:18
alternative representations. See, before I showed you that one way George can solve this things not reduced to lowest terms problem, is when you build a rational number,0:34:29
you divide up by the greatest common denominator. Another way to do that is shown over here. I can have an alternative representation0:34:38
for rational numbers where when you make a rational number, you just cons them. However, when you go to select out the numerator, at that point you compute the gcd of the stuff0:34:50
that's sitting in that pair, and divide out by the gcd. And similarly, when I get the denominator,0:35:01
at that point when I go to get the denominator, I'll divide out by the gcd. So the difference would be in the old representation, when ans was constructed here, say what's 6 and 8,0:35:13
in the first way, the 6 and 8 would have got reduced when they got stuck into that pair, numerator would select out 3. And in the way I just showed you, well, ans would get 6 and 8 put in,0:35:25
and then at the point where I said numerator, some computation would get done to put out 3 instead of 6. So those are two different ways I might do it.0:35:34
Which one's better? Well, it depends, right? If I'm making a system where I am mostly constructing rational numbers and hardly ever looking at them, then it's probably better0:35:44
not to do that gcd computation when I construct them. If I'm doing a system where I look at things a lot more than I construct them, then it's probably better0:35:53
to do the work when I construct them. So there's a choice there. But the real issue is that you might not be able to decide at the moment you're worrying0:36:05
about these rational numbers. See, in general, as systems designers, you're forced with the necessity to make decisions0:36:15
about how you're going to do things, and in general, the way you'd like to retain flexibility is to never make up your mind about anything until you're forced to do it.0:36:26
The problem is, there's a very, very narrow line between deferring decisions and outright procrastination.0:36:38
So you'd like to make progress, but also at the same time, never be bound by the consequences of your decisions.0:36:48
Data abstraction's one way of doing this. What we did is we used wishful thinking. See, we gave a name to the decision.0:36:57
We said, make-RAT, numerator, and denominator will stand for however it's going to be done, and however it's going to be done is George's problem. But really, what that was doing is giving a name0:37:06
to the decision of how we're going to do it, and then continuing as if we made the decision. And then eventually, when we really wanted it to work,0:37:17
coming back and facing what we really had to do. And in fact, we'll see a couple times from now that you may never have to choose any particular representation, ever, ever.0:37:27
Anyway, that's a very powerful design technique. It's the key to the reason people use data abstraction. And we're going to see that idea again and again.0:37:37
Let's stop for questions. AUDIENCE: What does this decision making through abstraction layers do to the axiom of do all your design0:37:47
before any of your code? PROFESSOR: Well, that's someone's axiom, and I bet that's the axiom of someone who hasn't implemented very large computer systems very much.0:38:01
I said that computer science is a lot like magic, and it's sort of good that it's like magic. There's a bad part of computer science that's a lot like religion. And in general, I think people who0:38:12
really believe that you design everything before you implement it basically are people who haven't designed very many things.0:38:21
The real power is that you can pretend that you've made the decision and then later on figure out which one is right, which decision you ought to have made.0:38:30
And when you can do that, you have the best of both worlds. AUDIENCE: Can you explain the difference between let and define?0:38:40
PROFESSOR: Oh, OK. Let is a way to establish local names.0:38:55
Let me give you sort of the half answer. And I'll say, later on we can talk about the whole very complicated thing. But the big difference for now is that, see,0:39:05
when you're typing at Lisp, you're typing in this environment where you're making definitions. And when you say define a to be 5, if I say define a to be 5,0:39:20
then from then on the thing will remember that a is 5. Let is a way to set up a local context where0:39:29
there's a definition. So if I type something like, saying let a-- no, I shouldn't say a-- if I said let z0:39:43
be 10, and within that context, tell me what the sum of z and z0:39:53
is. So if I typed in this expression to Lisp, and then this would put out 20.0:40:02
However, then if I said what's z, the computer would say that's an unbound variable. So let is a way of setting up a context where0:40:13
you can make definitions. But those definitions are local to this context. And of course, if I'd said a in here, I'd still get 20.0:40:27
But this a would not interfere at all with this one. So if I type this, and then type this, and then say what's a?0:40:36
a will still be 5. So there's some other subtle differences between let and define, but that's the most important one.0:41:20
All right, well, we've looked at implementing this little system for doing arithmetic on rational numbers as an example of this methodology of data abstraction.0:41:31
And that's a way of controlling complexity in large systems. But, see, like procedure definition, and like all the ways we're going0:41:40
to talk about for controlling complexity, the real power of these things show up not when you sort of do these things in themselves, like it's not such a great thing0:41:49
that we've done rational number arithmetic, it's that you can use these as building blocks for making more complicated things.0:42:00
So it's no wonderful idea that you can just put two numbers together to form a pair. If that's all you ever wanted to do, there are tons of ways that you can do that. The real issue is can you do that in such a way0:42:11
so that the things that you build become building blocks for doing something even more complex? So whenever someone shows you a method for controlling complexity, you should say, yeah, that's great,0:42:20
but what can I build with it? So for example, let me just run through another thing that's0:42:30
a lot like the rational number one. Suppose we would like to represent points in the plane. You sort of say, well, there's a point, and we're going to call that point p.0:42:40
And that point might have coordinates, like this might be the point 1 comma 2.0:42:50
The x-coordinate might be 1, and it's y-coordinate might be 2. And we'll make a little system for manipulating points in the plane.0:43:00
And again, we can do that-- here's a little example of that. It can represent vectors, the same as points in the plane,0:43:10
and we'll say, yep, there's a constructor called make-vector, make-vector's going to take two coordinates,0:43:21
and here we can implement them if we like as pairs, but the important thing is that there's a constructor. And then given some vector, p, we can find its x-coordinate,0:43:31
or we can get its y-coordinate. So there's a constructor and selectors for points in the plane. Well, given points in the plane, we0:43:40
might want to use them to build something. So for instance, we might want to talk about, we might have a point, p, and a point, q, and p might be the point 1, 2, and q might be the point 2, 3.0:43:54
And we might want to talk about the line segment that starts at p and ends at q. And that might be the segment s.0:44:05
So we might want to build points for vectors in terms of numbers, and segments in terms of vectors.0:44:16
So we can represent line segments in exactly the same way. All right, so the line segment from p to q, we'll say there's a constructor, make-segment.0:44:27
And make up names for the selectors, the starting point of the segment and the ending point of the segment. And again, we can implement a segment using cons as a pair of points, and car and cdr get out the two points0:44:38
that we put together to get the segment. Well, now having done that, we can0:44:48
have some operations on them. Like we could say, what's the midpoint of a line segment?0:44:57
So here's the midpoint of a line segment, that's going to be the points whose coordinates are the averages of the coordinates of the endpoints.0:45:07
OK, there's the midpoint. So to get the midpoint of a line segment, s, we'll just say grab the starting point to the segment,0:45:17
grab the ending point of the segment, and now make a vector-- make a point whose coordinates are the average of the x-coordinate of the first point0:45:27
and the x-coordinate of the second point, and whose y-coordinate is the average of the y-coordinates. So there's an implementation of midpoint.0:45:37
And then similarly, we can build something like the length of the segment. The length of the segment is a thing0:45:46
whose-- use Pythagoras's rule, the length of the segment is the square root of the d x squared plus d y squared.0:45:57
We'll say to get the length of a line segment, we'll let dx be the difference of the x-coordinate of one0:46:06
endpoint and the x-coordinate of the other endpoint, and we'll let dy be the difference of the y-coordinates.0:46:16
And then we'll take the square root of the sum of the squares of dx and dy, that's what this says. All right, so there's an implementation of length.0:46:26
And again, what we built is a layered system.0:46:35
We built a system which has, well, say up here there's segments.0:46:47
And then there's an abstraction barrier. The abstraction barrier separates the implementation0:46:56
of segments from the implementation of vectors and points, and what that abstraction barrier is are the constructors and selectors. It's make-segment, and segment-start, and segment-end.0:47:18
And then there are vectors. And vectors in turn are built on top of pairs and numbers. So I'll say pairs and numbers.0:47:29
And that has its own abstraction barrier, which is make-vector, and x-coordinate, and y-coordinate.0:47:46
So we have, again, a layered system. You're starting to see that there are layers here. I ought to mention, there is a very important thing0:47:57
that I kind of took for granted. And it's sort of so natural, but on the other hand it's a very important thing.0:48:07
Notice that in order to represent this segment s, I said this segment is a pair of points.0:48:16
And a point is a pair of numbers. And if I were going to draw the box and pointers structure for that, I would say, oh, the segment0:48:25
is, given those particular representations that I showed you, I'd say this segment s is a pair,0:48:34
and the first thing in the pair is a vector, and the vector is a pair of numbers.0:48:45
And that's this, that's p. And the other thing in the segment is q, which is itself a pair of numbers.0:49:00
So I almost took it for granted when I said that cons allows you to put things together. But it's very easy to not appreciate0:49:12
that, because notice, some of the things I can put together can themselves be pairs. And let me introduce a word that I'll talk about more next time,0:49:24
it's one of my favorite words, called closure. And by closure I mean that the means of combination0:49:34
in your system are such that when you put things together using them, like we make a pair, you can then put those together0:49:43
with the same means of combination. So I can have not only a pair of numbers, but I can have a pair of pairs. So for instance, making arrays in a language like Fortran0:49:57
is not a closed means of combination, because I can make an array of numbers, but I can't make an array of arrays. And one of the things that you should ask, one of your tests0:50:09
of quality for a means of combination that someone shows you, is gee, are the things you make closed under that means of combination?0:50:18
So pairs would not be nearly so interesting if all I could do was make a pair of numbers. I couldn't build very much structure at all. OK, well, we'll come back to that.0:50:28
I just wanted to mention it now. You'll hear a lot about closure later on. You can also see the potential for losing control0:50:38
of complexity as you have a layered system if you don't use data abstraction. Let's go back and look at this slide for length.0:50:48
Length works and is a simple thing because I can say, when I want to get this value, I can say, oh, that is the x-coordinate of the first endpoint of the segment.0:51:02
And each of these things, each of these selectors, x-coordinate and endpoint, stand for a decision choice whose details I don't have to look at.0:51:12
So I could perfectly well, again, just like rational numbers I did before, I could say, oh well, gee, a segment really is a pair of pairs.0:51:21
And the x-coordinate of the first endpoint or the segment really is the-- well, what is it? It's the car of the car of the segment.0:51:33
So I could perfectly well go and redefine length. I could say, define the length of some segment s.0:51:48
And I could start off writing something like, well, we'll let dx be-- well, what's it have to be? It's got to be the difference of the two coordinates,0:51:58
so that's the difference of, the first one is the car of the car of s, subtracted0:52:08
from the first one, the car of the other half of it, the cdr of s.0:52:21
All right, and then dy would be-- well, let's see, I'd get the y-coordinate, so it'd be the difference of the cdr of the car of s,0:52:33
and the cdr of the cdr of s, sort of go on.0:52:44
You can see that's much harder to read than the program I had before. But worse than that, suppose you'd gone and implemented length?0:52:56
And then the next day, George comes to you and says, I'm sorry, I changed my mind. I want to write points with the x-coordinate first. So you come back you stare at this code0:53:06
and say, oh gee, what was that? That was the car, so I have to change this to cdr, and this is cdr, and this now has to be car.0:53:20
And this has to be car. And you sort of do that, and then the next day George comes back and says, sorry, the guys designing the display0:53:31
would like lines to be painted in the opposite direction, so I have to write the endpoint first in the order. And then you come back and you stare at this code, and say,0:53:40
gee, what was it talking about? Oh yeah, well I've got to change this one to cdr, and this one becomes car, this one comes car,0:53:49
and this becomes cdr. And you go up and do that, and then the next day, George comes back and says, I'm sorry, what I really meant is that the segments always have to be painted from left to right on the screen.0:53:59
And then you sort of, it's clear, you just go and punch George in the mouth at that point. But you see, as soon as we have a 10 layer system,0:54:09
you see how that complexity immediately builds up to the point where even something like this gets out of control. So again, the way we've gotten out of that0:54:19
is we've named that spirit. We built a system where there is a thing, which is the representation choice for how you're0:54:29
going to talk about vectors. And choices about that representation are localized right there. They don't have their guts spilling over into things like how you compute the length0:54:38
and how you compute the midpoint. And that's the real power of this system. OK, we're explicit about them, so0:54:48
that we have control over them. All right, questions? AUDIENCE: What happens in the case where you don't want to be treating objects in terms of pairs? For instance, in three-dimensional space,0:55:00
you'd have three coordinates. Or even in the case where you have n-dimensional space, what happens? PROFESSOR: Right, OK. Well, this is a preview of what I'll say tomorrow. But the point is, once you have two things,0:55:14
you have as many things as you want. All right? Because if I want to make three things, I could start making things like a pair whose first thing is0:55:25
1, and whose second thing is another pair that, say, has 2 and 3 in it.0:55:34
And so on, a hundred things. I can nest them out of pairs. I made a pretty arbitrary decision about how to do it, and you can immediately see there are lots of ways to do that. What we'll start talking about next time0:55:44
are conventions for how to do things like that. But notice that what this really depends on is I can make pairs of pairs. If all I could do was make pairs of numbers, I'd be stuck.0:56:07
OK. Let's break. [MUSIC PLAYING]0:56:55
All right, well, we've just gone off and done a couple of simple examples of data abstraction. Now I want to do something more complicated.0:57:05
We're going to talk about what it means. And this will be harder, because it's always much harder in computer programming to talk about what something means than to go off and do it.0:57:16
But let's go back to almost the very beginning. Let's go back to the point where I said,0:57:25
we just assumed that there were procedures, make-RAT, and numer, and denom.0:57:38
Let's go back to where we had this, at the very beginning, constructors and selectors, and went off and defined the rational number arithmetic.0:57:47
And remember, I said at that point we were sort of done, except for George. Well, what is it that we'd actually done at that point? What was it that was done?0:57:59
Well, what I want to say is, what was done after we'd implemented the operations and terms of these, was that we had defined a rational number0:58:08
representation in terms of abstract data.0:58:17
What do I mean by abstract data? Well, the idea is that at that point, when we had our +RAT and our *RAT,0:58:28
that any implementation of make-RAT, and numerator, and denominator that George supplied us with,0:58:38
could be the basis for a rational number representation. Like, it wasn't our concern where you divided through to get the greatest common denominator, or any of that.0:58:48
So the idea is that what we built is a rational arithmetic system that would sit on top of any representation.0:58:57
What do I mean by any representation? I mean, certainly it can't be the case that all I mean is George can reach in a bag and pull out three arbitrary procedures and say, well, fine,0:59:09
now that's the implementation. That can't be what I mean. What I've got to mean is that there's0:59:18
some way of saying whether three procedures are going to be suitable as a basis for rational number representation. If we think about it, what suitable0:59:29
might mean is if I have to assume something like this, I have to say that if x is the result of say,0:59:39
doing make-RAT of n and d, then the numerator of x divided0:59:59
by the denominator of x is equal to n over d.1:00:09
See, what that is is that's George's contract. What we mean by writing a contract for rational numbers, if you think about it, this is the right thing.1:00:18
And the two ones we showed do the right thing. See, if I'm taking out greatest common divisors, it doesn't matter whether I take them out or not,1:00:27
or the place where I take them, because the idea is I'm going to divide through. But see, this is George's contract. So what we really say to George is your business is to go off and find us1:00:39
three procedures, make-RAT, and numerator, and denominator, that fulfill this contract for any choice of n and d. And that's what we mean by we can use that1:00:50
as the basis for a rational number representation. And other than that, it fulfills this contract. We don't care how he does it.1:00:59
It's not our business. It's below the layer of abstraction. In fact, if we want to say, what is a rational number really?1:01:13
See, what's it really, without having to talk about going below the layer of abstraction, what we're forced into saying is a rational number really is sort of this axiom, is three procedures,1:01:27
make-RAT, numerator, and denominator, that satisfy this axiom. In some sense, abstractly, that's what a rational number is really.1:01:41
That's sort of easy words to listen to, because what you have in your head, of course, is well, for all this thing about saying that's what a rational number is really,1:01:50
you actually just saw that we built rational numbers. See, what we really did is we built rational numbers1:02:03
on top of pairs. So for all I'm saying abstractly, we can say a rational number really is just this axiom.1:02:15
You can listen to that comfortably, because you're saying, well, yeah, but really it's actually pairs, and I'm just annoying you by trying to be abstract.1:02:24
Well, let me, as an antidote for that, let me do something that I think is really going to terrify you. I mean, it's really going to bring1:02:33
you face to face with the sort of existential reality of this abstraction that we're talking about. And what I'm going to talk about is, what are pairs really?1:02:45
See, what did I tell you about pairs? I tricked you, right? I said that Lisp has this primitive called cons that builds pairs. But what did I really tell you about?1:02:56
If you go back and said, let's look on this slide, all I really told you about pairs is that there happens to be this property, these properties1:03:05
of cons, car, and cdr. And all I really said about pairs is that there's a thing called cons, and a thing called car, and a thing called cdr.1:03:14
And it is the case that if I build cons of x, y and take car of it, I get x. And if I build cons of x, y and get cdr of it, I get y.1:03:25
And even though I lulled you into thinking that there's something in Lisp that does that, so you pretended you knew1:03:34
what it was, in fact, I didn't tell you any more about pairs than this tells you about rational numbers. It's just some axiom for pairs.1:03:44
Well, to drive that home, let me really scare you, and show you what we might build pairs in terms of.1:03:56
And what you're going to see is that we can build rational numbers, and line segments, and vectors, and all of this stuff in terms of pairs, and we're going to see below here that pairs can1:04:06
be built out of nothing at all. Pure abstraction. So let me show you on this slide an implementation1:04:17
of cons, car, and cdr. And we'll look at it again in a second, but notice that their procedure definitions of cons, car,1:04:26
and cdr, you don't see any data in there, what you see is a lambda. So cons here is going to return--1:04:38
is a procedure that returns a procedure, just like AVERAGE DAMP. Cons of a and b returns a procedure of an argument1:04:49
called pick, and it says, if pick is equal to 1, I'm going to return a, and if pick is equal to 2,1:04:58
I'm going to return b, and that's what cons is going to be. Car of a thing x, car of a pair x,1:05:10
is going to be x applied to 1. And notice that makes sense. You might not understand why or how I'm doing such a thing, but at least it makes sense, because the thing constructed1:05:19
by cons is a procedure, and car applies that to 1. And similarly, cdr applies that thing to 2.1:05:29
OK, now I claimed that this is a representation of cons, car, and cdr, and notice there's no data in it. All right, it's built out of air. It's just procedures.1:05:39
There's no data objects at all in that representation. Well, what could that possibly mean?1:05:49
Well, if you really believe this stuff, then you have to believe that in order to show that that's a representation for cons, car,1:05:59
and cdr, all I have to do is show that it satisfies the axiom. See, all I should have to convince you of is, for example, that gee, that car of cons of 37 and 491:06:22
is 37 for arbitrary values of 37 and 49. And cdr the same way.1:06:32
See, if I really can demonstrate to you that that weird procedure definition, in terms of [? air ?], has the property that it satisfies this,1:06:41
then you just have to grant me that that is a possible implementation of cons, car, and cdr, on which I can build everything else. Well, let's look at that.1:06:50
And this will be practice in the substitution model.1:06:59
How could we check this? We sort of know how to do that. It's just the same substitution model. Let's look. We start out, and we say, what's car of cons of 37 and 49?1:07:11
What do we do? Cons is some procedure. Its value is cons was a procedure of a and b. The thing returned by cons is its procedure body1:07:23
with 37 and 49 substituted for the parameters. It'll be 37 substituted for a and 49 substituted for b.1:07:32
So this expression has the same meaning as this expression. Its car of, and the body of cons was this thing that started with lambda.1:07:43
And it says, so if pick is equal to 1, where pick is this other argument, if pick is equal to 1, it's 37, that's where a was, and if pick is equal to 2, it's 49.1:07:55
So that's the first step. I'm just going through mechanical substitution. And remember, at this point in the course, if you're confused about what things mean, go mechanically through the substitution model.1:08:05
Well, what is this reduced to? Car said, take your argument, which in this case is this,1:08:15
and apply it to 1. That was the definition of car. So if I look at car, if I do that, the answer is, well, it's that argument, this was the argument to car,1:08:25
applied to 1. Well, what does that mean? I take 1, and I substitute it in the body here for this value of pick, which1:08:36
is the name of the argument, what do I get? Well, I get the thing that says if 1 equals 1 it's 37, and if 1 equals 2 it's 49, so the answer's 37.1:08:46
And similarly, if I'd taken cdr, that would apply it to 2, and I'd get 49. So you see, what I've demonstrated is that that completely weird implementation of cons, car,1:08:57
and cdr, satisfies the axioms. So it's a perfectly valid way of building, in fact, all of the data objects we're going to see in Lisp. So they all, if you like, can be built1:09:07
on sort of existential nothing. And as far as you know, that's how it works. You couldn't tell. If all you're ever going to do with pairs1:09:17
is construct them with cons and look at them with car and cdr, you couldn't possibly tell how this thing works. Now, it might give you a sort of warm feeling inside if I say,1:09:26
well, yeah, in fact, for various reasons there happens to be a primitive called cons, car, and cdr, and if it's too scary, if this kind of stuff is too scary, you don't have to look inside of it.1:09:36
So that might make you feel better, but the point is, it really could work this way, and it wouldn't make any difference to the system at all.1:09:46
So in some sense, we don't need data at all to build these data abstractions. We can do everything in terms of procedures. OK, well, why did I terrify you in this way?1:09:57
First, I really want to reinforce this idea of abstraction, that you really can do these things abstractly.1:10:06
Secondly, I want to introduce an idea we're going to see more and more of in this course, which is we're going to blur the line between what's data1:10:17
and what's a procedure. See, in this funny implementation it turned out that cons of something happened to be represented in terms of a procedure,1:10:27
even though we think of it as data. While here that's sort of a mathematical trick, but one of the things we'll see is1:10:36
that a lot of the very important programming techniques that we're going to get to sort of depend very crucially on blurring this traditional line between what you consider a procedure1:10:47
and what you consider data. We're going to see more and more of that, especially next time. OK, questions? AUDIENCE: If you asked the system1:10:56
to print a, what would happen? PROFESSOR: The question is, what would happen if I asked the system to print a.1:11:05
Given this representation, you already know the answer. The answer is compound procedure a, just like last time.1:11:21
It'd say compound procedure. It might say a little bit more. It might say compound procedure lambda or something or other, depending on details of how I named it.1:11:31
But it's a procedure. And the only reason for that is I haven't told the system anything special about how to print such things.1:11:40
Now, it's in fact true that with the actual implementation of cons that to be built in the system, it would print something else. It would print, say, this is a pair.1:11:53
AUDIENCE: When you define cons, and then you pass it into values, how does it know where to look for the cons, because you can use cons1:12:05
over and over again? How does it know where to look to know which a and b it's supposed to pull back out? I don't know if I'm expressing that quite right.1:12:17
Where is it stored? PROFESSOR: OK, the question is, I sort of have a cons with a 37 and a 49, and I might make another cons1:12:27
with a 1 and a 2, and I might have one called a, and I might have one called b. And the question is, how does it know? And why don't they get confused? And that's a very good question.1:12:40
See, you have to really believe that the procedures are objects. It's sort of like saying-- let's try another simpler example.1:12:49
Suppose I ask for the square root of 3. So I asked for the square root of 5,1:12:58
and then I ask for the square of 20. You're probably not the least bit1:13:07
bothered that I can take square root and apply it to 5, and then I can take square root and apply it to 20. And there's sort of no issue, gee,1:13:16
doesn't it get confused about whether it's working on 5 or 20? There's no issue about that because you're thinking of a procedure which goes off and does something.1:13:26
Now, in some sense you're asking me the same question. But it's really bothering you, and it's bothering you for a really good reason. Because when I write that, you're saying gee, this is,1:13:36
I know, sort of a procedure. But it's not a procedure that's just running. It's just sort of a procedure sitting there. And how can it be that sometimes this procedure has 37 and 49,1:13:46
and there might be another one which has 5 and 6 in there, and why don't they get confused? So there's something very, very important that's bothering you.1:13:58
And it's really crucial to what's going on. We're suddenly saying that procedures are not just the act of doing something.1:14:08
Procedures are conceptual entities, objects, and if I built cons of 37 and 49, that's a particular procedure that sits there.1:14:18
And it's different from cons of 3 and 4. That's another procedure that sits there. AUDIENCE: Both of them exist independently. PROFESSOR: And exists independently. AUDIENCE: And they both can be referenced by car and cdr.1:14:28
PROFESSOR: And they both would be referenced by car and cdr. Just like I could increment this, and I could increment that.1:14:38
They're objects. And that's sort of where we're going. See, the fact that you're asking the question shows that you're really starting to think about the implications of what's going on.1:14:47
It's the difference between saying a procedure is just the act of doing something. And a procedure is a real object that has existence.1:14:56
AUDIENCE: So when the procedure gets built, the actual values are now substituted for a and b-- PROFESSOR: That's right. AUDIENCE: And then that procedure exists as lambda, and pick is what's actually passed in.1:15:07
PROFESSOR: Yes, when cons gets called, and the result of cons is a new procedure that's constructed, that new procedure has an argument that's called pick.1:15:17
AUDIENCE: But it no longer has an a and b. The a and b are the actual values that are passed through. PROFESSOR: And it has-- right, according to the substitution model, what it now has is not those arbitrary names a and b,1:15:26
it somehow has that 37 and 49 in there. But you're right, that's a hard thing to think about it, and it's different from the way you've1:15:35
been thinking about procedures. AUDIENCE: And if I have again cons of 37 and 49, it's a different object? PROFESSOR: And if you make another cons of 37 and 49,1:15:51
you're into a wonderful philosophical problem, which is going to be what the lecture about halfway through this course is about.1:16:00
Which is, if I cons 37 and 49, and I do it again, is that the same thing, or is it a different thing? And how could you tell? And when could it possibly matter?1:16:10
And that's sort of like saying, is that the same thing as this?1:16:21
Or is this the same thing as that? It's the same kind of question. And that's a very, very deep question. And I can't answer in less than an hour.1:16:30
But we will.0:00:00
Lecture 3A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING]0:00:21
PROFESSOR: Well, last time we talked about compound data, and there were two main points to that business. First of all, there was a methodology of data0:00:31
abstraction, and the point of that was that you could isolate the way that data objects are used from the way0:00:40
that they're represented: this idea that there's this guy, George, and you go out make a contract with him; and it's his business to represent the data objects; and at the moment you are using them, you don't think0:00:49
about George's problem. And then secondly, there was this particular way that Lisp has of gluing together things to form objects called pairs,0:01:00
and that's done with cons, car and cdr. And the way that cons, car and cdr are implemented is basically irrelevant. That's sort of George's problem of how0:01:09
to build those things. It could be done as primitives. It could be done using procedures in some weird way, but we're not going to worry about that. And as an example, we looked at rational number arithmetic.0:01:20
We looked at vectors, and here's just a review of vectors. Here's an operation that takes the sum of of two vectors, so we want to add this vector, v1, and this vector, v2, and0:01:32
we get the sum. And the sum is the vector whose coordinates are the sum of the coordinates of the pieces you're adding.0:01:41
So I can say, to define make-vect, right, to add two vectors I make a vector, whose x coordinate is the sum of the0:01:50
two x coordinates, and whose y coordinate is the sum of the two y coordinates. And then similarly, we could have an operation that scales0:02:03
vectors, so here's a procedure scale that multiplies a vector, v, by some number, s.0:02:13
So here's v, v goes from there to there and I scale v, and I get a vector in the same direction that's longer. And again, to scale a vector, I multiply the successive0:02:23
coordinates. So I make a vector, whose x coordinate is the scale factor times the x coordinate and whose y coordinate is the scale factor times the y coordinate.0:02:34
So those are two operations that are implemented using the representation of vectors. And the representation of vectors, for instance, is something that we can build in terms of pairs.0:02:45
So George has gone out and implemented for us make-vector and x coordinate and y coordinate, and this could be done, for instance, using cons, car and cdr; and notice0:03:04
here, I wrote this in a slightly different way. The procedures we've seen before, I've said something like say, make-vector of x and y: cons of x and y.0:03:16
And here I just wrote make-vector cons. And that means something slightly different. Previously we'd say, define make-vector to be a procedure that takes two arguments, x and y, and does0:03:26
cons of x and y. And here I am saying define make-vector to be the thing that cons is, and that's almost the same as the other0:03:38
way we've been writing things. And I just want you to get used to the idea that procedures can be objects, and that you can name them.0:03:48
OK, well there's vector representation, and again, if that was all there was to it, this would all be pretty boring. And the point is, remember, that you can use cons to glue0:04:00
together not just numbers to form pairs, but to glue together arbitrary things. So for instance, if we'd like to represent a line segment,0:04:11
say the line segment that goes from a certain vector: say, the segment from the vector 2,3 to the point represented0:04:27
by the vector 5,1. If we want to represent that line segment, then we can build that as a pair of pairs.0:04:41
So again, we can represent line segments. We can make a constructor that makes a segment using cons, selects out the start of a segment, selects out the end0:04:50
point of the segment; and then if we actually look at that, if we peel away the abstraction layers, and say0:05:00
what's that really is a pair of pairs, we'd say well that's a pair. Here's the segment.0:05:10
It's car, right, it's car pointer is a pair, and it's cdr is also a pair, and then what the car is-- here's the0:05:21
car, that itself is a pair of 2 and 3. And similarly the cdr is a pair of 2 and 3. And let me remind you again, that a lot of people have some0:05:30
idea that if I'd taken this arrow and somehow written it to point down, that would mean something else. That's irrelevant. It's only how these are connected and not whether this0:05:40
arrow happens to go vertically or horizontally. And again just to remind you, there was0:05:49
this notion of closure. See, closure was the thing that allowed us to start0:06:02
building up complexity, that didn't trap us in pairs. Particularly what I mean is the things that we make,0:06:12
having combined things using cons to get a pair, those things themselves can be combined using cons to make0:06:21
more complicated things. Or as a mathematician might say, the set of data objects in List is closed under the operation of forming pairs.0:06:34
That's the thing that allows us to build complexity. And that seems obvious, but remember, a lot of the things in the computer languages that people use are not closed. So for example, forming arrays in basic and Fortran is not a0:06:47
closed operation, because you can make an array of numbers or character strings or something, but you can't make an array of arrays. And when you look at means of combination, you should be0:06:59
should be asking yourself whether things are closed under that means of combination. Well in any case, because we can form pairs of pairs, we0:07:09
can start using pairs to glue things together in all sorts of different ways. So for instance if I'd like to glue together the four things, 1, 2, 3 and 4, there are a lot of ways I can do it.0:07:20
I could, for example, like we did with that line segment, I could make a pair that had a 1 and a 2 and0:07:32
a 3 and a 4, right? Or if I liked, I could do something like this. I could make a pair, whose first thing is a pair, whose0:07:46
car is 1, and his cdr is itself a pair that has the 2 and the 3, and then I could put the 4 up here.0:07:56
So you see, there are a lot of different ways that I can start using pairs to glue things together, and so it'll be a good idea to establish some kind of conventions,0:08:07
right, that allow us to deal with this thing in some conventional way, so we're not constantly making an ad hoc choice.0:08:16
And List has a particular convention for representing a sequence of things as, essentially, a chain of pairs,0:08:26
and that's called a List. And what a List is is essentially just a convention0:08:39
for representing a sequence. I would represent the sequence 1, 2, 3 and 4 by a sequence of pairs.0:08:48
I'd put 1 here and then the cdr of this would point to another pair whose car was the next thing in the sequence,0:09:01
and the cdr would point to another pair whose car was the next thing in the sequence-- so there's 3-- and then another one. So for each item in the sequence, I'll get a pair.0:09:15
And now there are no more, so I put a special marker that means there's nothing more in the List. OK, so that's a0:09:28
conventional way to glue things together if you want to represent a sequence, right. And what it is is a bunch of pairs, the successive cars of0:09:42
each pair are the items that you want to glue together, and the cdr pointer points to the next pair. Now if I actually wanted to construct that, what I would0:09:52
type into List is this: I'd actually construct that as saying, well this thing is the cons of 1 onto the cons of 20:10:07
onto the cons of 3 onto the cons of 4 onto, well, this thing nil. And what nil is is a name for the end of List marker.0:10:21
It's a special name, which means this is the end of the List. OK, so that's how I would actually construct that.0:10:37
Of course, it's a terrible drag to constantly have to write something like the cons of 1 onto the cons of 2 onto the cons of 3, whenever you want to make this thing. So List has an operation that's called List, and List0:10:54
is just an abbreviation for this nest of conses. So I could say, I could construct that by saying that is the List of 1, 2, 3 and 4.0:11:08
And all this is is another way, a piece of syntactic sugar, a more convenient way for writing that chain of conses-- cons of cons of cons of cons of cons of cons onto nil.0:11:18
So for example, I could build this thing and say, I'll define 1-TO-4 to be the List of 1, 2, 3 and 4.0:11:48
OK, well notice some of the consequences of using this convention. First of all if I have this List, this 1, 2, 3 and 4, the0:11:57
car of the whole thing is the first element in the List, right. How do I get 2? Well, 2 would be the car of the cdr of this thing 1-TO-4,0:12:21
it would be 2, right. I take this thing, I take the cdr of it, which is this much,0:12:30
and the car of that is 2, and then similarly, the car of the cdr of the cdr of 1-TO-4, cdr, cdr, car--0:12:48
would give me 3, and so on. Let's take a look at that on the computer screen for a second.0:12:57
I could come up to List, and I could type define 1-TO-4 to be0:13:07
the List of 1, 2, 3 and 4, right. And I'll tell that to List, and it says, fine, that's the0:13:19
definition of 1-TO-4. And I could say, for instance, what's the car of the cdr of0:13:28
the cdr of 1-TO-4, close paren, close paren.0:13:38
Right, so the car of the cdr of the cdr would be 3. Right, or I could say, what's 1-TO-4 itself.0:13:51
And you see what List typed out is 1, 2, 3, 4, enclosed in parentheses, and this notation, typing the elements of the List enclosed in parentheses is List's0:14:02
conventional way for printing back this chain of pairs that represents a sequence. So for example, if I said, what's the cdr of 1-TO-4,0:14:19
that's going to be the rest of the List. That's the thing pointed to by the first pair, which is, again, a sequence that starts off with 2.0:14:28
Or for example, I go off and say, what's the cdr of the cdr of 1-TO-4; then that's 3,4.0:14:44
Or if I say, what's the cdr of the cdr of the cdr of the cdr0:14:58
of 1-TO-4, and I'm down there looking at the end of List0:15:07
pointer itself, and List prints that as just open paren, close paren. You can think of that as a List with nothing in there. All right, see at the end what I did there was I looked at0:15:16
the cdr of the cdr of the cdr of 1-TO-4, and I'm just left with the end of List pointer itself. And that gets printed as open close.0:15:34
All right, well that's a conventional way you can see for working down a List by taking successive cdrs of things.0:15:43
It's called cdring down a List. And of course it's pretty much of a drag to type all those cdrs by hand. You don't do that. You write procedures that do that.0:15:53
And in fact one very, very common thing to do in List is to write procedures that, sort of, take a List of things and0:16:02
do something to every element in List, and return you a List of the results. So what I mean for example, is I might write a procedure called Scale-List, and Scale-List I might say I want0:16:18
to scale by 10 the entire List 1-TO-4, and that would return0:16:27
for me the List 10, 20, 30, 40.0:16:36
[UNINTELLIGIBLE PHRASE] Right, it returns List, and well you can see that there's0:16:46
going to be some kind of recursive strategy for doing it. How would I actually write that procedure? The idea would be, well if you'd like to build up a List0:16:56
where you've multiplied every element by 10, what you'd say is well you imagine that you'd taken the rest of the List--0:17:06
right, the thing represented by the cdr of the List, and suppose I'd already built a List where each of these was multiplied by 10--0:17:16
that would be Scale-List of the cdr of the List. And then all I have to do is multiply the car of the List by 10, and0:17:25
then cons that onto the rest, and I'll get a List. Right and then similarly, to have scaled the cdr of the List, I'll scale the cdr of that and cons onto that 20:17:35
multiplied by 10. And finally when I get all the way down to the end, and I only have this end of List pointer. All right, this thing whose name is nil-- well I just returned an end of List pointer.0:17:45
So there's a recursive strategy for doing that. Here's the actual procedure that does that. Right, this is an example of the general strategy of cdr-ing down a List and so called cons-ing0:17:56
up the result, right. So to Scale a List l by some scale factor s, what do I do?0:18:06
Well there's a test, and List has the predicate called null. Null means is this thing the end of List pointer, or another way to think of that is are there any elements in0:18:16
this List, right. But in any case if I'm looking at the end of List pointer, then I just return the end of List pointer. I just return nil, otherwise I cons together the result of0:18:32
doing what I'm going to do to the first element in the List, namely taking the car of l and multiplying it by s, and I cons that onto recursively scaling the rest of the List.0:18:50
OK, so again, the general idea is that you recursively do something to the rest of the List, to the cdr of the List, and then you cons that onto actually doing something to0:18:59
the first element of the List. When you get down to the end here, you return the end of List pointer, and that's a general pattern for doing something to a List. Well of0:19:16
course you should know by now that the very fact that there's a general pattern there means I shouldn't be writing this procedure at all. What I should do is write a procedure that's the general0:19:25
pattern itself that says, do something to everything in the List and define this thing in terms of that. Right, make some higher order procedure, and here's the higher order procedure that does that.0:19:34
It's called MAP, and what MAP does is it takes a List, takes a List l, and it takes a procedure p, and it returns0:19:45
the List of the elements gotten by applying p to each successive element in the List. All right, so p to v1, p to v2, p of en.0:19:56
Right, so I think of taking this List and transforming it by applying p to each element. And you see all this procedure is is exactly the general0:20:06
strategy I said. Instead of multiply by 10, it's do the procedure. If the List is empty, return nil. Otherwise, apply p to the first element of the List.0:20:17
Right, apply p to car of l, and cons that onto the result of applying p to everything in the cdr of the List, so that's0:20:26
a general procedure called MAP. And I could define Scale-List in terms of MAP.0:20:39
Let me show you that first. But I could say Scale-List is another way to define it is just MAP along the List by the procedure, which takes an item0:20:53
and multiplies it by s. Right, so this is really the way I should think about scaling the List, build that actual recursion into the0:21:04
general strategy, not to every particular procedure I write. And of course, one of the values of doing this is that you start to see commonality. Right, again you're capturing general patterns of usage.0:21:16
For instance, if I said MAP, the square procedure, down this List 1-TO-4, then I'd end up with 1, 4, 9 and 16.0:21:32
Right, or if I said MAP down this List, lambda of x plus0:21:42
x10, if I MAP that down 1-TO-4, then I'd get the List0:21:51
where everything had 10 added to it: right, so I'd get 11, 12, 13, 14.0:22:00
And you can see that's going to be a very, very common idea: doing something to every element in the List. One thing you might think about is writing MAP in an0:22:11
iterative style. The one I wrote happens to evolve a recursive process, but we could just as easily have made one that evolves an iterative process. But see the interesting thing about it is that once you0:22:21
start thinking in terms of MAP-- see, once you say scale is just MAP, you stop thinking about whether it's iterative or recursive, and you just say, well there's this aggregate, there's this List,0:22:32
and what I do is transform every item in the List, and I stop thinking about the particular control structure in order. That's a very, very important idea, and it, I guess it0:22:45
really comes out of APL. It's, sort of, the really important idea in APL that you stop thinking about control structures, and you start thinking about operations on aggregates, and then about0:22:55
halfway through this course, we'll see when we talk about something called stream processing, how that view of the world really comes into its glory. This is just us a, sort of, cute idea.0:23:05
But we'll see much more applications of that later on. Well let me mention that there's something that's very similar to MAP that's also a useful idea, and that's--0:23:17
see, MAP says I take a List, I apply something to each item, and I return a List of the successive values.0:23:26
There's another thing I might do, which is very, very similar, which is take a List and some action you want to do and then do it to each item in the List in sequence.0:23:36
Don't make a List of the values, just do this particular action, and that's something that's very much like MAP.0:23:45
It's called for-each, and for-each takes a procedure and a List, and what it's going to do is do something to every item in the List. So basically what it does: it says if the0:23:56
List is not empty, right, if the List is not null, then what I do is, I apply my procedure to the first item in0:24:05
the List, and then I do this thing to the rest of the List. I apply for-each to the cdr of the List.0:24:15
All right, so I do it to the first of the List, do it to the rest of the List, and of course, when I call it recursively, that's going to do it to the rest of the rest of the List and so on.0:24:24
And finally, when I get done, I have to just do something to say I'm done, so we'll return the message "done." So that's very, very similar to MAP. It's mostly different in what it returns.0:24:35
And so for example, if I had some procedure that printed things on the screen, if I wanted to print everything in the List, I could say for-each, print this List. Or0:24:47
if I had a List of figures, and I wanted to draw them on the display, I could say for-each, display on the screen this figure.0:24:57
Let's take questions. AUDIENCE: Does it create a new copy with something done to it, unless you explicitly tell it to do that?0:25:06
Is that correct? PROFESSOR: Right. Yeah, that's right. For-each does not create a List. It just sort of does something.0:25:15
So if you have a bunch of things you want to do and you're not worried about values like printing something, or drawing something on the screen, or ringing the bell on the terminal, or for something,0:25:24
you can say for-each, you know, do this for-each of those things in the List, whereas MAP actually builds you this new collection of values that you might want to use. It's just a subtle difference between them.0:25:34
AUDIENCE: Could you write MAP using for-each, so that you did some sort of cons or something to build the List back up? PROFESSOR: Well, sort of. I mean, I probably could.0:25:44
I can't think of how to do it right offhand, but yeah, I could arrange something. AUDIENCE: The vital difference between MAP and for-each is one is recursive and the other is not in the sense you0:25:57
defined early yesterday, I believe. PROFESSOR: Yeah, about MAP and for-each and recursion. Yeah, that's a good point.0:26:09
For the MAP procedure I wrote, that happens to be a recursive process. And the reason for that is that when you've done this thing to the rest of the List, you're waiting for that value0:26:19
so that you can stick it on to the beginning of the List, whereas for-each doesn't really have any values to wait for. So that turns out to be an iterative process. That's not fundamental. I could have defined MAP so that it's evolved by an0:26:30
iterative process. I just didn't happen to. AUDIENCE: If you were to cons for each with a List that had embedded Lists, I imagine it would work, right?0:26:43
It would give you the internal elements of each of those internal Lists? PROFESSOR: OK, the question is if I [UNINTELLIGIBLE] for each or MAP, for that matter, with a List that had0:26:54
Lists in it-- although we haven't really looked at that yet-- would that work. The answer is yes in the sense I mean work and no in the0:27:04
sense that you mean work, because all that-- see if I give you a List, where hanging off here is, you0:27:16
know, is something that's not a number, maybe another List or you know, another cons or something, for-each just says do something to each item in this List. It goes down0:27:25
successively looking at the cdrs. AUDIENCE: OK. PROFESSOR: And as far as it's concerned, the first item in this List is whatever is hanging off here. AUDIENCE: Mhm. PROFESSOR: That might or might not be the right thing. AUDIENCE: So it wouldn't go down into the--0:27:35
PROFESSOR: Absolutely not. I could certainly write something else. There's another, what you're looking for is a common pattern of usage called tree recursion, where you take a List, and you actually go all the way down to the what's0:27:46
called the leaves of the tree. And you could write such a thing, but that's not for-each and it's not MAP. Remember, these things are really being very simple minded.0:27:55
OK, no more questions? All right, let's break. [MUSIC PLAYING]0:28:42
PROFESSOR: What I'd like to do now is spend the rest of this time talking about one example, and this example, I think, pretty much summarizes everything that we've done up0:28:53
until now: all right, and that's List structure and issues of abstraction, and representation and capturing0:29:02
commonality with higher order procedures, and also is going to introduce something we haven't really talked about a lot yet-- what I said is the major third theme in this0:29:13
course: meta-linguistic abstraction, which is the idea that one of the ways of tackling complexity in engineering design is to build a suitable powerful language.0:29:27
You might recall what I said was pretty much the very most important thing that we're going to tell you in this course is that when you think about a language, you think0:29:39
about it in terms of what are the primitives; what are the means of combination--0:29:49
right, what are the things that allow you to build bigger things; and then what are the means of abstraction.0:30:01
How do you take those bigger things that you've built and put black boxes around them and use them as elements in making something even more complicated?0:30:12
Now the particular language I'm going to talk about is an example that was made up by a friend of ours0:30:21
called Peter Henderson. Peter Henderson is at the University of Stirling in Scotland.0:30:32
And what this language is about is making figures that sort of look like this.0:30:42
This is this is a woodcut by Escher called "Square Limit." You, sort of, see it has this complicated, kind of,0:30:52
recursive, sort of, recursive kind of figure, where there's this fish pattern in the middle and things sort of0:31:02
bleed out smaller and smaller in self similar ways. Anyway, Peter Henderson's language was for describing0:31:11
figures that look like that and designing new ones that look like that and drawing them on a display screen.0:31:20
There's another theme that we'll see illustrated by this example, and that's the issue of what Gerry and I have0:31:31
already mentioned a lot: that there's no real difference, in some sense, between procedures and data. And anyway I hope by the end of this morning, if you're not0:31:41
already, you will be completely confused about what the difference between procedures and data are, if you're not confused about that already.0:31:51
Well in any case, let's start describing Peter's language. I should start by telling you what the primitives are. This language is very simple because there's only one primitive.0:32:03
A primitive is not quite what you think it is. There's only one primitive called a picture, and a picture is not quite what you think it is.0:32:12
Here's an example. This is a picture of George. The idea is that a picture in this language is going to be0:32:23
something that draws a figure scaled to fit a rectangle that you specify.0:32:33
So here you see in [? Saint ?] [? Lawrence's ?] outline of a rectangle, that's not really part of the picture, but the picture--0:32:43
you'll give it a rectangle, and it will draw this figure scaled to fit the rectangle. So for example, there's George, and here, this is also George.0:32:52
It's the same picture, right, just scaled to fit a different rectangle. Here's George as a fat kid.0:33:02
That's the same George. It's all the same figure. All of these three things are the same picture in this language. I'm just giving it different rectangles to scale itself in.0:33:16
OK, those are the primitives. That is the primitive. Now let's start talking about the means of combination and the operations.0:33:25
There is, for example, an operation called Rotate. And what Rotate does is, if I have a picture, say a picture0:33:35
that draws an "A" in some rectangle that I give it, the Rotate of that-- say the Rotate by 90 degrees would, if I give it a0:33:47
rectangle, draw the same image, but again, scaled to fit that rectangle.0:33:56
So that's Rotate by 90 degrees. There's another operation called Flip that can flip something, either horizontally or vertically. All right, so those are, sort of, operations, or you can0:34:06
think of those as means of combination of one element. I can put things together. There's a means of combination called Beside, and what Beside0:34:17
does: it'll take two pictures, let's say A and B--0:34:29
and by picture I mean something that's going to draw an image in a specified rectangle-- and what Beside will do--0:34:38
I have to say, Beside of A and B, the side of two pictures and some number, s. And s will be a number between zero and one.0:34:50
And Beside will draw a picture that looks like this. It will take the rectangle you give it and scale its base by s. Say s is 0.5.0:35:00
And then over here it will draw-- it'll put the first picture, and over here it'll put the0:35:12
second picture. Or for instance if I gave it a different value of s, if I said Beside with a 0.25, it would do the same thing,0:35:27
except the A would be much skinnier. So it would draw something like that.0:35:38
So there's a means of combination Beside, and similarly there's an Above, which does the same thing except it puts them vertically instead of horizontally.0:35:47
Well let's look at that. All right, there's George and his kid brother, which is,0:35:58
right, constructed by taking George and putting him Beside0:36:10
the Above-- taking the empty picture, and there's a thing called the empty picture, which does the obvious thing-- putting the empty picture above a copy of George, and0:36:19
then putting that whole thing Beside George.0:36:28
Here's something called P which is, again, George Beside0:36:38
Flipping George, I think, horizontally in this case, and then Rotating the whole result 180 degrees and putting them Beside one another with the basic rectangle divided at0:36:50
0.5, right, and I can call that P. And then I can take P,0:36:59
and put it above the Flipped copy of itself, and I can call that Q.0:37:09
Notice how rapidly that we've built up complexity, just in, you know, 15 seconds, you've gotten from George to that0:37:18
thing Q. Why is that? How are how we able to do that so fast? The answer is the closure property.0:37:28
See, it's the fact that when I take a picture and put it Beside another picture, that's then, again, a picture that I can go and Rotate and Flip or put Above something else.0:37:39
Right, and when I take that element P, which is the Beside or the Flip or the Rotate of something, that's, again, a picture. Right, the world of pictures is closed under those means of0:37:49
combination. So whenever I have something, I can turn right around and use that as an element in something else. So maybe better than List and segments, that just gives you0:37:59
an image for how fast you can build up complexity, because operations are closed. OK, well before we go on with building more things, let's0:38:12
talk about how this language is actually implemented. The basic element that sits under the table here is a0:38:23
thing called a rectangle, and what a rectangle is going to be, it's a thing that specified by an origin that's0:38:36
going to be some vector that says where the rectangle starts. And then there's going to be some other vector that I'm going to call the horizontal part of the rectangle, and0:38:49
another picture called the vertical part of the rectangle.0:39:00
And those three pieces are the elements: where the lower vertex is, how you get to the next vertex over here, and how you get to the vertex over there.0:39:09
The three vectors specify a rectangle. Now to actually build rectangles, what I'll assume0:39:18
is that we have a constructor called "make rectangle," or "make-rect," and selectors for horiz and vert and origin that0:39:37
get out the pieces of that rectangle. And well, you know a lot of ways you can do this now. You can do it by using pairs in some way or other standard0:39:47
List or not. But in any case, the implementation of these things, that's George's problem. It's just a data representation problem. So let's assume we have these rectangles to work with.0:39:58
OK. Now the idea of this, remember what's got to happen. Somehow we have to worry about taking the figure and scaling0:40:10
it to fit some rectangle that you give it, that's the basic thing you have to arrange, that these pictures can do.0:40:22
How do we think about that? Well, one way to think about that is that any time I give you a rectangle, that defines, in some sense, a0:40:40
transformation from the standard square into that rectangle. Let me say what I mean. By the standard square, I'll mean something, which is a0:40:49
square whose coordinates are 0,0, and 1,0, and 0,1 and 1,1.0:41:01
And there's some sort of the obvious scaling transformation, which maps this to that and this to that,0:41:10
and sort of, stretches everything uniformly. So we take a line segment like this and end up mapping it to0:41:22
a line segment like that, so some point xy goes to some0:41:31
other point up there. And although it's not important, with a little vector algebra, you could write that formula. The thing that xy goes to, the point that xy goes to is0:41:43
gotten by taking the origin of the rectangle and then adding that as a vector to-- well, take x, the x coordinate, which is something0:41:54
between zero and one, multiply that by the horizontal vector of the rectangle; and take the y coordinate, which is also0:42:09
something between zero and one and multiply that by the vertical vector of the rectangle. That's just a little linear algebra.0:42:19
Anyway, that's the formula, which is the right obvious transformation that takes things into the unit square, into the interior of that rectangle.0:42:31
OK well, let's actually look at that as a procedure. So what we want is the thing which tells us that particular transformation that a rectangle defines.0:42:44
So here's the procedure. I'll call it coordinate-map. Coordinate-map is the thing that takes as its argument a rectangle and returns for you a procedure on points.0:43:00
Right, so for each rectangle you get a way of transforming a point xy into that rectangle. And how do you get it? Well I just-- writing in List what I wrote there on the blackboard--0:43:10
I add to the origin of the rectangle the result of adding--0:43:20
I take the horizontal part of the rectangle; I scale that by the x coordinate of the point.0:43:29
I take the vertical vector of the rectangle. I scale that by the y coordinate of the point, and then add all those three things up.0:43:40
That's the procedure. That is the procedure that I'm going to apply to a point. And this whole thing is generated for each rectangle.0:43:53
So any rectangle defines a coordinate MAP, which is a procedure on points. OK.0:44:06
All right, so for example, George here, my original George, might have been something that I specified by segments in the unit square, and then for each rectangle I0:44:20
give this thing, I'm going to draw those segments inside that rectangle. How actually do I do that?0:44:30
Well I take each segment in my original reference George that was specified, and to each of the end points of those0:44:40
segments, I applied the coordinate MAP of the particular rectangle I want to draw it in. So for example, this lower rectangle, this George as a fat kid rectangle, has its coordinate MAP.0:44:51
And if I want to draw this image, what I do is for each segment here, say for this segment, I transformed that0:45:01
point by the coordinate MAP, transform that point by the coordinate MAP. That will give me this point and that point and draw the segment between them.0:45:10
Right, that's the idea. Right, and if I give it a different rectangle like this one, that's a different coordinate MAP, so I get a different image of those line segments.0:45:19
Well how do we actually get a picture to start with? I can build a picture to start with out of a List of line segments initially. Here's a procedure that builds what I'll call a primitive0:45:31
picture, meaning one I, sort of, got that didn't come out of Beside or Rotate or something. It starts with a List of line segments, and now0:45:43
it does what I said. What's a picture have to be? First of all it's a procedure that's defined on rectangles.0:45:52
What does it do? It says for each-- this is going to be a List of line segments-- for each segment, for each s, which is a segment in this0:46:02
List of segments, well it draws a line. What line does it draw? It gets the start point of that segment, transforms that0:46:16
by the coordinate MAP of the rectangle. That's the first new point it wants to do. Then it takes the endpoint of the segment, transforms that by the coordinate MAP of the rectangle, and then draws a0:46:27
line between. Let's assume drawline is some primitive that's built into the system that actually draws a line on the display. All right, so it transforms the endpoints by the coordinate MAP of the rectangle, draws a line0:46:37
between them, does that for each s in this List of segments.0:46:46
And now remember again, a picture is a procedure that takes a rectangle as argument. So when you hand it a rectangle, this is what it does: draws those lines.0:46:57
All right, so there's-- how would I actually use this thing? Let's make it a little bit more concrete. Right, I would say for instance, define R to be0:47:21
make-rectangle of some stuff, and I'd have to specify some vectors here using make-vector.0:47:30
And then I could say, define say, G to be make-picture, and0:47:45
then some stuff. And what I'd have to specify here is a List of line segments, right, using make segment.0:47:55
Make-segment might be made out of vectors, and vectors might be made out of points. And then if I actually wanted to see the image of G inside a rectangle, well a picture is a procedure that takes a0:48:10
rectangle as argument. So if I then called G with an input of R, that would cause whatever image G is worrying about to be drawn inside the0:48:22
rectangle R. Right, so that's how you'd use that. [MUSIC PLAYING]0:49:08
PROFESSOR: Well why is it that I say this example is nice? You probably don't think it's nice. You probably think it's more weird than nice. Right, representing these pictures as procedures, which0:49:18
do complicated things with rectangles. So why is it nice? The reason it's nice is that once you've implemented the0:49:29
primitives in this way, the means of combination just fall out by implementing procedures. Let me show you what I mean. Suppose we want to implement Beside.0:49:41
So I'd like to-- suppose I've got a picture. Let's call it P1. P1 is going to be-- and now remember what a picture really is.0:49:50
It's a thing that if you can hand it some rectangle, it will cause an image to be drawn in whatever rectangle0:50:00
you hand it. And suppose P2 two is some other picture, and you hand that a rectangle.0:50:09
And whatever rectangle you hand it, it draws some picture. And now if I'd like to implement Beside of P1 and P20:50:25
with a scale factor A, well what does that have to be? That's got to be picture. It's got to be a thing that you hand it a rectangle, and it draws something in that rectangle.0:50:34
So if hand Beside this rectangle-- let's hand it a rectangle. Well what's it going to do? it's going to take this rectangle and split it into0:50:45
two at a ratio of A and one minus A. And it will say, oh sure, now I've got two rectangles.0:51:02
And now it goes off to P1 and says P1, well draw yourself in this rectangle, and goes off to P2, and says, P2, fine, draw yourself in this rectangle.0:51:13
The only computation it has to do is figure out what these rectangles are. Remember a rectangle is specified by an origin and a horizontal vector and a vertical vector, so it's got0:51:24
to figure out what these things are. So for this first rectangle, the origin turns out to be the origin of the original rectangle, and the vertical0:51:34
vector is the same as the vertical vector of the original rectangle. The horizontal vector is the horizontal vector of the0:51:43
original rectangle scaled by A. And that's the first rectangle. The second rectangle, the origin is the original origin0:51:55
plus that horizontal vector scaled by A. The horizontal vector of the second rectangle is the rest of the horizontal0:52:05
vector of the first one, which is 1 minus A times the original H, and the vertical vector is still v. But0:52:15
basically it goes and constructs these two rectangles, and the important point is having constructed the rectangles, it says OK, p1, you draw yourself in there, and p2, you draw yourself in there, and that's0:52:25
all Beside has to do. All right, let's look at that piece of code.0:52:34
Beside of a picture and another picture with some0:52:45
scaling ratio is first of all, since it's a picture, a procedure that's going to take a rectangle as argument.0:52:55
What's it going to do? It says, p1 draw yourself in some rectangle and p2 draw yourself in some other rectangle. And now what are those rectangles?0:53:04
Well here's the computation. It makes a rectangle, and this is the algebra I just did on the board: the origin, something; the horizontal vector, something; and the vertical vector, something.0:53:13
And for p2, the rectangle it wants has some other origin and horizontal vector and vertical vector. But the important point is that all it's saying is, p1,0:53:23
go do your thing in one rectangle, and p2, go do your thing in another rectangle. That's all the Beside has to do. OK, similarly Rotate--0:53:37
see if I have this picture A, and I want to look at say rotating A by 90 degrees, what that should mean is, well take0:53:51
this rectangle, which is origin and horizontal vector and vertical vector, and now pretend that it's really the0:54:01
rectangle that looks like this, which has an origin and a horizontal vector up here, and a vertical vector there, and now draw yourself with respect to that rectangle.0:54:13
Let me show you that as a procedure. All right, so we'll Rotate 90 of the picture, because again, a procedure for rectangle, which says, OK picture, draw0:54:24
yourself in some rectangle; and then this algebra is the transformation on the rectangle. It's the one which makes it look like the rectangle is0:54:33
sideways, the origin is someplace else and the vertical vector is someplace else, and the horizontal vector is someplace else, and vertical vector is someplace else. OK?0:54:43
OK. OK, again notice, the crucial thing that's going on here is you're using the representation of pictures as0:54:57
procedures to automatically get the closure property, because what happens is, Beside just has this thing p1. Beside doesn't care if that's a primitive picture or it's0:55:08
line segments or if p1 is, itself, the result of doing Aboves or Besides or Rotates. All Beside has to know about, say, p1 is that if you hand p10:55:17
a rectangle, it will cause something to be drawn. And above that level, Beside just doesn't-- it's none of its business how p1 accomplishes that drawing.0:55:27
All right, so you're using the procedural representation to ensure this closure. OK. So implementing pictures as procedures makes these means0:55:40
of combination, you know, both pretty simple and also, I think, elegant. But that's not the real punchline.0:55:49
The real punchline comes when you look at the means of abstraction in this language. Because what have we done? We've implemented the means of combination themselves as0:56:02
procedures. And what that means is that when we go to abstract in this language, everything that List supplies us for manipulating0:56:14
procedures is automatically available to do things in this picture language. The technical term I want to say is not only is this0:56:25
language implemented in List, obviously it is, but the language is nicely embedded in List. What I mean is by0:56:39
embedding the language in this way, all the power of List is automatically available as an extension to whatever you want to do.0:56:49
And what do I mean by that? Example: say, suppose I want to make a thing that takes four pictures A, B, C and D, and makes a configuration that0:57:06
looks like this. Well you might call that, you know, four pictures or something, four-pict configuration.0:57:17
How do I do that? Well I can obviously do that. I just write a procedure that takes B above D and A above C0:57:26
and puts those things beside each other. So I automatically have List's ability to do procedure composition. And I didn't have to make that specifically in the picture language.0:57:35
It's automatic from the fact that the means of combination are themselves procedures. Or suppose I wanted to do something a little bit more complicated.0:57:44
I wanted to put in a parameter so that for each of these, I could independently specify a rotation by 90 degrees. That's just putting a parameter in the procedure.0:57:53
It's automatically there. Right, it automatically comes from the embedding. Or even more, suppose I wanted to, you know, use recursion.0:58:04
Let's look at a recursive means of combination on pictures. I could say define-- let's see if you can figure out what this one is-- suppose0:58:14
I say define what it means to right-push a picture, right-push a picture and some integer N and some scale0:58:28
factor A. I'll define this to say if N equals 0, then the0:58:40
answer is the picture. Otherwise I'm going to put--0:58:49
oops, name change: P. Otherwise, I'm going to take P0:58:59
and put it beside the results of recursively right-pushing P0:59:09
with N minus 1 and A and use a scale factor of A. OK, so if0:59:25
N0 , it's P. Otherwise I put P with a scale factor of A-- I'm sorry I didn't align this right-- recursively beside the result of right-pushing P, N minus 10:59:37
times with a scale factor of A. There's a recursive means of combination. What's that look like? Well, here's what it looks like.0:59:46
There's George right-pushed against himself twice with a scale factor of 0.75.0:59:59
OK. Where'd that come from? How did I get all this fancy recursion? And the answer is just automatic, absolutely automatic. Since these are procedures, the embedding says, well sure,1:00:08
I can define recursive procedures. I didn't have to arrange that. And of course, we can do more complicated things of the same sort. I could make something that does an up-push.1:00:18
Right, that sort of goes like this, by recursively putting something above. Or I could make something that, sort of, was this scheme. I might start out with a picture and then, sort of,1:00:33
recursively both push it aside and above, and that might put something there. And then up here I put the same recursive thing, and I1:00:42
might end up with something like this. Right, so there's a procedure that's a little bit more complicated than right-push but not much.1:00:53
I just do an Above and a Beside, rather than just a Beside. Now if I take that and apply that with the idea of putting1:01:05
four pictures together, which I can surely do; and I go and I apply that to Q, which we defined before, right, what I1:01:16
end up with this is this thing, which is, sort of, the square limit of Q, done twice.1:01:27
Right, and then we can compare that with Escher's "Square Limit." And you see, it's sort of the same idea. Escher's is, of course, much, much prettier.1:01:37
If we go back and look at George, right, if we go look at George here-- see, I started with a fairly arbitrary design, this picture1:01:47
of George and did things with it. Right, whereas if we go look at the Escher picture, right, the Escher picture is not an arbitrary design.1:01:56
It's this very, very clever thing, so that when you take this fish body and Rotate it and shrink it down, it bleeds into the next one really nicely.1:02:07
And of course with George, I didn't really do anything like that. So if we look at George, right, there's a little bit of1:02:16
match up, but not very nice, and it's pretty arbitrary. One very nice project, by the way, would be to write a procedure that could take some basic figure like this George1:02:27
thing and start moving the ends of the lines around, so you got a really nice one when you went and did that "Square Limit" process. That'd be a really nice thing to think about.1:02:38
Well so, we can combine things. We can recursive procedures. We can do all kinds of things, and that's all automatic. Right, the important point, the difference between merely1:02:47
implementing something in a language and embedding something in the language, so that you don't lose the original power of the language, and what List is great at, see List is a lousy language for doing any1:02:56
particular problem. What it's good for is figuring out the right language that you want and embedding that in List. That's the real power of this approach to design.1:03:05
Of course, we can go further. See, you saw the other thing that we can do in List is capture general methods of doing things as higher order1:03:16
procedures. And you probably just from me drawing it got the idea that right-push and the analogous thing where you push something1:03:25
up and up and up and up and this corner push thing are all generalizations of a common kind of idea.1:03:34
So just to illustrate and give you practice in looking at a fairly convoluted use of higher order procedures, let me show you the general idea of pushing some means of1:03:45
combination to recursively repeat it. So here's a good one to puzzle out. We'll define it what it means to push using a means of1:03:59
combination. Comb is going to be something like the Beside or Above. Well what's that going to be. That's going to be a procedure, remember what1:04:10
Beside actually was, right. It took a picture, took two pictures and a scale factor. Using that I produced something that took a level1:04:21
number and a picture and a scale factor, that I called right-push. So this is going to be something that takes a picture, a level number and a scale factor, and1:04:32
it's going to say-- I'm going to do some repeated operation. I'm going to repeatedly apply the procedure which takes a1:04:46
picture and applies the means of combination to the picture and the original picture and the one I took in here and the1:04:58
scale factor, and I do the thing which repeats this procedure N times, and I apply that whole thing to my1:05:15
original picture. Repeated here, in case you haven't seen it, is another higher order procedure that takes a procedure and a number1:05:29
and returns for you another procedure that applies this procedure N times. And I think some of you have already written repeated as an1:05:38
exercise, but if you haven't, it's a very good exercise in thinking about higher order procedures. But in any case, the result of this repeated is what I apply to picture.1:05:49
And having done that, that's going to capture-- that is the thing, the way I got from the idea of Beside to the idea of right-push So having done that, I could say1:06:00
define right-push to be push of Beside.1:06:17
Or if I say, define up-push to be push of Beside, I'd get the analogous thing or define corner-push to be push of some appropriate thing that did both the Beside and Above, or I could push anything.1:06:28
Anyway this is, if you're having trouble with lambdas, this is an excellent exercise in figuring out what this means. OK, well there's a lot to learn from this example.1:06:42
The main point I've been dwelling on is the notion of nicely embedding a language inside another language. Right, so that all the power of this language like List of1:06:54
the surrounding language is still accessible to you and appears as a natural extension of the language that you built. That's one thing that this example shows very well.1:07:06
OK. Another thing is, if you go back and think about that, what's procedures and what's data. You know, by the time we get up to here, my God,1:07:15
what's going on. I mean, this is some procedure, and it takes a picture and an argument, and what's a picture. Well, a picture itself, as you remember, was a procedure, and that took a rectangle. And a rectangle is some abstraction.1:07:26
And I hope now that by now you're completely lost as to the question of what in the system is procedure and what's data. You see, there isn't any difference.1:07:35
There really isn't. And you might think of a picture sometimes as a procedure and sometimes as data, but that's just, sort of, you know, making you feel comfortable.1:07:44
It's really both in some sense or neither in some sense. OK, there's a more general point about the structure of1:07:56
the system as creating a language, viewing the engineering design process as one of creating language or1:08:08
rather one of creating a sort of sequence of layers of language. You see, there's this methodology, or maybe I should1:08:18
say mythology, that's, sort of, charitably called software, quote, engineering. All right, and what does it say, it's says well, you go1:08:27
and you figure out your task, and you figure out exactly what you want to do. And once you figure out exactly what you want to do, you find out that it breaks out into three sub-tasks, and you go and you start working on-- and you work on this1:08:36
sub-task, and you figure out exactly what that is. And you find out that that breaks down into three sub-tasks, and you specify them completely, and you go and you work on those two, and you work on this sub-one, and1:08:45
you specify that exactly. And then finally when you're done, you come back way up here, and you work on your second sub-task, and specify that out and work it out. And then you end up with--1:08:55
you end up at the end with this beautiful edifice. Right, you end up with a marvelous tree, where you've broken your task into sub-tasks and broken each of1:09:05
these into sub-tasks and broken those into sub-tasks, right. And each of these nodes is exactly and precisely defined1:09:15
to do the wonderful, beautiful task to make it fit into the whole edifice, right. That's this mythology. See only a computer scientist could possibly believe that you build a complex system like that, right.1:09:28
Contrast that with this Henderson example. It didn't work like that. What happened was that there was a sequence1:09:37
of layers of language. What happened? There was a layer of a thing that allowed us to build1:09:47
primitive pictures. There's primitive pictures and that was a language.1:09:56
I didn't say much about it. We talked about how to construct George, but that was a language where you talked about vectors and line segments and points and where they sat in the unit square.1:10:06
And then on top of that, right, on top of that-- so this is the language of primitive pictures.1:10:17
Right, talking about line segments in particular pictures in the unit square. On top of that was a whole language. There was a language of geometric combinators, a1:10:33
language of geometric positions, which talks about things like Above and Beside and right-push and Rotate.1:10:48
And those things, sort of, happened with reference to the things that are talked about in this language.1:10:58
And then if we like, we saw that above that there was sort of a language of schemes of combination.1:11:21
For example, push, which talked about repeatedly doing something over with a scale factor. And the things that were being discussed in that language1:11:31
were, sort of, the things that happened down here. So what you have is, at each level, the objects that are1:11:41
being talked about are the things that were erected at the previous level. What's the difference between this thing and this thing?1:11:53
The answer is that over here in the tree, each node, and in fact, each decomposition down here, is being designed to do1:12:03
a specific task, whereas in the other scheme, what you have is a full range of linguistic1:12:13
power at each level. See what's happening there, at any level, it's not being set up to do a particular task.1:12:23
It's being set up to talk about a whole range of things. The consequence of that for design is that something that's designed in that method is likely to be more robust,1:12:36
where by robust, I mean that if you go and make some change in your description, it's more likely to be captured by a1:12:46
corresponding change, in the way that the language is implemented at the next level up, right, because you've made1:12:55
these levels full. So you're not talking about a particular thing like Beside. You've given yourself a whole vocabulary to express things of that sort, so if you go and change your specifications a1:13:06
little bit, it's more likely that your methodology will able to adapt to capture that change, whereas a design like this is not going to be robust, because if I go and1:13:15
change something that's in here, that might affect the entire way that I decomposed everything down, further down the tree. Right, so very big difference in outlook in decomposition,1:13:26
levels of language rather than, sort of, a strict hierarchy. Not only that, but when you have levels of language you've given yourself a different vocabularies for talking about1:13:37
the design at different levels. So if we go back and look at George one last time, if I wanted to change this picture George, see suddenly I have a1:13:46
whole different ways of describing the change. Like for example, I may want to go to the basic primitive design and move the endpoint of some vector.1:13:57
That's a change that I would discuss at the lowest level. I would say the endpoint is somewhere else. Or I might come up and say, well the next thing I wanted to do, this little replicated element, I might want to do by1:14:10
something else. I might want to put a scale factor in that Beside. That's a change that I would discuss at the next level of design, the level of combinators.1:14:19
Or I might want to say, I might want to change the basic way that I took this pattern and made some recursive decomposition, maybe not bleeding out toward the1:14:29
corners or something else. That would be a change that I would discuss at the highest level. And because I've structured the system to be this way, I have all these vocabularies for talking about change in1:14:39
different ways and a lot of flexibility to decide which one's appropriate. OK, well that's sort of a big point about the difference in1:14:48
software methodology that comes out from List, and it all comes, again, out of the notion that really, the design process is not so much implementing programs as1:14:58
implementing languages. And that's really the powerful of List. OK, thank you. Let's take a break.0:00:00
Lecture 3B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING]0:00:19
PROFESSOR: Well, Hal just told us how you build robust systems. The key idea was-- I'm sure that many of you don't really assimilate that0:00:30
yet-- but the key idea is that in order to make a system that's robust, it has to be insensitive to small changes, that is, a small change in the problem should lead to only a0:00:39
small change in the solution. There ought to be a continuity. The space of solutions ought to be continuous in this space of problems. The way he was explaining how to do that was instead of0:00:50
solving a particular problem at every level of decomposition of the problem at the subproblems, where you solve the class of problems, which are a neighborhood of the particular problem that you're trying to solve.0:01:01
The way you do that is by producing a language at that level of detail in which the solutions to that class of problems is representable in that language.0:01:11
Therefore when you makes more changes to the problem you're trying to solve, you generally have to make only small local changes to the solution you've constructed, because at the0:01:20
level of detail you're working, there's a language where you can express the various solutions to alternate problems of the same type.0:01:30
Well that's the beginning of a very important idea, the most important perhaps idea that makes computer science more powerful than most of the other kinds of engineering0:01:40
disciplines we know about. What we've seen so far is sort of how to use embedding of languages.0:01:49
And, of course, the power of embedding languages partly comes from procedures like this one that I showed you yesterday. What you see here is the derivative program that we0:02:01
described yesterday. It's a procedure that takes a procedure as an argument and returns a procedure as a value. And using such things is very nice.0:02:12
You can make things like push combinators and all that sort of wonderful thing that you saw last time. However, now I'm going to really muddy the waters.0:02:21
See this confuses the issue of what's the procedure and what is data, but not very badly. What we really want to do is confuse it very badly.0:02:31
And the best way to do that is to get involved with the manipulation of the algebraic expressions that the procedures themselves are expressed in. So at this point, I want to talk about instead of things0:02:43
like on this slide, the derivative procedure being a thing that manipulates a procedure-- this is a numerical method you see here. And what you're seeing is a representation of the0:02:56
numerical approximation to the derivative. That's what's here. In fact what I'd like to talk about is instead things that look like this.0:03:06
And what we have here are rules from a calculus book. These are rules for finding the derivatives of the0:03:15
expressions that one might write in some algebraic language. It says things like a derivative of a constant is 0.0:03:24
The derivative of the valuable with respect to which you are taking the derivative is 1. The derivative of a constant times the function is the constant times the derivative of the function,0:03:34
and things like that. These are exact expressions. These are not numerical approximations.0:03:43
Can we make programs? And, in fact, it's very easy to make programs that manipulate these expressions.0:03:56
Well let's see. Let's look at these rules in some detail. You all have seen these rules in your elementary calculus class at one time or another.0:04:06
And you know from calculus that it's easy to produce derivatives of arbitrary expressions. You also know from your elementary calculus that it's hard to produce integrals.0:04:17
Yet integrals and derivatives are opposites of each other. They're inverse operations. And they have the same rules. What is special about these rules that makes it possible0:04:29
for one to produce derivatives easily and integrals why it's so hard? Let's think about that very simply. Look at these rules.0:04:39
Every one of these rules, when used in the direction for taking derivatives, which is in the direction of this arrow, the left side is matched against your0:04:48
expression, and the right side is the thing which is the derivative of that expression. The arrow is going that way.0:04:58
In each of these rules, the expressions on the right-hand side of the rule that are contained within derivatives are subexpressions, are proper subexpressions, of the0:05:08
expression on the left-hand side. So here we see the derivative of the sum, with is the expression on the left-hand side is the sum of the0:05:17
derivatives of the pieces. So the rule of moving to the right are reduction rules. The problem becomes easier.0:05:28
I turn a big complicated problem it's lots of smaller problems and then combine the results, a perfect place for recursion to work. If I'm going in the other direction like this, if I'm0:05:42
trying to produce integrals, well there are several problems you see here. First of all, if I try to integrate an expression like a sum, more than one rule matches. Here's one that matches.0:05:52
Here's one that matches. I don't know which one to take. And they may be different. I may get to explore different things. Also, the expressions become larger in that direction.0:06:04
And when the expressions become larger, then there's no guarantee that any particular path I choose will terminate, because we will only terminate by accidental cancellation.0:06:14
So that's why integrals are complicated searches and hard to do. Right now I don't want to do anything as hard as that. Let's work on derivatives for a while.0:06:24
Well, these roles are ones you know for the most part hopefully. So let's see if we can write a program which is these rules. And that should be very easy.0:06:34
Just write the program. See, because while I showed you is that it's a reduction rule, it's something appropriate for a recursion.0:06:43
And, of course, what we have for each of these rules is we have a case in some case analysis. So I'm just going to write this program down.0:06:53
Now, of course, I'm going to be saying something you have to believe. Right? What you have to believe is I can represent these algebraic expressions, that I can grab their parts, that I can put0:07:03
them together. We've invented list structures so that you can do that. But you don't want to worry about that now. Right now I'm going to write the program that encapsulates these rules independent of the representation of the0:07:14
algebraic expressions. You have a derivative of an expression with0:07:27
respect to a variable. This is a different thing than the derivative of the function. That's what we saw last time, that numerical approximation.0:07:39
It's something you can't open up a function. It's just the answers. The derivative of an expression is the way it's written. And therefore it's a syntactic phenomenon.0:07:48
And so a lot of what we're going to be doing today is worrying about syntax, syntax of expressions and things like that. Well, there's a case analysis.0:07:57
Anytime we do anything complicated thereby a recursion, we presumably need a case analysis. It's the essential way to begin. And that's usually a conditional0:08:06
of some large kind. Well, what are their possibilities? the first rule that you saw is this something a constant?0:08:16
And what I'm asking is, is the expression a constant with respect to the variable given? If so, the result is 0, because the derivative0:08:28
represents the rate of change of something. If, however, the expression that I'm taking the derivative0:08:38
of is the variable I'm varying, then this is the same variable, the expression var, then the rate of change of the0:08:52
expression with respect to the variable is 1. It's the same 1. Well now there are a couple of other possibilities.0:09:01
It could, for example, be a sum. Well, I don't know how I'm going to express sums yet. Actually I do. But I haven't told you yet.0:09:10
But is it a sum? I'm imagining that there's some way of telling. I'm doing a dispatch on the type of the expression here,0:09:20
absolutely essential in building languages. Languages are made out of different expressions. And soon we're going to see that in our more powerful methods of building languages on languages.0:09:32
Is an expression a sum? If it's a sum, well, we know the rule for derivative of the sum is the sum of the derivatives of the parts.0:09:42
One of them is called the addend and the other is the augend. But I don't have enough space on the blackboard to such long names. So I'll call them A1 and A2. I want to make a sum.0:09:53
Do you remember which is the sum for end or the menu end? Or was it the dividend and the divisor or something like that? Make sum of the derivative of the A1, I'll call it.0:10:08
It's the addend of the expression with respect to the variable, and the derivative of the A2 of the expression,0:10:23
because the two arguments, the addition with respect to the variable.0:10:32
And another rule that we know is product rule, which is, if the expression is a product.0:10:43
By the way, it's a good idea when you're defining things, when you're defining predicates, to give them a name that ends in a question mark. This question mark doesn't mean anything.0:10:53
It's for us as an agreement. It's a conventional interface between humans so you can read my programs more easily. So I want you to, when you write programs, if you define0:11:02
a predicate procedure, that's something that rings true of false, it should have a name which ends in question mark. The list doesn't care. I care.0:11:11
I want to make a sum. Because the derivative of a product is the sum of the first times the derivative of the second plus the second times the derivative of the first. Make a sum of two0:11:26
things, a product of, well, I'm going to say the M1 of the0:11:37
expression, and the derivative of the M2 of the expression0:11:47
with respect to the variable, and the product of the0:12:01
derivative of M1, the multiplier of the expression,0:12:10
with respect to the variable. It's the product of that and the multiplicand, M2, of the expression.0:12:21
Make that product. Make the sum. Close that case. And, of course, I could add as many cases as I like here for a complete set of rules you might find in a calculus book.0:12:34
So this is what it takes to encapsulate those rules. And you see, you have to realize there's a lot of0:12:43
wishful thinking here. I haven't told you anything about how I'm going to make these representations. Now, once I've decided that this is my set of rules, I0:12:52
think it's time to play with the representation. Let's attack that/ Well, first of all, I'm going to play a pun.0:13:01
It's an important pun. It's a key to a sort of powerful idea. If I want to represent sums, and products, and differences,0:13:12
and quotients, and things like that, why not use the same language as I'm writing my program in? I write my program in algebraic expressions that0:13:23
look like the sum of the product on a and the product of x and x, and things like that.0:13:34
And the product of b and x and c, whatever, make that a sum of the product. Right now I don't want to have procedures with unknown0:13:43
numbers of arguments, a product of b and x and c. This is list structure.0:13:54
And the reason why this is nice, is because any one of these objects has a property. I know where the car is. The car is the operator.0:14:04
And the operands are the successive cdrs the successive cars of the cdrs of the list that this is. It makes it very convenient.0:14:14
I have to parse it. It's been done for me. I'm using the embedding and Lisp to advantage. So, for example, let's start using list structure to write0:14:29
down the representation that I'm implicitly assuming here. Well I have to define various things that are implied in this representation.0:14:38
Like I have to find out how to do a constant, how you do same variable. Let's do those first. That's pretty easy enough. Now I'm going to be introducing lots of primitives0:14:47
here, because these are the primitives that come with list structure. OK, you define a constant.0:15:02
And what I mean by a constant, an expression that's constant with respect to a veritable, is that the expression is something simple.0:15:11
I can't take it into pieces, and yet it isn't that variable. I can't break it up, and yet it isn't that variable. That does not mean that there may be other expressions that0:15:22
are more complicated that are constants. It's just that I'm going to look at the primitive constants in this way. So what this is, is it says that's it's the and.0:15:34
I can combine predicate expressions which return true or false with and. Something atomic, The expression is atomic, meaning0:15:45
it cannot be broken into parts. It doesn't have a car and a cdr. It's not a list. It adds a special test built into the system.0:15:54
And it's not identically equal to that variable.0:16:06
I'm representing my variable by things that are symbols which cannot be broken into pieces, things like x, and y,0:16:16
things like this. Whereas, of course, something like this can be broken up into pieces. And the same variable of an expression with respect to a0:16:40
variable is, in fact, an atomic expression. I want to have an atomic0:16:50
expression, which is identical.0:17:08
I don't want to look inside this stuff anymore. These are primitive maybe. But it doesn't matter.0:17:18
I'm using things that are given to me with a language. I'm not terribly interest in them Now how do we deal with sums? Ah, something very interesting will happen.0:17:29
A sum is something which is not atomic and begins with the plus symbol. That's what it means. So here, I will define.0:17:45
An question is a sum if and it's not atomic and it's head,0:18:08
it's beginning, its car of the expression is the symbol plus.0:18:19
Now you're about to see something you haven't seen before, this quotation. Why do I have that quotation there?0:18:29
Say your name, AUDIENCE: Susanna. PROFESSOR: Louder. AUDIENCE: Susanna PROFESSOR: Say your name. AUDIENCE: Your name. PROFESSOR: Louder. AUDIENCE: Your name. PROFESSOR: OK.0:18:39
What I'm showing you here is that the words of English are ambiguous. I was saying, say your name.0:18:52
I was also possibly saying say, your name. But that cannot be distinguished in speech.0:19:04
However, we do have a notation in writing, which is quotation for distinguishing these two possible meanings.0:19:14
In particular, over here, in Lisp we have a notation for distinguishing these meetings. If I were to just write a plus here, a plus symbol, I would0:19:24
be asking, is the first element of the expression, is the operator position of the expression, the addition operator?0:19:34
I don't know. I would have to have written the addition operator there, which I can't write. However, this way I'm asking, is this the symbolic object0:19:45
plus, which normally stands for the addition operator? That's what I want. That's the question I want to ask. Now before I go any further, I want to point out the0:19:55
quotation is a very complex concept, and adding it to a language causes a great deal of troubles. Consider the next slide.0:20:06
Here's a deduction which we should all agree with. We have, Alyssa is smart and Alyssa is George's mother.0:20:17
This is an equality, is. From those two, we can deduce that George's mother is smart.0:20:27
Because we can always substitute equals for equals in expressions. Or can we?0:20:36
Here's a case where we have "Chicago" has seven letters. The quotation means that I'm discussing the word Chicago,0:20:45
not what the word represents. Here I have that Chicago is the biggest city in Illinois.0:20:54
As a consequence of this, I would like to deduce that the biggest city in Illinois has seven letters. But that's manifestly false.0:21:05
Wow, it works. OK, so once we have things like that, our language gets much more complicated.0:21:14
Because it's no longer true that things we tend to like to do with languages, like substituting equals for equals and getting right answers, are going to work without being very careful.0:21:24
We can't substitute into what's called referentially opaque contexts, of which a quotation is the prototypical type of referentially opaque context.0:21:33
If you know what that means, you can consult a philosopher. Presumably there is one in the room. In any case, let's continue now, now that we at least have0:21:42
an operational understanding of a 2000-year-old issue that has to do with name, and mention, and all sorts of things like that.0:21:52
I have to define what I mean, how to make a sum of two things, an a1 and a2.0:22:02
And I'm going to do this very simply. It's a list of the symbol plus, and a1, and a2.0:22:13
And I can determine the first element. Define a1 to be cadr. I've just0:22:34
introduced another primitive. This is the car of the cdr of something. You might want to know why car and cdr are names of these0:22:43
primitives, and why they've survived, even though they're much better ideas like left and right. We could have called them things like that. Well, first of all, the names come from the fact that in the0:22:54
great past, when Lisp was invented, I suppose in '58 or something, it was on a 704 or something like that, which had a machine. It was a machine that had an address register and a0:23:04
decrement register. And these were the contents of the address register and the decrement register. So it's an historical accident. Now why have these names survived? It's because Lisp programmers like to talk to each other0:23:14
over the phone. And if you want to have a long sequence of cars and cdrs you might say, cdaddedr, which can be understood. But left of right or right of left is not so clear if you0:23:26
get good at it. So that's why we have these words. All of them up to four deep are defined typically in a Lisp system.0:23:38
A2 to be-- and, of course, you can see that if I looked at one of these expressions like the sum of 3 and 5, what that is is a0:23:54
list containing the symbol plus, and a number 3,0:24:06
and a number 5. Then the car is the symbol plus.0:24:16
The car of the cdr. Well I take the cdr and then I take the car. And that's how I get to the 3. That's the first argument. And the car of the cdr of the cdr gets me to this one, the 5.0:24:28
And similarly, of course, I can define what's going on with products. Let's do that very quickly.0:24:48
Is the expression a product? Yes if and if it's true, that's it's not atomic and0:25:01
it's EQ quote, the asterisk symbol, which is the operator0:25:13
for multiplication. Make product of an M1 and an M2 to be list, quote, the0:25:35
asterisk operation and M1 and M2. and I define M1 to be cadr and M2 to be caddr. You get to be0:26:00
a good Lisp programmer because you start talking that way. I cdr down lists and console them up and so on. Now, now that we have essentially a complete program0:26:09
for finding derivatives, you can add more rules if you like. What kind of behavior do we get out of it? I'll have to clear that x. Well, supposing I define foo here to be the sum of the0:26:28
product of ax square and bx plus c. That's the same thing we see here as the algebraic expression written in the more conventional notation over there.0:26:37
Well, the derivative of foo with respect to x, which we can see over here, is this horrible, horrendous mess.0:26:46
I would like it to be 2ax plus b. But it's not. It's equivalent to it. What is it?0:26:56
I have here, what do I have? I have the derivative of the product of x and x. Over here is, of course, the sum of x times0:27:09
1 and 1 times x. Now, well, it's the first times the derivative of the second plus the second times the derivative of the first. It's right. That's 2x of course.0:27:20
a times 2x is 2ax plus 0X square doesn't count plus B over here plus a bunch of 0's.0:27:29
Well the answer is right. But I give people take off points on an exam for that, sadly enough. Let's worry about that in the next segment. Are there any questions?0:27:42
Yes? AUDIENCE: If you had left the quote when you put the plus, then would that be referring to the procedure plus and0:27:51
could you do a comparison between that procedure and some other procedure if you wanted to? PROFESSOR: Yes. Good question. If I had left this quotation off at this point, if I had0:28:05
left that quotation off at that point, then I would be referring here to the procedure which is the thing that plus is defined to be.0:28:15
And indeed, I could compare some procedures with each other for identity.0:28:25
Now what that means is not clear right now. I don't like to think about it. Because I don't know exactly what it would need to compare procedures. There are reasons why that may make no sense at all.0:28:35
However, the symbols, we understand. And so that's why I put that quote in. I want to talk about the symbol that's apparent on the page.0:28:46
Any other questions? OK. Thank you. Let's take a break. [MUSIC PLAYING]0:29:30
PROFESSOR: Well, let's see. We've just developed a fairly plausible program for computing the derivatives of algebraic expressions. It's an incomplete program, if you would0:29:40
like to add more rules. And perhaps you might extend it to deal with uses of addition with any number of arguments and multiplication with any of the number of arguments.0:29:49
And that's all rather easy. However, there was a little fly in that ointment. We go back to this slide.0:30:02
We see that the expressions that we get are rather bad. This is a rather bad expression.0:30:11
How do we get such an expression? Why do we have that expression? Let's look at this expression in some detail. Let's find out where all the pieces come from.0:30:21
As we see here, we have a sum-- just what I showed you at the end of the last time-- of X times 1 plus 1 time X. That is a0:30:30
derivative of this product. The produce of a times that, where a does not depend upon x, and therefore is constant with respect to x, is this0:30:40
sum, which goes from here all the way through here and through here. Because it is the first thing times the derivative of the second plus the derivative of the first times the second as0:30:54
the program we wrote on the blackboard indicated we should do. And, of course, the product of bx over here manifests itself0:31:06
as B times 1 plus 0 times X because we see that B does not0:31:15
depend upon X. And so the derivative of B is this 0, and the derivative of X with respect itself is the 1. And, of course, the derivative of the sums over here turn0:31:26
into these two sums of the derivatives of the parts. So what we're seeing here is exactly the thing I was trying to tell you about with Fibonacci numbers a while ago,0:31:37
that the form of the process is expanded from the local rules that you see in the procedure, that the procedure0:31:48
represents a set of local rules for the expansion of this process. And here, the process left behind some stuff, which is0:31:59
the answer. And it was constructed by the walk it takes of the tree structure, which is the expression.0:32:08
So every part in the answer we see here derives from some part of the problem. Now, we can look at, for example, the derivative of0:32:17
foo, which is ax square plus bx plus c, with respect to other things, like here, for example, we can see that the derivative of foo with respect to a.0:32:27
And it's very similar. It's, in fact, the identical algebraic expression, except for the fact that theses 0's and 1's are in different places. Because the only degree of freedom we have in this tree0:32:38
walk is what's constant with respect to the variable we're taking the derivative with respect to and was the same variable.0:32:48
In other words, if we go back to this blackboard and we look, we have no choice what to do when we take the derivative of the sum or a product.0:32:58
The only interesting place here is, is the expression the variable, or is the expression a constant with respect to0:33:07
that variable for very, very small expressions? In which case we get various 1's and 0's, which if we go back to this slide, we can see that the 0's that appear here,0:33:17
for example, this 1 over here in derivative of foo with respect to A, which gets us an X square, because that 1 gets the multiply of X and X into the answer, that 1 is 0.0:33:32
Over here, we're not taking the derivative of foo with respect to c. But the shapes of these expressions are the same. See all those shapes.0:33:42
They're the same. Well is there anything wrong with our rules?0:33:53
No. They're the right rules. We've been through this one before. One of the things you're going to begin to discover is that0:34:02
there aren't too many good ideas. When we were looking at rational numbers yesterday,0:34:12
the problem was that we got 6/8 rather then 3/4. The answer was unsimplified. The problem, of course, is very similar.0:34:21
There are things I'd like to be identical by simplification that don't become identical. And yet the rules for doing addition a multiplication of0:34:30
rational numbers were correct. So the way we might solve this problem is do the thing we did last time, which always works. If something worked last time it ought to work again.0:34:40
It's changed representation. Perhaps in the representation we could put in a simplification step that produces a simplified representation.0:34:50
This may not always work, of course. I'm not trying to say that it always works. But it's one of the pieces of artillery we have in our war0:34:59
against complexity. You see, because we solved our problem very carefully. What we've done, is we've divided the world in several parts. There are derivatives rules and general rules for algebra0:35:12
of some sort at this level of detail. and i have an abstraction barrier.0:35:21
And i have the representation of the algebraic expressions,0:35:32
list structure. And in this barrier, I have the interface procedures.0:35:43
I have constant, and things like same-var.0:35:54
I have things like sum, make-sum. I have A1, A2.0:36:06
I have products and things like that, all the other things I might need for various kinds of algebraic expressions. Making this barrier allows me to arbitrarily change the0:36:18
representation without changing the rules that are written in terms of that representation. So if I can make the problem go away by changing0:36:28
representation, the composition of the problem into these two parts has helped me a great deal. So let's take a very simple case of this.0:36:38
What was one of the problems? Let's go back to this transparency again. And we see here, oh yes, there's horrible things like0:36:48
here is the sum of an expression and 0. Well that's no reason to think of it as anything other than the expression itself.0:36:57
Why should the summation operation have made up this edition? It can be smarter than that. Or here, for example, is a multiplication of0:37:09
something by 1. It's another thing like that. Or here is a product of something with 0, which is certainly 0. So we won't have to make this construction.0:37:21
So why don't we just do that? We need to change the way the representation works, almost here.0:37:37
Make-sum to be. Well, now it's not something so simple. I'm not going to make a list containing the symbol plus and0:37:48
things unless I need to. Well, what are the possibilities?0:37:57
I have some sort of cases here. If I have numbers, if anyone is a number--0:38:09
and here's another primitive I've just introduced, it's possible to tell whether something's number-- and if number A2, meaning they're not symbolic0:38:23
expressions, then why not do the addition now? The result is just a plus of A1 and A2.0:38:32
I'm not asking if these represent numbers. Of course all of these symbols represent numbers. I'm talking about whether the one I've got is the number 3 right now.0:38:43
And, for example, supposing A1 is a number, and it's equal to0:38:59
0, well then the answer is just A2. There is no reason to make anything up.0:39:10
And if A2 is a number, and equal A20, then0:39:27
the result is A1. And only if I can't figure out something better to do with this situation, well, I can start a list. Otherwise I want0:39:41
the representation to be the list containing the quoted symbol plus, and A1, and A2.0:39:58
And, of course, a very similar thing can be done for products. And I think I'll avoid boring you with them. I was going to write it on the blackboard.0:40:07
I don't think it's necessary. You know what to do. It's very simple. But now, let's just see the kind of results we get out of0:40:17
changing our program in this way. Well, here's the derivatives after having just changed the constructors for expressions.0:40:28
The same foo, aX square plus bX plus c, and what I get is nothing more than the derivative of that is 2aX plus0:40:40
B. Well, it's not completely simplified. I would like to collect common terms and sums. Well, that's more work. And, of course, programs to do this sort of thing are huge0:40:51
and complicated. Algebraic simplification, it's a very complicated mess. There's a very famous program you may have heard of called Maxima developed at MIT in the past, which is 5,000 pages of0:41:02
Lisp code, mostly the algebraic simplification operations. There we see the derivative of foo.0:41:12
In fact, X is at something I wouldn't take off more than 1 point for on an elementary calculus class. And the derivative of foo with respect to a, well it's gone down to X times X, which isn't so bad.0:41:24
And the derivative of foo with respect to b is just X itself. And the derivative of foo with respect to c comes out 1. So I'm pretty pleased with this.0:41:34
What you've seen is, of course, a little bit contrived, carefully organized example to show you how we can manipulate algebraic expressions, how we do that0:41:43
abstractly in terms of abstract syntax rather than concrete syntax and how we can use the abstraction to control0:41:53
what goes on in building these expressions. But the real story isn't just such a simple thing as that. The real story is, in fact, that I'm manipulating these0:42:03
expressions. And the expressions are the same expressions-- going back to the slide-- as the ones that are Lisp expressions.0:42:12
There's a pun here. I've chosen my representation to be the same as the representation in my language of similar things.0:42:22
By doing so, I've invoked a necessity. I created the necessity to have things like quotation because of the fact that my language is capable of writing0:42:35
expressions that talk about expressions of the language. I need to have something that says, this is an expression I'm talking about rather than this expression is talking0:42:45
about something, and I want to talk about that. So quotation stops and says, I'm talking about this0:42:54
expression itself. Now, given that power, if I can manipulate expressions of0:43:03
the language, I can begin to build even much more powerful layers upon layers of languages. Because I can write languages that not only are embedded in0:43:12
Lisp or whatever language you start with, but languages that are completely different, that are just, if we say, interpreted in Lisp or something like that.0:43:23
We'll get to understand those words more in the future. But right now I just want to leave you with the fact that we've hit a line which gives us tremendous power.0:43:36
And this point we've bought a sledgehammer. We have to be careful to what flies when we apply it. Thank you. [MUSIC PLAYING]0:00:00
Lecture 4A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:24
PROFESSOR: Well, yesterday we learned a bit about symbolic manipulation, and we wrote a rather stylized program to0:00:35
implement a pile of calculus rule from the calculus book. Here on the transparencies, we see a bunch of calculus rules0:00:47
from such a book. And, of course, what we did is sort of translate these rules into the language of the computer.0:00:56
But, of course, that's a sort of funny strategy. Why should we have to translate these rules into the language of the computer? And what do I really mean by that?0:01:07
These are--the program we wrote yesterday was very stylized. It was a conditional, a dispatch on the type of the expression as observed by the rules.0:01:19
What we see here are rules that say if the object being the derivative is being taken of, if that expression is a constant, then do one thing.0:01:29
If it's a variable, do another thing. If it's a product of a constant times a variable, do something and so on. There's sort of a dispatch there on a type.0:01:41
Well, since it has such a stylized behavior and structure, is there some other way of writing this program that's more clear?0:01:50
Well, what's a rule, first of all? What are these rules? Let's think about that. Rules have parts. If you look at these rules in detail, what you see, for0:02:04
example, is the rule has a left-hand side and a right-hand side. Each of these rules has a left-hand side and the0:02:13
right-hand side. The left-hand side is somehow compared with the expression you're trying to take the derivative of. The right-hand side is the replacement for that0:02:24
expression. So all rules on this page are something like this.0:02:35
I have patterns, and somehow, I have to produce, given a0:02:45
pattern, a skeleton. This is a rule.0:02:55
A pattern is something that matches, and a skeleton is something you substitute into in order to get a new expression.0:03:06
So what that means is that the pattern is matched against the expression, which is the source expression.0:03:23
And the result of the application of the rule is to produce a new expression, which I'll call a target, by0:03:38
instantiation of a skeleton. That's called instantiation.0:03:50
So that is the process by which these rules are described. What I'd like to do today is build a language and a means0:04:02
of interpreting that language, a means of executing that language, where that language allows us to directly express these rules. And what we're going to do is instead of bringing the rules0:04:14
to the level of the computer by writing a program that is those rules in the computer's language-- at the moment, in a Lisp-- we're going to bring the computer to the level of us by0:04:25
writing a way by which the computer can understand rules of this sort. This is slightly emphasizing the idea that we had last time0:04:35
that we're trying to make a solution to a class of problems rather than a particular one. The problem is if I want to write rules for a different0:04:45
piece of mathematics, say, to simple algebraic simplification or something like that, or manipulation of0:04:54
trigonometric functions, I would have to write a different program in using yesterday's method. Whereas I would like to encapsulate all of the things0:05:03
that are common to both of those programs, meaning the idea of matching, instantiation, the control structure, which turns out to be very complicated for such a0:05:12
thing, I'd like to encapsulate that separately from the rules themselves. So let's look at, first of all, a representation.0:05:22
I'd like to use the overhead here. I'd like-- there it is. I'd like to look at a representation of the rules of calculus for derivatives in a sort of simple language that0:05:36
I'm writing right here. Now, I'm going to avoid--I'm going to avoid worrying about syntax. We can easily pretty this, and I'm not interested in making--0:05:48
this is indeed ugly. This doesn't look like the beautiful text set dx by dt or something that I'd like to write, but that's not essential.0:05:58
That's sort of an accidental phenomenon. Here, we're just worrying about the fact that the structure of the rules is that there is a left-hand side0:06:07
here, represents the thing I want to match against the derivative expression. This is the representation I'm going to say for the derivative of a constant, which we will call c with0:06:18
respect to the variable we will call v. And what we will get on the right-hand side is 0. So this represents a rule.0:06:29
The next rule will be the derivative of a variable, which we will call v with respect to the same variable v, and we get a 1.0:06:38
However, if we have the derivative of a variable called u with respect to a different variables v, we will get 0.0:06:47
I just want you look at these rules a little bit and see how they fit together. For example, over here, we're going to have the derivative0:06:56
of the sum of an expression called x1 and an expression called x2. These things that begin with question marks are called pattern variables in the language that we're inventing,0:07:08
and you see we're just making it up, so pattern variables for matching. And so in this-- here we have the derivative of the sum of the expression0:07:19
which we will call x1. And the expression we will call x2 with respect to the variable we call v will be-- here is the right-hand side: the sum of the derivative of that expression x1 with0:07:29
respect to v-- the right-hand side is the skeleton-- and the derivative of x2 with respect to v. Colons here will0:07:38
stand for substitution objects. They're--we'll call them skeleton evaluations.0:07:48
So let me put up here on the blackboard for a second some syntax so we'll know what's going on for this rule language. First of all, we're going to have to worry about the0:07:58
pattern matching. We're going to have things like a symbol like foo matches0:08:11
exactly itself.0:08:23
The expression f of a and b will be used to match any list0:08:35
whose first element is f, whose second element is a, and0:08:51
whose third element is b. Also, another thing we might have in a pattern is that--0:09:03
a question mark with some variable like x. And what that means, it says matches anything, which we0:09:17
will call x. Question mark c x will match only constants.0:09:30
So this is something which matches a constant colon x.0:09:44
And question mark v x will match a variable,0:09:55
which we call x. This is sort of the language we're making up now.0:10:04
If I match two things against each other, then they are compared element by element. But elements in the pattern may contain these syntactic0:10:13
variables, pattern variables, which will be used to match arbitrary objects.0:10:22
And we'll get that object as the value in the name x here, for example.0:10:31
Now, when we make skeletons for instantiation. Well, then we have things like this.0:10:42
foo, a symbol, instantiates to itself.0:10:55
Something which is a list like f of a and b, instantiates to--0:11:06
well, f instantiates to a 3-list, a list of three elements, okay, which are the results of instantiating each0:11:27
of f, a, and b.0:11:36
And x well--we instantiate to the value of x as in the0:11:53
matched pattern.0:12:02
So going back to the overhead here, we see--we see that all of those kinds of objects, we see here a pattern variable0:12:14
which matches a constant, a pattern variable which matches a variable, a pattern variable which will match anything. And if we have two instances of the same name, like this is0:12:25
the derivative of the expression which is a variable only whose name will be v with respect to some arbitrary0:12:34
expression which we will call v, since this v appears twice, we're going to want that to mean they have to be the same. The only consistent match is that those are the same.0:12:45
So here, we're making up a language. And in fact, that's a very nice thing to be doing. It's so much fun to make up a language. And you do this all the time.0:12:54
And the really most powerful design things you ever do are sort of making up a language to solve problems like this. Now, here we go back here and look at some of these rules.0:13:05
Well, there's a whole set of them. I mean, there's one for addition and one for multiplication, just like we had before. The derivative of the product of x1 and x2 with respect to v0:13:16
is the sum of the product of x1 and the derivative x2 with respect to v and the product of the derivative of x1 and x2.0:13:27
And here we have exponentiation. And, of course, we run off the end down here. We get as many as we like. But the whole thing over here, I'm giving this--this list of0:13:36
rules the name "derivative rules." What would we do with such a thing once we have it?0:13:45
Well, one of the nicest ideas, first of all, is I'm going to write for you, and we're going to play with it all day. What I'm going to write for you is a program called0:13:56
simplifier, the general-purpose simplifier. And we're going to say something like define dsimp to0:14:09
be a simplifier of the derivative rules.0:14:23
And what simplifier is going to do is, given a set of rules, it will produce for me a procedure which will simplify expressions containing the things that are0:14:33
referred to by these rules. So here will be a procedure constructed for your purposes0:14:42
to simplify things with derivatives in them such that, after that, if we're typing at some list system, and we get a prompt, and we say dsimp, for example, of the derivative of0:14:58
the sum of x and y with respect to x-- note the quote here because I'm talking about the0:15:08
expression which is the derivative-- then I will get back as a result plus 1 0.0:15:19
Because the derivative of x plus y is the derivative of x plus derivative y. The derivative of x with respect to x is 1. The derivative of y with respect to x is 0.0:15:29
It's not what we're going to get. I haven't put any simplification at that level-- algebraic simplification-- yet. Of course, once we have such a thing, then we can--then we0:15:39
can look at other rules. So, for example, we can, if we go to the slide, OK?0:15:49
Here, for example, are other rules that we might have, algebraic manipulation rules, ones that would be used for simplifying algebraic expressions.0:15:58
For example, just looking at some of these, the left-hand side says any operator applied to a constant e1 and a0:16:08
constant e2 is the result of evaluating that operator on the constants e1 and e2. Or an operator, applied to e1, any expression e1 and a0:16:20
constant e2, is going to move the constant forward. So that'll turn into the operator with e2 followed by e1. Why I did that, I don't know.0:16:30
It wouldn't work if I had division, for example. So there's a bug in the rules, if you like. So the sum of 0 and e is e.0:16:42
The product of 1 and any expression e is e. The product of 0 and any expression e is 0. Just looking at some more of these rules, we could have0:16:51
arbitrarily complicated ones. We could have things like the product of the constant e1 and any constant e2 with e3 is the result of multiplying the0:17:04
result of--multiplying now the constants e1 and e2 together and putting e3 there.0:17:13
So it says combine the constants that I had, which was if I had a product of e1 and e2 and e3 just multiply--I mean and e1 and e2 are both constants, multiply them.0:17:23
And you can make up the rules as you like. There are lots of them here. There are things as complicated, for example, as-- oh, I suppose down here some distributive law, you see.0:17:33
The product of any object c and the sum of d and e gives the result as the same as the sum of the product of c and d0:17:42
and the product of c and e. Now, what exactly these rules are doesn't very much interest me. We're going to be writing the language that will allow us to0:17:51
interpret these rules so that we can, in fact, make up whatever rules we like, another whole language of programming.0:18:03
Well, let's see. I haven't told you how we're going to do this. And, of course, for a while, we're going to work on that. But there's a real question of what is--what am I going to do0:18:13
at all at a large scale? How do these rules work? How is the simplifier program going to manipulate these rules with your expression to produce a reasonable answer?0:18:26
Well, first, I'd like to think about these rules as being some sort of deck of them. So here I have a whole bunch of rules, right?0:18:42
Each rule-- here's a rule-- has a pattern and a skeleton. I'm trying to make up a control structure for this.0:18:53
Now, what I have is a matcher, and I have something which is0:19:02
an instantiater. And I'm going to pass from the matcher to the instantiater0:19:13
some set of meaning for the pattern variables, a dictionary, I'll call it. A dictionary, which will say x was matched against the0:19:26
following subexpression and y was matched against another following subexpression. And from the instantiater, I will be making expressions,0:19:35
and they will go into the matcher. They will be expressions.0:19:44
And the patterns of the rules will be fed into the matcher, and the skeletons from the same rule will be fed into the0:19:53
instantiater. Now, this is a little complicated because when you have something like an algebraic expression, where someth--the rules are intended to be able to allow you to0:20:02
substitute equal for equal. These are equal transformation rules. So all subexpressions of the expression should be looked at.0:20:11
You give it an expression, this thing, and the rules should be cycled around. First of all, for every subexpression of the expression you feed in, all of the rules must be0:20:21
tried and looked at. And if any rule matches, then this process occurs. The dictionary--the dictionary is to have some values in it.0:20:30
The instantiater makes a new expression, which is basically replaces that part of the expression that was matched in your original expression.0:20:40
And then, then, of course, we're going to recheck that, going to go around these rules again, seeing if that could be simplified further.0:20:49
And then, then we're going to do that for every subexpression until the thing no longer changes. You can think of this as sort of an organic process. You've got some sort of stew, right?0:21:00
You've got bacteria or something, or enzymes in some, in some gooey mess. And there's these--and these enzymes change things.0:21:10
They attach to your expression, change it, and then they go away. And they have to match. The key-in-lock phenomenon. They match, they change it, they go away.0:21:19
You can imagine it as a parallel process of some sort. So you stick an expression into this mess, and after a while, you take it out, and it's been simplified.0:21:29
And it just keeps changing until it no longer can be changed. But these enzymes can attach to any part of the, of the expression.0:21:39
OK, at this point, I'd like to stop and ask for questions. Yes. AUDIENCE: This implies that the matching program and the0:21:48
instantiation program are separate programs; is that right? Or is that-- they are. PROFESSOR: They're separate little pieces. They fit together in a larger structure.0:21:57
AUDIENCE: So I'm going through and matching and passing the information about what I matched to an instantiater, which makes the changes. And then I pass that back to the matcher?0:22:06
PROFESSOR: It won't make a change. It will make a new expression, which has, which has substituted the values of the pattern variable that were matched on the left-hand side for the variables that are0:22:17
mentioned, the skeleton variables or evaluation variables or whatever I called them, on the right-hand side. AUDIENCE: And then that's passed back into the matcher?0:22:27
PROFESSOR: Then this is going to go around again. This is going to go through this mess until it no longer changes. AUDIENCE: And it seems that there would be a danger of getting into a recursive loop.0:22:37
PROFESSOR: Yes. Yes, if you do not write your rules nicely, you are-- indeed, in any programming language you invent, if it's sufficiently powerful to do anything, you can write0:22:46
programs that will go into infinite loops. And indeed, writing a program for doing algebraic manipulation for long will produce infinite loops.0:23:00
Go ahead. AUDIENCE: Some language designers feel that this feature is so important that it should become part of the basic language, for example, scheme in this case.0:23:12
What are your thoughts on-- PROFESSOR: Which language feature? AUDIENCE: The pairs matching. It's all application of such rules should be--0:23:21
PROFESSOR: Oh, you mean like Prolog? AUDIENCE: Like Prolog, but it becomes a more general-- PROFESSOR: It's possible. OK, I think my feeling about that is that I would like to0:23:33
teach you how to do it so you don't depend upon some language designer. AUDIENCE: OK. PROFESSOR: You make it yourself. You can roll your own.0:23:44
Thank you.0:24:14
Well, let's see. Now we have to tell you how it works. It conveniently breaks up into various pieces.0:24:24
I'd like to look now at the matcher. The matcher has the following basic structure. It's a box that takes as its input an expression and a0:24:44
pattern, and it turns out a dictionary.0:25:01
A dictionary, remember, is a mapping of pattern variables to the values that were found by matching, and it puts out another dictionary, which is the result of augmenting this0:25:20
dictionary by what was found in matching this expression against this pattern. So that's the matcher.0:25:33
Now, this is a rather complicated program, and we can look at it on the overhead over here and see, ha, ha,0:25:42
it's very complicated. I just want you to look at the shape of it. It's too complicated to look at except in pieces.0:25:51
However, it's a fairly large, complicated program with a lot of sort of indented structure.0:26:00
At the largest scale-- you don't try to read those characters, but at the largest scale, you see that there is a case analysis, which is all0:26:09
these cases lined up. What we're now going to do is look at this in a bit more detail, attempting to understand how it works.0:26:19
Let's go now to the first slide, showing some of the structure of the matcher at a large scale.0:26:28
And we see that the matcher, the matcher takes as its input a pattern, an expression, and a dictionary.0:26:38
And there is a case analysis here, which is made out of several cases, some of which have been left out over here, and the general case, which I'd like you to see.0:26:50
Let's consider this general case. It's a very important pattern. The problem is that we have to examine two trees0:27:00
simultaneously. One of the trees is the tree of the expression, and the other is the tree of the pattern. We have to compare them with each other so that the0:27:12
subexpressions of the expression are matched against subexpressions of the pattern. Looking at that in a bit more detail, suppose I had a0:27:21
pattern, a pattern, which was the sum of the product of a thing which we will call x and a thing which we will call y,0:27:38
and the sum of that, and the same thing we call y. So we're looking for a sum of a product whose second--whose0:27:49
second argument is the same as the second argument of the sum. That's a thing you might be looking for.0:27:59
Well, that, as a pattern, looks like this. There is a tree, which consists of a sum, and a0:28:09
product with a pattern variable question mark x and question mark y, the other pattern variable, and question0:28:21
mark y, just looking at the same, just writing down the list structure in a different way. Now, suppose we were matching that against an expression0:28:31
which matches it, the sum of, say, the product of 3 and x and, say, x.0:28:42
That's another tree. It's the sum of the product of 3 and x and of x.0:28:59
So what I want to do is traverse these two trees simultaneously. And what I'd like to do is walk them like this.0:29:08
I'm going to say are these the same? This is a complicated object. Let's look at the left branches.0:29:17
Well, that could be the car. How does that look? Oh yes, the plus looks just fine. But the next thing here is a complicated thing. Let's look at that. Oh yes, that's pretty fine, too.0:29:26
They're both asterisks. Now, whoops! My pattern variable, it matches against the 3. Remember, x equals 3 now.0:29:36
That's in my dictionary, and the dictionary's going to follow along with me: x equals three. Ah yes, x equals 3 and y equals x, different x.0:29:46
The pattern x is the expression x, the pattern y. Oh yes, the pattern variable y, I've already0:29:56
got a value for it. It's x. Is this an x? Oh yeah, sure it is. That's fine. Yep, done. I now have a dictionary, which I've accumulated0:30:07
by making this walk. Well, now let's look at this general case here and see how that works. Here we have it.0:30:17
I take in a pattern variable--a pattern, an expression, and a dictionary. And now I'm going to do a complicated thing here, which0:30:26
is the general case. The expression is made out of two parts: a left and a right half, in general.0:30:35
Anything that's complicated is made out of two pieces in a Lisp system. Well, now what do we have here? I'm going to match the car's of the two expressions against0:30:45
each other with respect to the dictionary I already have, producing a dictionary as its value, which I will then use0:30:55
for matching the cdr's against each other. So that's how the dictionary travels, threads the entire structure. And then the result of that is the dictionary for the match0:31:06
of the car and the cdr, and that's what's going to be returned as a value. Now, at any point, a match might fail.0:31:16
It may be the case, for example, if we go back and look at an expression that doesn't quite match, like supposing this was a 4.0:31:29
Well, now these two don't match any more, because the x that had to be-- sorry, the y that had to be x here and this0:31:38
y has to be 4. But x and 4 were not the same object syntactically. So this wouldn't match, and that would be rejected0:31:47
sometimes, so matches may fail. Now, of course, because this matcher takes the dictionary from the previous match as input, it must be able to0:31:57
propagate the failures. And so that's what the first clause of this conditional does. It's also true that if it turned out that the pattern0:32:07
was not atomic-- see, if the pattern was atomic, I'd go into this stuff, which we haven't looked at yet. But if the pattern is not atomic and the0:32:16
expression is atomic-- it's not made out of pieces-- then that must be a failure, and so we go over here. If the pattern is not atomic and the pattern is not a0:32:26
pattern variable-- I have to remind myself of that-- then we go over here. So that way, failures may occur.0:32:35
OK, so now let's look at the insides of this thing. Well, the first place to look is what happens if I have an atomic pattern? That's very simple. A pattern that's not made out of any pieces: foo.0:32:46
That's a nice atomic pattern. Well, here's what we see. If the pattern is atomic, then if the expression is atomic,0:32:56
then if they are the same thing, then the dictionary I get is the same one as I had before. Nothing's changed. It's just that I matched plus against plus, asterisk against0:33:09
asterisk, x against x. That's all fine. However, if the pattern is not the one which is the expression, if I have two separate atomic objects, then0:33:19
it was matching plus against asterisk, which case I fail. Or if it turns out that the pattern is atomic but the0:33:29
expression is complicated, it's not atomic, then I get a failure. That's very simple.0:33:38
Now, what about the various kinds of pattern variables? We had three kinds. I give them the names.0:33:47
They're arbitrary constants, arbitrary variables, and arbitrary expressions. A question mark x is an arbitrary expression.0:34:01
A question mark cx is an arbitrary constant, and a question mark vx is an arbitrary variable. Well, what do we do here?0:34:10
Looking at this, we see that if I have an arbitrary constant, if the pattern is an arbitrary constant, then it had better be the case that the expression0:34:19
had better be a constant. If the expression is not a constant, then that match fails. If it is a constant, however, then I wish to extend the dictionary. I wish to extend the dictionary with that pattern0:34:32
being remembered to be that expression using the old dictionary as a starting point.0:34:41
So really, for arbitrary variables, I have to check first if the expression is a variable by matching against. If so, it's worth extending the dictionary so that the0:34:50
pattern is remembered to be matched against that expression, given the original dictionary, and this makes a new dictionary. Now, it has to check.0:35:00
There's a sorts of failure inside extend dictionary, which is that-- if one of these pattern variables already has a value0:35:09
and I'm trying to match the thing against something else which is not equivalent to the one that I've already matched it against once, then a failure will come flying out of here, too.0:35:20
And I will see that some time. And finally, an arbitrary expression does not have to check anything syntactic about the expression that's being0:35:29
matched, so all it does is it's an extension of the dictionary. So you've just seen a complete, very simple matcher.0:35:39
Now, one of the things that's rather remarkable about this is people pay an awful lot of money these days for someone to make a, quote, AI expert system that has nothing more0:35:49
in it than a matcher and maybe an instantiater like this. But it's very easy to do, and now, of course, you can start up a little start-up company and make a couple of megabucks0:35:59
in the next week taking some people for a ride. 20 years ago, this was remarkable, this kind of program.0:36:09
But now, this is sort of easy. You can teach it to freshmen. Well, now there's an instantiater as well.0:36:19
The problem is they're all going off and making more money than I do. But that's always been true of universities. As expression, the purpose of the instantiater is to make0:36:33
expressions given a dictionary and a skeleton.0:36:44
And that's not very hard at all. We'll see that very simply in the next, the next slide here.0:36:53
To instantiate a skeleton, given a particular dictionary-- oh, this is easy. We're going to do a recursive tree walk over the skeleton.0:37:04
And for everything which is a skeleton variable-- I don't know, call it a skeleton evaluation. That's the name and the abstract syntax that I give it in this program: a skeleton evaluation, a thing beginning0:37:13
with a colon in the rules. For anything of that case, I'm going to look up the answer in the dictionary, and we'll worry about that in a second.0:37:24
Let's look at this as a whole. Here, I have-- I'm going to instantiate a skeleton, given a dictionary. Well, I'm going to define some internal loop right there, and0:37:38
it's going to do something very simple. Even if a skeleton--even if a skeleton is simple and atomic, in which case it's nothing more than giving the skeleton back as an answer, or in the general case, it's0:37:51
complicated, in which case I'm going to make up the expression which is the result of instantiating-- calling this loop recursively--0:38:01
instantiating the car of the skeleton and the cdr. So here is a recursive tree walk. However, if it turns out to be a skeleton evaluation, a colon0:38:12
expression in the skeleton, then what I'm going to do is find the expression that's in the colon--0:38:21
the CADR in this case. It's a piece of abstract syntax here, so I can change my representation of rules. I'm going to evaluate that relative to this dictionary,0:38:31
whatever evaluation means. We'll find out a lot about that sometime. And the result of that is my answer. so. I start up this loop-- here's my initialization--0:38:42
by calling it with the whole skeleton, and this will just do a recursive decomposition into pieces. Now, one more little bit of detail is what0:38:55
happens inside evaluate? I can't tell you that in great detail. I'll tell you a little bit of it. Later, we're going to see--look into this in much more detail.0:39:04
To evaluate some form, some expression with respect to a dictionary, if the expression is an atomic object, well, I'm0:39:15
going to go look it up. Nothing very exciting there. Otherwise, I'm going to do something complicated here, which is I'm going to apply a procedure which is the result0:39:26
of looking up the operator part in something that we're going to find out about someday. I want you realize you're seeing magic now. This magic will become clear very soon, but not today.0:39:40
Then I'm looking at--looking up all the pieces, all the arguments to that in the dictionary. So I don't want you to look at this in detail.0:39:51
I want you to say that there's more going on here, and we're going to see more about this. But it's-- the magic is going to stop.0:40:02
This part has to do with Lisp, and it's the end of that. OK, so now we know about matching and instantiation.0:40:15
Are there any questions for this segment?0:40:27
AUDIENCE: I have a question. PROFESSOR: Yes. AUDIENCE: Is it possible to bring up a previous slide? It's about this define match pattern.0:40:36
PROFESSOR: Yes. You'd like to see the overall slide define match pattern. Can somebody put up the-- no, the overhead. That's the biggest scale one.0:40:45
What part would you like to see? AUDIENCE: Well, the top would be fine. Any of the parts where you're passing failed.0:40:54
PROFESSOR: Yes. AUDIENCE: The idea is to pass failed back to the dictionary; is that right? PROFESSOR: The dictionary is the answer to a match, right?0:41:05
And it is either some mapping or there's no match. It doesn't match.0:41:14
AUDIENCE: Right. PROFESSOR: So what you're seeing over here is, in fact, because the fact that a match may have another match pass in the dictionary, as you see in the general case down here.0:41:24
Here's the general case where a match passes another match to the dictionary. When I match the cdr's, I match them in the dictionary that is resulting from matching the car's.0:41:36
OK, that's what I have here. So because of that, if the match of the car's fails, then it may be necessary that the match of the cdr's propagates that failure, and that's what the first line is.0:41:48
AUDIENCE: OK, well, I'm still unclear what matches-- what comes out of one instance of the match? PROFESSOR: One of two possibilities. Either the symbol failed, which means there is no match.0:41:59
AUDIENCE: Right. PROFESSOR: Or some mapping, which is an abstract thing right now, and you should know about the structure of it, which relates the pattern variables to their values as0:42:13
picked up in the match. AUDIENCE: OK, so it is-- PROFESSOR: That's constructed by extend dictionary. AUDIENCE: So the recursive nature brings about the fact0:42:22
that if ever a failed gets passed out of any calling of match, then the first condition will pick it up-- PROFESSOR: And just propagate it along without any further0:42:32
ado, right. AUDIENCE: Oh, right. OK. PROFESSOR: That's just the fastest way to get that failure out of there.0:42:43
Yes. AUDIENCE: If I don't fail, that means that I've matched a pattern, and I run the procedure extend dict and then pass in the pattern in the expression.0:42:55
But the substitution will not be made at that point; is that right? I'm just-- PROFESSOR: No, no. There's no substitution being there because there's no skeleton to be substituted in. AUDIENCE: Right. So what-- PROFESSOR: All you've got there is we're making up the0:43:04
dictionary for later substitution. AUDIENCE: And what would the dictionary look like? Is it ordered pairs?0:43:13
PROFESSOR: That's--that's not told to you. We're being abstract. AUDIENCE: OK. PROFESSOR: Why do you want to know? What it is, it's a function. It's a function. AUDIENCE: Well, the reason I want to know is--0:43:22
PROFESSOR: A function abstractly is a set of ordered pairs. It could be implemented as a set of list pairs. It could be implemented as some fancy table mechanism.0:43:32
It could be implemented as a function. And somehow, I'm building up a function. But I'm not telling you. That's up to George, who's going to build that later.0:43:49
I know you really badly want to write concrete things. I'm not going to let you do that. AUDIENCE: Well, let me at least ask, what is the important information there that's being passed to extend dict?0:43:59
I want to pass the pattern I found-- PROFESSOR: Yes. The pattern that's matched against the expression. You want to have the pattern, which happens to be in those cases pattern variables, right?0:44:09
All of those three cases for extend dict are pattern variables. AUDIENCE: Right. PROFESSOR: So you have a pattern variable that is to be given a value in a dictionary.0:44:18
AUDIENCE: Mm-hmm. PROFESSOR: The value is the expression that it matched against. The dictionary is the set of things I've already0:44:27
figured out that I have memorized or learned. And I am going to make a new dictionary, which is extended from the original one by having that pattern variable0:44:36
have a value with the new dictionary. AUDIENCE: I guess what I don't understand is why can't the substitution be made right as soon as you find-- PROFESSOR: How do I know what I'm going to substitute? I don't know anything about this skeleton.0:44:47
This pattern, this matcher is an independent unit. AUDIENCE: Oh, I see. OK. PROFESSOR: Right? AUDIENCE: Yeah. PROFESSOR: I take the matcher. I apply the matcher. If it matches, then it was worth doing instantiation.0:44:57
AUDIENCE: OK, good. Yeah. PROFESSOR: OK? AUDIENCE: Can you just do that answer again using that example on the board? You know, what you just passed back to the matcher.0:45:06
PROFESSOR: Oh yes. OK, yes. You're looking at this example. At this point when I'm traversing this structure, I get to here: x.0:45:16
I have some dictionary, presumably an empty dictionary at this point if this is the whole expression. So I have an empty dictionary, and I've matched x against 3.0:45:26
So now, after this point, the dictionary contains x is 3, OK? Now, I continue walking along here.0:45:35
I see y. Now, this is a particular x, a pattern x. I see y, a pattern y. The dictionary says, oh yes, the pattern y is the symbol x0:45:48
because I've got a match there. So the dictionary now contains at this point two entries. The pattern x is 3, and the pattern y is the expression x.0:46:02
Now, I get that, I can walk along further. I say, oh, pattern y also wants to be 4. But that isn't possible, producing a failure.0:46:14
Thank you. Let's take a break.0:47:02
OK, you're seeing your first very big and hairy program. Now, of course, one of the goals of this subsegment is to get you to be able to read something like this and not be0:47:12
afraid of it. This one's only about four pages of code. By the end of the subject, I hope a 50-page program will not look particularly frightening.0:47:22
But I don't expect-- and I don't want you to think that I expect you to be getting it as it's coming out. You're supposed to feel the flavor of this, OK?0:47:31
And then you're supposed to think about it because it is a big program. There's a lot of stuff inside this program.0:47:40
Now, I've told you about the language we're implementing, the pattern match substitution language. I showed you some rules. And I've told you about matching and instantiation,0:47:51
which are the two halves of how a rule works. Now we have to understand the control structure by which the rules are applied to the expressions so as to do0:48:03
algebraic simplification. Now, that's also a big complicated mess.0:48:12
The problem is that there is a variety of interlocking, interwoven loops, if you will, involved in this. For one thing, I have to apply--0:48:22
I have to examine every subexpression of my expression that I'm trying to simplify. That we know how to do. It's a car cdr recursion of some sort, or something like0:48:34
that, and some sort of tree walk. And that's going to be happening. Now, for every such place, every node that I get to in0:48:43
doing my traversal of the expression I'm trying to simplify, I want to apply all of the rules.0:48:53
Every rule is going to look at every node. I'm going to rotate the rules around. Now, either a rule will or will not match.0:49:07
If the rule does not match, then it's not very interesting. If the rule does match, then I'm going to replace that node0:49:16
in the expression by an alternate expression. I'm actually going to make a new expression, which contains-- everything contains that new value, the result of0:49:26
substituting into the skeleton, instantiating the skeleton for that rule at this level. But no one knows whether that thing that I instantiated0:49:35
there is in simplified form. So we're going to have to simplify that, somehow to call the simplifier on the thing that I just constructed.0:49:45
And then when that's done, then I sort of can build that into the expression I want as my answer. Now, there is a basic idea here, which I will call a0:49:55
garbage- in, garbage-out simplifier. It's a kind of recursive simplifier. And what happens is the way you simplify something is that0:50:06
simple objects like variables are simple. Compound objects, well, I don't know. What I'm going to do is I'm going to build up from simple0:50:16
objects, trying to make simple things by assuming that the pieces they're made out of are simple. That's what's happening here.0:50:27
Well, now, if we look at the first slide-- no, overhead, overhead. If we look at the overhead, we see a very complicated program like we saw before for the matcher, so complicated that0:50:38
you can't read it like that. I just want you to get the feel of the shape of it, and the shape of it is that this program has various0:50:48
subprograms in it. One of them--this part is the part for traversing the0:50:57
expression, and this part is the part for trying rules. Now, of course, we can look at that in some more detail.0:51:06
Let's look at--let's look at the first transparency, right? The simplifier is made out of several parts.0:51:17
Now, remember at the very beginning, the simplifier is the thing which takes a rules--a set of rules and produces a program which will simplify it relative to them.0:51:29
So here we have our simplifier. It takes a rule set. And in the context where that rule set is defined, there are0:51:39
various other definitions that are done here. And then the result of this simplifier procedure is, in fact, one of the procedures that was defined.0:51:50
Simplify x. What I'm returning as the value of calling the simplifier on a set of rules is a procedure, the simplify x0:52:01
procedure, which is defined in that context, which is a simplification procedure appropriate for using those set of rules.0:52:14
That's what I have there. Now, the first two of these procedures, this one and this one, are together going to be the recursive traversal of an0:52:25
expression. This one is the general simplification for any expression, and this is the thing which simplifies a list of parts of an expression.0:52:35
Nothing more. For each of those, we're going to do something complicated, which involves trying the rules. Now, we should look at the various parts.0:52:45
Well let's look first at the recursive traversal of an expression. And this is done in a sort of simple way.0:52:54
This is a little nest of recursive procedures. And what we have here are two procedures-- one for simplifying an expression, and one for0:53:06
simplifying parts of an expression. And the way this works is very simple. If the expression I'm trying to simplify is a compound0:53:16
expression, I'm going to simplify all the parts of it. And that's calling--that procedure, simplify parts, is going to make up a new expression with all the parts0:53:25
simplified, which I'm then going to try the rules on over here. If it turns out that the expression is not compound, if it's simple, like just a symbol or something like pi,0:53:37
then in any case, I'm going to try the rules on it because it might be that I want in my set of rules to expand pi to 3.14159265358979, dot, dot, dot.0:53:48
But I may not. But there is no reason not to do it. Now, if I want to simplify the parts, well, that's easy too.0:53:59
Either the expression is an empty one, there's no more parts, in which case I have the empty expression. Otherwise, I'm going to make a new expression by cons, which0:54:11
is the result of simplifying the first part of the expression, the car, and simplifying the rest of the expression, which is the cdr.0:54:21
Now, the reason why I'm showing you this sort of stuff this way is because I want you get the feeling for the various patterns that are very important when writing programs. And this could be written a different way.0:54:33
There's another way to write simplified expressions so there would be only one of them. There would only be one little procedure here. Let me just write that on the blackboard to give you a feeling for that.0:54:49
This in another idiom, if you will.0:54:58
To simplify an expression called x, what am I going to do? I'm going to try the rules on the following situation.0:55:11
If-- on the following expression-- compound, just like we had before.0:55:21
If the expression is compound, well, what am I going to do? I'm going to simplify all the parts. But I already have a cdr recursion, a common pattern of0:55:30
usage, which has been captured as a high-order procedure. It's called map. So I'll just write that here. Map simplify the expression, all the parts of the0:55:47
expression. This says apply the simplification operation, which is this one, every part of the expression, and then that cuts those up into a list. It's every element of0:56:02
the list which the expression is assumed to be made out of, and otherwise, I have the expression. So I don't need the helper procedure, simplify parts,0:56:12
because that's really this. So sometimes, you just write it this way. It doesn't matter very much. Well, now let's take a look at--0:56:24
let's just look at how you try rules. If you look at this slide, we see this is a complicated mess also.0:56:33
I'm trying rules on an expression. It turns out the expression I'm trying it on is some subexpression now of the expression I started with. Because the thing I just arranged allowed us to try0:56:43
every subexpression. So now here we're taking in a subexpression of the expression we started with. That's what this is.0:56:52
And what we're going to define here is a procedure called scan, which is going to try every rule. And we're going to start it up on the whole set of rules.0:57:01
This is going to go cdr-ing down the rules, if you will, looking for a rule to apply. And when it finds one, it'll do the job.0:57:14
Well, let's take a look at how try rules works. It's very simple: the scan rules. Scan rules, the way of scanning. Well, is it so simple?0:57:23
It's a big program, of course. We take a bunch of rules, which is a sublist of the list of rules. We've tried some of them already, and they've not been0:57:33
appropriate, so we get to some here. We get to move to the next one. If there are no more rules, well then, there's nothing I can do with this expression, and it's simplified.0:57:42
However, if it turns out that there are still rules to be done, then let's match the pattern of the first rule0:57:52
against the expression using the empty dictionary to start with and use that as the dictionary. If that happens to be a failure, try0:58:02
the rest of the rules. That's all it says here. It says discard that rule.0:58:11
Otherwise, well, I'm going to get the skeleton of the first rule, instantiate that relative to the dictionary, and simplify the result, and that's the expression I want.0:58:24
So although that was a complicated program, every complicated program is made out of a lot of simple pieces. Now, the pattern of recursions here is very complicated.0:58:34
And one of the most important things is not to think about that. If you try to think about the actual pattern by which this does something, you're going to get very confused.0:58:45
I would. This is not a matter of you can do this with practice. These patterns are hard. But you don't have to think about it.0:58:55
The key to this-- it's very good programming and very good design-- is to know what not to think about. The fact is, going back to this slide, I don't have to0:59:07
think about it because I have specifications in my mind for what simplify x does. I don't have to know how it does it.0:59:16
And it may, in fact, call scan somehow through try rules, which it does. And somehow, I've got another recursion going on here. But since I know that simplify x is assumed by wishful0:59:28
thinking to produce the simplified result, then I don't have to think about it anymore. I've used it. I've used it in a reasonable way. I will get a reasonable answer.0:59:39
And you have to learn how to program that way-- with abandon. Well, there's very little left of this thing.0:59:50
All there is left is a few details associated with what a dictionary is. And those of you who've been itching to know what a dictionary is, well, I will flip it up and not tell you1:00:01
anything about it. Dictionaries are easy. It's represented in terms of something else called an A list, which is a particular pattern of usage for making1:00:14
tables in lists. They're easy. They're made out of pairs, as was asked a bit ago. And there are special procedures for dealing with1:00:23
such things called assq, and you can find them in manuals. I'm not terribly excited about it. The only interesting thing here in extend dictionary is I have to extend the dictionary with a pattern, a datum, and a1:00:36
dictionary. This pattern is, in fact, at this point a pattern variable. And what do I want to do? I want to pull out the name of that pattern variable, the1:00:48
pattern variable name, and I'm going to look up in the dictionary and see if it already has a value. If not, I'm going to add a new one in.1:00:57
If it does have one, if it has a value, then it had better be equal to the one that was already stored away. And if that's the case, the dictionary is what I expected it to be.1:01:06
Otherwise, I fail. So that's easy, too. If you open up any program, you're going to find inside of1:01:15
it lots of little pieces, all of which are easy. So at this point, I suppose, I've just told you some million-dollar valuable information.1:01:27
And I suppose at this point we're pretty much done with this program. I'd like to ask about questions. AUDIENCE: Yes, can you give me the words that describe the specification for a simplified expression?1:01:38
PROFESSOR: Sure. A simplified expression takes an expression and produces a simplified expression. That's it, OK?1:01:48
How it does it is very easy. In compound expressions, all the pieces are simplified, and then the rules are tried on the result. And for simple expressions, you just try all the rules.1:01:59
AUDIENCE: So an expression is simplified by virtue of the rules? PROFESSOR: That's, of course, true. AUDIENCE: Right. PROFESSOR: And the way this works is that simplifi expression, as you see here, what it does is it breaks the1:02:10
expression down into the smallest pieces, simplifies building up from the bottom using the rules to be the simplifier, to do the manipulations, and constructs1:02:21
a new expression as the result. Eventually, one of things you see is that the rules themselves, the try rules, call a simplified expression1:02:30
on the results when it changes something, the results of a match. I'm sorry, the results of instantiation of a skeleton1:02:39
for a rule that has matched. So the spec of a simplified expression is that any expression you put into it comes out simplified according to those rules.1:02:49
Thank you. Let's take a break.0:00:00
Lecture 4B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:20
PROFESSOR: So far in this course we've been talking a lot about data abstraction. And remember the idea is that we build systems that have these horizontal barriers in them, these abstraction0:00:31
barriers that separate use, the way you might use some data object, from the way you might represent it.0:00:48
Or another way to think of that is up here you have the boss who's going to be using some sort of data object.0:00:57
And down here is George who's implemented it. Now this notion of separating use from representation so you can think about these two problems separately is a very,0:01:10
very powerful programming methodology, data abstraction. On the other hand, it's not really sufficient for really0:01:21
complex systems. And the problem with this is George. Or actually, the problem is that there0:01:32
are a lot of Georges. Let's be concrete. Let's suppose there is George, and there's also Martha.0:01:41
OK, now George and Martha are both working on this system, both designing representations, and absolutely are incompatible.0:01:51
They wouldn't cooperate on a representation under any circumstances. And the problem is you would like to have some system where0:02:00
both George and Martha are designing representations, and yet, if you're above this abstraction barrier you don't0:02:09
want to have to worry about that, whether something is done by George or by Martha. And you don't want George and Martha to interfere with each other. Somehow in designing a system, you not only want these0:02:20
horizontal barriers, but you also want some kind of vertical barrier to keep George and Martha separate.0:02:32
Let me be a little bit more concrete. Imagine that you're thinking about personnel records for a0:02:42
large company with a lot of loosely linked divisions that don't cooperate very well either. And imagine even that this company is formed by merging a0:02:57
whole bunch of companies that already have their personnel record system set up. And imagine that once these divisions are all linked in0:03:06
some kind of very sophisticated satellite network, and all these databases are put together. And what you'd like to do is, from any place in the company,0:03:17
to be able to say things like, oh, what's the name in a personnel record?0:03:26
Or, what's the job description in a personnel record? And not have to worry about the fact that each division obviously is going to have completely separate0:03:36
conventions for how you might implement these records. From this point you don't want to know about that. Well how could you possibly do that?0:03:48
One way, of course, is to send down an edict from somewhere that everybody has to change their format to some fixed compatible thing.0:03:58
That's what people often try, and of course it never works. Another thing that you might want to do is somehow arrange0:04:07
it so you can have these vertical barriers. So that when you ask for the name of a personnel record, somehow, whatever format it happens to be, name will0:04:17
figure out how to do the right thing. We want name to be, so-called, a generic operator.0:04:26
Generic operator means what it sort of precisely does depends on the kind of data that it's looking at. More than that, you'd like to design the system so that the0:04:37
next time a new division comes into the company they don't have to make any big changes in what they're already doing to link into this system, and the rest of the company0:04:50
doesn't have to make any big changes to admit their stuff to the system. So that's the problem you should be thinking about. Like it's sort of just your work.0:05:00
You want to be able to include new things by making minimal changes. OK, well that's the problem that we'll be talking about today.0:05:09
And you should have this sort of distributed personnel record system in your mind. But actually the one I'll be talking about is a problem that's a little bit more self-contained than that.0:05:18
that'll bring up the issues, I think, more clearly. That's the problem of doing a system that does arithmetic on complex numbers.0:05:27
So let's take a look here. Just as a little review, there are things called complex numbers. Complex number you can think of as a point in0:05:36
the plane, or z. And you can represent a point either by its real-part and0:05:46
its imaginary-part. So if this is z and its real-part is this much, and its imaginary-part is that much, and you write z equals x plus iy.0:05:59
Or another way to represent a complex number is by saying, what's the distance from the origin, and what's the angle?0:06:10
So that represents a complex number as its radius times an angle.0:06:19
This one's called-- the original one's called rectangular form, rectangular representation, real- and imaginary-part, or polar representation.0:06:28
Magnitude and angle-- and if you know the real- and imaginary-part, you can figure out the magnitude and angle. If you know x and y, you can get r by this formula.0:06:37
Square root of sum of the squares, and you can get the angle as an arctangent. Or conversely, if you knew r and A you could figure out x and y. x is r times the cosine of A, and y is r times the sine of0:06:49
A. All right, so there's these two. They're complex numbers. You can think of them either in polar form or rectangular form. What we would like to do is make a system that does0:06:59
arithmetic on complex numbers. In other words, what we'd like-- just like the rational number example-- is to have some operations plus c, which is going to take0:07:11
two complex numbers and add them, subtract them, and multiply them, and divide them.0:07:20
OK, well there's little bit of mathematics behind it. What are the actual formulas for manipulating such things?0:07:29
And it's sort of not important where they come from, but just as an implementer let's see-- if you want to add two complex numbers it's pretty easy to0:07:40
get its real-part and its imaginary-part. The real-part of the sum of two complex numbers, the real-part of the z1 plus z2 is the real-part of z1 plus the0:07:53
real-part of z2. And the imaginary-part of z1 plus z2 is the imaginary part0:08:02
of z1 plus the imaginary part of z2. So it's pretty easy to add complex numbers. You just add the corresponding parts and make a new complex0:08:12
number with those parts. If you want to multiply them, it's kind of nice to do it in polar form. Because if you have two complex numbers, the magnitude0:08:21
of their product is here, the product of the magnitudes. And the angle of the product is the sum of the angles.0:08:35
So that's sort of mathematics that allows you to do arithmetic on complex numbers. Let's actually think about the implementation. Well we do it just like rational numbers.0:08:49
We come down, we assume we have some constructors and selectors. What would we like? Well let's assume that we make a data object cloud, which is0:08:58
a complex number that has some stuff in it, and that we can get out from a complex number the real-part, or the imaginary-part, or the magnitude, or the angle.0:09:12
We want some ways of making complex numbers-- not only selectors, but constructors. So we'll assume we have a thing called make-rectangular. What make-rectangular is going to do is take a real-part and0:09:24
an imaginary-part and construct a complex number with those parts. Similarly, we can have make-polar which will take a0:09:35
magnitude and an angle, and construct a complex number which has that magnitude and angle.0:09:44
So here's a system. We'll have two constructors and four selectors. And now, just like before, in terms of that abstract data0:09:55
we'll go ahead and implement our complex number operations. And here you can see translated into Lisp code just the arithmetic formulas I put down before.0:10:08
If I want to add two complex numbers I will make a complex number out of its real- and imaginary-parts. The real part of the complex number I'm going to make is0:10:19
the sum of the real-parts. The imaginary part of the complex number I'm going to make is the sum of the imaginary-parts.0:10:30
I put those together, make a complex number. That's how I implement complex number addition. Subtraction is essentially the same.0:10:39
All I do is subtract the parts rather than add them. To multiply two complex numbers, I use the other formula.0:10:49
I'll make a complex number out of a magnitude and angle. The magnitude is going to be the product of the magnitudes0:10:58
of the two complex numbers I'm multiplying. And the angle is going to be the sum of the angles of the two complex numbers I'm multiplying.0:11:09
So there's multiplication. And then division, division is almost the same. Here I divide the magnitudes and subtract the angles.0:11:28
Now I've implemented the operations. And what do we do? We call on George. We've done the use, let's worry about the0:11:38
representation. We'll call on George and say to George, go ahead and build us a complex number representation. Well that's fine.0:11:47
George can say, we'll implement a complex number simply as a pair that has the real-part and the0:11:56
imaginary-part. So if I want to make a complex number with a certain real-part and an imaginary-part, I'll just use cons to form a pair, and that will-- that's George's0:12:06
representation of a complex number. So if I want to get out the real-part of something, I just extract the car, the first part. If I want to get the imaginary-part, I extract the0:12:16
cdr. How do I deal with the magnitude and angle? Well if I want to extract the magnitude of one of these0:12:25
things, I get the square root of the sum of the square of the car plus the square of the cdr. If I want to get the0:12:34
angle, I compute the arctangent of the cdr in the car. This is a list procedure for computing arctangent.0:12:44
And if somebody hands me a magnitude and an angle and says, make me a complex number, well I compute the real-part and the imaginary-part, or our cosine0:12:54
of a and our sine of a, and stick them together into a pair. OK so we're done. In fact, what I just did, conceptually, is absolutely no0:13:07
different from the rational number representation that we looked at last time. It's the same sort of idea. You implement the operators, you pick a representation.0:13:18
Nothing different. Now let's worry about Martha. See, Martha has a different idea. She doesn't want to represent a complex number as a pair of0:13:29
a real-part and an imaginary-part. What she would like to do is represent a complex number as a pair of a magnitude and an angle.0:13:39
So if instead of calling up George we ask Martha to design our representation, we get something like this. We get make-polar. Sure, if I give you a magnitude and an angle we're0:13:50
just going to form a pair that has magnitude and angle. If you want to extract the magnitude, that's easy. You just pull out the car or the pair.0:13:59
If you want to extract the angle, sure, that's easy. You just pull out the cdr. If you want to look for real-parts and imaginary-parts, well then you have to do some work.0:14:08
If you want the real-part, you have to get r cosine a. In other words, r, the car of the pair, times the cosine of0:14:19
the cdr of the pair. So this is r times the cosine of a, and that's the real-part.0:14:28
If you want to get the imaginary-part, it's r times the sine of a. And if I hand you a real-part and an imaginary-part and say,0:14:37
make me a complex number with that real-part and imaginary-part, well I figure out what the magnitude and angle should be. The magnitude's the square root of the sum of the squares0:14:48
and the angle's the arctangent. I put those together to make a pair. So there's Martha's idea. Well which is better?0:14:59
Well if you're doing a lot of additions, probably George's is better, because you're doing a lot of real-parts and imaginary-parts. If mostly you're going to be doing multiplications and divisions, then maybe Martha's idea is better.0:15:11
Or maybe, and this is the real point, you can't decide. Or maybe you just have to let them both hang around, for0:15:21
personality reasons. Maybe you just really can't ever decide what you would like. And again, what we would really like is a system that0:15:31
looks like this. That somehow there's George over here, who has built rectangular complex numbers.0:15:41
And Martha, who has polar complex numbers. And somehow we have operations that can add, and subtract,0:15:54
and multiply, and divide, and it shouldn't matter that there are two incompatible representations of complex numbers floating around this system.0:16:04
In other words, not only like an abstraction barrier here that has things in it like a real-part, and an0:16:15
imaginary-part, and magnitude, and angle. So not only is there an abstraction barrier that hides0:16:26
the actual representation from us, but also there's some kind of vertical barrier here that allows both of these representations to exist without0:16:36
interfering with each other. The idea is that the things in here-- real-part, imaginary-part, magnitude, and angle-- will be generic operators.0:16:47
If you ask for the real-part, it will worry about what representation it's looking at. OK, well how can we do that?0:16:56
There's actually a really obvious idea, if you're used to thinking about complex numbers. If you're used to thinking about compound data.0:17:06
See, suppose you could just tell by looking at a complex number whether it was constructed by George or Martha.0:17:15
In other words, so it's not that what's floating around here are ordinary, just complex numbers, right? They're fancy, designer complex numbers.0:17:24
So you look at a complex numbers as it's not just a complex number, it's got a label on it that says, this one is by Martha. Or this is a complex number by George.0:17:34
Right? They're signed. See, and then whenever we looked at a complex number we could just read the label, and then we'd know how you expect0:17:45
to operate on that. In other words, what we want is not just ordinary data objects. We want to introduce the notion of what's called typed data.0:17:59
Typed data means, again, there's some sort of cloud. And what it's got in it is an ordinary data object like0:18:08
we've been thinking about. Pulled out the contents, sort of the actual data.0:18:19
But also a thing called a type, but it's signed by either George or Martha. So we're going to go from regular data to type data.0:18:31
How do we build that? Well that's easy. We know how to build clouds. We build them out of pairs. So here's a little representation that supports0:18:41
typed data. There's a thing called take a type and attach it to a piece of contents, and we just use cons.0:18:51
And if we have a piece of typed data, we can look at the type, which is the car. We can look at the contents, which is the cdr. Now along0:19:00
with that, the way we use our type data will test, when we're given a piece of data, what type it is. So we have some type predicates with us.0:19:10
For example, to see whether a complex number is one of George's, whether it's rectangular, we just check to see if the type of that is the symbol rectangular, right?0:19:23
The symbol rectangular. And to check whether a complex number is one of Martha's, we check to see whether the type is the symbol polar.0:19:36
So that's a way to test what kind of number we're looking at. Now let's think about how we can use that to build the system. So let's suppose that George and Martha were off working0:19:46
separately, and each of them had designed their complex number representation packages. What do they have to do to become part of the system, to0:19:58
exist compatibly? Well it's really pretty easy. Remember, George had this package. Here's George's original package, or half of it.0:20:08
And underlined in red are the changes he has to make. So before, when George made a complex number out of an x and y, he just put them together to make a pair.0:20:20
And the only difference is that now he signs them. He attaches the type, which is the symbol rectangular to that pair.0:20:30
Everything else George does is the same, except that-- see, George and Martha both have procedures named real-part and imaginary-part. So to allow them both to exist in the same Lisp environment,0:20:44
George had changed the names of his procedures. So we'll say, this is George's real-part procedure. It's the real-part rectangular procedure, the imaginary-part rectangular procedure.0:20:55
And then here's the rest of George's package. He'd had magnitude and angle, just renames them magnitude rectangular and angle rectangular.0:21:05
And Martha has to do basically the same thing. Martha previously, when she made a complex number out of a0:21:15
magnitude and angle, she just cons them. Now she attaches the type polar, and she changes the0:21:25
name so her real-part procedure won't conflict in name with George's. It's a real-part-polar, imaginary-part-polar,0:21:34
magnitude polar, and angle polar.0:21:45
Now we have the system. Right there's George and Martha. And now we've got to get some kind of manager to look at these types.0:21:55
How are these things actually going to work now that George and Martha have supplied us with typed data? Well what we have are a bunch of generic selectors.0:22:05
Generic selectors for complex numbers real-part, imaginary-part, magnitude, and angle.0:22:14
Let's look at them more closely. What does a real-part do? If I ask for the real part of a complex number,0:22:24
well I look at it. I look at its type. I say, is it rectangular? If so, I apply George's real part procedure to the contents0:22:36
of that complex number. This is a number that has a type on it. I strip off the type using contents and0:22:46
apply George's procedure. Or is this a polar complex number? If I want the real part, I apply Martha's real part0:22:56
procedure to the contents of that number. So that's how real part works. And then similarly there's imaginary-part, which is almost the same.0:23:06
It looks at the number and if it's rectangular, uses George's imaginary-part procedure. If it's polar, uses Martha's. And then there's a magnitude and an angle.0:23:19
So there's a system. Has three parts. There's sort of George, and Martha, and the manager. And that's how you get generic operators implemented.0:23:28
Let's look at just a simple example, just to pin it down. But exactly how this is going to work, suppose you're going0:23:40
to be looking at the complex number who's real-part is one, and who's imaginary-part is two. So that would be one plus 2i.0:23:50
What would happen is up here, up here above where the operations have to happen, that number would be represented as a pair of 1 and 2 together with typed data.0:24:10
That would be the contents. And the whole data would be that thing with the symbol rectangular added onto that. And that's the way that complex number would exist in0:24:20
the system. When you went to take the real-part, the manager would look at this and say, oh it's one of George's.0:24:30
He'll strip off the type and hand down to George the pair 1, 2. And that's the kind of data that George developed his0:24:41
system to use. So it gets stripped down. Later on, if you ask George to construct a complex number,0:24:51
George would construct some complex number as a pair, and before he passes it back up through the manager would attach the type rectangular.0:25:03
So you see what happens. There's no confusion in this system. It doesn't matter in the least that the pair 1, 2 means0:25:13
something completely different in Martha's world. In Martha's world this pair means the complex number whose magnitude is 1 and whose angle is 2. And there's no confusion, because by the time any pair0:25:23
like this gets handed back through the manager to the main system it's going to have the type polar attached. Whereas this one would have the type rectangular attached.0:25:36
OK, let's take a break. [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:26:20
We just looked at a strategy for implementing generic operators. That strategy has a name: it's called dispatch type.0:26:34
And the idea is that you break your system into a bunch of pieces. There's George and Martha, who are making representations,0:26:43
and then there's the manager. Looks at the types on the data and then dispatches them to the right person. Well what criticisms can we make of that as a system0:26:55
organization? Well first of all there was this little, annoying problem that George and Martha had to change the names of their procedures.0:27:04
George originally had a real-part procedure, and he had to go name it real-part rectangular so it wouldn't interfere with Martha's real-part procedure, which is now named real-part-polar, so it wouldn't interfere with the0:27:14
manager's real-part procedure, who's now named real-part. That's kind of an annoying problem. But I'm not going to talk about that one now. We'll see later on when we think about the structure of0:27:24
Lisp names and environments that there really are ways to package all those so-called name spaces separately so they don't interfere with each other. Not going to think about that problem now.0:27:35
The problem that I actually want to focus on is what happens when you bring somebody new into the system.0:27:44
What has to happen? Well George and Martha don't care. George is sitting there in his rectangular world, has his procedures and his types.0:27:54
Martha sits in her polar world. She doesn't care. But let's look at the manager. What's the manager have to do?0:28:03
The manager comes through and had these operations. There was a test for rectangular and a test for polar. If Harry comes in with some new kind of complex number,0:28:17
and Harry has a new type, Harry type complex number, the manager has to go in and change all those procedures. So the inflexibility in the system, the place where work0:28:28
has to happen to accommodate change, is in the manager. That's pretty annoying. It's even more annoying when you realize the manager's not0:28:40
doing anything. The manager is just being a paper pusher. Let's look again at these programs. What are they doing?0:28:51
What does real-part do? Real-part says, oh, is it the kind of complex number that George can handle? If so, send it off to George. Is it the kind of complex number that Martha can handle?0:29:01
If so, send it off to Martha. So it's really annoying that the bottleneck in this system, the thing that's preventing flexibility and change, is0:29:13
completely in the bureaucracy. It's not in anybody who's doing any of the work. Not an uncommon situation, unfortunately.0:29:23
See, what's really going on-- abstractly in the system, there's a table. So what's really happening is somewhere there's a table.0:29:32
There're types. There's polar and rectangular.0:29:41
And Harry's may be over here. And there are operators. There's an operator like real-part.0:29:55
Or imaginary-part. Or a magnitude and angle.0:30:05
And sitting in this table are the right procedures.0:30:19
So sitting here for the type polar and real-part is Martha's procedure real-part-polar.0:30:30
And over here in the table is George's procedure real-part-rectangular. And over here would be, say, Martha's procedure0:30:40
magnitude-polar, and George's procedure magnitude-rectangular, right, and so on.0:30:49
The rest of this table's filled in. And that's really what's going on. So in some sense, all the manager is doing is acting as0:31:03
this table. Well how do we fix our system?0:31:12
How do you fix bureaucracies a lot of the time? What you do is you get rid of the manager. We just take the manager and replace him by a computer. We're going to automate him out of existence.0:31:23
Namely, instead of having the manager who basically consults this table, we'll have our system use the table directly. What do I mean by that?0:31:32
Let's assume, again using data abstraction, that we have some kind of data structure that's a table. And we have ways of sticking things in and ways of getting0:31:43
things out. And to be explicit, let me assume that there's an operation called "put." And put is going to take, in this0:31:52
case two things I'll call "keys." Key1 and key2. And a value.0:32:06
And that stores the value in the table under key1 and key2. And then we'll assume there's a thing called "get," such0:32:15
that if later on I say, get me what's in the table stored under key1 and key2, it'll retrieve whatever value was0:32:25
stored there. And let's not worry about how tables are implemented. That's yet another data abstraction, George's problem. And maybe we'll see later--0:32:34
talk about how you might actually build tables in Lisp. Well given this organization, what did George and Martha0:32:44
have to do? Well when they build their system, they each have the responsibility to set up their appropriate column in the table.0:32:55
So what George does, for example, when he defines his procedures, all he has to do is go off and put into the0:33:04
table under the type-rectangular. And the name of the operation is real-part, his procedure0:33:14
real-part-rectangular. So notice what's going into this table. The two keys here are symbols, rectangular and real-part. That's the quote.0:33:24
And what's going into the table is the actual procedure that he wrote, real-part rectangular. And then puts an imaginary part into the table, filed0:33:35
under the keys rectangular- and imaginary-part, and magnitude under the keys rectangular magnitude, angle0:33:44
under rectangular-angle. So that's what George has to do to be part of this system.0:33:54
Martha similarly sets up the column and the table under polar. Polar and real-part. Is the procedure real-part-polar?0:34:04
And imaginary-part, and magnitude, and angle. So this is what Martha has to do to be part of the system. Everyone who makes a representation has the0:34:13
responsibility for setting up a column in the table. And what does Harry do when Harry comes in with his brilliant idea for implementing complex numbers? Well he makes whatever procedure he wants and builds0:34:25
a new column in this table. OK, well what happened to the manager? The manager has been automated out of existence and is0:34:34
replaced by a procedure called operate. And this is the key procedure in the whole system. Let's say define operate.0:34:51
Operate is going to take an operation that you want to do, the name of an operation, and an object that you would like0:35:01
to apply that operation to. So for example, the real-part of some particular complex number, what does it do? Well the first thing it does, it looks in the table.0:35:12
Goes into the table and tries to find a procedure that's stored in the table.0:35:23
So it gets from the table, using as keys the type of the object and the operator, but looks on the table and sees0:35:40
what's stored under the type of the object and the operator, sees if anything's stored. Let's assume that get is implemented. So if nothing is stored there, it'll return the empty list.0:35:52
So it says, if there's actually something stored there, if the procedure here is not no, then it'll take the0:36:04
procedure that it found in the table and apply it to the contents of the object.0:36:18
And otherwise if there was nothing stored there, it'll-- well we can decide. In this case let's have it put out an error message saying, undefined operator.0:36:28
No operator for this type. Or some appropriate error message.0:36:39
OK? And that replaces the manager. How do we really use it? Well what we say is we'll go off and define our generic0:36:48
selectors using operate. We'll say that the real-part of an object is found by0:36:57
operating on the object with the name of the operation being real-part.0:37:08
And then similarly, imaginary-part is operate using the name imaginary-part and magnitude and angle. That's our implementation.0:37:17
That plus the tape plus the operate procedure. And the table effectively replaces what the manager used to do. Let's just go through that slowly to show you0:37:27
what's going on. Suppose I have one of Martha's complex numbers. It's got magnitude 1 and angle 2.0:37:39
And it's one of Martha's. So it's labeled here, polar. Let's call that z.0:37:48
Suppose that's z. And suppose with this implementation someone comes up and asks for the real-part of z.0:38:04
Well real-part now is defined in terms of operate. So that's equivalent to saying operate with the name of the0:38:18
operator being real-part, the symbol real-part on z.0:38:27
And now operate comes. It's going to look in the table, and it's going to try and find something stored under--0:38:38
the operation is going to apply by looking in the table under the type of the object. And the type of z is polar.0:38:48
So it's going to look and say, can I get using polar? And the operation name, which was real-part.0:39:05
It's going to look in there and apply that to the contents of z.0:39:14
And that? If everything was set up correctly, this thing is the procedure that Martha put there. This is real-part-polar.0:39:30
And this is z without its type. The thing that Martha originally designed those procedures to work on, which is 1, 2.0:39:43
And so operate sort of does uniformly what the manager used to do sort of all over the system. It finds the right thing, looks in the table, strips off0:39:52
the type, and passes it down into the person who handles it. This is another, and, you can see, more flexible for most0:40:04
purposes, way of implementing generic operators. And it's called data-directed programming.0:40:20
And the idea of that is in some sense the data objects themselves, those little complex numbers that are floating around the system, are carrying with them the0:40:30
information about how you should operate on them. Let's break for questions.0:40:41
Yes. AUDIENCE: What do you have stored in that data object? You have the data itself, you have its type, and you have the operations for that type? Or where are the operations that you found?0:40:53
PROFESSOR: OK, let me-- yeah, that's a good question. Because it raises other possibilities of how you might do it. And of course there are a lot of possibilities.0:41:04
In this particular implementation, what's sitting in this data object, for example, is the data itself-- which in this case is a pair of 1 and 2--0:41:14
and also a symbol. This is the symbol, the word P-O-L-A-R, and that's what's sitting in this data object.0:41:24
Where are the operations themselves? The operations are sitting in the table. So in this table, the rows and columns of the table are0:41:35
labeled by symbols. So when I store something in this table, the key might be the symbol polar and the symbol magnitude.0:41:48
And I think by writing it this way I've been very confusing. Because what's really sitting here isn't-- when I wrote magnitude polar, what I mean is the procedure0:41:58
magnitude polar. And probably what I really should have written-- except it's too small for me to write in this little space-- is something like lambda of z, the thing that0:42:11
Martha wrote to implement. And then you can see from that, there's another way that I alluded to of solving this name conflict problem, which0:42:20
is that George and Martha never have to name their procedures at all. They can just stick the anonymous things generated by lambda directly into the table. There's also another thing that your question raises, is0:42:32
the possibility that maybe what I would like somehow is to store in this data object not the symbol P-O-L-A-R but maybe actually all the operations themselves.0:42:43
And that's another way to organize the system, called message passing. So there are a lot of ways you can do it.0:42:54
AUDIENCE: Therefore if Martha and George had used the same procedure names, it would be OK because it wouldn't look [UNINTELLIGIBLE]. PROFESSOR: That's right.0:43:03
That's right. See, they wouldn't even have to name their procedures at all. What George could have written instead of saying put in the0:43:12
table under rectangular- and real-part, the procedure real-part rectangular, George could have written put under rectangular real-part, lambda of z, such and such,0:43:23
and such and such. And the system would work completely the same. AUDIENCE: My question is, Martha could have put key1 key2 real-part, and George could have put key1 key20:43:37
real-part, and as long as they defined them differently they wouldn't have had any conflicts, right? PROFESSOR: Yes, that would all be OK except for the fact that if you imagine George and Martha typing at the same0:43:47
console with the same meanings for all their names, and it would get confused by real-part, but there are ways to arrange that, too. And in principle you're absolutely right. If their names didn't conflict--0:43:56
it's the objects that go in the table, not the names.0:44:08
OK, let's take a break. [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:45:12
All right, well we just looked at data-directed programming as a way of implementing a system that does arithmetic on0:45:21
complex numbers. So I had these operations in it called plus C and minus C,0:45:32
and multiply, and divide, and maybe some others. And that sat on top of-- and this is the key point-- sat on0:45:46
top of two different representations. A rectangular package here, and a polar package.0:45:58
And maybe some more. And we saw that the whole idea is that maybe some more are now very easy to add. But that doesn't really show the power of this methodology.0:46:08
Shows you what's going on. The power of the methodology only becomes apparent when you start embedding this in some more complex system.0:46:17
What I'm going to do now is embed this in some more complex system. Let's assume that what we really have is a general kind of arithmetic system. So called generic arithmetic system.0:46:27
And at the top level here, somebody can say add two things, or subtract two things, or multiply two0:46:38
things, or divide two things. And underneath that there's an abstraction barrier.0:46:47
And underneath this barrier, is, say, a complex arithmetic package. And you can say, add two complex numbers. Or you might also have-- remember we did a rational0:46:57
number package-- you might have that sitting there. And there might be a rational thing. And the rational number package, well, has the things0:47:07
we implemented. Plus rat, and times rat, and so on. Or you might have ordinary Lisp numbers.0:47:17
You might say add three and four. So we might have ordinary numbers, in which case we have0:47:29
the Lisp supplied plus, and minus, and times, and slash. OK, so we might imagine this complex number system sitting0:47:39
in a more complicated generic operator structure at the next level up. Well how can we make that?0:47:49
We already have the idea, we're just going to do it again. We've implemented a rational number package. Let's look at how it has to be changed.0:48:01
In fact, at this level it doesn't have to be changed at all. This is exactly the code that we wrote last time. To add two rational numbers, remember0:48:10
there was this formula. You make a rational number whose numerator-- the numerator of the first times the denominator of the second, plus the denominator of the first times the0:48:20
numerator of the second. And who's denominator is the product of the denominators. And minus rat, and star rat, and slash rat.0:48:30
And this is exactly the rational number package that we made before. We're ignoring the GCD problem, but let's not worry about that.0:48:40
As implementers of this rational number package, how do we install it in the generic arithmetic system? Well that's easy. There's only one thing we have to do differently.0:48:51
Whereas previously we said that to make a rational number you built a pair of the numerator and denominator,0:49:00
here we'll not only build the pair, but we'll sign it. We'll attach the type rational. That's the only thing we have to do different, make it a typed data object.0:49:12
And now we'll stick our operations in the table. We'll put under the symbol rational and the operation add our procedure, plus rat.0:49:21
And, again, note this is a symbol. Right? Quote, unquote, but the actual thing we're putting in the table is the procedure.0:49:30
And for how to subtract, well you subtract rationals with minus rat. And multiply, and divide.0:49:41
And that is exactly and precisely what we have to do to fit inside this generic arithmetic system. Well how does the whole thing work?0:49:51
See, what we want to do is have some generic operators.0:50:00
Have add and sub and [UNINTELLIGIBLE] be generic operators. So we're going to define add and say, to add x and y, that0:50:18
will be operate-- we were going to call it operate-2.0:50:27
This is our operator procedure, but set up for two arguments using add on x and y.0:50:37
And so this is the analog to operate. Let's look at the code for second. It's almost like operate.0:50:46
To operate with some operator on an argument 1 and an argument 2, well the first thing we're going to do is0:50:56
check and see if the two arguments have the same type. So we'll say, is the type of the first argument the same as0:51:06
the type of the second argument? And if they're not, we'll go off and complain, and say,0:51:15
that's an error. We don't know how to do that. If they do have the same type, we'll do exactly what we did before. We'll go look and filed under the type of the argument--0:51:26
arg 1 and arg 2 have the same type, so it doesn't matter. So we'll look in the table, find the procedure. If there is a procedure there, then we'll apply it to the0:51:38
contents of the argument 1 and the contents of arg 2. And otherwise we'll say, error. Undefined operator. And so there's operate-2.0:51:51
And that's all we have to do. We just built the complex number package before. How do we embed that complex number package in0:52:00
this generic system? Almost the same. We make a procedure called make-complex that takes0:52:11
whatever George and Martha hand to us and add the type-complex. And then we say, to add complex numbers, plus complex,0:52:25
we use our internal procedure, plus c, and attach a type, make that a complex number.0:52:37
So our original package had names plus c and minus c that we're using to communicate with George and Martha. And then to communicate with the outside world, we have a0:52:47
thing called plus-complex and minus-complex. And so on.0:52:56
And the only difference is that these return values that are tight. So they can be looked at up here. And these are internal operations.0:53:09
Let's go look at that slide again. There's one more thing we do. After defining plus-complex, we put under the type complex0:53:19
and the symbol add, that procedure plus complex. And then similarly for subtracting complex numbers, and multiplying them, and dividing them.0:53:31
OK, how do we install ordinary numbers? Exactly the same way. Come off and say, well we'll make a thing called0:53:40
make-number. Make-number takes a number and attaches a type, which is the symbol number.0:53:50
We build a procedure called plus-number, which is simply, add the two things using the ordinary addition, because in0:53:59
this case we're talking about ordinary numbers, and attach a type to it and make that a number. And then we put into the table under the symbol number and0:54:08
the operation add, this procedure plus-number, and then the same thing for subtracting, and multiplying, and dividing.0:54:22
Let's look at an example, just to make it clear. Suppose, for instance, I'm going0:54:32
to perform the operation. So I sit up here and I'm going to perform the operation, which looks like multiplying two complex numbers. So I would multiply, say, 3 plus 4i and 2 plus 6i.0:54:49
And that's something that I might want to take hand that to mul. I'll write mul as my generic operator here. How's that going to work?0:54:58
Well 3 plus 4i, say, sits in the system at this level as something that looks like this. Let's say it was one of George's.0:55:08
So it would have a 3 and a 4.0:55:18
And attached to that would be George's type, which would say rectangular, it came from George.0:55:29
And attached to that-- and this itself would be the data view from the next level up, which it is-- so that itself would be a type-data object which would0:55:41
say complex. So that's what this object would look like up here at the very highest level, where the really super-generic0:55:52
operations are looking at it. Now what happens, mul eventually's going to come along and say, oh, what's it's type? It's type is complex.0:56:04
Go through to operate-2 and say, oh, what I want to do is apply what's in the table, which is going to be the procedure star complex, on this thing with the type0:56:17
stripped off. So it's going to strip off the type, take that much, and send that down into the complex world.0:56:26
The complex world looks at its operations and says, oh, I have to apply star c. Star c might say, oh, at some point I want to look at the magnitude of this object that it's in, that it's got.0:56:39
And they'll say, oh, it's rectangular, it's one of George's. So it'll then strip off the next version of type, and hand that down to George to take the magnitude of.0:56:52
So you see what's going on is that there are these chains of types. And the length of the chain is sort of the number of levels0:57:01
that you're going to be going up in this table. And what a type tells you, every time you have a vertical barrier in this table, where there's some ambiguity about0:57:12
where you should go down to the next level, the type is telling you where to go. And then everybody at the bottom, as they construct data and filter it up, they stick their type back on.0:57:25
So that's the general structure of the system. OK.0:57:34
Now that we've got this, let's go and make this thing even more complex. Let's talk about adding to the system not only these kinds of0:57:46
numbers, but it's also meaningful to start talking about adding polynomials. Might do arithmetic on polynomials. Like we could have x to the fifteenth plus 2x to the0:57:57
seventh plus 5. That might be some polynomial.0:58:06
And if we have two such gadgets we can add them or multiply them. Let's not worry about dividing them. Just add them, multiply them, then we'll subtract them.0:58:15
What do we have to do? Well let's think about how we might represent a polynomial. It's going to be some typed data object.0:58:24
So let's say a polynomial to this system might look like a thing that starts with the type polynomial. And then maybe it says the next thing is what0:58:33
variable its in. So I might say I'm a polynomial in the variable x. And then it'll have some information about what the terms are.0:58:42
And there're just tons of ways to do this, but one way is to say we're going to have a thing called a term-list. And0:58:51
a term-list-- well, in our case we'll use something that looks like this. We'll make it a bunch of pairs which have an order in a coefficient. So this polynomial would be represented by this term-list.0:59:09
And what that means is that this polynomial starts off with a term of order 15 and coefficient 1.0:59:23
And the next thing in it is a term of order 7 and coefficient 2, a term of order 0, which is constant in coefficient 5. And there are lots and lots of ways, and lots and lots of0:59:35
trade-offs when you really think about making algebraic manipulation packages about exactly how you should represent these things. But this is a fairly standard one.0:59:44
It's useful in a lot of contexts. OK, well how do we implement our polynomial arithmetic?0:59:54
Let's start out. What we'll do to make a polynomial-- we'll first have a way to make polynomials.1:00:05
We're going to make a polynomial out of variable like x and term-list. And all that does is we'll package them together someway.1:00:14
We'll put the variable together with the term list using cons, and then attached to that the type polynomial.1:00:26
OK, how do we add two polynomials? To add a polynomial, p1 and p2, and then just for simplicity let's say we will only add1:00:36
things in the same variable. So if they have the same variable, and same variable here is going to be some selector we write, whose details we don't care about.1:00:45
If the two polynomials have the same variable, then we'll do something. If they don't have the same variable, we'll give an error, polynomials not in the same variable.1:00:55
And if they do have the same variable, what we'll do is we'll make a polynomial whose variable is whatever that variable is, and whose term-list is something we'll1:01:05
call sum-terms. Plus terms will add the two term lists. So we'll add the two term lists to the polynomial. That'll give us a term-list. We'll add on, we'll say it's a1:01:16
polynomial in the variable with that term-list. That's plus poly. And then we're going to put in our table under the type1:01:26
polynomial, add them using plus poly. And of course we really haven't done much. What we've really done is pushed all the work onto this thing, plus-terms, which is supposed to add term-lists.1:01:38
Let's look at that. Here's an overview of how we might add two term-lists.1:01:48
So L1 and L2 were going to be two term-lists. And a term-list is a bunch of pairs, coefficient in order. And it's a big case analysis.1:01:59
And the first thing we'll check for and see if there are any terms. We're going to recursively work down these term-lists, so eventually we'll get to a place where1:02:09
either L1 or L2 might be empty. And if either one is empty, our answer will be the other one. So if L1 is empty we'll return L2, and if L2 is empty1:02:20
we'll return L1. Otherwise there are sort of three interesting cases. What we're going to do is grab the first term in each of1:02:30
those lists, called t1 and t2. And we're going to look at three cases, depending on1:02:43
whether the order of t1 is greater than the order of t2, or less than t2, or the same.1:02:53
Those are the three cases we're going to look at. Let's look at this case. If the order of t1 is greater than the order of t2, then1:03:03
what that means is that our answer is going to start with this term of the order of t1. Because it won't combine with any lower order terms. So what1:03:14
we do is add the lower order terms. We recursively add together all the terms in the rest of the term-list in L1 and L2.1:03:26
That's going to be the lower order terms of the answer. And then we're going to adjoin to that the highest order term. And I'm using here a whole bunch of procedures I haven't1:03:35
defined, like a adjoin-term, and rest-terms, and selectors that get order. But you can imagine what those are.1:03:44
So if the first term-list has a higher order than the second, we recursively add all the lower terms and then stick on that last term.1:03:55
The other case, the same way. If the first term has a smaller order, well then we1:04:05
add the first term-list and the rest of the terms in the second one, and adjoin on this highest order term.1:04:14
So so far nothing's much happened, we've just sort of pushed this thing off into adding lower order terms. The last case where you actually get to a coefficients that you have to add, this will be the case where1:04:24
the orders are equal. What we do is, well again recursively add the lower order terms. But now we have to really combine something.1:04:33
What we do is we make a term whose order is the order of the term we're looking at. By now t1 and t2 have the same order.1:04:44
That's its order. And its coefficient is gotten by adding the coefficient of t1 and the coefficient of t2.1:04:56
This is a big recursive working down of terms, but really there's only one interesting symbol in this procedure, only one interesting idea.1:05:05
The interesting idea is this add. And the reason that's interesting is because1:05:15
something completely wonderful just happened. We reduced adding polynomials, not to sort of plus, but to1:05:25
the generic add. In other words, by implementing it that way, not only do we have our system where we can have rational1:05:37
numbers, or complex numbers, or ordinary numbers, we've just added on polynomials.1:05:48
But the coefficients of the polynomials can be anything that the system can add. So these could be polynomials whose coefficients are1:05:57
rational numbers or complex numbers, which in turn could be either rectangular, or polar, or ordinary numbers.1:06:19
So what I mean precisely is our system right now automatically can handle things like adding together1:06:30
polynomials that have this one: 2/3 of x squared plus 5/17 x plus 11/4.1:06:40
Or automatically handle polynomials that look like 3 plus 2i times x to the fifth plus 4 plus 7i, or something.1:06:54
You can automatically handle those things. Why is that? That's merely because, or profoundly because we reduced1:07:03
adding polynomials to adding their coefficients. And adding coefficients was done by the generic add operator, which said, I don't care what your types are as1:07:12
long as I know how to add you. So automatically for free we get the ability to handle that. What's even better than that, because remember one of the1:07:24
things we did is we put into the table that the way you add polynomials is using plus poly.1:07:34
That means that polynomials themselves are things that can be added. So for instance let me write one here.1:07:45
Here's a polynomial. So this gadget here I'm writing up, this is a1:07:55
polynomial in y whose coefficients are polynomials in x.1:08:08
So you see, simply by saying, polynomials are themselves things that can be added, we can go off and say, well not only can we deal with rationals, or complex, or1:08:19
ordinary numbers, but we can deal with polynomials whose coefficients are rationals, or complex, or ordinary numbers, or polynomials whose coefficients are rationals, or1:08:31
complex, rectangular, polar, or ordinary numbers, or polynomials whose coefficients are rationals, complex, or1:08:42
ordinary numbers. And so on, and so on, and so on. So this is sort of an infinite or maybe a recursive tower of types that we've built up.1:08:53
And it's all exactly from that one little symbol. A-D-D. Writing "add" instead of "plus" in the polynomial thing.1:09:02
Slightly different way to think about it is that polynomials are a constructor for types. Namely you give it a type, like integer, and it returns1:09:12
for you polynomials in x whose coefficients are integers. And the important thing about that is that the operations on polynomials reduce to the operations on the1:09:22
coefficients. And there are a lot of things like that. So for example, let's go back and rational numbers. We thought about rational numbers as an integer over an1:09:32
integer, but there's the general notion of a rational object. Like we might think about 3x plus 7 over x squared plus 1.1:09:43
That's general rational object whose numerator and denominator are polynomials. And to add two of them we use the same formula, numerator1:09:52
times denominator plus denominator times numerator over product of denominators. How could we install that in our system? Well here's our original rational1:10:01
number arithmetic package. And all we have to do in order to make the entire system continue working with general rational objects, is replace1:10:12
these particular pluses and stars by the generic operator. So if we simply change that procedure to this one, here we've changed plus and star to add a mul, those are1:10:23
absolutely the only change, then suddenly our entire system can start talking about objects that look like this.1:10:34
So for example, here is a rational object whose numerator is a polynomial in x whose coefficients are1:10:44
rational numbers. Or here is a rational object whose numerator is polynomials1:10:53
in x whose coefficients are rational objects constructed out of complex numbers.1:11:03
And then there are a lot of other things like that. See, whenever you have a thing where the operations reduce to operations on the pieces, another example would be two by two matrices.1:11:12
I have the idea, there might be a matrix here of general things that I don't care about. But if I add two of them, the answer over here is gotten by1:11:25
adding this one and that one, however they like to add. So I can implement that the same way. And if I do that, then again suddenly my system can start handling things like this.1:11:35
So here's a matrix whose elements happen to be-- we'll say this element here is a rational object whose numerator and denominators are polynomials.1:11:47
And all that comes for free. What's really going on here? What's really going on is getting rid of this manager1:11:58
who's sitting there poking his nose into who everybody's business is. We built a system that has decentralized control.1:12:14
So when you come into and no one's poking around saying, gee, are you in the official list of people who can be added? Rather you say, well go off and add yourself how your1:12:24
parts like to be added. And the result of that is you can get this very, very, very complex hierarchy where a lot of things just get done and1:12:33
rooted to the right place automatically. Let's stop for questions. AUDIENCE: You say you get this for free.1:12:43
One thing that strikes me is that now you've lost kind of the cleanness of the break between what's on top and what's underneath. In other words, now you're defining some of the1:12:52
lower-level procedures in terms of things above their own line. Isn't that dangerous? Or, if nothing more, a little less structured?1:13:05
PROFESSOR: No, I-- the question is whether that's less structured. Depends on what you mean by structure. All this is doing is recursion. See, it's saying that the way you add these1:13:15
guys is to use that. And that's not less structured, it's just a recursive structure. So I don't think it's particularly any less clean.1:13:24
AUDIENCE: Now when you want to change the multiplier or the add operator, suddenly you've got tremendous consequences underneath that you're not even sure the extent of.1:13:34
PROFESSOR: That's right, but it depends what you mean. See, this goes both ways. What would be a good example?1:13:44
I ignored greatest common divisor, for instance. I ignored that problem just to keep the example simple. But if I suddenly decided that plus rat here should do a GCD1:13:59
computation and install that, then that immediately becomes available to all of these, to that guy, and that guy, and1:14:08
that guy, and all the way down. So it depends what you mean by the coherence of your system. It's certainly true that you might want to have a special1:14:17
different one that didn't filter down through the coefficients, but the nice thing about this particular example is that mostly you do. AUDIENCE: Isn't that the problem, I think, that you're1:14:27
getting to tied in with the fact that the structuring, the recursiveness of that structuring there is actually1:14:36
in execution as opposed to just definition of the actual types themselves? PROFESSOR: I think I understand the question.1:14:46
The point is that these types evolve and get more and more complex as the thing's actually running. Is that what-- AUDIENCE: Yes. As it's running. PROFESSOR: --what you're saying? Yes, the point is-- AUDIENCE: As opposed to the basic definitions. PROFESSOR: Right. The type structure is sort of recursive.1:14:57
It's not that you can make this finite list of the actual things they might look like before the system runs. It's something that evolves.1:15:06
So if you want to specify that system, you have to do in some other way than by this finite list. You have to do it by a recursive structure. AUDIENCE: Because the basic structure of the types is1:15:16
pretty clean and simple. PROFESSOR: Right. Yes? AUDIENCE: I have a question. I understand once you have your data structure set up,1:15:25
how it pulls off complex and passes that down, and then pulls off rect, passes that down. But if you're just a user and you don't know anything about rect or polar or whatever, how do you initially set up that1:15:35
data structure so that everything goes to the right spot? If I just have the equation over there on the left and I just want to add, multiply complex numbers-- PROFESSOR: Well that's the wonderful thing. If you're just a user you say "mul."1:15:47
AUDIENCE: And it figures out that I mean complex numbers? Or how do I tell it that I want-- PROFESSOR: Well you're going to have in your hands complex numbers. See what you would have at some level, as a real user, is1:15:56
a constructor for complex numbers. AUDIENCE: So then I have to make complex numbers? PROFESSOR: So you have to make them. What you would probably have as a user is some little thing in the reader loop, which would give you some plausible1:16:07
way to type in a complex number, in whatever format you like. Or it might be that you're never typing them in. Someone's just handing you a complex number.1:16:16
AUDIENCE: OK, so if I had a complex number that had a polynomial in it, I'd have to make my polynomial and then make my complex number. PROFESSOR: Right if you wanted it constructed from scratch. At some point you construct them from scratch.1:16:25
But what you don't have to know of that is when you have the object you can just say "mul." And it'll multiply. Yeah? AUDIENCE: I think the question that was being posed here is,1:16:36
say if I want to change my presentation of complexes, or some operation of complex, how much real code I will have to1:16:46
gets around with, or change to change it in one specific operation? PROFESSOR: [UNINTELLIGIBLE] what you have to change. And the point is that you only have to change what you're changing.1:16:56
See if Martha decides that she would rather-- let's see something silly-- like change the order in the pair. Like angle and magnitude in the other order, she just1:17:09
makes that change locally. And the whole thing will propagate through the system in the right way. Or if suddenly you said, gee, I have another representation1:17:18
for rationals. And I'm going to stick it here, by filing those operations in the table. Then suddenly all of these polynomials whose coefficients1:17:27
are coefficients of coefficients, or whatever, also can automatically have available that representation. That's the power of this particular one. AUDIENCE: I'm not sure if I can even pose an intelligent1:17:37
sounding question. But somehow this whole thing went really nicely to this beautiful finish where all the things seemed to fall into place.1:17:47
Sort of seemed a little contrived. That's all for the sake, I'm sure, of teaching. I doubt that the guys who first did this-- and I could be wrong--1:17:56
figured it all out so that when they just all put it all together, you could all of the sudden, blam, do any kind of arithmetic on any kind of object. It seems like maybe they had to play with it for a while1:18:07
and had to bash it and rework it. And it seems like that's the kind of problem we're really faced with we start trying to design a really complex1:18:16
system, is having lots of different kinds of parts and not even knowing what kinds of operations we're going to want to do on those parts. How to organize the operations in this nice way so that no1:18:27
matter what you do, when you start putting them together everything starts falling out for free. PROFESSOR: OK, well that's certainly a very intelligent question.1:18:37
One part is this is a very good methodology that people have discovered a lot coming from symbolic algebra. Because there are a lot of complications.1:18:47
To allow you to implement these things before you decide what you want all the operations to be, and all of that. So in some sense it's an answer that people have discovered by wading through this stuff.1:18:58
In another sense, it is a very contrived example. AUDIENCE: It seems like to be able to do this you do have to wade through it for a certain amount of time before you can1:19:08
become good at it. PROFESSOR: Let me show you how terribly contrived this is. So you can write all these wonderful things. But the system that I wrote here, and if we had another1:19:17
half an hour to give this lecture I would have given this part of it, which says, notice that it breaks down if I tell it to do something as foolish as add 3 plus 7/2.1:19:30
Because what will happen is you'll get to operate-2, and operate-2 will say, oh this is type number, and that's type rational. I don't know how to add them.1:19:41
So you'd like the system at least to be able to say something like, gee, before you do that change that to 3/1.1:19:50
Turn it into a rational number, hand that to the rational package. That's the thing I didn't talk about in this lecture. It's a little bit in the book, which talks about the problem1:20:00
of what's called coercion. Where you wanted-- see, having so carefully set up all of these types as distinct objects, a lot of times you want to also put in1:20:11
knowledge about how to view an ordinary number as a kind of rational. Or view an ordinary number as a kind of complex.1:20:21
That's where the complexity in the system really starts happening, where you talk about, see where do I put that knowledge? Is it rational to know that ordinary numbers might be1:20:30
pieces of [UNINTELLIGIBLE] of them? Or they're terrible, terrible examples, like if I might want to add a complex number to a rational number.1:20:50
Bad example. 5/7. Then somebody's got to know that I have to convert these to another type, which is complex numbers whose parts1:20:59
might be rationals. And who worries about that? Does complex worry about that? Does rational worry about that? Does plus worry about that? That's where the real complexity comes in.1:21:08
And that's where it's pretty well sorted out. And a lot of, in fact, all of this message passing stuff was motivated by problems like this.1:21:18
And when you really push it, people are-- somehow the algebraic manipulation problem seems to be so complex that the people who are always at the edge of it are exactly in1:21:27
the state you said. They're wading through this thing, mucking around, seeing what they use, trying to distill stuff. AUDIENCE: I just want to come back to this issue of1:21:36
complexity once more. It certainly seems to be true that you have a great deal of flexibility in altering the lower level kinds of things.1:21:49
But it is true that you are, in a sense, freezing higher level operations. Or at least if you change them you don't know where all of1:21:58
the changes are going to show up, or how they are. PROFESSOR: OK, that's an extremely good question. What I have to do is, if I decide there's a new general1:22:10
operation called equality test, then all of these people have to decide whether or not they would like to have an1:22:19
equality test by looking in the table. There're ways to decentralize it even more. That's what I sort of hinted at last time, where I said you1:22:31
could not only have this type as a symbol, but you actually might store in each object the operations that it knows of that.1:22:40
So you might have things like greatest common divisor, which is a thing here which is defined only for integers, and not in general for rational numbers.1:22:51
So it might be a very, very fragmented system. And then depending on where you want your flexibility, there's a whole spectrum of places that you can build that in. But you're pointing at the place where this starts being1:23:02
weak, that there has to be some agreement on top here about these general operations. Or at least people have to think about them. Or you might decide, you might have a table that's very sparse, that only has a few things in it.1:23:14
But there are lot of ways to play that game. OK, thank you.1:23:23
[MUSIC: "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:00
Lecture 5A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING]0:00:16
PROFESSOR: Well, so far we've invented enough programming to do some very complicated things. And you surely learned a lot about0:00:28
programming at this point. You've learned almost all the most important tricks that usually don't get taught to people until they have had a lot of experience. For example, data directed programming is a major trick,0:00:40
and yesterday you also saw an interpreted language. We did this all in a computer language, at this point, where0:00:50
there was no assignment statement. And presumably, for those of you who've seen your Basic or Pascal or whatever, that's usually considered the most0:01:00
important thing. Well today, we're going to do some thing horrible. We're going to add an assignment statement. And since we can do all these wonderful things without it,0:01:09
why should we add it? An important thing to understand is that today we're going to, first of all, have a rule, which is going to always be obeyed, which is the only reason we ever add a feature0:01:19
to our language is because there is a good reason. And the good reason is going to boil down to the ability, you now get an ability to break a problem into pieces0:01:30
that are different sets of pieces then you could have broken it down without that, give you another means of decomposition. However, let's just start.0:01:39
Let me quick begin by reviewing the kind of language that we have now.0:01:48
We've been writing what's called functional programs. And functional programs are a kind of encoding of mathematical truths.0:01:58
For example, when we look at the factorial procedure that you see on the slide here, it's basically two clauses.0:02:07
If n is one, the result is one, otherwise n times factorial n minus one. That's factorial of n. Well, that is factorial of n. And written down in some other obscure notation that you0:02:17
might have learned in calculus classes, mathematical logic, what you see there is if n equals one, for the result of0:02:28
n factorial is one, otherwise, greater than one, n factorial is n times n minus one factorial. True statements, that's the kind of language we've been using.0:02:37
And whenever we have true statements of that sort, there is a kind of, a way of understanding how they work0:02:47
which is that such processes can be involved by substitution. And so we see on the second slide here, that the way we0:02:56
understand the execution implied by those statements in arranged in that order, is that you do successive0:03:05
substitutions of arguments for formal parameters in the body of a procedure. This is basically a sequence of equalities.0:03:14
Factorial four is four times factorial three. That is four times three times factorial of two and so on. We're always preserving truth.0:03:26
Even though we're talking about true statements, there might be more than one organization of these true statements to describe the computation of a particular function, the computation of the value of0:03:37
a particular function. So, for example, looking at the next one here. Here is a way of looking at the sum of n and m.0:03:49
And we did this one by a recursive process. It's the increment of the sum of the decrement of n and m.0:04:00
And, of course, there is some piece of mathematical logic here that describes that. It's the increment of the sum of the decrement of n and m,0:04:11
just like that. So there's nothing particularly magic about that. And, of course, if we can also look at an iterative process for the same, a program that evolves an iterative process,0:04:22
for the same function. These are two things that compute the same answer. And we have equivalent mathematical truths that are0:04:34
arranged there. And just the way you arrange those truths determine the particular process. In the way choose and arrange them determines the process that's evolved.0:04:44
So we have the flexibility of talking about both the function to be computed, and the method by which it's computed. So it's not clear we need more.0:04:53
However, today I'm going to this awful thing. I'm going to introduce this assignment operation. Now, what is this?0:05:02
Well, first of all, there is going to be another kind of kind of statement, if you will, in a programming language called Set!0:05:13
Things that do things like assignment, I'm going to put exclamation points after. We'll talk about what that means in a second. The exclamation point, again like question mark, is an0:05:23
arbitrary thing we attach to the symbol which is the name, has no significance to the system. The only significance is to me and you to alert you that this is an assignment of some sort.0:05:35
But we're going to set a variable to a value. And what that's going to mean is that there is a time at0:05:47
which something happens. Here's a time. If I have time going this way, it's a time access. Time progresses by walking down the page.0:05:58
Then an assignment is the first thing we have that produces the difference between a before and an after. All the other programs that we've written, that have no0:06:09
assignments in them, the order in which they were evaluated didn't matter. But assignment is special, it produces a moment in time. So there is a moment before the set occurs and after, such0:06:27
that after this moment in time, the variable has the0:06:39
value, value.0:06:49
Independent of what value it had before, set! changes the value of the variable. Until this moment, we had nothing that changed.0:07:03
So, for example, one of the things we can think of is that the procedures we write for something like factorial are in fact pretty much identical to the function factorial.0:07:13
Factorial of four, if I write fact4, independent of what context it's in, and independent of how many times I write it, I always get the same answer.0:07:23
It's always 24. It's a unique map from the argument to the answer. And all the programs we've written so far are like that.0:07:33
However, once I have assignment, that isn't true. So, for example, if I were to define count to be one.0:07:50
And then I'm going to define also a procedure, a simple procedure called demo, which takes argument x and does the0:08:02
following operations. It first sets x to x plus one. My gosh, this looks just like FORTRAN, right--0:08:13
in a funny syntax. And then add to x count, Oh, I just made a mistake.0:08:24
I want to say, set! count to one plus count. It's this thing defined here.0:08:34
And then plus x count. Then I can try this procedure. Let's run it.0:08:43
So, suppose I get a prompt and I say, demo three.0:08:52
Well, what happens here? The first thing that happens is count is currently one. Currently, there is a time. We're talking about time. x gets three.0:09:02
At this moment, I say, oh yes, count is incremented, so count is two. two plus three is five. So the answer I get out is five.0:09:14
Then I say, demo of say, three again.0:09:23
What do I get? Well, now count is two, it's not one anymore, because I have incremented it. But now I go through this process, three goes into x,0:09:35
count becomes one plus count, so that's three now. The sum of those two is six, so the answer is six. And what we see is the same expression leads to two0:09:45
different answers, depending upon time. So demo is not a function, does not compute a0:09:55
mathematical function. In fact, you could also see why now, of course, this is the first place where the substitution model0:10:05
isn't going to work. This kills the substitution model dead. You know, with quotation there were some little problems that0:10:14
a philosopher might notice with the substitutions, because you have to worry about what deductions you can make when you substitute into quotes, if you're allowed to0:10:23
do that at all. But here the substitution model is dead, can't do anything at all. Because, supposing I wanted to use a substitution model to0:10:34
consider substituting for count? Well, my gosh, if I substitute for here and here, they're different ones.0:10:44
It's not the same count any more. I get the wrong answer. The substitution model is a static phenomenon that describes things that are true and not things that change.0:10:55
Here, we have truths that change. OK, Well, before I give you any understanding of this,0:11:06
this is very bad. Now, we've lost our model of computation. Pretty soon, I'm going to have to build you a new model of computation.0:11:15
But ours plays with this, just now, in an informal sense. Of course, what you already see is that when I have something like assignment, the model that we're going to need0:11:24
is different from the model that we had before in that the variables, those symbols like count, or x are no longer going to refer to the values they have, but rather to some0:11:35
sort of place where the value restored. We're going to have to think that way for a while. And it's going to be a very bad thing and cause a lot of trouble.0:11:44
And so, as I said, the very fact that we're inventing this bad thing, means that there had better be a good reason for it, otherwise, just a waste of time and a lot of effort.0:11:53
Let's just look at some of it just to play. Supposing we write down the functional version, functional meaning in the old style, of factorial by0:12:02
an iterative process. Factorial of n, we're going to iterate of m and i, which says0:12:26
if i is greater than n, then the result is m, otherwise,0:12:40
the result of iterating the product of i and m. So m is going to be the product that I'm accumulating.0:12:51
m is the product. And the count I'm going to increase by one.0:13:04
Plus, ITER, ELSE, COND, define. I'm going to start this up.0:13:17
And these days, you should have no trouble reading something like this. What I have here is a product there being accumulated and a counter.0:13:26
I start them up both at one. I'm going to buzz the counter up, i goes to i plus one every time around. But that's only our putting a time on the process, each of0:13:38
this is just a set of truths, true rules. And m is going to get a new values of i and m, i times m0:13:47
each time around, and eventually i is going to be bigger than n, in which case, the answer's going to be m. Now, I'm speaking to you, use time in this. That's just because I know how the computer works.0:13:58
But I didn't have to. This could be a purely mathematical description at this point, because substitution will work for this. But let's set right down a similar sort of program, using0:14:08
the same algorithm, but with assignments. So this is called the functional version.0:14:23
I want to write down an imperative version.0:14:34
Factorial of n. I'm going to create my two variables. Let i initialize itself to one, and m be initialized to0:14:48
one, similar. We'll create a loop which has COND greater than i, and if i0:15:05
is greater than n, we're done. And the result is m, the product I'm accumulating. Otherwise, I'm going to write down three things to do.0:15:19
I'm going to set! m to the product of i and m, set! i to the sum of i and0:15:34
one, and go around the loop again. Looks very familiar to you FORTRAN programmers.0:15:44
ELSE, COND, define, funny syntax though. Start the loop up, and that's the program.0:15:59
Now, this program, how do we think about it? Well, let's just say what we're seeing here. There are two local variables, i and m, that have been initialized to one.0:16:10
Every time around the loop, I test to see if i is greater than n, which is the input argument, and if so, the result is the product being accumulated in m.0:16:19
However, if it's not the end of the loop, if I'm not done, then what I'm going to do is change the product to be the result of multiplying i times the current product.0:16:29
Which is sort of what we were doing here. Except here I wasn't changing. I was making another copy, because the substitution model0:16:38
says, you copy the body of the procedure with the arguments substituted for the formal parameters. Here I'm not worried about copying, here I've changed the0:16:49
value of m. I also then change the value of i to i plus one, and go buzzing around.0:16:58
Seems like essentially the same program, but there are some ways of making errors here that didn't exist until today. For example, if I were to do the horrible thing of not0:17:10
being careful in writing my program and interchange those two assignments, the program wouldn't compute the same function.0:17:20
I get a timing error because there's a dependency that m depends upon having the last value of i. If I try to i first, then I've got the wrong value of i when0:17:32
I multiply by m. It's a bug that wasn't available until this moment, until we introduced something that had time in it.0:17:43
So, as I said, first we need a new model of computation, and second, we have to be damn good reason for doing this kind of ugly thing.0:17:52
Are there any questions? Speak loudly, David. AUDIENCE: I'm confused about, we've introduced set now, but0:18:04
we had let before and define before. I'm confused about the difference between the three. Wouldn't define work in the same situation as set if you0:18:14
introduced it a bit? PROFESSOR: No, define is intended for setting something once the first time, for making it. You've never seen me write on a blackboard two defines in a0:18:26
row whose intention was to change the old value of some variable to a new one. AUDIENCE: Is that by convention or-- PROFESSOR: No, it's intention.0:18:38
The answer is that, for example, internal to a procedure, two defines in a row are illegal, two defines0:18:47
in a row of the same variable. x can't be defined twice. Whether or not a system catches that error is a different question, but I legislate to you that define0:18:58
happens once on anything. Now, indeed, in interactive debugging, we intend that you interacting with your computer will redefine things, and so0:19:08
there's a special exception made for interactive debugging. But define is intended to mean to set up something which will0:19:18
be forever that value after that point. It's as if all the defines were done at the beginning. In fact, the only legal place to put a define in Scheme,0:19:29
internal to a procedure, is just at the beginning of a lambda expression, the beginning of the body of a procedure.0:19:41
Now, let of course does nothing like either of that. I mean, if you look at what's happening with a let, this0:19:50
happens again exactly once. It sets up a context where i and m are values one and one. That context exists throughout this scope, this0:20:01
region of the program. However, you don't think of that let as setting i again.0:20:11
It doesn't change it. i never changes because of the let. i gets created because of let. In fact, the let is a very simple idea.0:20:22
Let does nothing more, Let a variable one to have value one; I'll write this down a little bit more neatly; Let's0:20:37
write, var one have value, the value of expression e1, and variable two, have this value of the expression e2, in an0:20:48
expression e3, is the same thing as a procedure of var0:21:00
one and var two, the formal parameters, and e3 being the body, where var one is bound to the value of e1, and var0:21:15
two gets the value of e2. So this is, in fact, a perfectly understandable thing from a substitution point of view.0:21:24
This is really the same expression written in two different ways. In fact, the way the actual system works is this gets0:21:34
translated into this before anything happens. AUDIENCE: OK, I'm still unclear as then what makes the difference between a let and a define. They could-- PROFESSOR: A define is a syntactic sugar, whereby,0:21:45
essentially a bunch of variables get created by lets and then set up once.0:21:57
OK, time for the first break, I think. Thank you. [MUSIC PLAYING]0:23:04
Well let's see. I now have to rebuild the model of computation, so you understand how some such mechanical mechanism could0:23:13
work that can do what we've just talked about. I just recently destroyed your substitution model.0:23:22
Unfortunately, this model is significantly more complicated than the substitution model. It's called the environment model. And I'm going to have to introduce some terminology,0:23:32
which is very good terminology for you to know anyway. It's about names. And we're going to give names to the kinds of names things have and the way those names are used.0:23:42
So this is a meta-description, if you will. Anyway, there is a pile of an unfortunate terminology here, but we're going to need this to understand what's called0:23:52
the environment model. We're about to do a little bit of boring, dog-work here. Let's look at the first transparency.0:24:02
And we see a description of a word called bound. And we're going to say that a variable, v, is bound in an0:24:11
expression, e, if the meaning of e is unchanged by the uniform replacement of a variable w, not occurring in0:24:22
e, for every occurrence of v in e. Now that's a long sentence, so, I think, I'm going to have to say a little bit about that before we even fool0:24:31
around at all here. Bound variables we're talking about here.0:24:44
And you've seen lots of them. You may not know that you've seen lots of them. Well, I suppose in your logic you saw a logical variables like, for every x there exists a y such that p is true of x0:24:58
and y from your calculus class. This variable, x, and this variable, y, are bound, because the meaning of this expression does not depend0:25:10
upon the particular letters I used to describe x and y. If I were to change the w for x, then said for every w there0:25:21
exists a y such that p is true of w and y, it would be the same sentence. That's what it means.0:25:30
Or another case of this that you've seen is integral say, from 0 to one of dx over one plus x square.0:25:46
Well that's something you see all the time. And this x is a bound variable. If I change that to a t, the expression is0:25:55
still the same thing. This is a 1/4 of the arctan of one or something like that.0:26:04
Yes, that's the arctan of one. So bound variables are actually fairly common, for those of you who have played a bit with mathematics.0:26:13
Well, let's go into the programming world. Instead of the quantifier being something like, for0:26:22
every, or there exists, or integral, a quantifier is a symbol that binds a variable. And we are going to use the quantifier lambda as being the essential thing that binds variables.0:26:33
And so we have some nice examples here like that procedure of one argument y which does0:26:43
the following thing. It calls the procedure of one argument x, which multiplies x by y, and applies that to three.0:26:58
That procedure has the property there of two bound variables in it, x and y. This quantifier, lambda here, binds this y, and this0:27:08
quantifier, lambda, binds that x. Because, if I were to take an arbitrary symbol does not occur in this expression like w and replace all y's with w's0:27:20
in this expression, the expression is still the same, the same procedure. And this is an important idea. The reason why we had such things like that is a kind of0:27:30
modularity. If two people are writing programs, and they work together, it shouldn't matter what names they use internal to their own little machines that they're building.0:27:42
And so, what I'm really telling you there, is that, for example, this is equivalent to that procedure of one argument y which uses that procedure of one argument0:27:54
d which multiplies z by y. Because nobody cares what I used in here.0:28:06
It's a nice example. On the other hand, I have some variables that are not bound.0:28:15
For example, that procedure of one argument x which multiplies x by y.0:28:27
In this case, y is not bound. Supposing y had the value three, and z had the value0:28:36
four, then this procedure would be the thing that multiplies its argument by three. If I were to replace every instance of y with z, I would0:28:47
have a different procedure which multiplies every argument that's given by four. And, in fact, we have a name for such a variable.0:28:57
Here, we say that a variable, v, is free in the expression, e, if the meaning of the expression, e, is changed by0:29:06
the uniform replacement of a variable, w, not occurring in e for every occurrence of v and e. So that's why this variable over here,0:29:20
y, is a free variable.0:29:29
And so free variables in this expression-- And other examples of that is that procedure of one argument0:29:38
y, which is just what we had before, which uses that procedure of one argument x that multiplies x by y--0:29:51
use that on three. This procedure has a free variable0:30:00
in it which is asterisk. See, because, if that has a normal meaning of multiplication, then if I were to replace uniformly all0:30:11
asterisks with pluses, then the meaning of this expression would change. That's what you mean by a free variable.0:30:22
So, so far you've learned some logician words which describe the way names are used. Now, we have to do a little bit more playing around here,0:30:32
a little bit more. I want to tell you about the regions are over which variables are defined.0:30:42
You see, we've been very informal about this up till now, and, of course, many of you have probably understood very clearly or most of you, that the x that's being0:30:51
declared here is defined only in here. This x is the defined only in here, and this y is defined0:31:03
only in here. We have a name for such an idea. It's called a scope. And let me give you another piece of terminology.0:31:14
It's a long story. If x is a bound variable in e, then there is a lambda expression where it is bound. So the only way you can get a bound variable ultimately is0:31:23
by lambda expression. Then you may worry, does define quite an exception to this? And it turns out, we could always arrange things so you don't need any defines.0:31:33
And we'll see that in a while. It's a very magical thing. So define really can go away. The really, only thing that makes names is lambda .0:31:42
That's its job. And what's so amazing about a lot of things is you can compute with only lambda. But, in any case, a lambda expression has a place where0:31:53
it declares a variable. We call it the formal parameter list or the bound variable list. We say that the lambda expression binds--0:32:03
so it's a verb-- binds the variables declared in it's found variable list. In addition, those parts of the expression where the variable is defined, which was declared by some declaration,0:32:15
is called the scope of that variable. So these are scopes. This is the scope of y.0:32:27
And this is the scope of x-- that sort of thing.0:32:41
OK, well, now we have enough terminology to begin to understand how to make a new model for computation, because0:32:52
the key thing going on here is that we destroyed the substitution model, and we now have to have a model that represents the names as referring to places.0:33:03
Because if we are going to change something, then we have a place where it's stored. You see, if a name only refers to a value, and if I tried to0:33:14
change the name's meaning, well, that's not clear. There's nothing that is the place that that0:33:23
name referred to. How am I really saying it? There is nothing shared among all of the instances of that name. And what we really mean, by a name, is that we0:33:32
fan something out. We've given something a name, and you have it, and you have it, because I'm given you a reference to it, and I've given you a reference to it.0:33:41
And we'll see a lot about that. So let me tell you about environments. I need the overhead projection machine, thank you.0:33:52
And so here is a bunch of environment structures.0:34:01
An environment is a way of doing substitutions virtually. It represents a place where something is stored which is the substitutions that you haven't done.0:34:14
It's a place where everything accumulates, where the names of the variables are associated with the values they have such that when you say, what dose this name mean,0:34:26
you look it up in an environment. So an environment is a function, or a table, or something like that. But it's a structured sort of table.0:34:35
It's made out of things called frames. Frames are pieces of environment, and they are0:34:45
chained together, in some nice ways, by what's called parent links or something like that. So here, we have an environment structure0:34:57
consisting of three environments, basically, a, b, and c. d is also an environment, but it's the same one, they share.0:35:11
And that's the essence of assignment. If I change a variable, a value of a valuable that lives here, like that one, it should be visible from all places0:35:21
that you're looking at it from. Take this one, x. If I change the x to four, it's visible from other places.0:35:30
But I'm not going to worry about that right now. We're going to talk a lot about that in a little while. What do we have here? Well, these are called frames. Here is a frame, here's a frame, and here's a frame.0:35:43
a is an environment which consists of the table which is frame two, followed by the table labeled frame one.0:35:52
And, in this environment, in say this environment, frame two, x and y are bound.0:36:04
They have values. Sorry, in frame one-- In frame two, z is bound, and x is bound, and y is bound,0:36:15
but the value of x that we see, looking from this point of view, is this x. It's x is seven, rather than this one which is three.0:36:24
We say that this x shadows this x. From environment three--0:36:33
from frame three, from environment b, which refers to frame three, we have variables n and y bound and also x.0:36:44
This y shadow this one. So the value, looking from this point of view, of y is two.0:36:53
The value for looking from this point of view and m is one. And the value, looking from this point of view, of x is three.0:37:02
So there we have a very simple environment structure made out of frames. These correspond to the applications of procedures. And we'll see that in a second.0:37:14
So now I have to make you some other nice little structure that we build. Next slide, we see an object, which I'm going to draw0:37:25
procedures. This is a procedure. A procedure is made out of two parts. It's sort of like a cons.0:37:37
However, it's the two parts. The first part refers to some code, something that can be0:37:46
executed, a set of instructions, if you will. You can think of it that way. And the second part is the environment. The procedure is the whole thing.0:37:57
And we're going to have to use this to capture the values of the free variables that occur in the procedure.0:38:06
If a variable occurs in the procedure it's either bound in that procedure or free. If it's bound, then the value will somehow be easy to find.0:38:16
It will be in some easy environment to get at. If it's free, we're going to have to have something that goes with the procedure that says where we'll go look for its value.0:38:27
And the reasons why are not obvious yet, but will be soon. So here's a procedure object. It's a composite object consisting of a piece of code0:38:40
and a environment structure. Now I will tell you the new rules, the complete new rules, for evaluation.0:38:50
The first rule is-- there's only two of them. These correspond to the substitution model rules. And the first one has to do with how do you apply a0:39:00
procedure to its arguments? And a procedural object is applied to a set of arguments by constructing a new frame.0:39:11
That frame will contain the mapping of the former parameters to the actual parameters of the arguments that were supplied in the call.0:39:21
As you know, when we make up a call to a procedure like lambda x times x y, and we call that with the argument three, then we're going to need some0:39:31
mapping of x to three. It's the same thing as later substituting, if you will, the three for the x in the old model.0:39:41
So I'm going to build a frame which contains x equals three as the information in that frame. Now, the body of the procedure will then have to be evaluated0:39:52
which is this. I will be evaluated in an environment which is0:40:04
constructed by adjoining the new frame that we just made to the environment which was part of the procedure that we applied.0:40:13
So I'm going to make a little example of that here. Supposing I have some environment.0:40:25
Here's a frame which represents it. And some procedure-- which I'm going to draw with circles here because it's easier than little triangles-- Sorry, those are rhombuses, rhomboidal little pieces of0:40:38
fruit jelly or something. So here's a procedure which takes this environment. And the procedure has a piece of code, which is a lambda0:40:48
expression, which binds x and y and then executes an expression, e.0:40:58
And this is the procedure. We'll call it p. I wish to apply that procedure to three and four. So I want to do p of three and four.0:41:09
What I'm going to do, of course, is make a new frame. I build a frame which contains x equals three,0:41:18
and y equals four. I'm going to connect that frame to this frame over here.0:41:27
And then this environment, with I will call b, is the environment in which I will evaluate the body of e.0:41:39
Now, e may contain references to x and y and other things. x and y will have values right here.0:41:50
Other things will have their values here. How do we get this frame? That we do by the construction of procedures which is the0:42:00
other rule. And I think that's the next slide. Rule two, when a lambda expression is evaluated,0:42:10
relative to a particular environment-- See, the way I get a procedure is by evaluating the lambda expression. Here's a lambda expression.0:42:20
By evaluating it, I get a procedure which I can apply to three. Now this lambda expression is evaluated in an environment where y is defined.0:42:31
And I want the body of this which contains a free version of y. y is free in here, it's bound over the whole thing, but it's0:42:41
free over here. I want that y to be this one. I evaluate this body of this procedure in the environment0:42:53
where y was created. That's this kind of thing, because that was done by application. Now, if I ever want to look up the value of y, I have to know0:43:03
where it is. Therefore, this procedural was created, the creation of the procedure which is the result of evaluating that lambda expression had better capture a pointer or remember the0:43:14
frame in which y was bound. So that's what this rule is telling us. So, for example, if I happen to be evaluating a lambda0:43:28
expression, lambda expression in e, lambda of say, x and y,0:43:37
let's call it g in e, evaluating that. Well, all that means is I now construct a procedure object.0:43:47
e is some environment. e is something which has a pointer to it. I construct a procedure object that points up to that0:43:56
environment, where the code of that is a lambda expression or whatever that translates into.0:44:06
And this is the procedure. So this produces for me-- this object here, this environment0:44:17
pointer, captures the place where this lambda expression was evaluated, where the definition was used, where the definition was used to make a0:44:26
procedure, to make the procedure. So it picks up the environment from the place where that0:44:35
procedure was defined, stores it in the procedure itself, and then when the procedure is used, the environment where it was defined is extended with the new frame.0:44:48
So this gives us a locus for putting where a variable has a value. And, for example, if there are lots of guys pointing in at that environment, then they share that place.0:45:01
And we'll see more of that shortly. Well, now you have a new model for understanding the execution of programs. I suppose I'll take questions0:45:12
now, and then we'll go on and use that for something. AUDIENCE: Is it right to say then, the environment is that0:45:21
linked chain of frames-- PROFESSOR: That's right. AUDIENCE: starting with-- working all the way back? PROFESSOR: Yes, the environment is a sequence of frames linked together.0:45:32
And the way I like to think about it, it's the pointer to the first one, because once you've got that you've got them all.0:45:44
Anybody else? AUDIENCE: Is it possible to evaluate a procedure or to define a procedure in two different environments such that it will behave differently, and have pointers to both-- PROFESSOR: Oh, yes.0:45:53
The same procedure is not going to have two different environments. The same code, the same lambda expression can be evaluated in two environments producing two different procedures.0:46:06
Each procedure-- AUDIENCE: Their definition has the same name. Their operation-- PROFESSOR: The definition is written the same, with the same characters. I can evaluate that set of characters, whatever, that0:46:16
list structure that defines, that is the textual representation. I can evaluate that in two different environments producing two different procedures.0:46:25
Each of those procedures has its own local sets of variables, and we'll see that right now.0:46:36
Anybody else? OK, thank you. Let's take a break.0:46:48
[MUSIC PLAYING]0:47:22
Well, now I've done this terrible thing to you. I've introduced a very complicated thing, assignment,0:47:34
which destroys most of the interesting mathematical properties of our programs. Why should I have done this?0:47:43
What possible good could this do? Clearly not a nice thing, so I better have a good excuse.0:47:52
Well, let's do a little bit of playing, first of all, with some very interesting programs that have assignment. Understand something special about them that makes them0:48:02
somewhat valuable. Start with a very simple program which I'm going to call make-counter. I'm going to define make-counter to be a procedure0:48:26
of one argument n which returns as its value a procedure of no arguments-- a procedure that produces a procedure--0:48:36
which sets n to the increment of n and returns0:48:48
that value of n. Now we're going to investigate the behavior of this.0:48:57
It's a sort of interesting thing. In order to investigate the behavior, I have to make an environment model, because we can't understand this any other way.0:49:08
So let's just do that. We start out with some sort of-- let's say there is a global environment that the machine is born with. Global we'll call it.0:49:19
And it's going to have in it a bunch of initial things. We all know what it's got. It's got things in it like say, plus, and times, and0:49:32
quotient, and difference, and CAR, and et cetera, lots of things.0:49:42
I don't know what they are, some various squiggles that are the things the machine is born with.0:49:51
And by doing the definition here, what I plan to do-- Well, what am I doing? I'm doing this relative to the global environment. So here's my environment pointer.0:50:03
In order to do that I have to evaluate this lambda expression. That means I make a procedure object. So I'm going to make a procedure object here.0:50:17
And the procedure object has, as the place it's defined, the global environment. The procedure object contains some code that represents a0:50:29
procedure of one argument n which returns a procedure of no arguments which does something.0:50:38
And the define is a way of changing this environment, so that I now add to it a make-counter, a special rule0:50:53
for the special thing defined. But what that is, is it gives me that pointer to that procedure.0:51:03
So now the global environment contains make-counter as well. Now, we're going to do some operations. I'm going to use this to make some counters.0:51:14
We'll see what a counter is. So let's define c1 to be a counter beginning at 0.0:51:35
Well, we know how to do this now, according to the model. I have to evaluate the expression make-counter in the global environment, make-counter of 0.0:51:47
Well, I look up make-counter and see that it's a procedure. I'm going to have to apply that procedure.0:51:56
The way I apply the procedure is by constructing a frame. So I construct a frame which has a value for n in it which0:52:12
is 0, and the parent environment is the one which is the environment of definition of make-counter.0:52:23
So I've made an environment by applying make-counter to 0. Now, I have to evaluate the body of make-counter, which is0:52:34
this lambda expression, in that environment. Well evaluating this body, this body is a lambda0:52:43
expression. Evaluate a lambda expression means make a procedure object. So I'm going to make a procedure object.0:52:56
And that procedure object has the environment it was defined in being that, where n was defined to be 0.0:53:07
And it has some code, which is the procedure of no arguments which does something, that sets something, and returns n.0:53:17
And this thing is going to be the object, which in the global environment, will have the name c1.0:53:26
So we construct a name here, c1, and say that equals that.0:53:35
Now, but also make another counter, c2 to be make-counter0:53:50
say, starting with 10. Then I do essentially the same thing. I apply the make-counter procedure, which I got from0:53:59
here, to make another frame with n being 10. That frame has the global environment as its parent.0:54:10
I then construct a procedure which has that as it's frame of definition.0:54:20
The code of it is the procedure of no arguments which does something. And it does a set, and so on. And n comes out.0:54:31
And c2 is this. Well, you're already beginning to see something fairly interesting.0:54:40
There are two n's here. They are not one n. Each time I called make-counter, I made another0:54:49
instance of n. These are distinct and separate from each other. Now, let's do some execution, use those counters.0:55:00
I'm going to use those counters. Well, what happens if I say, c1 at this point?0:55:15
Well, I go over here, and I say, oh yes, c1 is a procedure. I'm going to call this procedure on no arguments, but it has no parameters.0:55:25
That's right. What's its body? Well, I have to look over here, because I didn't write it down. It said, set n to one plus n and return n, increment n.0:55:39
Well, the n it sees is this one. So I increment that n. That becomes one, and I return the value one.0:55:53
Supposing I then called c2. Well, what do I do? I say c2 is this procedure which does the same thing, but0:56:03
here's the n. It becomes 11. And so I have an 11 which is the value.0:56:15
I then can say, let's try c1 again. c1 is this, that's two, so the answer is two.0:56:29
And c2 gives me a 12 by the same method, by walking down here looking at that and saying, here's the n, I'm0:56:38
incrementing. So what I have are computational objects. There are two counters, each with its own0:56:49
independent local state. Let's talk about this a little. This is a strange thing.0:57:01
What's an object? It's not at all obvious what an object is. We like to think about objects, because it's0:57:11
economical to think that way. It's an intellectual economy. I am an object.0:57:21
You are an object. We are not the same object. I can divide the world into two parts, me and you, and0:57:32
there's other things as well, such that most of the things I might want to discuss about my workings do not involve you,0:57:41
and most of the things I want to discuss about your workings don't involve me. I have a blood pressure, a temperature, a respiration0:57:50
rate, a certain amount of sugar in my blood, and numerous, thousands, of state variables-- millions actually,0:57:59
or I don't know how many-- huge numbers of state variables in the physical sense which represent the state of me as a particle, and0:58:09
you have gazillions of them as well. And most of mine are uncoupled to most of yours. So we can compute the properties of me without0:58:21
worrying too much about the properties of you. If we had to work about both of us together, than the number of states that we have to consider is the product of the number of states you have and the number of states I have. But this way it's almost a sum.0:58:32
Now, indeed there are forces that couple us. I'm talking to you and your state changes. I'm looking at you and my state changes.0:58:41
Some of my state variables, a very few of them, therefore, are coupled to yours. If you were to suddenly yell very loud, my blood pressure would go up.0:58:54
However, and it may not be always appropriate to think about the world as being made out of independent states and independent particles. Lots of the bugs that occur in things like quantum mechanics,0:59:05
or the bugs in our minds that occur when we think about things like quantum mechanics, are due the fact that we are trying to think about things being broken up into independent pieces, when in fact there's more coupling0:59:15
than we see on the surface, or that we want to believe in, because we want to compute efficiently and effectively. We've been trained to think that way.0:59:29
Well, let's see. How would we know if we had objects at all? How can we tell if we have objects? Consider some possible optical illusions.0:59:41
This could be done. These pieces of chalk are not appropriately identical, but supposing you couldn't tell the difference of them by looking at them.0:59:52
Well, there's a possibility that this all a game I'm playing with mirrors. It's really the same piece of chalk, but you're seeing two of them.1:00:01
How would you know if you're seeing one or two? Well, there's only one way I know. You grab one of them and change it and see if the other1:00:10
one changed. And it didn't, so there's two of them.1:00:19
And, on the other hand, there is some other screwy properties of things like that. Like, how do we know if something changed? We have to look at it before and after the change.1:00:28
The change is an assignment, it's a moment in time. But that means we have to know it was the same one that we're looking at. So some very strange, and unusual, and obscure, and--1:00:39
I don't understand the problems associated with assignment, and change, and objects. These could get very, very bad.1:00:51
For example, here I am, I am a particular person, a particular object. Now, I can take out my knife, and cut my fingernail.1:01:02
A piece of my fingernail has fallen off onto the table. I believe I am the same person I was a second ago, but I'm1:01:11
not physically the same in the slightest. I have changed. Why am I the same? What is the identity of me?1:01:21
I don't know. Except for the fact that I have some sort of identity. And so, I think by introducing assignment and objects, we1:01:34
have opened ourselves up to all the horrible questions of philosophy that have been plaguing philosophers for some thousands of years about this sort of thing.1:01:43
It's why mathematics is a lot cleaner. Let's look at the best things I know to say about actions and identity.1:01:52
We say that an action, a, had an effect on an object, x, or equivalently, that x was changed by a, if some property, p, which was true of x before a, became1:02:02
false of x after a. Let's test. It still means I have to have the x before and after. Or, the other way of saying this is, we say that two1:02:13
objects x and y are the same for any action which has an effect on x has the same effect on y. However, objects are very useful, as I said, for1:02:22
intellectual economy. One of the things that's incredibly useful about them, is that the world is, we like to think about, made out of1:02:32
independent objects with independent local state. We like to think that way, although it isn't completely true. When we want to make very complicated programs that deal1:02:42
with such a world, if we want those programs to be understandable by us and also to be changeable, so that if we change the world we change the program only a little bit,1:02:51
then we want there to be connections, isomorphism, between the objects in the world and the objects in our mental model. The modularity of the world can give us the modularity in1:03:00
our programming. So we invent things called object-oriented programming and things like that to provide us with that power.1:03:09
But it's even easier. Let's play a little game. I want to play a little game, show you an even easier example of where modularity can be enhanced by using an1:03:19
assignment statement, judiciously. One thing I want to enforce and impress on you, is don't use assignment statements the way you use it in FORTRAN or1:03:28
Basic or something or Pascal, to do the things you don't have to do with it. It's not the right way to think for most things.1:03:37
Sometimes it's essential, or maybe it's essential. We'll see more about that too. OK, let me show you a fun game here.1:03:47
There was mathematician by the name of Cesaro-- or Cesaro, Cesaro I suppose it is-- who figured out a clever way of computing pi.1:03:58
It turns out that if I take to random numbers, two integers at random, and compute the greatest common divisor, their1:04:11
greatest common divisor is either one or it's not one. If it's one, then they have no common divisors. If their greatest common divisor is one--1:04:21
the probability that two random numbers, two numbers chosen at random, has as greatest common divisor one is related to pi. In fact--1:04:31
yes, it's very strange-- of course there are other ways of computing pi, like dropping pins on flags, and things like that, and sort of the same kind of thing.1:04:40
So the probability of that the GCD of number one and number two, two random numbers chosen, is 6 over pi squared.1:04:55
I'm not going to try to prove that. It's actually not too hard and sort of fun. How would we estimate such probability? Well, the way we do that, the way we estimate probabilities,1:05:07
is by doing lots of experiments, and then computing the ratios of the ones that come out one way to the total number of experiments we do.1:05:16
It's called Monte Carlo, and it's useful in other contexts for doing things like integrals where you have lots and lots of variables-- the space which is limiting the dimensions you are doing you integral in.1:05:26
But going back to here, Let's look at this slide, We can use Cesaro's method for estimating pi with n trials by taking the1:05:40
square root of six over a Monte Carlo, a Monte Carlo experiment with n trials, using Cesaro's experiment,1:05:51
where Cesaro's experiment is the test of whether the GCD of two random numbers-- And you can see that I've already got some assignments1:06:01
in here, just by what I wrote. The fact that this word rand, in parentheses, therefore, that procedure call, yields a different value than this one,1:06:11
at least that's what I'm assuming by writing this this way, indicates that this is not a function, that there's internal state in it which is changing.1:06:25
If the GCD of those two random numbers is equal to one, that's the experiment. So here I have an experimental method for estimating the1:06:34
value of pi. Where, I can easily divide this problem into two parts. One is the specific Monte Carlo experiment of Cesaro,1:06:43
which you just saw, and the other is the general technique of doing Monte Carlo experiments. And that's what this is. If I want to do Monte Carlo experiments with n trials, a1:06:55
certain number of trials, and a particular experiment, the way I do that is I make a little iterative procedure which has variable the number of trials remaining and the1:07:05
number trials that have been passed, that I've gotten true. And if the number remaining is 0, then the answer is the number past divided by this whole number of trials, was1:07:16
the estimate of the probability. And if it's not, if I have more trials to do, then let's do one. We do an experiment. We call the procedure which is experiment on no arguments.1:07:27
We do the experiment and then, if that turned out to be true, we go around the loop decrementing the number of experiments we have to do by one and incrementing the1:07:36
number that were passed. And if the experiment was false, we just go around the loop decrementing the number of experiments remaining and keeping the number passed the same.1:07:48
We start this up iterating over the total number of trials with 0 experiments past. A very elegant little program.1:07:57
And I don't have to just do this with Cesaro's experiment, it could be lots of Monte Carlo experiments I might do. Of course, this depends upon the existence of some sort of random number generator.1:08:07
And random number generators generally look something like this. There is a random number generator--1:08:17
is in fact a procedure which is going to do something just like the counter. It's going to update an x to the result of applying some1:08:30
function to x, where this function is some screwy kind of function that you might find out in Knuth's books on the details of programming.1:08:41
He does these wonderful books that are full of the details of programming, because I can't remember how to make a random number generator, but I can look it up there, and I1:08:50
can find out. And then, eventually, I return the value of x which is the state variable internal to the random number generator. That state variable is initialized1:09:00
somehow, and has a value. And this procedure is defined in the context where that variable is bound.1:09:10
So this is a hidden piece of local state that you see here. And this procedure is defined in that context.1:09:21
Now, that's a very simple thing to do. And it's very nice. Supposing, I didn't want to use assignments. Supposing, I wanted to write this program without1:09:30
assignments. What problems would I have? Well, let's see. I'd like to use the overhead machine here, thank you.1:09:44
First of all, let's look at the whole thing. It's a big story. Unfortunately, which tells you there is something wrong. It's at least that big, and it's monolithic.1:09:57
You don't have to understand or look at the text there right now to see that it's monolithic. It isn't a thing which is Cesaro's experiment. It's not pulled out from the Monte Carlo process.1:10:10
It's not separated. Let's look why. Remember, the constraint here is that every procedure return1:10:19
the same value for the same arguments. Every procedure represents a function. That's a different kind of constraint.1:10:28
Because when I have assignments, I can change some internal state variable. So let's see how that causes things to go wrong. Well, start at the beginning.1:10:38
The estimate of pi looks sort of the same. What I'm doing is I take the square root of six over the1:10:47
random GCD test applied to n, whereas that's what this is. But here, we are beginning to see something funny. The random GCD test of a certain number of trials is1:10:58
just like we had before, an iteration on the number of trials remaining, the number of trials that have been passed, and another variable x.1:11:10
What's that x? That x is the state of the random number generator. And it is now going to be used here.1:11:21
The same random update function that I have over here is the one I would have used in a random number generator if I were building it the other way, the one I get out of Knuth's books.1:11:31
x is going to get transformed into x1, I need two random numbers. And x1 is going to get transformed into x2, I have two random numbers. I then have to do exactly what I did before.1:11:42
I take the GCD of x1 x2. If that's one, then I go around the loop with x2 being the next value of x.1:11:54
You see what's happened here is that the state of the random number generator is no longer confined to the insides of the random number generator. It has leaked out.1:12:03
It has leaked out into my procedure that does the Monte Carlo experiment. But what's worse than that, is it's also, because it was1:12:13
contained inside my experiment itself, Cesaro, it leaked out of that too. Because Cesaro called twice, has to have a different value each time, if I going to have a legitimate experimental1:12:24
test. So Cesaro can't be a function either, unless I pass it the seed of the random number generator that is going1:12:34
to go wandering around. So unfortunately, the seed of random number generator has leaked out into Cesaro, from the random number generator, that's leaked into the Monte Carlo experiment.1:12:45
And, unfortunately, my Monte Carlo experiment here is no longer general. The Monte Carlo experiment here knows how many random numbers I need to do the experiment.1:12:58
That's sort of horrible. I lost an ability to decompose a problem into pieces, because I wasn't willing to accept the little loop of information,1:13:10
the feedback process, that happens inside the random number generator before that was made by having an assignment to a state variable that was confined to the1:13:20
random number generator. So the fact that the random number generator is an object, with an internal state variable, it's affected by1:13:29
nothing, but it'll give you something, and it will apply it's force to you, that was what we're missing now.1:13:38
OK, well I think we've seen enough reason for doing this, and it all sort of looks very wonderful. Wouldn't it be nice if assignment was a good thing1:13:51
and maybe it's worth it, but I'm not sure. As Mr. Gilbert and Sullivan said, things are seldom what they seem, skim milk masquerades as cream.1:14:01
Are there any questions?1:14:17
Are there any philosophers here? Anybody want to argue about objects? You're just floored, right?1:14:29
And you haven't done your homework yet. You haven't come up with a good question. Oh, well.1:14:40
Sure, thank you. Let's take the long break now.0:00:00
Lecture 5B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:21
PROFESSOR: Well, now that we've given you some power to make independent local state and to model objects, I thought we'd do a bit of programming of a very0:00:31
complicated kind, just to illustrate what you can do with this sort of thing.0:00:40
I suppose, as I said, we were motivated by physical systems and the ways we like to think about physical systems, which is that there are these things that the world is made out of.0:00:52
And each of these things has particular independent local state, and therefore it is a thing. That's what makes it a thing.0:01:01
And then we're going to say that in the model in the world--we have a world and a model in our minds and in the computer of that world.0:01:10
And what I want to make is a correspondence between the objects in the world and the objects in the computer, the relationships between the objects in the world and the relationships between those same obj...--the model objects0:01:21
in the computer, and the functions that relate things in the world to the functions that relate things in the computer.0:01:30
This buys us modularity. If we really believe the world is like that, that it's made out of these little pieces, and of course we could arrange0:01:40
our world to be like that, we could only model those things that are like that, then we can inherit the modularity in the world into our programming.0:01:50
That's why we would invent some of this object-oriented programming. Well, let's take the best kind of objects I know. They're completely--they're completely wonderful:0:02:03
electrical systems. Electrical systems really are the physicist's best, best objects.0:02:14
You see over here I have some piece of machinery. Right here's a piece of machinery. And it's got an electrical wire connecting one part of0:02:24
the machinery with another part of the machinery. And one of the wonderful properties of the electrical world is that I can say this is an object, and this is an0:02:34
object, and they're-- the connection between them is clear. In principle, there is no connection that I didn't describe with these wires.0:02:44
Let's say if I have light bulbs, a light bulb and a power supply that's plugged into the outlet. Then the connection is perfectly clear.0:02:53
There's no other connections that we know of. If I were to tie a knot in the wire that connects the light bulb to the power supply, the light remains lit up.0:03:04
It doesn't care. That the way the physics is arranged is such that the connection can be made abstract, at least for low0:03:13
frequencies and things like that. So in fact, we have captured all of the connections there really are.0:03:22
Well, as you can go one step further and talk about the most abstract types of electrical systems we have, digital to dual circuits. And here there are certain kinds of objects.0:03:34
For example, in digital circuits we have things like inverters. We have things like and-gates.0:03:43
We have things like or-gates. We connect them together by sort-of wires which represent0:03:53
abstract signals. We don't really care as physical variables whether these are voltages or currents or some combination or anything like that, or water, water pressure.0:04:05
These abstract variables represent certain signals. And we build systems by wiring these things together with wires.0:04:14
So today what I'm going to show you, right now, we're going to build up an invented language in Lisp, embedded in the same sense that Henderson's picture language0:04:24
was embedded, which is not the same sense as the language of pattern match and substitution was done yesterday. The pattern match/substitution language was interpreted by a0:04:35
Lisp program. But the embedding of Henderson's program is that we just build up more and more procedures that encapsulate the structure we want.0:04:45
So for example here, I'm going to have some various primitive kinds of objects, as you see, that one and that one. I'm going to use wires to combine them.0:04:55
The way I represent attaching-- I can make wires. So let's say A is a wire. And B is a wire. And C is a wire. And D is a wire.0:05:04
And E is wire. And S is a wire. Well, an or-gate that has both inputs, the inputs being A and B, and the output being Y or D, you notate like this.0:05:17
An and-gate, which has inputs A and B and output C, we notate like that. By making such a sequence of declarations, like this, I can0:05:29
wire together an arbitrary circuit. So I've just told you a set of primitives and means of combination for building digital circuits, when I need0:05:40
more in a real language than abstraction. And so for example, here I have--here I have a half adder.0:05:52
It's something you all know if you've done any digital design. It's used for adding numbers together on A and B and putting out a sum and a carry.0:06:03
And in fact, the wiring diagram is exactly what I told you. A half adder with things that come out of the box-- you see the box, the boundary, the abstraction is always a box.0:06:14
And there are things that come out of it, A, B, S, and C. Those are the declared variables--declared variables0:06:24
of a lambda expression, which is the one that defines half adder. And internal to that, I make up some more wires, D and E,0:06:36
which I'm going to use for the interconnect-- here E is this one and D is this wire, the interconnect that doesn't come through the walls of the box--0:06:45
and wire things together as you just saw. And the nice thing about this that I've just shown you is this language is hierarchical in the right way. If a language isn't hierarchical in the right way,0:06:55
if it turns out that a compound object doesn't look like a primitive, there's something wrong with the language-- at least the way I feel about that.0:07:06
So here we have--here, instead of starting with mathematical functions, or things that compute mathematical functions, which is what we've been doing up until now, instead of starting with things that look like0:07:15
mathematical functions, or compute such things, we are starting with things that are electrical objects and we build up more electrical objects. And the glue we're using is basically the0:07:26
Lisp structure: lambdas. Lambda is the ultimate glue, if you will. And of course, half adder itself can be used in a more0:07:39
complicated abstraction called a full adder, which in fact involves two half adders, as you see here, hooked together with some extra wires, that you see here, S, C1, and C2,0:07:50
and an or-gate, to manufacture a full adder, which takes a input number, another input number, a carry in, and0:08:01
produces output, a sum and a carry out. And out of full adders, you can make real adder chains and big adders.0:08:12
So we have here a language so far that has primitives, means of combination, and means of abstraction to real language.0:08:22
Now, how are we going to implement this? Well, let's do it easily. Let's look at the primitives. The only problem is we have to implement the primitives.0:08:31
Nothing else has to be implemented, because we're picking up the means of combination and abstraction from Lisp, inheriting them in the embedding.0:08:43
OK, so let's look at a particular primitive. An inverter is a nice one. Now, inverter has two wires coming in, an in and an out.0:08:57
And somehow, it's going to have to know what to do when a signal comes in. So somehow it's going to have to tell its input wire--0:09:07
and now we're going to talk about objects and we're going to see this in a little more detail soon-- but it's going to have to tell its input wire that when you0:09:16
change, tell me. So this object, the object which is the inverter has to tell the object which is the input wire,0:09:25
hi, my name is George. And my, my job is to do something with results when you change. So when you change, you get a change, tell me about it.0:09:34
Because I've got to do something with that. Well, that's done down here by adding an action on the input wire called invert-in, where invert-in is defined over here0:09:47
to be a procedure of no arguments, which gets the logical not of the signal on the input wire.0:09:56
And after some delay, which is the inverter delay, all these electrical objects have delays, we'll do the following thing-- set the signal on the output wire to the new value.0:10:10
A very simple program. Now, you have to imagine that the output wire has to be sensitive and know that when its signal changes, it may0:10:19
have to tell other guys, hey, wake up. My value has changed. So when you hook together inverter with an and-gate or0:10:29
something like that, there has to be a lot of communication going on in order to make sure that the signal propagates right. And down here is nothing very exciting.0:10:38
This is just the definition of logical not for some particular representations of the logical values-- 1, 0 in this case. And we can look at things more complicated like and-gates.0:10:49
And-gates take two inputs, A1 and A2, we'll call them, and produce an output. But the structure of the and-gate is identical to the0:10:59
one we just saw. There's one called an and-action procedure that's defined, which is the thing that gets called when an input0:11:08
is changed. And what it does, of course, is nothing more than compute the logical and of the signals on the inputs. And after some delay, called the and-gate delay, calls this0:11:20
procedure, which sets a signal on the output to a new value. Now, how I implement these things is all wishful thinking. As you see here, I have an assignment operation.0:11:32
It's not set. It's a derived assignment operation in the same way we had functions that were derived from CAR and CDR. So0:11:41
I, by convention, label that with an exclamation point. And over here, you see there's an action, which is to inform0:11:50
the wire, called A1 locally in this and-gate, to call the and-action procedure when it gets changed, and the wire A20:12:00
to call the and-action procedure when it gets changed. All very simple.0:12:09
Well, let's talk a little bit about this communication that must occur between these various parts.0:12:18
Suppose, for example, I have a very simple circuit which contains an and with wires A and B. And that connects0:12:34
through a wire called C to an inverter which has a wire output called D. What are the comput...--here's0:12:46
the physical world. It's an abstraction of the physical world. Now I can buy these out of little pieces that you get at Radio Shack for a few cents. And there are boxes that act like this, which have little0:12:57
numbers on them like LS04 or something. Now supposing I were to try to say what's the0:13:06
computational model. What is the thing that corresponds to that, that part of reality in the mind of us and in the computer?0:13:15
Well, I have to assign for every object in the world an object in the computer, and for every relationship in the world between them a relationship in the computer.0:13:25
That's my goal. So let's do that. Well, I have some sort of thing called the signal, A.0:13:35
This is A. It's a signal. It's a cloudy thing like that. And I have another one down here which I'm going to call B. It's another signal.0:13:49
Now this signal--these two signals are somehow going to have to hook together into a box, let's call it this, which is the and-gate, action procedure.0:14:00
That's the and-gate's action procedure. And it's going to produce--well, it's going to0:14:09
interact with a signal object, which we call C--a wire0:14:18
object, excuse me, we call C. And then the-- this is going to put out again, or connect to, another action procedure which is one associated with the inverter0:14:28
in the world, not. And I'm going to have another--another wire, which0:14:39
we'll call D. So here's my layout of stuff. Now we have to say what's inside them and what they have to know to compute.0:14:51
Well, every--every one of these wires has to know what the value of the signal that's on that wire is. So there's going to be some variable inside here, we'll call it signal.0:15:02
And he owns a value. So there must be some environment associated with this. And for each one of these, there must be an environment that binds signal.0:15:15
And there must be a signal here, therefore. And presumably, signal's a value that's either 1 or 0, and signal.0:15:28
Now, we also have to have some list of people to inform if the signal here changes. We're going to have to inform this.0:15:39
So I've got that list. We'll call it the Action Procedures, AP. And it's presumably a list. But the first thing on the list, in this case, is this guy.0:15:50
And the action procedures of this one happens to have some list of stuff. There might be other people who are sharing A, who are looking at it.0:15:59
So there might be other guys on this list, like somebody over there that we don't know about. It's the other guy attached to A. And the action procedure here also has to point to that, the0:16:11
list of action procedures. And of course, that means this one, its action procedures has to point up to here. This is the things-- the people it has to inform.0:16:21
And this guy has some too. But I don't know what they are because I didn't draw it in my diagram. It's the things connected to D.0:16:30
Now, it's also the case that when the and-action procedure is awakened, saying one of the people who know that you've0:16:41
told--one of the people you've told to wake you up if their signal changes, you have to go look and ask them what's their signal so you can do the and, and produce a0:16:51
signal for this one. So there has to be, for example, information here saying A1, my A1 is this guy, and my A2 is this guy.0:17:08
And not only that, when I do my and, I'm going to have to tell this guy something. So I need an output--0:17:19
being this guy. And similarly, this guy's going to have a thing called0:17:29
the input that he interrogates to find out what the value of the signal on the input is, when the signal wakes up and0:17:39
says, I've changed, and sends a message this way saying, I've changed. This guy says, OK, what's your value now? When he gets that value, then he's going to have to say, OK,0:17:50
output changes this guy, changes this guy.0:18:00
And so on. And so I have to have at least that much connected-ness. Now, let's go back and look, for example, at the and-gate.0:18:10
Here we are back on this slide. And we can see some of these parts. For any particular and-gate, there is an A1, there is an A2, and the output.0:18:21
And those are, those are an environment that was created at the--those produce a frame at the time and-gate was0:18:30
called, a frame where A1, A2, and output are--have as their values, they're bound to the wires which, they are--which0:18:41
were passed in. In that environment, I constructed a procedure--0:18:50
this one right there. And-action procedure was constructed in that environment. That was the result of evaluating a lambda0:19:00
expression. So it hangs onto the frame where these were defined. Local--part of its local state is that.0:19:11
The and-action procedure, therefore, has access to A1, A2, and output as we see here. A1, A2, and output.0:19:22
Now, we haven't looked inside of a wire yet. That's all that remains. Let's look at a wire.0:19:33
Like the overhead, very good. Well, the wire, again, is a, is a somewhat complicated mess.0:19:43
Ooh, wrong one. It's a big complicated mess, like that. But let's look at it in detail and see what's going on.0:19:54
Well, the wire is one of these. And it has to have two things that are part of it, that it's state.0:20:05
One of them is the signal we see here. In other words, when we call make-wire to make a wire, then the first thing we do is we create some variables which0:20:15
are the signal and the action procedures for this wire. And in that context, we define various functions--or0:20:26
procedures, excuse me, procedures. One of them is called set-my-signal to a new value. And what that does is takes a new value in.0:20:37
If that's equal to my current value of my signal, I'm done. Otherwise, I set the signal to the new value and call each of the action procedures that I've been, that I've0:20:47
been--what's the right word?-- introduced to. I get introduced when the and-gate was applied to me.0:21:04
I add action procedure at the bottom. Also, I have to define a way of accepting an action procedure-- which is what you see here--- which increments my action procedures using set to the0:21:18
result of CONSing up a new process--a procedure, which is passed to me, on to my actions procedures list. And for technical reasons, I have to call that procedure one.0:21:27
So I'm not going to tell you anything about that, that has to do with event-driven simulations and getting them started, which takes a little bit of thinking.0:21:36
And finally, I'm going to define a thing called the dispatcher, which is a way of passing a message to a wire,0:21:45
which is going to be used to extract from it various information, like what is the current signal value? What is the method of setting your signal?0:21:57
I want to get that out of it. How do I--how do I add another action procedure? And I'm going to return that dispatch, that0:22:08
procedure as a value. So the wire that I've constructed is a message accepting object which accepts a message like, like what's your method of adding action procedures?0:22:19
In fact, it'll give me a procedure, which is the add action procedure, which I can then apply to an action procedure to create another action procedure in the wire.0:22:31
So that's a permission. So it's given me permission to change your action procedures. And in fact, you can see that over here.0:22:41
Next slide. Ah. This is nothing very interesting. The call each of the action procedures is just a CDRing0:22:52
down a list. And I'm not going to even talk about that anymore. We're too advanced for that. However, if I want to get a signal from a wire, I ask the wire--0:23:02
which is, what is the wire? The wire is the dispatch returned by creating the wire. It's a procedure. I call that dispatch on the message get-signal.0:23:12
And what I should expect to get is a method of getting a signal. Or actually, I get the signal. If I want to set a signal, I want to change a signal, then0:23:25
what I'm going to do is take a wire as an argument and a new value for the signal, I'm going to ask the wire for permission to set its signal and use that permission, which0:23:35
is a procedure, on the new value. And if we go back to the overhead here, thank you, if0:23:44
we go back to the overhead here, we see that the method-- if I ask for the method of setting the signal, that's over here, it's set-my-signal, a procedure that's defined0:23:54
inside the wire, which if we look over here is the thing that says set my internal value called the signal, my internal variable, which is the signal, to the new value,0:24:08
which is passed to me as an argument, and then call each of the action procedures waking them up. Very simple.0:24:19
Going back to that slide, we also have the one last thing-- which I suppose now you can easily work out for yourself-- is the way you add an action.0:24:30
You take a wire--a wire and an action procedure. And I ask the wire for permission to add an action.0:24:40
Getting that permission, I use that permission to give it an action procedure. So that's a real object. There's a few more details about this.0:24:52
For example, how am I going to control this thing? How do I do these delays?0:25:01
Let's look at that for a second. The next one here. Let's see. We know when we looked at the and-gate or the not-gate that0:25:15
when a signal changed on the input, there was a delay. And then it was going to call the procedure, which was going to change the output.0:25:26
Well, how are we going to do this? We're going to make up some mechanism, a fairly complicated mechanism at that, which we're going to have to be very careful about. But after a delay, we're going to do an action.0:25:37
A delay is a number, and an action is a procedure. What that's going to be is they're going to have a special structure called an agenda, which is a thing that0:25:47
organizes time and actions. And we're going to see that in a while. I don't want to get into that right now. But the agenda has a moment at which--at which something happens.0:25:59
We're setting up for later at some moment, which is the sum of the time, which is the delay time plus the current time, which the agenda thinks is now.0:26:08
We're going to set up to do this action, and add that to the agenda. And the way this machine will now run is very simple.0:26:18
We have a thing called propagate, which is the way things run. If the agenda is empty, we're done--if there's nothing more to be done.0:26:27
Otherwise, we're going to take the first item off the agenda, and that's a procedure of no arguments. So that we're going to see extra parentheses here.0:26:36
We call that on no arguments. That takes the action. Then we remove that first item from the agenda, and we go0:26:45
around the propagation loop. So that's the overall structure of this thing. Now, there's a, a few other things we can look at.0:26:57
And then we're going to look into the agenda a little while from now. Now the overhead again. Well, in order to set this thing going, I just want to show you some behavior out of this simulator.0:27:07
By the way, you may think this simulator is very simple, and probably too simple to be useful. The fact of the matter is that this simulator has been used to manufacture a fairly large computer.0:27:18
So this is a real live example. Actually, not exactly this simulator, because I'll tell you the difference. The difference is that there were many more different kinds0:27:28
of primitives. There's not just the word inverter or and-gate. There were things like edge-triggered, flip-flops,0:27:37
and latches, transparent latches, and adders, and things like that. And the difficulty with that is that there's pages and0:27:48
pages of the definitions of all these primitives with numbers like LS04. And then there's many more parameters for them. It's not just one delay.0:27:58
There's things like set up times and hold times and all that. But with the exception of that part of the complexity, the structure of the simulator that we use for building a0:28:07
real computer, that works is exactly what you're seeing here. Well in any case, what we have here is a few simple things.0:28:19
Like, there's inverter delays being set up and making a new agenda. And then we can make some inputs. There's input-1, input-2, a sum and a0:28:28
carry, which are wires. I'm going to put a special kind of object called a probe onto, onto some of the wires, onto sum and onto carry.0:28:37
A probe is a, can object that has the property that when you change a wire it's attached to, it types out a message.0:28:46
It's an easy thing to do. And then once we have that, of course, the way you put the probe on, the first thing it does, it says, the current value of the sum at time 0 is 0 because I just noticed it.0:28:59
And the value of the carry at time 0, this is the time, is 0. And then we go off and we build some structure.0:29:09
Like, we can build a structure here that says you have a half-adder on input-1, input-2, sum, and carry.0:29:18
And we're going to set the signal on input-1 to 1. We do some propagation. At time 8, which you could see going through this thing if you wanted to, the new value of sum became 1.0:29:29
And the thing says I'm done. That wasn't very interesting. But we can send it some more signals. Like, we set-signal on input-2 to be one. And at that time if we propagate, then it carried at0:29:39
11, the carry becomes 1, and at 16, the sum's new value becomes 0. And you might want to work out that, if you like, about the0:29:48
digital circuitry. It's true, and it works. And it's not very interesting. But that's the kind of behavior we get out of this thing.0:30:01
So what I've shown you right now is a large-scale picture, how you, at a bigger, big scale, you implement an0:30:10
event-driven simulation of some sort. And how you might organize it to have nice hierarchical structure allowing you to build abstract boxes that you0:30:20
can instantiate. But I haven't told you any of the details about how this agenda and things like that work. That we'll do next. And that's going to involve change and mutation of data0:30:32
and things like that. Are there any questions now, before I go on?0:30:47
Thank you. Let's take a break.0:31:28
Well, we've been making a simulation. And the simulation is an event-driven simulation where0:31:39
the objects in the world are the objects in the computer. And the changes of state that are happening in the world in time are organized to be time in the computer, so that if0:31:53
something happens after something else in the world, then we have it happen after, after the corresponding events happen in the same order in the computer.0:32:04
That's where we have assignments, when we make that alignment. Right now I want to show you a way of organizing time, which is an agenda or priority queue, it's sometimes called.0:32:16
We'll do some--we'll do a little bit of just understanding what are the things we need to be able to do to make agendas.0:32:28
And so we're going to have--and so right now over here, I'm going to write down a bunch of primitive operations for manipulating agendas. I'm not going to show you the code for them because they're0:32:38
all very simple, and you've got listings of all that anyway. So what do we have? We have things like make-agenda which produces a0:32:52
new agenda. We can ask--we get the current-time of an agenda,0:33:10
which gives me a number, a time. We can get--we can ask whether an agenda is empty,0:33:20
empty-agenda.0:33:30
And that produces either a true or a false.0:33:42
We can add an object to an agenda.0:33:52
Actually, what we add to an agenda is an operation--an action to be done. And that takes a time, the action itself, and the agenda0:34:03
I want to add it to. That inserts it in the appropriate place in the agenda. I can get the first item off an agenda, the first thing I0:34:14
have to do, which is going to give me an action.0:34:26
And I can remove the first item from an agenda. That's what I have to be able to do with agendas. That is a big complicated mess.0:34:42
From an agenda. Well, let's see how we can organize this thing as a data structure a bit.0:34:52
Well, an agenda is going to be some kind of list. And it's going to be a list that I'm going to have to be able to modify.0:35:01
So we have to talk about modifying of lists, because I'm going to add things to it, and delete things from it, and things like that.0:35:11
It's organized by time. It's probably good to keep it in sorted order. But sometimes there are lots of things that happen at the0:35:22
same time--approximate same time. What I have to do is say, group things by the time at which they're supposed to happen. So I'm going to make an agenda as a list of segments.0:35:32
And so I'm going to draw you a data structure for an agenda, a perfectly reasonable one. Here's an agenda.0:35:41
It's a thing that begins with a name. I'm going to do it right now out of list structure.0:35:52
It's got a header. There's a reason for the header. We're going to see the reason soon. And it will have a segment.0:36:03
It will have--it will be a list of segments. Supposing this agenda has two segments, they're the car's--0:36:13
successive car's of this list. Each segment is going to have a time--0:36:24
say for example, 10-- that says that the things that happen in this segment are at time 10.0:36:33
And what I'm going to have in here is another data structure which I'm not going to describe, which is a queue of things to do at time 10.0:36:42
It's a queue. And we'll talk about that in a second. But abstractly, the queue is just a list of things to do at a particular time. And I can add things to a queue.0:36:53
This is a queue. There's a time, there's a segment.0:37:02
Now, I may have another segment in this agenda. Supposing this is stuff that happens at time 30.0:37:13
It has, of course, another queue of things that are queued up to be done at time 30.0:37:23
Well, there are various things I have to be able to do to an agenda. Supposing I want to add to an agenda another thing to be done at time 10.0:37:33
Well, that's not very hard. I'm going to walk down here, looking for the segment of time 10. It is possible that there is no segment of time 10.0:37:42
We'll cover that case in a second. But if I find a segment of time 10, then if I want to add another thing to be done at time 10, I just0:37:51
increase that queue-- "just increase" isn't such an obvious idea. But I increase the things to be done at that time.0:38:01
Now, supposing I want to add something to be done at time 20. There is no segment for time 20. I'm going to have to create a new segment.0:38:11
I want my time 20 segment to exist between time 10 and time 30. Well, that takes a little work.0:38:20
I'm going to have to do a CONS. I'm going to have to make a new element of the agenda list--list of segments.0:38:33
I'm going to have to change. Here's change. I'm going to have to change the CDR of the CDR of the0:38:42
agenda to point that a new CONS of the new segment and the CDR of the CDR of the CDR of the agenda, the CD-D-D-DR.0:38:56
And this is going to have a new segment now of time 20 with its own queue, which now has one element in it.0:39:10
If I wanted to add something at the end, I'm going to have to replace the CDR of this, of this list with something.0:39:20
We're going to have to change that piece of data structure. So I'm going to need new primitives for doing this. But I'm just showing you why I need them.0:39:29
And finally, if I wanted to add a thing to be done at time 5, I'm going to have to change this one, because I'm going to0:39:41
have to add it in over here, which is why I planned ahead and had a header cell, which has a place. If I'm going to change things, I have to have0:39:50
places for the change. I have to have a place to make the change. If I remove things from the agenda, that's not so hard.0:40:02
Removing them from the beginning is pretty easy, which is the only case I have. I can go looking for the first, the first segment.0:40:11
I see if it has a non-empty queue. If it has a non-empty queue, well, I'm going to delete one element from the queue, like that.0:40:20
If the queue ever becomes empty, then I have to delete the whole segment. And then this, this changes to point to here. So it's quite a complicated data structure manipulation0:40:30
going on, the details of which are not really very exciting. Now, let's talk about queues. They're similar.0:40:41
Because each of these agendas has a queue in it. What's a queue? A queue is going to have the following primitive0:40:51
operations. To make a queue, this gives me a new queue.0:41:07
I'm going to have to be able to insert into a queue a new item.0:41:24
I'm going to have to be able to delete from a queue the first item in the queue.0:41:39
And I want to be able to get the first thing in the queue0:41:51
from some queue. I also have to be able to test whether a queue is empty.0:42:07
And when you invent things like this, I want you to be very careful to use the kinds of conventions I use for naming things. Notice that I'm careful to say these change something and0:42:18
that tests it. And presumably, I did the same thing over here. OK, and there should be an empty test over here.0:42:29
OK, well, how would I make a queue? A queue wants to be something I can add to at the end of, and pick up the thing at the beginning of. I should be able to delete from the beginning0:42:39
and add to the end. Well, I'm going to show you a very simple structure for that. We can make this out of CONSes as well. Here's a queue.0:42:49
It has--it has a queue header, which contains two parts-- a front pointer and a rear pointer.0:43:02
And here I have a queue with two items in it. The first item, I don't know, it's perhaps a 1.0:43:12
And the second item, I don't know, let's give it a 2.0:43:21
The reason why I want two pointers in here, a front pointer and a rear pointer, is so I can add to the end without having to chase down from the beginning.0:43:31
So for example, if I wanted to add one more item to this queue, if I want to add on another item to be worried0:43:40
about later, all I have to do is make a CONS, which contains that item, say a 3. That's for inserting 3 into the queue.0:43:51
Then I have to change this pointer here to here.0:44:00
And I have to change this one to point to the new rear.0:44:09
If I wish to take the first element of the queue, the first item, I just go chasing down the front pointer until I find the first one and pick it up.0:44:18
If I wish to delete the first item from the queue, delete-queue, all I do is move the front pointer along this way.0:44:27
The new front of the queue is now this. So queues are very simple too. So what you see now is that I need a certain number of new0:44:39
primitive operations. And I'm going to give them some names. And then we're going to look into how they work, and how they're used. We have set the CAR of some pair, or a thing produced by0:44:56
CONSing, to a new value. And set the CDR of a pair to a new value.0:45:12
And then we're going to look into how they work. I needed setting CAR over here to delete the first element of the queue. This is the CAR, and I had to set it.0:45:23
I had to be able to set the CDR to be able to move the rear pointer, or to be able to increment the queue here. All of the operations I did were made out of those that I0:45:33
just showed you on the, on the last blackboard. Good. Let's pause the time, and take a little break then.0:46:38
When we originally introduced pairs made out of CONS, made by CONS, we only said a few axioms about them, which were0:46:48
of the form-- what were they-- for all X and Y, the CAR of the CONS of X and Y is X and0:47:06
the CDR of the CONS of X and Y is Y. Now, these say nothing0:47:15
about whether a CONS has an identity like a person. In fact, all they say is something sort of abstract,0:47:25
that a CONS is the parts it's made out of. And of course, two things are made out of the same parts, they're the same, at least from the point of view of0:47:34
these axioms. But by introducing assignment-- in fact, mutable data is a kind of assignment, we have a0:47:43
set CAR and a set CDR-- by introducing those, these axioms no longer tell the whole story. And they're still true if written exactly like this.0:47:53
But they don't tell the whole story. Because if I'm going to set a particular CAR in a particular CONS, the questions are, well, is that setting all CARs and0:48:05
all CONSes of the same two things or not? If I--if we use CONSes to make up things like rational numbers, or things like 3 over 4, supposing I had two0:48:19
three-fourths. Are they the same one-- or are they different? Well, in the case of numbers, it doesn't matter. Because there's no meaning to changing the0:48:29
denominator of a number. What you could do is make a number which has a different denominator. But the concept of changing a number which has to have a0:48:38
different denominator is sort of a very weird, and sort of not supported by what you think of as mathematics. However, when these CONSes represent things in the physical world, then changing something like the CAR is like0:48:50
removing a piece of the fingernail. And so CONSes have an identity. Let me show you what I mean about identity, first of all.0:49:01
Let's do some little example here. Supposing I define A to the CONS of 1 and 2.0:49:18
Well, what that means, first of all, is that somewhere in some environment I've made a symbol A to have a value which0:49:27
is a pair consisting of pointers to a 1 and a pointer to a 2, just like that.0:49:38
Now, supposing I also say define B to be the CONS--0:49:53
it doesn't matter, but I like it better, it's prettier-- of A and A.0:50:03
Well, first of all, I'm using the name A twice. At this moment, I'm going to think of CONSes as having identity. This is the same one.0:50:13
And so what that means is I make another pair, which I'm going to call B. And it contains two pointers to A. At0:50:29
this point, I have three names for this object. A is its name. The CAR of B is its name. And the CDR of B is its name.0:50:39
It has several aliases, they're called. Now, supposing I do something like set-the-CAR, the CAR of0:51:01
the CAR of B to 3.0:51:12
What that means is I find the CAR of B, that's this. I set the CAR of that to be 3, changing this.0:51:24
I've changed A. If I were to ask what's the CAR of A--of A now?0:51:35
I would get out 3, even though here we see that A was the CONS of 1 and 2.0:51:45
I caused A to change by changing B. There is sharing here. That's sometimes what we want.0:51:54
Surely in the queues and things like that, that's exactly what we defined our--organized our data structures to facilitate-- sharing.0:52:04
But inadvertent sharing, unanticipated interactions between objects, is the source of most of the bugs that occur in complicated programs. So by introducing this possibility0:52:17
of things having identity and sharing and having multiple names for the same thing, we get a lot of power. But we're going to pay for it with lots of0:52:27
complexity and bugs. So also, for example, if I just looked at this just to drive that home, the CADR of B, which has nothing to do0:52:43
with even the CAR of B, apparently. The CADR of B, what's that? Take that CDR of B and now take the CAR of that.0:52:53
Oh, that's 3 also. So I can have non-local interactions by sharing. And I have to be very careful of that.0:53:06
Well, so far, of course, it seems I've introduced several different assignment operators-- set, set CAR, set CDR. Well, maybe I should just get rid of0:53:19
set CAR and set CDR. Maybe they're not worthwhile. Well, the answer is that once you let the camel's nose into the tent, the rest of him follows.0:53:30
All I have to have is set, and I can make all of the--all of the bad things that can happen. Let's play with that a little bit.0:53:40
A couple of days ago, when we introduced compound data, you saw Hal show you a definition of CONS in terms0:53:49
of a message acceptor. I'm going to show you even a more horrible thing, a definition of CONS in terms of nothing but air, hot air.0:54:04
What is the definition of CONS, of the old functional kind, in terms of purely lambdic expressions,0:54:13
procedures? Because I'm going to then modify this definition to get assignment to be only one kind of assignment, to get rid of0:54:25
the set CAR and set CDR in terms of set. So what if I define CONS of X and Y to be a procedure of one0:54:41
argument called a message M, which calls that message on X and Y?0:54:51
This [? idea ?] was invented by Alonzo Church, who was the greatest programmer of the 20th century, although he never saw a computer. It was done in the 1930s. He was a logician, I suppose at Princeton at the time.0:55:08
Define CAR of X to be the result of applying X to that procedure of two arguments, A and D, which selects A. I will0:55:24
define CDR of X to be that procedure, to be the result of0:55:36
applying X to that procedure of A and D, which selects D.0:55:46
Now, you may not recognize this as CAR, CDR, and CONS. But I'm going to demonstrate to you that it satisfies the original axioms, just once.0:55:55
And then we're going to do some playing of games. Consider the problem CAR of CONS of, say, 35 and 47.0:56:09
Well, what is that? It is the result of taking car of the result of substituting 35 and 47 for X and Y in the body of this.0:56:19
Well, that's easy enough. That's CAR of the result of substituting into lambda of M, M of 35 and 47.0:56:35
Well, what this is, is the result of substituting this object for X in the body of that. So that's just lambda of M--0:56:48
that's substituted, because this object is being substituted for X, which is the beginning of a list, lambda of M--0:56:57
M of 35 and 47, applied to that procedure of A and D,0:57:07
which gives me A. Well, that's the result of substituting this for M here. So that's the same thing as lambda of A, D, A,0:57:22
applied to 35 and 47. Oh, well that's 35. That's substituting 35 for A and for 47 for D in A. So I0:57:36
don't need any data at all, not even numbers. This is Alonso Church's hack.0:57:52
Well, now we're going to do something nasty to him. Being a logician, he wouldn't like this. But as programmers, let's look at the overhead.0:58:03
And here we go. I'm going to change the definition of CONS. It's almost the same as Alonzo Church's, but not quite.0:58:14
What do we have here? The CONS of two arguments, X and Y, is going to be that procedure of one argument M, which supplies M to X and Y as0:58:25
before, but also to two permissions, the permission to set X to N and the permission to set Y to N, given that I0:58:35
have an N. So besides the things that I had here in Church's0:58:44
definition, what I have is that the thing that CONS returns will apply its argument to not just the0:58:55
values of the X and Y that the CONS is made of, but also permissions to set X and Y to new values.0:59:06
Now, of course, just as before, CAR is exactly the same. The CAR of X is nothing more than applying X, as in Church's definition, to a procedure, in this case, of0:59:18
four arguments, which selects out the first one. And just as we did before, that will be the value of X0:59:28
that was contained in the procedure which is the result of evaluating this lambda expression in the environment where X and Y are defined over here.0:59:41
That's the value of CONS. Now, however, the exciting part. CDR, of course, is the same. The exciting part, set CAR and set CDR. Well, they're nothing0:59:54
very complicated anymore. Set CAR of a CONS X to a new value Y is nothing more than applying that CONS, which is the procedure of four--the1:00:06
procedure of one argument which applies its argument to four things, to a procedure which is of four arguments--1:00:15
the value of X, the value of Y, permission to set X, the permission to set Y-- and using it--using that permission to set X to the new value.1:00:31
And similarly, set-cdr is the same thing. So what you've just seen is that I didn't introduce any new primitives at all.1:00:40
Whether or not I want to implement it this way is a matter of engineering. And the answer is of course I don't implement it this way for reasons that have to do with engineering.1:00:51
However in principle, logically, once I introduced one assignment operator, I've assigned--I've introduced them all.1:01:05
Are there any questions? Yes, David. AUDIENCE: I can follow you up until you get--I can follow1:01:14
all of that. But when we bring in the permissions, defining CONS in terms of the lambda N, I don't follow where N gets passed.1:01:24
PROFESSOR: Oh, I'm sorry. I'll show you. Let's follow it. Of course, we could do it on the blackboard. It's not so hard. But it's also easy here. Supposing I wish to set-cdr of X to Y. See that right there.1:01:38
set-cdr of X to Y. X is presumably a CONS, a thing resulting from evaluating CONS. Therefore X comes from a place over here, that that X is of1:01:54
the result of evaluating this lambda expression. Right? That when I evaluated that lambda expression, I evaluated1:02:04
it in an environment where the arguments to CONS were defined. That means that as free variables in this lambda1:02:14
expression, there is the--there are in the frame, which is the parent frame of this lambda expression, the1:02:23
procedure resulting from this lambda expression, X and Y have places. And it's possible to set them. I set them to an N, which is the argument of the1:02:35
permission. The permission is a procedure which is passed to M, which is the argument that the CONS object gets passed.1:02:47
Now, let's go back here in the set-cdr The CONS object, which is the first argument of set-cdr1:02:56
gets passed an argument. That--there's a procedure of four things, indeed, because that's the same thing as this M over here, which is applied1:03:05
to four objects. The object over here, SD, is, in fact, this permission.1:03:15
When I use SD, I apply it to Y, right there. So that comes from this.1:03:25
AUDIENCE: So what do you-- PROFESSOR: So to finish that, the N that was here is the Y which is here.1:03:34
How's that? AUDIENCE: Right, OK. Now, when you do a set-cdr, X is the value the CDR is going to become. PROFESSOR: The X over here.1:03:44
I'm sorry, that's not true. The X is--set-cdr has two arguments-- The CONS I'm changing and the value I'm changing it to.1:03:56
So you have them backwards, that's all. Are there any other questions?1:04:07
Well, thank you. It's time for lunch.0:00:00
Lecture 6A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:18
PROFESSOR: Well, last time Gerry really let the cat out of the bag. He introduced the idea of assignment. Assignment and state.0:00:37
And as we started to see, the implications of introducing assignment and state into the language are absolutely frightening. First of all, the substitution model of0:00:47
evaluation breaks down. And we have to use this much more complicated environment model and this very mechanistic thing with diagrams, even to say what statements in the programming0:00:56
language mean. And that's not a mere technical point. See, it's not that we had this particular substitution model and, well, it doesn't quite work, so we have to do0:01:05
something else. It's that nothing like the substitution model can work. Because suddenly, a variable is not just something that0:01:15
stands for a value. A variable now has to somehow specify a place that holds a value. And the value that's in that place can change.0:01:30
Or for instance, an expression like f of x might have a side0:01:39
effect in it. So if we say f of x and it has some value, and then later we say f of x again, we might get a different value0:01:48
depending on the order. So suddenly, we have to think not only about values but about time.0:01:57
And then things like pairs are no longer just their CARs and their CDRs. A pair now is not quite its CAR and its CDR. It's rather0:02:06
its identity. So a pair has identity. It's an object.0:02:21
And two pairs that have the same CAR and CDR might be the same or different, because suddenly we have to worry about sharing.0:02:34
So all of these things enter as soon as we introduce assignment. See, this is a really far cry from where we started with0:02:43
substitution. It's a technically harder way of looking at things because we have to think more mechanistically about our0:02:52
programming language. We can't just think about it as mathematics. It's philosophically harder, because suddenly there are all these funny issues about what does it mean that something0:03:02
changes or that two things are the same. And also, it's programming harder, because as Gerry showed last time, there are all these bugs having to do with bad sequencing and aliasing that just don't exist0:03:14
in a language where we don't worry about objects. Well, how'd we get into this mess?0:03:23
Remember what we did, the reason we got into this is because we were looking to build modular systems. We0:03:35
wanted to build systems that fall apart into chunks that seem natural. So for instance, we want to take a random number generator0:03:46
and package up the state of that random number generator inside of it so that we can separate the idea of picking random numbers from the general Monte Carlo strategy0:03:56
of estimating something and separate that from the particular way that you work with random numbers in that formula developed by Cesaro for pi.0:04:06
And similarly, when we go off and construct some models of things, if we go off and model a system that we see in the0:04:15
real world, we'd like our program to break into natural pieces, pieces that mirror the parts of the system that we see in the real world.0:04:24
So for example, if we look at a digital circuit, we say, gee, there's a circuit and it has a piece and0:04:33
it has another piece. And these different pieces sort of have identity.0:04:43
They have state. And the state sits on these wires. And we think of this piece as an object that's different from that as an object.0:04:52
And when we watch the system change, we think about a signal coming in here and changing a state that might be here and going here and interacting with a state that might be stored there, and so on and so on.0:05:06
So what we'd like is we'd like to build in the computer systems that fall into pieces that mirror our view of0:05:17
reality, of the way that the actual systems we're modeling seem to fall into pieces. Well, maybe the reason that building systems like this0:05:28
seems to introduce such technical complications has nothing to do with computers. See, maybe the real reason that we pay such a price to0:05:37
write programs that mirror our view of reality is that we have the wrong view of reality. See, maybe time is just an illusion, and0:05:47
nothing ever changes. See, for example, if I take this chalk, and we say, gee, this is an object and it has a state. At each moment it has a position and a velocity.0:05:59
And if we do something, that state can change. But if you studied any relativity, for instance, you know that you don't think of the path of that chalk as0:06:09
something that goes on instant by instant. It's more insightful to think of that whole chalk's existence as a path in space-time. that's all splayed out.0:06:18
There aren't individual positions and velocities. There's just its unchanging existence in space-time. Similarly, if we look at this electrical system, if we0:06:28
imagine this electrical system is implementing some sort of signal processing system, the signal processing engineer who put that thing together doesn't think of it as, well,0:06:39
at each instance there's a voltage coming in. And that translates into something. And that affects the state over here, which changes the state over here. Nobody putting together a signal processing system0:06:49
thinks about it like that. Instead, you say there's this signal that's splayed out over time.0:06:58
And if this is acting as a filter, this whole thing transforms this whole thing for some sort of other output.0:07:09
You don't think of it as what's happening instant by instant as the state of these things. And somehow you think of this box as a whole thing, not as little pieces sending messages of state to each other at0:07:20
particular instants. Well, today we're going to look at another way to0:07:30
decompose systems that's more like the signal processing engineer's view of the world than it is like thinking about objects that communicate sending messages.0:07:41
That's called stream processing.0:07:54
And we're going to start by showing how we can make our programs more uniform and see a lot more commonality if we0:08:08
throw out of these programs what you might say is an inordinate concern with worrying about time.0:08:17
Let me start by comparing two procedures. The first one does this. We imagine that there's a tree.0:08:30
Say there's a tree of integers. It's a binary tree.0:08:39
So it looks like this. And there's integers in each of the nodes. And what we would like to compute is for each odd number0:08:51
sitting here, we'd like to find the square and then sum up all those squares. Well, that should be a familiar kind of thing. There's a recursive strategy for doing it.0:09:02
We look at each leaf, and either it's going to contribute the square of the number if it's odd or 0 if it's even. And then recursively, we can say at each tree, the sum of0:09:13
all of them is the sum coming from the right branch and the left branch, and recursively down through the nodes. And that's a familiar way of thinking about programming. Let's actually look at that on the slide.0:09:23
We say to sum the odd squares in a tree, well, there's a test. Either it's a leaf node, and we're going to check to see if it's an integer, and then either it's odd, in which0:09:34
we take the square, or else it's 0. And then the sum of the whole thing is the sum coming from the left branch and the right branch.0:09:46
OK, well, let me contrast that with a second problem. Suppose I give you an integer n, and then some function to0:09:55
compute of the first of each integer in 1 through n. And then I want to collect together in a list all those function values that satisfy some property.0:10:05
That's a general kind of thing. Let's say to be specific, let's imagine that for each integer, k, we're going to compute the k Fibonacci number.0:10:14
And then we'll see which of those are odd and assemble those into a list. So here's a procedure that does that.0:10:23
Find the odd Fibonacci numbers among the first n. And here is a standard loop the way we've been writing it. This is a recursion. It's a loop on k, and says if k is bigger than n, it's the0:10:33
empty list. Otherwise we compute the k-th Fibonacci number, call that f. If it's odd, we CONS it on to the list starting0:10:45
with the next one. And otherwise, we just take the next one. And this is the standard way we've been writing iterative loops. And we start off calling that loop with 1.0:10:57
OK, so there are two procedures. Those procedures look very different. They have very different structures. Yet from a certain point of view, those procedures are0:11:07
really doing very much the same thing. So if I was talking like a signal processing engineer, what I might say is that the first procedure enumerates the0:11:25
leaves of a tree. And then we can think of a signal coming out of that, which is all the leaves.0:11:35
We'll filter them to see which ones are odd, put them through some kind of filter.0:11:45
We'll then put them through a kind of transducer. And for each one of those things, we'll take the square.0:11:54
And then we'll accumulate all of those. We'll accumulate them by sticking them together with addition starting from 0.0:12:07
That's the first program. The second program, I can describe in a very, very similar way. I'll say, we'll enumerate the numbers on this interval, for0:12:17
the interval 1 through n. We'll, for each one, compute the Fibonacci number, put them0:12:28
through a transducer. We'll then take the result of that, and we'll filter it for oddness. And then we'll take those and put them into an accumulator.0:12:39
This time we'll build up a list, so we'll accumulate with CONS starting from the empty list. So this way of looking at the program makes the two seem0:12:50
very, very similar. The problem is that that commonality is completely obscured when we look at the procedures we wrote. Let's go back and look at some odd squares again, and say0:13:02
things like, where's the enumerator? Where's the enumerator in this program? Well, it's not in one place.0:13:11
It's a little bit in this leaf-node test, which is going to stop. It's a little bit in the recursive structure of the thing itself.0:13:23
Where's the accumulator? The accumulator isn't in one place either. It's partly in this 0 and partly in this plus.0:13:32
It's not there as a thing that we can look at. Similarly, if we look at odd Fibs, that's also, in some sense, an enumerator and an accumulator, but0:13:42
it looks very different. Because partly, the enumerator is here in this greater than sign in the test. And partly it's in this whole recursive0:13:52
structure in the loop, and the way that we call it. And then similarly, that's also mixed up in there with the accumulator, which is partly over there and partly0:14:01
over there. So these very, very natural pieces, these very natural boxes here don't appear in our programs. Because they're kind0:14:13
of mixed up. The programs don't chop things up in the right way. Going back to this fundamental principle of computer science0:14:22
that in order to control something, you need the name of it, we don't really have control over thinking about things this way because we don't have our hands in them explicitly.0:14:31
We don't have a good language for talking about them. Well, let's invent an appropriate language in which0:14:42
we can build these pieces. The key to the language is these guys, is what is these things I called signals? What are these things that are flying on the0:14:52
arrows between the boxes? Well, those things are going to be data structures called0:15:02
streams. That's going to be the key to inventing this language. What's a stream? Well, a stream is, like anything else, a data abstraction.0:15:12
So I should tell you what its selectors and constructors are. For a stream, we're going to have one constructor that's called CONS-stream.0:15:25
CONS-stream is going to put two things together to form a thing called a stream. And then to extract things from the stream, we're going0:15:34
to have a selector called the head of the stream. So if I have a stream, I can take its head or I can take its tail.0:15:44
And remember, I have to tell you George's contract here to tell you what the axioms are that relate these.0:15:53
And it's going to be for any x and y, if I form the0:16:04
CONS-stream and take the head, the head of CONS-stream of x and y is going to be x and the tail of CONS-stream of x and y0:16:26
is going to be y. So those are the constructor, two selectors for streams, and an axiom. There's something fishy here.0:16:36
So you might notice that these are exactly the axioms for CONS, CAR, and CDR. If instead of writing CONS-stream I wrote0:16:46
CONS and I said head was the CAR and tail was the CDR, those are exactly the axioms for pairs. And in fact, there's another thing here.0:16:55
We're going to have a thing called the-empty-stream, which is like the-empty-list.0:17:08
So why am I introducing this terminology? Why don't I just keep talking about pairs and lists? Well, we'll see. For now, if you like, why don't you just pretend that0:17:18
streams really are just a terminology for lists. And we'll see in a little while why we want to keep this extra abstraction layer and not just call them lists.0:17:32
OK, now that we have streams, we can start constructing the pieces of the language to operate on streams. And there are a whole bunch of very useful things that we could0:17:41
start making. For instance, we'll make our map box to take a stream, s,0:17:54
and a procedure, and to generate a new stream which has as its elements the procedure applied to all the0:18:03
successive elements of s. In fact, we've seen this before. This is the procedure map that we did with lists. And you see it's exactly map, except we're testing for0:18:14
empty-stream. Oh, I forgot to mention that. Empty-stream is like the null test. So if it's empty, we generate the empty stream. Otherwise, we form a new stream whose first element is0:18:24
the procedure applied to the head of the stream, and whose rest is gotten by mapping along with the procedure down the tail of the stream.0:18:33
So that looks exactly like the map procedure we looked at before. Here's another useful thing. Filter, this is our filter box. We're going to have a predicate and a stream.0:18:43
We're going to make a new stream that consists of all the elements of the original one that satisfy the predicate. That's case analysis. When there's nothing in the stream, we0:18:53
return the empty stream. We test the predicate on the head of the stream. And if it's true, we add the head of the stream onto the0:19:03
result of filtering the tail of the stream. And otherwise, if that predicate was false, we just filter the tail of the stream.0:19:13
Right, so there's filter. Let me run through a couple more rather quickly. They're all in the book and you can look at them. Let me just flash through.0:19:22
Here's accumulate. Accumulate takes a way of combining things and an initial value in a stream and sticks them all together.0:19:31
If the stream's empty, it's just the initial value. Otherwise, we combine the head of the stream with the result of accumulating the tail of the stream starting from the initial value.0:19:40
So that's what I'd use to add up everything in the stream. I'd accumulate with plus. How would I enumerate the leaves of a tree? Well, if the tree is just a leaf itself, I make something0:19:54
which only has that node in it. Otherwise, I append together the stuff of enumerating the left branch and the right branch.0:20:04
And then append here is like the ordinary append on lists.0:20:13
You can look at that. That's analogous to the ordinary procedure for appending two lists. How would I enumerate an interval? This will take two integers, low and high, and generate a0:20:24
stream of the integers going from low to high. And we can make a whole bunch of pieces. So that's a little language of talking about streams. Once we0:20:34
have streams, we can build things for manipulating them. Again, we're making a language. And now we can start expressing things in this language.0:20:43
Here's our original procedure for summing the odd squares in a tree. And you'll notice it looks exactly now like the block0:20:52
diagram, like the signal processing block diagram. So to sum the odd squares in a tree, we enumerate the leaves of the tree.0:21:01
We filter that for oddness. We map that for squareness. And we accumulate the result of that using addition,0:21:12
starting from 0. So we can see the pieces that we wanted. Similarly, the Fibonacci one, how do we get the odd Fibs?0:21:22
Well, we enumerate the interval from 1 to n, we map along that, computing the Fibonacci of each one. We filter the result of those for oddness.0:21:34
And we accumulate all of that stuff using CONS starting from the empty-list.0:21:43
OK, what's the advantage of this? Well, for one thing, we now have pieces that we can start mixing and matching. So for instance, if I wanted to change this, if I wanted to0:21:58
compute the squares of the integers and then filter them, all I need to do is pick up a standard piece like this in that square and put it in. Or if we wanted to do this whole Fibonacci computation on0:22:10
the leaves of a tree rather than a sequence, all I need to do is replace this enumerator with that one. See, the advantage of this stream processing is that0:22:20
we're establishing-- this is one of the big themes of the course-- we're establishing conventional interfaces that0:22:35
allow us to glue things together. Things like map and filter are a standard set of components that we can start using for pasting together programs in all sorts of ways.0:22:45
It allows us to see the commonality of programs. I just ought to mention, I've only showed you two procedures. But let me emphasize that this way of putting things together0:22:57
with maps, filters, and accumulators is very, very general. It's the generate and test paradigm for programs. And as0:23:08
an example of that, Richard Waters, who was at MIT when he was a graduate student, as part of his thesis research went and analyzed a large chunk of the IBM scientific0:23:17
subroutine library, and discovered that about 60% of the programs in it could be expressed exactly in terms0:23:26
using no more than what we've put here-- map, filter, and accumulate. All right, let's take a break.0:23:36
Questions? AUDIENCE: It seems like the essence of this whole thing is just that you have a very uniform, simple data structure0:23:45
to work with, the stream. PROFESSOR: Right. The essence is that you, again, it's this sense of conventional interfaces. So you can start putting a lot of things together.0:23:55
And the stream is as you say, the uniform data structure that supports that. This is very much like APL, by the way. APL is very much the same idea, except in APL, instead0:24:06
of this stream, you have arrays and vectors. And a lot of the power of APL is exactly the same reason of the power of this.0:24:19
OK, thank you. Let's take a break.0:24:57
All right. We've been looking at ways of organizing computations using streams. What I want to do now is just show you two somewhat0:25:07
more complicated examples of that. Let's start by thinking about the following kind of utility procedure that will come in useful.0:25:16
Suppose I've got a stream. And the elements of this stream are themselves streams. So the first thing might be 1, 2, 3.0:25:32
So I've got a stream. And each element of the stream is itself a stream. And what I'd like to do is build a stream that collects0:25:45
together all of the elements, pulls all of the elements out of these sub-streams and strings them all together in one thing. So just to show you the use of this language, how easy it is,0:25:56
call that flatten. And I can define to flatten this stream of streams. Well,0:26:13
what is that? That's just an accumulation. I want to accumulate using append, by0:26:25
successively appending. So I accumulate using append streams, starting with0:26:36
the-empty-stream down that stream of streams.0:26:54
OK, so there's an example of how you can start using these higher order things to do some interesting operations. In fact, there's another useful thing0:27:04
that I want to do. I want to define a procedure called flat-map, flat map of0:27:18
some function and a stream. And what this is going to do is f will be a stream of elements. f is going to be a function that for each element in the0:27:28
stream produces another stream. And what I want to do is take all of the elements and all of those streams and combine them together. So that's just going to be the flatten of map f down s.0:27:51
Each time I apply f to an element of s, I get a stream. If I map it all the way down, I get a stream of streams, and I'll flatten that. Well, I want to use that to show you a new way to do a0:28:04
familiar kind of problem. The problem's going to be like a lot of problems you've seen, although maybe not this particular one.0:28:14
I'm going to give you an integer, n. And the problem is going to be find all pairs and integers i0:28:31
and j, between 0 and i, with j less than i, up to n, such0:28:42
that i plus j is prime.0:28:55
So for example, if n equals 6, let's make a little table here, i and j and i plus j.0:29:09
So for, say, i equals 2 and j equals 1, I'd get 3. And for i equals 3, I could have j equals 2, and that0:29:18
would be 5. And 4 and 1 would be 5 and so on, up until i goes to 6.0:29:28
And what I'd like to return is to produce a stream of all the triples like this, let's say i, j, and i plus j.0:29:37
So for each n, I want to generate this stream. OK, well, that's easy. Let's build it up.0:29:47
We start like this. We're going to say for each i, we're going to generate a stream.0:29:56
For each i in the interval 1 through n, we're going to generate a stream. What's that stream going to be? We're going to start by generating all the pairs. So for each i, we're going to generate, for each j in the0:30:11
interval 1 to i minus 1, we'll generate the pair, or the list with two elements i and j.0:30:23
So we map along the interval, generating the pairs. And for each i, that generates a stream of pairs.0:30:33
And we flatmap it. Now we have all the pairs i and j, such that i is less than j. So that builds that. Now we're got to test them.0:30:42
Well, we take that thing we just built, the flatmap, and we filter it to see whether the i-- see, we had an i and a j.0:30:51
i was the first thing in the list, j was the second thing in the list. So we have a predicate which says in that list of two elements is the sum of the0:31:00
CAR and the CDR prime. And we filter that collection of pairs we just built. So those are the pairs we want.0:31:09
Now we go ahead and we take the result of that filter and we map along it, generating the list i and j and i plus j.0:31:19
And that's our procedure prime-sum-pairs. And then just to flash it up, here's the whole procedure. A map, a filter, a flatmap.0:31:34
There's the whole thing, even though this isn't particularly readable. It's just expanding that flatmap. So there's an example which illustrates the general point0:31:45
that nested loops in this procedure start looking like compositions of flatmaps of flatmaps of flatmaps of maps and things.0:31:54
So not only can we enumerate individual things, but by using flatmaps, we can do what would correspond to nested loops in most other languages.0:32:03
Of course, it's pretty awful to keep writing these flatmaps of flatmaps of flatmaps. Prime-sum-pairs you saw looked fairly complicated, even0:32:13
though the individual pieces were easy. So what you can do, if you like, is introduced some syntactic sugar that's called collect. And collect is just an abbreviation for that nest of0:32:23
flatmaps and filters arranged in that particular way. Here's prime-sum-pairs again, written using collect. It says to find all those pairs, I'm going to collect0:32:32
together a result, which is the list i, j, and i plus j, that's going to be generated as i runs through the interval0:32:44
from 1 to n and as j runs through the interval from 1 to i minus 1, such that i plus j is prime.0:32:58
So I'm not going to say what collect does in general. You can look at that by looking at it in the book. But pretty much, you can see that the pieces of this are the pieces of that original procedure I wrote.0:33:08
And this collect is just some syntactic sugar for automatically generating that nest of flatmaps and flatmaps. OK, well, let me do one more example that shows you the0:33:21
same kind of thing. Here's a very famous problem that's used to illustrate a lot of so-called backtracking computer algorithms. This is the eight queens problem.0:33:30
This is a chess board. And the eight queens problem says, find a way to put down eight queens on a chess board so that no two are attacking each other. And here's a particular solution to the0:33:39
eight queens problem. So I have to make sure to put down queens so that no two are in the same row or the same column or sit0:33:48
along the same diagonal. Now, there's sort of a standard way of doing that.0:33:59
Well, first we need to do is below the surface, at George's level. We have to find some way to represent a board, and represent positions.0:34:08
And we'll not worry about that. But let's assume that there's a predicate called safe. And what safe is going to do is going to say given that I0:34:19
have a bunch of queens down on the chess board, is it OK to put a queen in this particular spot? So safe is going to take a row and a column.0:34:32
That's going to be a place where I'm going to try and put down the next queen, and the rest of positions.0:34:45
And what safe will say is given that I already have queens down in these positions, is it safe to put another queen down in that row and that column?0:34:58
And let's not worry about that. That's George's problem. and it's not hard to write. You just have to check whether this thing contains any things on that row or that column or in that diagonal.0:35:10
Now, how would you organize the program given that? And there's sort of a traditional way to organize it called backtracking.0:35:20
And it says, well, let's think about all the ways of putting the first queen down in the first column.0:35:31
There are eight ways. Well, let's say try the first column. Try column 1, row 1. These branches are going to represent the possibilities at0:35:41
each level. So I'll try and put a queen down in the first column. And now given that it's in the first column, I'll try and put the next queen down in the first column.0:35:53
I'll try and put the first queen, the one in the first column, down in the first row. I'm sorry. And then given that, we'll put the next queen down in the first row. And that's no good.0:36:02
So I'll back up to here. And I'll say, oh, can I put the first queen down in the second row? Well, that's no good. Oh, can I put it down in the third row? Well, that's good.0:36:12
Well, now can I put the next queen down in the first column? Well, I can't visualize this chess board anymore, but I think that's right. And I try the next one. And at each place, I go as far down this tree as I can.0:36:24
And I back up. If I get down to here and find no possibilities below there, I back all the way up to here, and now start again generating this sub-tree.0:36:33
And I sort of walk around. And finally, if I ever manage to get all the way down, I've found a solution. So that's a typical sort of paradigm that's used a lot in0:36:45
AI programming. It's called backtracking search.0:36:57
And it's really unnecessary. You saw me get confused when I was visualizing this thing.0:37:06
And you see the complication. This is a complicated thing to say. Why is it complicated? Its because somehow this program is too inordinately0:37:16
concerned with time. It's too much-- I try this one, and I try this one, and I go back to the last possibility. And that's a complicated thing. If I stop worrying about time so much, then there's a much0:37:28
simpler way to describe this. It says, let's imagine that I have in my hands the tree down0:37:40
to k minus 1 levels. See, suppose I had in my hands all possible ways to put down0:37:50
queens in the first k columns. Suppose I just had that. Let's not worry about how we get it. Well, then, how do I extend that?0:37:59
How do I find all possible ways to put down queens in the next column? It's really easy. For each of these positions I have, I think about putting0:38:12
down a queen in each row to make the next thing. And then for each one I put down, I filter those by the ones that are safe.0:38:22
So instead of thinking about this tree as generated step by step, suppose I had it all there. And to extend it from level k minus 1 to level k, I just0:38:32
need to extend each thing in all possible ways and only keep the ones that are safe. And that will give me the tree to level k. And that's a recursive strategy for solving the eight0:38:41
queens problem. All right, well, let's look at it.0:38:50
To solve the eight queens problem on a board of some specified size, we write a sub-procedure called0:39:00
fill-columns. Fill-columns is going to put down queens up through column k. And here's the pattern of the recursion. I'm going to call fill-columns with the size eventually.0:39:12
So fill-columns says how to put down queens safely in the first k columns of this chess board with a size number of rows in it. If k is equal to 0, well, then I don't have to0:39:22
put anything down. So my solution is just an empty chess board. Otherwise, I'm going to do some stuff. And I'm going to use collect. And here's the collect.0:39:34
I find all ways to put down queens in the first k minus 1 columns. And this was just what I set for.0:39:43
Imagine I have this tree down to k minus 1 levels. And then I find all ways of trying a row, that's just each0:39:53
of the possible rows. They're size rows, so that's enumerate interval. And now what I do is I collect together the new row I'm going0:40:03
to try and column k with the rest of the queens. I adjoin a position. This is George's problem. An adjoined position is like safe.0:40:13
It's a thing that takes a row and a column and the rest of the positions and makes a new position collection. So I adjoin a position of a new row and a new column to0:40:26
the rest of the queens, where the rest of the queens runs through all possible ways of solving the problem in k minus 1 columns. And the new row runs through all possible rows such that it0:40:39
was safe to put one there. And that's the whole program. There's the whole procedure.0:40:49
Not only that, that doesn't just solve the eight queens problem, it gives you all solutions to the eight queens problem. When you're done, you have a stream.0:40:58
And the elements of that stream are all possible ways of solving that problem. Why is that simpler? Well, we threw away the whole idea that this is some process0:41:10
that happens in time with state. And we just said it's a whole collection of stuff. And that's why it's simpler. We've changed our view.0:41:20
Remember, that's where we started today. We've changed our view of what it is we're trying to model. we stop modeling things that evolve in time and have steps0:41:30
and have state. And instead, we're trying to model this global thing like the whole flight of the chalk, rather than its state at each instant.0:41:40
Any questions? AUDIENCE: It looks to me like backtracking would be searching for the first solution it can find, whereas0:41:49
this recursive search would be looking for all solutions. And it seems that if you have a large enough area to search,0:41:58
that the second is going to become impossible. PROFESSOR: OK, the answer to that question is the whole0:42:07
rest of this lecture. It's exactly the right question. And without trying to anticipate the lecture too much, you should start being suspicious at this point, and0:42:19
exactly those kinds of suspicions. It's wonderful, but isn't it so terribly inefficient? That's where we're going.0:42:28
So I won't answer now, but I'll answer later. OK, let's take a break.0:43:29
Well, by now you should be starting to get suspicious. See, I've showed your this simple, elegant way of putting0:43:41
programs together, very unlike these other traditional programs that sum the odd squares or compute the odd0:43:50
Fibonacci numbers. Very unlike these programs that mix up the enumerator and the filter and the accumulator.0:44:00
And by mixing it up, we don't have all of these wonderful conceptual advantages of these streams pieces, these wonderful mix and match components for putting0:44:09
together lots and lots of programs. On the other hand, most of the programs you've seen look like these ugly ones.0:44:18
Why's that? Can it possibly be that computer scientists are so obtuse that they don't notice that if you'd merely did this0:44:28
thing, then you can get this great programming elegance? There's got to be a catch. And it's actually pretty easy to see what the catch is.0:44:39
Let's think about the following problem. Suppose I tell you to find the second prime between 10,000 and 1 million, or if your computer's larger, say between0:44:51
10,000 and 100 billion, or something. And you say, oh, that's easy. I can do that with a stream. All I do is I enumerate the interval0:45:01
from 10,000 to 1 million. So I get all those integers from 10,000 to 1 million. I filter them for prime-ness, so test all of them and see if0:45:10
they're prime. And I take the second element. That's the head of the tail. Well, that's clearly pretty ridiculous.0:45:21
We'd not even have room in the machine to store the integers in the first place, much less to test them. And then I only want the second one. See, the power of this traditional programming style0:45:36
is exactly its weakness, that we're mixing up the enumerating and the testing and the accumulating.0:45:45
So we don't do it all. So the very thing that makes it conceptually ugly is the very thing that makes it efficient.0:45:55
It's this mixing up. So it seems that all I've done this morning so far is just confuse you. I showed you this wonderful way that programming might work, except that it doesn't.0:46:05
Well, here's where the wonderful thing happens. It turns out in this game that we really can have our cake and eat it too.0:46:14
And what I mean by that is that we really can write stream programs exactly like the ones I wrote and arrange0:46:24
things so that when the machine actually runs, it's as efficient as running this traditional programming style that mixes up the generation and the test.0:46:36
Well, that sounds pretty magic. The key to this is that streams are not lists.0:46:48
We'll see this carefully in a second, but for now, let's take a look at that slide again. The image you should have here of this signal processing system is that what's going to happen is there's this box0:47:00
that has the integers sitting in it. And there's this filter that's connected to it and it's tugging on them.0:47:10
And then there's someone who's tugging on this stuff saying what comes out of the filter. And the image you should have is that someone says, well,0:47:19
what's the first prime, and tugs on this filter. And the filter tugs on the integers.0:47:28
And you look only at that much, and then say, oh, I really wanted the second one. What's the second prime? And that no computation gets done except when you tug on0:47:37
these things. Let me try that again. This is a little device. This is a little stream machine invented by Eric0:47:46
Grimson who's been teaching this course at MIT. And the image is here's a stream of stuff, like a whole bunch of the integers. And here's some processing elements.0:47:58
And if, say, it's filter of filter of map, or something. And if I really tried to implement that with streams as0:48:08
lists, what I'd say is, well, I've got this list of things, and now I do the first filter. So do all this processing. And I take this and I process and I process and I process0:48:18
and I process. And now I'm got this new stream. Now I take that result in my hand someplace. And I put that through the second one. And I process the whole thing.0:48:28
And there's this new stream. And then I take the result and I put it all the way through this one the same way. That's what would happen to these stream programs if0:48:41
streams were just lists. But in fact, streams aren't lists, they're streams. And the image you should have is something a little bit more like this.0:48:50
I've got these gadgets connected up by this data that's flowing out of them.0:48:59
And here's my original source of the streams. It might be starting to generate the integers. And now, what happens if I want a result? I tug on the end here.0:49:10
And this element says, gee, I need some more data. So this one comes here and tugs on that one. And it says, gee, I need some more data. And this one tugs on this thing, which might be a0:49:19
filter, and says, gee, I need some more data. And only as much of this thing at the end here gets generated as I tugged. And only as much of this stuff goes through the processing0:49:28
units as I'm pulling on the end I need. That's the image you should have of the difference between implementing what we're actually going to do and if streams were lists.0:49:40
Well, how do we make this thing? I hope you have the image. The trick is how to make it. We want to arrange for a stream to be a data structure0:49:52
that computes itself incrementally, an on-demand data structure. And the basic idea is, again, one of the very basic ideas0:50:02
that we're seeing throughout the whole course. And that is that there's not a firm distinction between programs and data. So what a stream is going to be is simultaneously this data0:50:12
structure that you think of, like the stream of the leaves of this tree. But at the same time, it's going to be a very clever procedure that has the method of computing in it.0:50:23
Well, let me try this. It's going to turn out that we don't need any more mechanism. We already have everything we need simply from the fact that we know how to handle procedures0:50:32
as first-class objects. Well, let's go back to the key. The key is, remember, we had these operations. CONS-stream and head and tail.0:50:48
When I started, I said you can think about this as CONS and think about this as CAR and think about that as CDR, but it's not. Now, let's look at what they really are.0:50:57
Well, CONS-stream of x and y is going to be an abbreviation0:51:09
for the following thing.0:51:19
CONS form a pair, ordinary CONS, of x to a thing called delay of y.0:51:31
And before I explain that, let me go and write the rest. The head of a stream is going to be just the CAR.0:51:42
And the tail of a stream is going to be a thing called force the CDR of the stream.0:51:56
Now let me explain this. Delay is going to be a special magic thing. What delay does is take an expression and produce a0:52:06
promise to compute that expression when you ask for it. It doesn't do any computation here. It just gives you a rain check. It produces a promise.0:52:17
And CONS-stream says I'm going to put together in a pair x and a promise to compute y.0:52:28
Now, if I want the head, that's just the CAR that I put in the pair. And the key is that the tail is going to be-- force calls in that promise.0:52:39
Tail says, well, take that promise and now call in that promise. And then we compute that thing. That's how this is going to work.0:52:48
That's what CONS-stream, head, and tail really are. Now, let's see how this works. And we'll go through this fairly carefully.0:52:58
We're going to see how this works in this example of computing the second prime between 10,000 and a million.0:53:08
OK, so we start off and we have this expression. The second prime-- the head of the tail of the result of0:53:20
filtering for primality the integers between 10,000 and 1 million. Now, what is that? What that is, that interval between 10,000 and 1 million,0:53:35
well, if you trace through enumerate interval, there builds a CONS-stream. And the CONS-stream is the CONS of 10,000 to a promise to0:53:45
compute the integers between 10,001 and 1 million.0:53:54
So that's what this expression is. Here I'm using the substitution model. And we can use the substitution model because we don't have side effects and state.0:54:04
So I have CONS of 10,000 to a promise to compute the rest of the integers. So only one integer, so far, got enumerated.0:54:14
Well, I'm going to filter that thing for primality. Again, you go back and look at the filter code. What the filter will first do is test the head.0:54:25
So in this case, the filter will test 10,000 and say, oh, 10,000's not prime. Therefore, what I have to do recursively0:54:36
is filter the tail. And what's the tail of it, well, that's the tail of this pair with a promise in it.0:54:46
Tail now comes in and says, well, I'm going to force that. I'm going to force that promise, which means now I'm going to compute the integers between 10,001 and 1 million.0:55:00
OK, so this filter now is looking at that. That enumerate itself, well, now we're back in the original0:55:10
enumerate situation. The enumerate is the CONS of the first thing, 10,001, onto a promise to compute the rest.0:55:19
So now the primality filter is going to go look at 10,001. It's going to decide if it likes that or not. It turns out 10,001 isn't prime. So it'll force it again and again and again.0:55:32
And finally, I think the first prime it hits is 10,009. And at that point, it'll stop. And that will be the first prime, and then eventually,0:55:42
it'll need the second prime. So at that point, it will go again. So you see what happens is that no more gets generated0:55:51
than you actually need. That enumerator is not going to generate any more integers0:56:00
than the filter asks it for as it's pulling in things to check for primality. And the filter is not going to generate any more stuff than you ask it for, which is the head of the tail.0:56:11
You see, what's happened is we've put that mixing of generation and test into what actually happens in the0:56:20
computer, even though that's not apparently what's happening from looking at our programs. OK, well, that seemed easy.0:56:30
All of this mechanism got put into this magic delay. So you're saying, gee, that must be where the magic is. But see there's no magic there either.0:56:39
You know what delay is. Delay on some expression is just an abbreviation for--0:56:53
well, what's a promise to compute an expression? Lambda of nil, procedure of no arguments, which is that expression.0:57:03
That's what a procedure is. It says I'm going to compute an expression. What's force? How do I take up a promise? Well, force of some procedure, a promise, is just run it.0:57:18
Done. So there's no magic there at all. Well, what have we done? We said the old style, traditional style of0:57:29
programming is more efficient. And the stream thing is more perspicuous. And we managed to make the stream procedures run like the0:57:40
other procedures by using delay. And the thing that delay did for us was to de-couple the apparent order of events in our programs from the actual0:57:52
order of events that happened in the machine. That's really what delay is doing. That's exactly the whole point. We've given up the idea that our procedures, as they run,0:58:04
or as we look at them, mirror some clear notion of time. And by giving that up, we give delay the freedom to arrange the order of events in the computation the way it likes.0:58:16
That's the whole idea. We de-couple the apparent order of events in our programs from the actual order of events in the computer. OK, well there's one more detail.0:58:25
It's just a technical detail, but it's actually an important one. As you run through these recursive programs unwinding, you'll see a lot of things that look like tail of the0:58:35
tail of the tail. That's the kind of thing that would happen as I go CONSing down a stream all the way. And if each time I'm doing that, each time to compute a0:58:47
tail, I evaluate a procedure which then has to go re-compute its tail, and re-compute its tail and recompute its tail each time, you can see that's very0:58:56
inefficient compared to just having a list where the elements are all there, and I don't have to re-compute each tail every time I get the next tail.0:59:05
So there's one little hack to slightly change what delay is,0:59:15
and make it a thing which is-- I'll write it this way. The actual implementation, delay is an abbreviation for0:59:27
this thing, memo-proc of a procedure. Memo-proc is a special thing that transforms a procedure. What it does is it takes a procedure of no arguments and0:59:39
it transforms it into a procedure that'll only have to do its computation once. And what I mean by that is, you give it a procedure.0:59:48
The result of memo-proc will be a new procedure, which the first time you call it, will run the original procedure, remember what result it got, and then from ever on after,1:00:00
when you call it, it just won't have to do the computation. It will have cached that result someplace. And here's an implementation of memo-proc.1:00:11
Once you have the idea, it's easy to implement. Memo-proc is this little thing that has two little flags in there. It says, have I already been run?1:00:20
And initially it says, no, I haven't already been run. And what was the result I got the last time I was run?1:00:29
So memo-proc takes a procedure called proc, and it returns a new procedure of no arguments. Proc is supposed to be a procedure of no arguments.1:00:38
And it says, oh, if I'm not already run, then I'm going to do a sequence of things. I'm going to compute proc, I'm going to save that.1:00:48
I'm going to stash that in the variable result. I'm going to make a note to myself that I've already been run, and then I'll return the result. So that's if you compute it if it's not already run.1:00:59
If you call it and it's already been run, it just returns the result. So that's a little clever hack called memoization.1:01:08
And in this case, it short circuits having to re-compute the tail of the tail of the tail of the tail of the tail. So there isn't even that kind of inefficiency.1:01:17
And in fact, the streams will run with pretty much the same efficiency as the other programs precisely. And remember, again, the whole idea of this is that we've1:01:28
used the fact that there's no really good dividing line between procedures and data. We've written data structures that, in fact, are sort of like procedures.1:01:38
And what that's allowed us to do is take an example of a common control structure, in this place iteration.1:01:49
And we've built a data structure which, since itself is a procedure, kind of has this iteration control structure in it. And that's really what streams are.1:01:58
OK, questions? AUDIENCE: Your description of tail-tail-tail, if I understand it correctly, force is actually execution of a1:02:10
procedure, if it's done without this memo-proc thing. And you implied that memo-proc gets around that problem. Doesn't it only get around it if tail-tail-tail is always1:02:20
executing exactly the same-- PROFESSOR: Oh, that's-- sure. AUDIENCE: I guess I missed that point. PROFESSOR: Oh, sure. I mean the point is--1:02:31
yeah. I mean I have to do a computation to get the answer. But the point is, once I've found the tail of the stream, to get the tail of the tail, I shouldn't have had to re-compute the first tail.1:02:42
See, and if I didn't use memo-proc, that re-computation would have been done. AUDIENCE: I understand now. AUDIENCE: In one of your examples, you mentioned that1:02:52
we were able to use the substitution model because there are no side effects. What if we had a single processing unit--1:03:01
if we had a side effect, if we had a state? Could we still practically build the stream model? PROFESSOR: Maybe. That's a hard question.1:03:10
I'm going to talk a little bit later about the places where substitution and side effects don't really mix very well. But in general, I think the answer is unless you're very1:03:21
careful, any amount of side effect is going to mess up everything.1:03:35
AUDIENCE: Sorry, I didn't quite understand the memo-proc operation. When do you execute the lambda? In other words, when memo-proc is executed, just this lambda1:03:46
expression is being generated. But it's not clear to me when it's executed. PROFESSOR: Right. What memo-proc does-- remember, the thing that's going into memo-proc, the thing proc, is a procedure of1:03:57
no arguments. And someday, you're going to call it. Memo-proc translates that procedure into another procedure of no arguments, which someday you're going to call.1:04:06
That's that lambda. So here, where I initially built as my tail of the1:04:17
stream, say, this procedure of no arguments, which someday I'll call. Instead, I'm going to have the tail of the stream be1:04:27
memo-proc of it, which someday I'll call. So that lambda of nil, that gets called when you call the memo-proc, when you call the result of that memo-proc,1:04:40
which would be ordinarily when you would have called the original thing that you set it. AUDIENCE: OK, the reason I ask is I had a feeling that when1:04:49
you call memo-proc, you just return this lambda. PROFESSOR: That's right. When you call memo-proc, you return the lambda.1:04:58
You never evaluate the expression at all, until the first time that you would have evaluated it.1:05:07
AUDIENCE: Do I understand it right that you actually have to build the list up, but the elements of the list don't get evaluated? The expressions don't get evaluated? But at each stage, you actually are building a list.1:05:18
PROFESSOR: That's-- I really should have said this. That's a really good point. No, it's not quite right. Because what happens is this. Let me draw this as pairs. Suppose I'm going to make a big stream, like enumerate1:05:29
interval, 1 through 1 billion. What that is, is a pair with a 1 and a promise.1:05:46
That's exactly what it is. Nothing got built up. When I go and force this, and say, what happens?1:05:56
Well, this thing is now also recursively a CONS. So that this promise now is the next thing, which is a 21:06:07
and a promise to do more. And so on and so on and so on. So nothing gets built up until you walk down the stream.1:06:18
Because what's sitting here is not the list, but a promise to generate the list. And by promise, technically I mean procedure.1:06:28
So it doesn't get built up. Yeah, I should have said that before this point. OK. Thank you. Let's take a break.0:00:00
Lecture 6B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:20
PROFESSOR: OK, well, we've been looking at streams, this signal processing way of putting systems together. And remember, the key idea is that we decouple the apparent0:00:35
order of events in our programs from the actual order of events in the computer. And that means that we can start dealing with very long streams and only having to generate0:00:46
the elements on demand. That sort of on-demand computation is built into the stream's data structure. So if we have a very long stream, we only0:00:55
compute what we need. The things only get computed when we actually ask for them. Well, what are examples? Are they actually asking for them?0:01:04
For instance, we might ask for the n-th element of a stream.0:01:16
Here's a procedure that computes the n-th element of a stream. An integer n, the n-th element of some stream s, and we just recursively walk down the stream.0:01:25
And the end of 0, we compute the head. Otherwise, it's the n-th the minus 1 element of the tail of the stream.0:01:34
Those two are just like for Lisp, but the difference is those elements aren't going to get computed until we walk down, taking successive n-ths. So that's one way that the stream0:01:43
elements might get forced. And another way, here's a little procedure that prints a stream. We say print a stream, so to print a stream s.0:01:54
Well, what do we do? We print the head of the stream, and that will cause the head to be computed. And then we recursively print stream the tail of the stream.0:02:04
And if we're already done, maybe we have to return something about the message done. OK, and then so if you make a stream, you could say here's the stream, this very long stream.0:02:14
And then you say print the stream, and the elements of the stream will get computed successively as that print calls them. They won't get all computed initially.0:02:24
So in this way, we can deal with some very long streams. Well, how long can a stream be?0:02:33
Well, it can be infinitely long. Let's look at an example here on the computer. I could walk up to this computer, and I could say--0:02:43
how about we'll define the stream of integers starting0:02:52
with some number N, the stream of positive integers starting with some number n. And that's cons-stream of n onto the0:03:12
integers from one more.0:03:24
So there are the integers. Then I could say let's get all the integers.0:03:34
define the stream of integers to be the integers0:03:43
starting with 1. And now if I say something like what's the what's the0:03:54
20th integer. So it's 21 because we start counting at 0.0:04:07
Or I can do more complicated things. Let me to define a little predicate here. How about define no-seven.0:04:19
It's going to test an integer, and it's going to say it's not.0:04:28
I take the remainder of x by 7, I don't get 0.0:04:41
And then I could say define the integers with no sevens to0:04:50
be, take all the integers and filter them to have no sevens.0:05:11
So now I've got the stream of all the integers that are not divisible by seven. So if I say what's the 100th integer and the list not0:05:25
divisible by seven, I get 117. Or if I'd like to say well, gee, what are all of them?0:05:35
So I could say print stream all these integers with no seven, it goes off printing.0:05:45
You may have to wait a very long time to see them all. Well, you can start asking, gee, is it really true that0:05:56
this data structure with the integers is really all the integers? And let me draw a picture of that program I just wrote.0:06:08
Here's the definition of the integers again that I just typed in, Right it's a cons of the first integer under the integer starting with the rest. Now, we can make a0:06:18
picture of that and see what it looks like. Conceptually, what I have is a box that's the integer starting with n.0:06:27
It takes in some number n, and it's going to return a stream of-- this infinite stream of all integers starting with n.0:06:37
And what do I do? Well, this is an integers from box. What's it got in it? Well, it takes in this n, and it increments it.0:06:58
And then it puts the result into recursively another integer's from box. It takes the result of that and the original n and puts0:07:10
those together with a cons and forms a stream. So that's a picture of that program I wrote. Let's see. These kind of diagrams we first saw drawn by Peter0:07:21
Henderson, the same guy who did the Escher language. We call them Henderson diagrams. And the convention here is that you put these things together. And the solid lines are things coming out are streams, and0:07:33
dotted lines are initial values going in. So this one has the shape of-- it takes in some integer, some initial value, and outputs a stream.0:07:46
Again, you can ask. Is that data structure integers really all the integers? Or is it is something that's cleverly arranged so that0:07:55
whenever you look for an integer you find it there? That's sort of a philosophical question, right? If something is there whenever you look, is it really there or not?0:08:04
It's sort of the same sense in which the money in your savings account is in the bank. Well, let me do another example.0:08:19
Gee, we started the course with an algorithm from Alexandria, which was Heron of Alexandria's algorithm for computing the square root.0:08:28
Let's take a look at another Alexandrian algorithm. This one is Eratosthenes method for computing all of0:08:37
the primes. It is called the Sieve of Eratosthenes. And what you do is you start out, and you list all the0:08:51
integers, say, starting with 2. And then you take the first integer, and you say, oh, that's prime. And then you go look at the rest, and you cross out all the things divisible by 2.0:09:01
So I cross out this and this and this. This takes a long time because I have to do it for all of the integers.0:09:11
So I go through the entire list of integers, crossing the ones divisible by 2.0:09:22
And now when I finish with all of the integers, I go back and look and say what am I left with? Well, the first thing that starts there is 3. So 3 is a prime. And now I go back through what I'm left with, and I cross out0:09:33
all the things divisible by 3. So let's see, 9 and 15 and 21 and 27 and 33 and so on.0:09:44
I won't finish. Then I see what I'm left with. And the next one I have is 5. Now I can through the rest, and I find the first one0:09:53
that's divisible by 5. I cross out from the remainder all the ones that are divisible by 5. And I do that, and then I go through and find 7. Go through all the rest, cross out things divisible 7, and I0:10:04
keep doing that forever. And when I'm done, what I'm left with is a list of all the primes. So that's the Sieve of Eratosthenes.0:10:15
Let's look at it as a computer program. It's a procedure called sieve.0:10:27
Now, I just write what I did. I'll say to sieve some stream s.0:10:38
I'm going to build a stream whose first element is the head of this. Remember, I always found the first thing I was left with, and the rest of it is the result of taking the tail of0:10:48
this, filtering it to throw away all the things that are divisible by the head of this, and now sieving the result.0:10:59
That's just what I did. And now to get the infinite stream of times, we just sieve all the integers starting from 2.0:11:14
Let's try that. We can actually do it. I typed in the definition of sieve before, I hope, so I can0:11:23
say something like define the primes to be the result of0:11:35
sieving the integers starting with 2.0:11:46
So now I've got this list of primes. That's all of the primes, right? So, if for example, what's the 20th prime in that list?0:12:01
73. See, and that little pause, it was only at the point when I started asking for the 20th prime is that it started computing.0:12:10
Or I can say here let's look at all of the primes.0:12:22
And there it goes computing all of the primes. Of course, it will take a while again if I want to look at all of them, so let's stop it.0:12:32
Let me draw you a picture of that. Well, I've got a picture of that. What's that program really look like? Again, some practice with these diagrams, I have a sieve box.0:12:42
How does sieve work? It takes in a stream. It splits off the head from the tail. And the first thing that's going to come out of the sieve0:12:53
is the head of the original stream. Then it also takes the head and uses that.0:13:02
It takes the stream. It filters the tail and uses the head to filter for nondivisibility. It takes the result of nondivisibility and puts it0:13:11
through another sieve box and puts the result together. So you can think of this sieve a filter, but notice that it's an infinitely recursive filter. Because inside the sieve box is another sieve box, and0:13:23
inside that is another sieve box and another sieve box. So you see we start getting some very powerful things. We're starting to mix this signal processing view of the0:13:32
world with things like recursion that come from computation. And there are all sorts of interesting things you can do that are like this. All right, any questions?0:13:48
OK, let's take a break.0:14:28
Well, we've been looking at a couple of examples of stream programming. All the stream procedures that we've looked at so far have0:14:39
the same kind of character. We've been writing these recursive procedures that kind of generate these stream elements one at a time and put them together in cons-streams. So we've been thinking a lot0:14:50
about generators. There's another way to think about stream processing, and that's to focus not on programs that sort of process these elements as you walk down the stream, but on things0:15:00
that kind of process the streams all at once. To show you what I mean, let me start by defining two0:15:09
procedures that will come in handy. The first one's called add streams. Add streams takes two streams: s1 and s2.0:15:22
and. It's going to produce a stream whose elements are the are the corresponding sums. We just sort of add them element-wise.0:15:32
If either stream is empty, we just return the other one. Otherwise, we're going to make a new stream whose head is the0:15:42
sum of the two heads and whose tail is the result of recursively adding the tails. So that will produce the element-wise sum of two0:15:52
streams. And then another useful thing to have around is scale stream. Scale stream takes some constant number in a stream s0:16:04
and is going to produce the stream of elements of s multiplied by this constant. And that's easy, that's just a map of the function of an0:16:14
element that multiplies it by the constant, and we map that down the stream. So given those two, let me show you what I mean by0:16:23
programs that operate on streams all at once. Let's look at this. Suppose I write this. I say define--0:16:36
I'll call it ones-- to be cons-stream of 1 onto ones.0:16:54
What's that? That's going to be an infinite stream of ones because the first thing is 1.0:17:03
And the tail of it is a thing whose first thing is 1 and whose tail is a thing whose first thing is 1 and so on and so on and so on. So that's an infinite stream of ones.0:17:15
And now using that, let me give you another definition of the integers. We can define the integers to be--0:17:28
well, the first integer we'll take to be 1, this cons-stream of 1 onto the element-wise sum onto add streams of the0:17:42
integers to ones.0:17:54
The integers are a thing whose first element is 1, and the rest of them you get by taking those integers and0:18:04
incrementing each one by one. So the second element of the integers is the first element of the integers incremented by one.0:18:13
And the rest of that is the next one, and the third element of that is the same as the first element of the tail of the integers incremented by one, which is the same as the0:18:25
first element of the original integers incremented by one and incremented by one again and so on.0:18:35
That looks pretty suspicious. See, notice that it works because of delay. See, this looks like-- let's take a look at ones. This looks like it couldn't even be processed because it's0:18:46
suddenly saying in order to know what ones is, I say it's cons-stream of something onto ones. The reason that works is because of that very sneaky hidden delay in there.0:18:55
Because what this really is, remember, cons-stream is just an abbreviation. This really is cons of 1 onto delay of ones.0:19:12
So how does that work? You say I'm going to define ones. First I see what ones is supposed to be defined as. Well, ones is supposed to be defined as a cons whose first0:19:27
part is 1 and whose second part is, well, it's a promise to compute something that I don't worry about yet. So it doesn't bother me that at the point I do this definition, ones isn't defined.0:19:37
Having run the definition now, ones is defined. So that when I go and look at the tail of it, it's defined. It's very sneaky.0:19:46
And an integer is the same way. I can refer to integers here because hidden way down-- because of this cons-stream. It's the cons-stream of 1 onto something that I0:19:56
don't worry that yet. So I don't look at it, and I don't notice that integers isn't defined at the point where I try and run the definition.0:20:06
OK, let me draw a picture of that integers thing because it still maybe seems a little bit shaky. What do I do?0:20:15
I've got the stream of ones, and that sort of comes in and goes into an adder that's going to be0:20:25
this add streams thing. And that goes in-- that's going to put out the integers.0:20:40
And the other thing that goes into the adder here is the integer, so there's a little feedback loop. And all I need to start it off is someplace I've got a stick0:20:51
that initial 1. In a real signal processing thing, this might be a delay0:21:00
element with that was initialized to 1. But there's a picture of that ones program. And in fact, that looks a lot like--0:21:09
if you've seen real signal block diagram things, that looks a lot like accumulators, finite state accumulators. And in fact, we can modify this a little bit to change0:21:21
this into something that integrates a stream or a finite state accumulator, however you like to think about it. So instead of the ones coming in and getting out the0:21:30
integers, what we'll do is say there's a stream s coming in, and we're going to get out the integral of this, successive0:21:43
values of that, and it looks almost the same. The only thing we're going to do is when s comes in here, before we just add it in we're going to multiply it0:21:53
by some number dt. And now what we have here, this is exactly the same thing. We have a box, which is an integrator.0:22:09
And it takes in a stream s, and instead of 1 here, we can put the additional value for the integral.0:22:19
And that one looks very much like a signal processing block diagram program. In fact, here's the procedure that looks exactly like that.0:22:31
Find the integral of a stream. So an integral's going to take a stream and produce a new stream, and it takes in an initial value and some time constant.0:22:42
And what do we do? Well, we internally define this thing int, and we make this internal name so we can feed it back, loop it around itself. And int is defined to be something that starts out at0:22:52
the initial value, and the rest of it is gotten by adding together.0:23:01
We take our input stream, scale it by dt, and add that to int. And now we'll return from all that the value of integral is this thing int.0:23:10
And we use this internal definition syntax so we could write a little internal definition that refers to itself.0:23:21
Well, there are all sorts of things we can do. Let's try this one. how about the Fibonacci numbers. You can say define fibs.0:23:36
Well, what are the Fibonacci numbers? They're something that starts out with 0, and0:23:48
the next one is 1. And the rest of the Fibonacci numbers are gotten by adding0:24:06
the Fibonacci numbers to their own tail.0:24:17
There's a definition of the Fibonacci numbers. How does that work? Well, we start off, and someone says compute for us the Fibonacci numbers, and we're going to tell you it0:24:30
starts out with 0 and 1. And everything after the 0 and 1 is gotten by summing two0:24:40
streams. One is the fibs themselves, and the other one is the tail of the fibs. So if I know that these start out with 0 and 1, I know that0:24:52
the fibs now start out with 0 and 1, and the tail of the fibs start out with 1. So as soon as I know that, I know that the next one here is 0 plus 1 is 1, and that tells me that the next one here is 10:25:04
and the next one here is 1. And as soon as I know that, I know that the next one is 2. So the next one here is 2 and the next one here is 2. And this is 3.0:25:14
This one goes to 3, and this is 5. So it's a perfectly sensible definition. It's a one-line definition. And again, I could walk over to the computer and type that0:25:25
in, exactly that, and then say print stream the Fibonacci numbers, and they all come flying out. See, this is a lot like learning0:25:34
about recursion again. Instead of thinking that recursive procedures, we have recursively defined data objects.0:25:45
But that shouldn't surprise you at all, because by now, you should be coming to really believe that there's no difference really between procedures and data. In fact, in some sense, the underlying streams are0:25:55
procedures sitting there, although we don't think of them that way. So the fact that we have recursive procedures, well, then it should be natural that we have recursive data, too.0:26:07
OK, well, this is all pretty neat. Unfortunately, there are problems that streams aren't going to solve. Let me show you one of them.0:26:17
See, in the same way, let's imagine that we're building an analog computer to solve some differential equation like,0:26:26
say, we want to solve the equation y prime dy dt is y squared, and I'm going to give you some initial value.0:26:36
I'll tell you y of 0 equals 1. Let's say dt is equal to something.0:26:46
Now, in the old days, people built analog computers to solve these kinds of things. And the way you do that is really simple. You get yourself an integrator, like that one, an0:27:01
integrator box. And we put in the initial value y of 0 is 1. And now if we feed something in and get something out,0:27:10
we'll say, gee, what we're getting out is the answer. And what we're going to feed in is the derivative, and the derivative is supposed to be the square of the answer.0:27:21
So if we take these values and map using square, and if I0:27:31
feed this around, that's how I build a block diagram for an analog computer that solves this differential equation.0:27:42
Now, what we'd like to do is write a stream program that looks exactly like that. And what do I mean exactly like that? Well, I'd say define y to be the integral of dy starting at0:28:08
1 with 0.001 as a time step. And I'd like to say that says this. And then I'd like to say, well, dy is gotten by mapping0:28:19
the square along y. So define dy to be map square along y.0:28:33
So there's a stream description of this analog computer, and unfortunately, it doesn't work. And you can see why it doesn't work because when I come in0:28:43
and say define y to be the integral of dy, it says, oh, the integral of y-- huh? Oh, that's undefined.0:28:53
So I can't write this definition before I've written this one. On the other hand, if I try and write this one first, it says, oh, I define y to be the map of square along y?0:29:03
Oh, that's not defined yet. So I can't write this one first, and I can't write that one first. So I can't quite play this game.0:29:17
Well, is there a way out? See, we can do that with ones. See, over here, we did this thing ones, and we were able0:29:27
to define ones in terms of ones because of this delay that was built inside because cons-stream had a delay. Now, why's it sensible?0:29:36
Why's it sensible for cons-stream to be built with this delay? The reason is that cons-stream can do a useful thing without looking at its tail.0:29:45
See, if I say this is cons-stream of 1 onto something without knowing anything about something, I know that the stream starts off with 1.0:29:54
That's why it was sensible to build something like cons-stream. So we put a delay in there, and that allows us to have this sort of self-referential definition.0:30:06
Well, integral is a little bit the same way. See, notice for an integral, I can-- let's go back and look at integral for a second.0:30:17
See, notice integral, it makes sense to say what's the first thing in the integral without knowing the stream that you're0:30:27
integrating. Because the first thing in the integral is always going to be the initial value that you're handed. So integral could be a procedure like cons-stream.0:30:37
You could define it, and then even before it knows what it's supposed to be integrating, it knows enough to say what its initial value is.0:30:46
So we can make a smarter integral, which is aha, you're going to give me a stream to integrate and an initial value, but I really don't have to look at that stream that I'm supposed to integrate until you ask me to work down0:30:56
the stream. In other words, integral can be like cons-stream, and you can expect that there's going to be a delay around its integrand. And we can write that.0:31:05
Here's a procedure that does that. Another version of integral, and this is almost like the previous one, except the stream it's going to get in is going to expect to be a delayed object.0:31:17
And how does this integral work? Well, the little thing it's going to define inside of itself says on the cons-stream, the initial value is the initial value, but only inside of that cons-stream,0:31:29
and remember, there's going to be a hidden delay inside here. Only inside of that cons-stream will I start0:31:38
looking at what the actual delayed object is. So my answer is the first thing's the initial value. If anybody now asks me for my tail, at that point, I'm going0:31:50
to force that delayed object-- and I'll call that s-- and I do the add streams. So this is an integral which is sort of like cons-stream.0:31:59
It's not going to actually try and see what you handed it as the thing to integrate until you look past the first element.0:32:10
And if we do that and we can make this work, all we have to do here is say define y to the integral of delay of y, of0:32:24
delay of dy. So y is going to be the integral of delay of dy0:32:33
starting at 1, and now this will work. Because I type in the definition of y, and that says, oh, I'm supposed to use the integral of something I don't care about right now because it's a delay.0:32:44
And these things, now you define dy. Now, y is defined. So when I define dy, it can see that definition for y. Everything is now started up. Both streams have their first element.0:32:54
And then when I start mapping down, looking at successive elements, both y and dy are defined. So there's a little game you can play that goes a little bit beyond just using the delay that's hidden inside0:33:06
streams. Questions? OK, let's take a break.0:34:07
Well, just before the break, I'm not sure if you noticed it, but something nasty started to happen. We've been going along with the streams and divorcing time0:34:21
in the programs from time in the computers, and all that divorcing got hidden inside the streams. And then at the very end, we saw that sometimes in order to really0:34:30
take advantage of this method, you have to pull out other delays. You have to write some explicit delays that are not hidden inside that cons-stream.0:34:39
And I did a very simple example with differential equations, but if you have some very complicated system with all kinds of self-loops, it becomes very, very difficult to see where you need those delays.0:34:49
And if you leave them out by mistake, it becomes very, very difficult to see why the thing maybe isn't working. So that's kind of mess, that by getting this power and0:35:00
allowing us to use delay, we end up with some very complicated programming sometimes, because it can't all be hidden inside the streams. Well, is there a way out of that?0:35:11
Yeah, there is a way out of that. We could change the language so that all procedures acted like cons-stream, so that every procedure automatically0:35:22
has an implicit delay around its arguments. And what would that mean? That would mean when you call a procedure, the arguments wouldn't get evaluated.0:35:32
Instead, they'd only be evaluated when you need them, so they might be passed off to some other procedure, which wouldn't evaluate them either. So all these procedures would be passing promises around.0:35:42
And then finally maybe when you finally got down to having to look at the value of something that was handed to a primitive operator would you actually start calling in all those promises.0:35:52
If we did that, since everything would have a uniform delay, then you wouldn't have to write any explicit delays, because it would be automatically built into the way the language works.0:36:02
Or another way to say that, technically what I'm describing is what's called-- if we did that, our language would be so-called0:36:12
normal-order evaluation language versus what we've0:36:22
actually been working with, which is called applicative order--0:36:31
versus applicative-order evaluation. And remember the substitution model for applicative order. It says when you go and evaluate a combination, you0:36:40
find the values of all the pieces. You evaluate the arguments and then you substitute them in the body of the procedure. Normal order says no, don't do that.0:36:49
What you do is effectively substitute in the body of the procedure, but instead of evaluating the arguments, you just put a promise to compute them there.0:36:58
Or another way to say that is you take the expressions for the arguments, if you like, and substitute them in the body of the procedure and go on, and never really simplify anything until you get down to a primitive operator.0:37:09
So that would be a normal-order language. Well, why don't we do that? Because if we did, we'd get all the advantages of delayed evaluation with none of the mess.0:37:18
In fact, if we did that and cons was just a delayed procedure, that would make cons the same as cons-stream. We wouldn't need streams of all because lists would0:37:27
automatically be streams. That's how lists would behave, and data structures would behave that way. Everything would behave that way, right? You'd never really do any computation until you actually0:37:38
needed the answer. You wouldn't have to worry about all these explicit annoying delays. Well, why don't we do that?0:37:47
First of all, I should say people do do that. There's some very beautiful languages. One of the very nicest is a language called Miranda, which0:37:56
is developed by David Turner at the University of Kent. And that's how this language works. It's a normal-order language and its data structures, which0:38:06
look like lists, are actually streams. And you write ordinary procedures in Miranda, and they do these prime things and eight queens things, just without anything special. It's all built in there.0:38:17
But there's a price. Remember how we got here. We're decoupling time in the programs0:38:26
from time in the machines. And if we put delay, that sort of decouples it everywhere, not just in streams. Remember what we're trying to do. We're trying to think about programming as a way to0:38:36
specify processes. And if we give up too much time, our language becomes more elegant, but it becomes a little bit less expressive.0:38:47
There are certain distinctions that we can't draw. One of them, for instance, is iteration. Remember this old procedure, iterative factorial, that we0:38:58
looked at quite a long time ago. Iterative factorial had a thing, and it said there was an internal procedure, and there was a state which was a product and a counter, and we iterate that0:39:09
going around the loop. And we said that was an iterative procedure because it didn't build up state. And the reason it didn't build up state is because this iter0:39:19
that's called is just passing these things around to itself. Or in the substitution model, you could see in the substitution model that Jerry did, that in an iterative0:39:29
procedure, that state doesn't have to grow. And in fact, we said it doesn't, so this is an iteration. But now think about this exact same text if we had a normal-order language.0:39:41
What would happen is this would no longer be an iterative procedure? And if you really think about the details of the substitution model, which I'm not going to do here, this0:39:51
expression would grow. Why would it grow? It's because when iter calls itself, it calls itself with this product. If it's a normal-order language, that multiplication0:40:00
is not going to get done. That's going to say I'm to call myself with a promise to compute this product. And now iter goes around again.0:40:09
And I'm going to call myself with a promise to compute this product where now one of the one factors is a promise.0:40:18
And I call myself again. And if you write out the substitution model for that iterative process, you'll see exactly the same growth in state, all those promises that are getting remembered that0:40:29
have to get called in at the very end. So one of the disadvantages is that you can't really express iteration. Maybe that's a little theoretical reason why not,0:40:39
but in fact, people who are trying to write real operating systems in these languages are running into exactly these types of problems. Like it's perfectly possible to0:40:51
implement a text editor in languages like these. But after you work a while, you suddenly have 3 megabytes of stuff, which is--0:41:01
I guess they call them the dragging tail problem of people who are looking at these, of promises that sort of haven't been called in because you couldn't quite express an iteration.0:41:10
And one of the research questions in these kinds of languages are figuring out the right compiler technology to get rid of the so-called dragging tails.0:41:20
It's not simple. But there's another kind of more striking issue about why you just don't go ahead and make your0:41:30
language normal order. And the reason is that normal-order evaluation and side effects just don't mix.0:41:42
They just don't go together very well. Somehow, you can't-- it's sort of you can't simultaneously go around0:41:51
trying to model objects with local state and change and at the same time do these normal-order tricks of de-coupling time.0:42:00
Let me just show you a really simple example, very, very simple. Suppose we had a normal-order language. And I'm going to start out in this language.0:42:09
This is now normal order. I'm going to define x to be 0. It's just some variable I'll initialize. And now I'm going to define this little funny function,0:42:18
which is an identity function. And what it does, it keeps track of the last time you called it using x.0:42:31
So the identity of n just returns n, but it sets x to be n. And now I'll define a little increment function, which is a0:42:40
very little, simple scenario. Now, imagine I'm interacting with this in the normal-order language, and I type the following. I say define y to be increment the identity function of 3, so0:42:52
y is going to be 4. Now, I say what's x? Well, x should have been the value that was remembered last0:43:02
when I called the identity function. So you'd expect to say, well, x is 3 at this point, but it's not. Because when I defined y here, what I really defined y to be0:43:13
increment of a promise to do this thing. So I didn't look at y, so that identity function didn't get run. So if I type in this definition and look at x, I'm0:43:24
going to get 0. Now, if I go look at y and say what's y, say y is 4, looking0:43:33
at y, that very active looking at y caused the identity function to be run. And now x will get remembered as 3. So here x will be 0.0:43:42
Here, x will be 3. That's a tiny, little, simple scenario, but you can see what kind of a mess that's going to make for debugging interactive0:43:52
programs when you have normal-order evaluation. It's very confusing. But it's very confusing for a very deep reason, which is0:44:03
that the whole idea of putting in delays is that you throw away time. That's why we can have these infinite processes. Since we've thrown away time, we don't have to wait for them0:44:13
to run, right? We decouple the order of events in the computer from what we write in our programs. But when we talk about state0:44:23
and set and change, that's exactly what we do want control of. So it's almost as if there's this fundamental contradiction0:44:32
in what you want. And that brings us back to these sort of philosophical mutterings about what is it that you're trying to model and how do you look at the world.0:44:42
Or sometimes this is called the debate over functional programming.0:44:53
A so-called purely functional language is one that just doesn't have any side effects. Since you have no side effects, there's no assignment0:45:02
operator, so there are no terrible consequences of it. You can use a substitution-like thing. Programs really are like mathematics and not like0:45:11
models in the real world, not like objects in the real world. There are a lot of wonderful things about functional languages. Since there's no time, you never have any synchronization problems. And if you want to put something into a parallel0:45:23
algorithm, you can run the pieces of that parallel processing any way you want. There's just never any synchronization to worry that, and it's a very congenial environment for doing this.0:45:33
The price is you give up assignment. So an advocate of a functional language would say, gee, that's just a tiny price to pay.0:45:44
You probably shouldn't use assignment most of the time anyway. And if you just give up assignment, you can be in this much, much nicer world than this place with objects.0:45:54
Well, what's the rejoinder to that? Remember how we got into this mess. We started trying to model things that had local state.0:46:04
So remember Jerry's random number generator. There was this random number generator that had some little state in it to compute the next random number and the next random number and the next random number.0:46:14
And we wanted to hide that state away from the Cesaro compute part process, and that's why we needed set. We wanted to package that stated modularly.0:46:24
Well, a functional programming person would say, well, you're just all wet. I mean, you can write a perfectly good modular program. It's just you're thinking about modularity wrong.0:46:33
You're hung up in this next random number and the next random number and the next random number. Why don't you just say let's write a program. Let's write an enumerator which just generates an0:46:42
infinite stream of random numbers. We can sort of have that stream all at once, and that's0:46:52
going to be our source of random numbers. And then if you like, you can put that through some sort of processor, which is-- I don't know-- a Cesaro test, and that can do what it wants.0:47:06
And what would come out of there would be a stream of0:47:16
successive approximations to pi.0:47:28
So as we looked further down this stream, we'd tug on this Cesaro thing, and it would pull out more and more random numbers. And the further and further we look down the stream, the0:47:37
better an approximation we'd get to pi. And it would do exactly the same as the other computation, except we're thinking about the modularity different. We're saying imagine we had all those infinite streams of0:47:46
random numbers all at once. You can see the details of this procedure in the book. Similarly, there are other things that we tend to get0:47:56
locked into on this one and that one and the next one and the next one, which don't have to be that way. Like you might think about like a banking system, which0:48:07
is a very simple idea. Imagine we have a program that sort of represents a bank account.0:48:18
The bank account might have in it-- if we looked at this in a sort of message-passing view of the world, we'd say a bank account is an object that has some0:48:29
local state in there, which is the balance, say. And a user using this system comes and sends a transaction request. So the user sends a transaction request, like0:48:41
deposit some money, and the bank account maybe-- let's say the bank account always responds with what the current balance is. The user says let's deposits some money, and the bank0:48:50
account sends back a message which is the balance. And the user says deposit some more, and the bank account sends back a message.0:48:59
And just like the random number generator, you'd say, gee, we would like to use set. We'd like to have balance be a piece of local state inside this bank account because we want to separate the state of0:49:08
the user from the state of the bank account. Well, that's the message-processing view. There's a stream view with that thing, which does the0:49:20
same thing without any set or side effects. And the idea is again we don't think about anything having0:49:29
local state. We think about the bank account as something that's going to process a stream of transaction requests.0:49:38
So think about this bank account not as something that goes message by message, but something that takes in a stream of transaction requests like maybe successive deposit announced.0:49:49
1, 2, 2, 4, those might be successive amounts to deposit. And then coming out of it is the successive0:49:58
balances 1, 3, 5, 9. So we think of the bank account not as something that has state, but something that acts sort of on the infinite0:50:09
stream of requests. But remember, we've thrown away time. So what we can do is if the user's here, we can have this infinite stream of requests being generated one at a time0:50:21
coming from the user and this transaction stream coming back on a printer being printed one at a time.0:50:30
And if we drew a little line here, right there to the user, the user couldn't tell that this system doesn't have state.0:50:39
It looks just like the other one, but there's no state in there. And by the way, just to show you, here's an actual0:50:48
implementation of this-- we'll call it make deposit account because you can only deposit. It takes an initial balance and then a stream of deposits0:50:57
you might make. And what is it? Well, it's just cons-stream of the balance onto make a new account stream whose initial balance is the old balance0:51:08
plus the first thing in the deposit stream and make deposit account works on the rest of which is the tail of the deposit stream.0:51:18
So there's sort of a very typical message-passing, object-oriented thing that's done without side effects at all.0:51:28
There are very many things you can do this way. Well, can you do everything without assignment? Can everybody go over to purely functional languages?0:51:40
Well, we don't know, but there seem to be places where purely functional programming breaks down. Where it starts hurting is when you have things like0:51:50
this, but you also mix it up with the other things that we had to worry that, which are objects and sharing and two independent agents being the same. So under a typical one, suppose you want to extend0:52:00
this bank account. So here's a bank account.0:52:12
Bank accounts take in a stream of transaction requests and put out streams of, say, balances or responses to that. But suppose you want to model the fact that this is a joint0:52:21
bank account between two independent people. So suppose there are two people, say, Bill and Dave,0:52:31
who have a joint bank account. How would you model this? Well, Bill puts out a stream of transaction requests, and0:52:40
Dave puts out a stream of transaction requests, and somehow, they have to merge into this bank account. So what you might do is write a little stream processing thing called merge, which sort of takes these, merges them0:52:58
together, produces a single stream for the bank account. Now they're both talking to the same bank account. That's all great, but how do you write merge? What's this procedure merge?0:53:09
You want to do something that's reasonable. Your first guess might be to say, well, we'll take alternate requests from Bill and Dave. But what happens if0:53:20
suddenly in the middle of this thing, Dave goes away on vacation for two years? Then Bill's sort of stuck. So what you want to do is-- well, it's hard to describe.0:53:29
What you want to do is what people call fair merge.0:53:38
The idea of fair merge is it sort of should do them alternately, but if there's nothing waiting here, it should take one twice. Notice I can't even say that without talking about time.0:53:51
So one of the other active researcher areas in functional languages is inventing little things like fair merge and0:54:00
maybe some others, which will take the places where I used to need side effects and objects and sort of hide them away in some very well-defined modules of the system so that0:54:11
all the problems of assignment don't sort of leak out all over the system but are captured in some fairly well-understood things.0:54:20
More generally, I think what you're seeing is that we're running across what I think is a very basic problem in computer science, which is how to define languages that0:54:29
somehow can talk about delayed evaluation, but also be able to reflect this view that there are objects in the world.0:54:38
How do we somehow get both? And I think that's a very hard problem. And it may be that it's a very hard problem that has almost nothing to do with computer science, that it really is a0:54:49
problem having to do with two very incompatible ways of looking at the world. OK, questions?0:55:17
AUDIENCE: You mentioned earlier that once you introduce assignment, the general rule for using the substitution model is you can't. Unless you're very careful, you can't.0:55:27
PROFESSOR: Right. AUDIENCE: Is there a set of techniques or a set of guidelines for localizing the effects of assignment so that0:55:37
the very careful becomes defined? PROFESSOR: I don't know. Let me think. Well, certainly, there was an assignment inside memo proc,0:55:50
but that was sort of hidden away. It ended up not making any difference. Part of the reason for that is once this thing triggered that it had run and gotten an answer, that answer will never change.0:56:00
So that was sort of a one-time assignment. So one very general thing you can do is if you only do what's called a one-time assignment and never change anything, then you can do better.0:56:11
One of the problems in this merge thing, people have-- let me see if this is right. I think it's true that with fair merge, with just fair0:56:22
merge, you can begin effectively simulating assignment in the rest of the language. It seems like anything you do to go outside--0:56:33
I'm not quite sure that's true for fair merge, but it's true of a little bit more general things that people have been doing. So it might be that any little bit you put in, suddenly if0:56:42
they allow you to build arbitrary stuff, it's almost as bad as having assignment altogether. But that's an area that people are thinking about now.0:56:51
AUDIENCE: I guess I don't see the problem here with merge if I call Bill, if Bill is a procedure, then Bill is going0:57:00
to increment the bank account or build the list that 's going to put in the next element. If I call Dave twice in a row, that will do that. I'm not sure where fair merge has to be involved.0:57:09
PROFESSOR: The problem is imagine these really as people. See, here I have the user who's interacting with this bank account. Put in a request, get an answer. Put in a request, get an answer. AUDIENCE: Right.0:57:18
PROFESSOR: But if the only way I can process request is to alternate them from two people-- AUDIENCE: Well, why would you alternate them? PROFESSOR: Why don't I? AUDIENCE: Yes. Why do you? PROFESSOR: Think of them as real people, right?0:57:27
This guy might go away for a year. And you're sitting here at the bank account window, and you can't put in two requests because it's waiting for this guy. AUDIENCE: Why does it have to be waiting for one?0:57:37
PROFESSOR: Because it's trying to compute a function. I have to define a function. Another way to say that is the answer to what comes out of this merge box is not a function of what goes in.0:57:51
Because, see, what would the function be? Suppose he puts in 1, 1, 1, 1, and he puts in 2, 2, 2, 2.0:58:03
What's the answer supposed to be? It's not good enough to say it's 1, 2, 1, 2, 1, 2. AUDIENCE: I understand. But when Bill puts in 1, 1 goes in. When Dave puts in 2 twice, 2 goes in twice.0:58:13
When Bill puts in-- PROFESSOR: Right. AUDIENCE: Why can't it be hooked to the time of the input-- the actual procedural-- PROFESSOR: Because I don't have time.0:58:23
See, all I can say is I'm going to define a function. I don't have time.0:58:32
There's no concept if it's going to alternate, except if nobody's there, it's going to wait a while for him. It's just going to say I have the stream of requests, the0:58:41
timeless infinite streams of all the requests that Dave would have made, right? And the timeless infinite stream of all the requests Bill would have made, and I want to operate on them.0:58:51
See, that's how this bank account is working. And the problem is that these poor people who are sitting at the bank account windows have the0:59:02
misfortune to exist in time. They don't see their infinite stream of all the requests they would have ever made. They're waiting now, and they want an answer.0:59:14
So if you're sitting there-- if this is the screen operation on some time-sharing system and it's working functionally, you want an answer then when you talk the character.0:59:25
You don't want it to have to wait for everybody in the whole system to have typed one character before it can get around to service you. So that's the problem. I mean, the fact that people live in time, apparently.0:59:36
If they didn't, it wouldn't be a problem.0:59:49
AUDIENCE: I'm afraid I miss the point of having no time in this banking transaction. Isn't time very important? For instance, the sequence of events.1:00:00
If Dave take out $100, then the timing sequence should be important. How do you treat transactions as streams?1:00:11
PROFESSOR: Well, that's the thing I'm saying. This is an example where you can't. You can't. The point is what comes out of here is simply not a function1:00:21
of the stream going in here and the stream going in here. It's a function of the stream going in here and the stream going in here and some kind of information about time, which is precisely what a normal-order language won't1:00:31
let you say. AUDIENCE: In order to brings this back into a more functional perspective, could we just explicitly time stamp1:00:40
all the inputs from Bill and Dave and define fair merge to just be the sort on those time stamps?1:00:49
PROFESSOR: Yeah, you can do that. You can do that sort of thing. Another thing you could say is imagine that really what this function is, is that it does a read every microsecond, and1:00:59
then if there's none there, that's considered an empty one. That's about equivalent to what you said. And yes, you can do that, but that's a clg. So it's not quite only implementation1:01:09
we're worried about. We're worried about expressive power in the language, and what we're running across is a real mismatch between what we can say easily and what we'd like to say.1:01:18
AUDIENCE: It sounds like where we're getting hung up with that is the fact it expects one input from both Bill and Dave at the same time. PROFESSOR: It's not quite one, but it's anything you define.1:01:28
So you can say Dave can go twice as often, but if anything you predefine, it's not the right thing. You can't decide at some particular function of their1:01:39
input requests. Worse yet, I mean, worse yet, there are things that even merge can't do. One thing you might want to do that's even more general is1:01:49
suddenly you add somebody else to this bank account system. You go and you add John to this bank account system. And now there's yet another stream that's going to come1:01:58
into the picture at some time which we haven't prespecified. So that's something even fair merge can't do, and they're things called-- I forget--1:02:07
natagers or something. That's a generalization of fair merge to allow that. There's a whole sort of research discipline saying how far can you push this functional perspective by1:02:16
adding more and more mechanism? And how far does that go before the whole thing breaks down and you might as well been using set anyway.1:02:25
AUDIENCE: You need to set him up on automatic deposit. [LAUGHTER]1:02:39
PROFESSOR: OK, thank you.0:00:00
Lecture 7A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:15
PROFESSOR: Well today we're going to learn about something quite amazing. We're going to understand what we mean by a program a little bit more profoundly than we have up till now.0:00:26
Up till now, we've been thinking of programs as describing machines. So for example, looking at this still store, we see here0:00:38
is a program for factorial. And what it is, is a character string description, if you will, of the wiring diagram of a0:00:49
potentially infinite machine. And we can look at that a little bit and just see the idea. That this is a sort of compact notation which says, if n is0:00:58
0, the result is one. Well here comes n coming into this machine, and if it's 0, then I control this switch in such a way that the switch allows the output to be one.0:01:09
Otherwise, it's n times factorial of n minus one. Well, I'm computing factorial of n minus one and multiplying that by n, and, in the case that it's not 0, this switch0:01:19
makes the output come from there. Of course, this is a machine with a potentially infinite number of parts, because factorial occurs within factorial, so we don't know how deep it has to be.0:01:31
But that's basically what our notation for programs really means to us at this point. It's a character string description, if you will, of a0:01:41
wiring diagram that could also be drawn some other way. And, in fact, many people have proposed to me, programming languages look graphical like this. I'm not sure I believe there are many advantages.0:01:51
The major disadvantage, of course, is that it takes up more space on a page, and, therefore, it's harder to pack into a listing or to edit very well.0:02:01
But in any case, there's something very remarkable that can happen in the competition world which is that you can have something called a universal machine.0:02:10
If we look at the second slide, what we see is a special machine called eval.0:02:21
There is a machine called eval, and I'm going to show it to you today. It's very simple. What is remarkable is that it will fit on the blackboard.0:02:33
However, eval is a machine which takes as input a description of another machine. It could take the wiring diagram of a0:02:42
factorial machine as input. Having done so, it becomes a simulator for the factorial0:02:52
machine such that, if you put a six in, out comes a 720. That's a very remarkable sort of machine.0:03:02
And the most amazing part of it is that it fits on a blackboard. By contrast, one could imagine in the analog electronics world a very different machine, a machine which also0:03:17
was, in some sense, universal, where you gave a circuit diagram as one of the inputs, for example, of this little low-pass filter, one-pole low-pass filter.0:03:28
And you can imagine that you could, for example, scan this out-- the scan lines are the signal that's describing what this0:03:37
machine is to simulate-- then the analog of that which is made out of electrical circuits, should configure itself into a filter that has the frequency response specified0:03:47
by the circuit diagram. That's a very hard machine to make, and, surely, there's no chance that I could put it on a blackboard. So we're going to see an amazing thing today.0:03:58
We're going to see, on the blackboard, the universal machine. And we'll see that among other things, it's extremely simple. Now, we're getting very close to the real spirit in the0:04:10
computer at this point. So I have to show a certain amount of reverence and respect, so I'm going to wear a suit jacket for the only time that you'll ever see me wear a suit jacket here.0:04:20
And I think I'm also going to put on an appropriate hat for the occasion. Now, this is a lecturer which I have to warn you--0:04:34
let's see, normally, people under 40 and who don't have several children are advised to be careful. If they're really worried, they should leave. Because0:04:44
there's a certain amount of mysticism that will appear here which may be disturbing and cause trouble in your minds. Well in any case, let's see, I wish to write for you the0:04:57
evaluator for Lisp. Now the evaluator isn't very complicated. It's very much like all the programs we've seen already.0:05:08
That's the amazing part of it. It's going to be-- and I'm going to write it right here-- it's a program called eval.0:05:22
And it's a procedure of two arguments in expression of an environment.0:05:31
And like every interesting procedure, it's a case analysis.0:05:40
But before I start on this, I want to tell you some things. The program we're going to write on the blackboard is ugly, dirty, disgusting, not the way I would write this is0:05:52
a professional. It is written with concrete syntax, meaning you've got really to use lots of CARs and CDRs which is exactly what I told you not to do.0:06:02
That's on purpose in this case, because I want it to be small, compact, fit on the blackboard so you can get the0:06:11
whole thing. So I don't want to use long names like I normally use. I want to use CAR-CDR because it's short. Now, that's a trade-off.0:06:20
I don't want you writing programs like this. This is purely for an effect. Now, you're going to have to work a little harder to read it, but I'm going to try to make it clear0:06:29
as I'm writing it. I'm also-- this is a pretty much complete interpreter, but there's going to be room for putting in more things-- I'm going to leave out definition and assignment,0:06:39
just because they are not essential, for a mathematical reason I'll show you later and also they take up more space.0:06:51
But, in any case, what do we have to do? We have to do a dispatch which breaks the types of expressions up into particular classes.0:07:02
So that's what we're going to have here. Well, what expressions are there? Let's look at the kinds of expressions. We can have things like the numeral three. What do I want that to do?0:07:12
I can make choices, but I think right now, I want it to be a three. That's what I want. So that's easy enough. That means I want, if the thing is a number, the0:07:27
expression, that I want the expression itself as the answer. Now the next possibility is things that we0:07:37
represent as symbols. Examples of symbols are things like x, n, eval, number, x.0:07:47
What do I mean them to be? Those are things that stand for other things. Those are the variables of our language. And so I want to be able to say, for example, that x, for0:07:58
example, transforms to it's value which might be three. Or I might ask something like car.0:08:07
I want to have as its value-- be something like some procedure, which I don't know0:08:17
what is inside there, perhaps a machine language code or something like that. So, well, that's easy enough. I'm going to push that off on someone else.0:08:27
If something is a symbol, if the expression is a symbol, then I want the answer to be the result, looking up the0:08:38
expression in the environment. Now the environment is a dictionary which maps the0:08:52
symbol names to their values. And that's all it is. How it's done? Well, we'll see that later. It's very easy.0:09:01
It's easy to make data structures that are tables of various sorts. But it's only a table, and this is the access routine for some table.0:09:10
Well, the next thing, another kind of expression-- you have things that are described constants that are not numbers, like 'foo.0:09:20
Well, for my convenience, I want to syntactically transform that into a list structure which is, quote foo.0:09:35
A quoted object, whatever it is, is going to be actually an abbreviation, which is not part of the evaluator but happens somewhere else, an abbreviation for an expression0:09:46
that looks like this. This way, I can test for the type of the expression as being a quotation by examining the car of the expression.0:09:58
So I'm not going to worry about that in the evaluator. It's happening somewhere earlier in the reader or something. If the expression of the expression is quote, then what0:10:18
I want, I want quote foo to itself evaluate to foo. It's a constant.0:10:27
This is just a way of saying that this evaluates to itself. What is that? That's the second of the list. It's the second element of the0:10:37
list. The second element of the list is it's CADR. So I'm just going to write here, CADR.0:10:51
What else do we have here? We have lambda expressions, for example, lambda of x plus x y.0:11:04
Well, I going have to have some representation for the procedure which is the value of an expression, of a lambda expression. The procedure here is not the expression lambda x.0:11:13
That's the description of it, the textual description. However, what what I going to expect to see here is something which contains an environment as one of its parts if I'm implementing a lexical language.0:11:27
And so what I'd like to see is some type flags. I'm going to have to be able to distinguish procedures later, procedures which were produced by lambdas, from ones0:11:37
that may be primitive. And so I'm going to have some flag, which I'll just arbitrarily call closure, just for historical reasons.0:11:47
Now, to say what parts of this are important. I'm going to need to know the bound variable list and the body. Well, that's the CDR of this, so it's going to be x and plus0:12:00
x y and some environment. Now this is not something that users should ever see, this is0:12:13
purely a representation, internally, for a procedure object. It contains a bound variable list, a body, and an0:12:22
environment, and some type tag saying, I am a procedure. I'm going to make one now. So if the CAR of the expression is quote lambda,0:12:43
then what I'm going to put here is-- I'm going to make a list of closure, the CDR of the0:12:58
procedure description was everything except the lambda,0:13:07
and the current environment. This implements the rule for environments in the environment model. It has to do with construction of procedures from lambda0:13:17
expressions. The environment that was around at the time the evaluator encountered the lambda expression is the environment where the procedure resulting interprets0:13:30
it's free variables. So that's part of that. And so we have to capture that environment as part of the procedure object.0:13:39
And we'll see how that gets used later. There are also conditional expressions of things like COND of say, p one, e one, p two, e two.0:13:54
Where this is a predicate, a predicate is a thing that is either true or false, and the expression to be evaluated if the predicate is true.0:14:03
A set of clauses, if you will, that's the name for such a thing. So I'm going put that somewhere else. We're going to worry about that in another piece of code.0:14:12
So EQ-- if the CAR of the expression is COND, then I'm going to do0:14:24
nothing more than evaluate the COND, the CDR of the expression.0:14:34
That's all the clauses in the environment that I'm given. Well, there's one more case, arbitrary thing like the sum0:14:46
of x and three, where this is an operator applied to operands, and there's nothing special about it.0:14:56
It's not one of the special cases, the special forms. These are the special forms.0:15:09
And if I were writing here a professional program, again, I would somehow make this data directed. So there wouldn't be a sequence of conditionals here, there'd be a dispatch on some bits if I were trying to do0:15:20
this in a more professional way. So that, in fact, I can add to the thing without changing my program much. So, for example, they would run fast, but I'm not worried0:15:29
about that. Here we're trying to look at this in its entirety. So it's else. Well, what do we do?0:15:38
In this case, I have to somehow do an addition. Well, I could find out what the plus is. I have to find out what the x and the three are.0:15:50
And then I have to apply the result of finding what the plus is to the result of finding out what the x and the three are. We'll have a name for that.0:15:59
So I'm going to apply the result of evaluating the CAR0:16:11
of the expression-- the car of the expression is the operator-- in the environment given.0:16:20
So evaluating the operator gets me the procedure. Now I have to evaluate all the operands to get the arguments. I'll call that EVLIST, the CDR of the operands, of the0:16:34
expression, with respect to the environment. EVLIST will come up later--0:16:43
EVLIST, apply, COND pair, COND, lambda, define. So that what you are seeing here now is pretty much all0:16:53
there is in the evaluator itself. It's the case dispatch on the type of the expression with the default being a general application or a combination.0:17:17
Now there is lots of things we haven't defined yet. Let's just look at them and see what they are. We're going to have to do this later, evcond. We have to write apply.0:17:27
We're going to have to write EVLIST. We're going to write LOOKUP. I think that's everything, isn't there? Everything else is something which is simple, or primitive, or something like that.0:17:38
And, of course, we could many more special forms here, but that would be a bad idea in general in a language. You make a language very complicated by putting a lot of things in there.0:17:47
The number of reserve words that should exist in a language should be no more than a person could remember on his fingers and toes. And I get very upset with languages which have hundreds0:17:56
of reserve words. But that's where the reserve words go. Well, now let's get to the next part of0:18:06
this, the kernel, apply. What else is this doing? Well, apply's job is to take a procedure and apply it to its0:18:17
arguments after both have been evaluated to come up with a procedure and the arguments rather the operator symbols and the operand symbols, whatever they are-- symbolic expressions.0:18:33
So we will define apply to be a procedure of two arguments, a procedure and arguments.0:18:47
And what does it do? It does nothing very complicated. It's got two cases. Either the procedure is primitive--0:19:02
And I don't know exactly how that is done. It's possible there's some type information just like we made closure for, here, being the description of the type of0:19:14
a compound thing-- probably so. But it is not essential how that works, and, in fact, it turns out, as you probably know or have deduced, that you0:19:24
don't need any primitives anyway. You can compute anything without them because some of the lambda that I've been playing with.0:19:33
But it's nice to have them. So here we're going to do some magic which I'm not going to explain. Go to machine language, apply primop.0:19:42
Here's how it adds. Execute an add instruction. However, the interesting part of a language is the glue by0:19:52
which the predicates are glued together. So let's look at that. Well, the other possibility is that this is a compound made0:20:01
up by executing a lambda expression, this is a compound procedure. Well, we'll check its type.0:20:10
If it is closure, if it's one of those, then I have to do an0:20:23
eval of the body. The way I do this, the way I deal with this at all, is the way I evaluate the application of a procedure to its arguments, is by evaluating the body of the procedure in0:20:34
the environment resulting from extending the environment of the procedure with the bindings of the formal parameters of the procedure to the arguments that0:20:43
were passed to it. That was a long sentence. Well that's easy enough.0:20:52
Now here's going to be a lot of CAR-CDRing. I have to get the body of the procedure. Where's the body of the procedure in here?0:21:02
Well here's the CAR, here's the CDR is the whole rest of this. So here's the CADR. And so I see, what I have here is the body is the second element of the second0:21:11
element of the procedure. So it's the CADR of the CADR or the CADADR. It's the C-A-D-A-D-R, CADADR of the procedure.0:21:30
To evaluate the body in the result of binding that's making up more environment, well I need the formal0:21:39
parameters of the of the procedure, what is that? That's the CAR of the CDR. It's horrible isn't it?0:21:52
--of the procedure. Bind that to the arguments that were passed in the environment, which is passed also as part of the procedure.0:22:04
Well, that's the CAR of the CDR of the CDR of this, CADADR, of the procedure.0:22:20
Bind, eval, pair, COND, lamda, define-- Now, of course, if I were being really a neat character,0:22:29
and I was being very careful, I would actually put an extra case here for checking for certain errors like, did you try to apply one to an argument?0:22:39
You get a undefined procedure type. So I may as well do that anyway. --else, some sort of error, like that.0:22:57
Now, of course, again, in some sort of more real system, written for professional reasons, this would be written0:23:06
with a case analysis done by some sort of dispatch. Over here, I would probably have other cases like, is this compiled code?0:23:16
It's very important. I might have distinguished the kind of code that's produced by a directly evaluating a lambda in interpretation from code that was produced by somebody's compiler or0:23:25
something like that. And we'll talk about that later. Or is this a piece Fortran program I have to go off and execute. It's a perfectly possible thing, at this point, to do that. In fact, in this concrete syntax evaluator I'm writing0:23:36
here, there's an assumption built in that this is Lisp, because I'm using CARs and CDRs. CAR means the operator, and CDR means the operand.0:23:46
In the text, there is an abstract syntax evaluator for which these could be-- these are given abstract names like operator, and operand, and all these other things are like that.0:23:56
And, in that case, you could reprogram it to be ALGOL with no problem. Well, here we have added another couple of things that0:24:07
we haven't defined. I don't think I'll worry about these at all, however, this one will be interesting later.0:24:17
Let's just proceed through this and get it done. There's only two more blackboards so it can't be very long.0:24:27
It's carefully tailored to exactly fit. Well, what do we have left? We have to define EVLIST, which is over here. And EVLIST is nothing more than a map down a bunch of0:24:40
operands producing arguments. But I'm going to write it out. And one of the reasons I'm going to write this out is for a mystical reason, which is I want to make this evaluator so0:24:51
simple that it can understand itself. I'm going to really worry about that a little bit.0:25:00
So let's write it out completely. See, I don't want to worry about whether or not the thing can pass functional arguments. The value evaluator is not going to use them. The evaluator is not going to produce functional values.0:25:10
So even if there were a different, alternative language that were very close to this, this evaluates a complex language like Scheme which does allow procedural0:25:19
arguments, procedural values, and procedural data. But even if I were evaluating ALGOL, which doesn't allow0:25:28
procedural values, I could use this evaluator. And this evaluator is not making any assumptions about that. And, in fact, if this value were to be restricted to not being able to that, it wouldn't matter, because it0:25:37
doesn't use any of those clever things. So that's why I'm arranging this to be super simple. This is sort of the kernel of all possible language evaluators.0:25:47
How about that? Evlist-- well, what is it? It's the procedure of two arguments, l and an0:25:56
environment, where l is a list such that if the list of0:26:06
arguments is the empty list, then the result is the empty list. Otherwise, I want to cons up the result of0:26:21
evaluating the CAR of the list of operands in the0:26:31
environment. So I want the first operand evaluated, and I'm going to make a list of the results by CONSing that onto the result0:26:40
of this EVLISTing as a CDR recursion, the CDR of the list relative to the same environment.0:26:53
Evlist, cons, else, COND, lambda, define-- And I have one more that I want to put on the blackboard.0:27:03
It's the essence of this whole thing. And there's some sort of next layer down.0:27:14
Conditionals-- conditionals are the only thing left that are sort of substantial. Then below that, we have to worry about things like lookup and bind, and we'll look at that in a second.0:27:25
But of the substantial stuff at this level of detail, next important thing is how you deal with conditionals. Well, how do we have a conditional thing?0:27:37
It's a procedure of a set of clauses and an environment.0:27:47
And what does it do? It says, if I've no more clauses, well, I have to give0:28:03
this a value. It could be that it was an error. Supposing it run off the end of a conditional, it's pretty arbitrary. It's up to me as programmer to choose what I want to happen.0:28:13
It's convenient for me, right now, to write down that this has a value which is the empty list, doesn't matter. For error checking, some people might prefer something else.0:28:23
But the interesting things are the following ones. If I've got an else clause-- You see, if I have a list of clauses, then each clause is a0:28:34
list. And so the predicate part is the CAAR of the clauses.0:28:43
It's the CAR, which is the first part of the first clause in the list of clauses. If it's an else, then it means I want my result of the0:28:55
conditional to be the result of evaluating the matching expression. So I eval the CADR. So this is the first clause, the second0:29:10
element of it, CADAR-- CADAR of a CAR-- of the clauses, with respect to the environment.0:29:26
Now the next possibility is more interesting. If it's false, if the first predicate in the predicate list is not an else, and it's not false, if it's not the0:29:38
word else, and if it's not a false thing-- Let's write down what it is if it's a false thing. If the result of evaluating the first0:29:49
predicate, the clauses-- respect the environment, if that evaluation yields false,0:30:01
then it means, I want to look at the next clause. So I want to discard the first one. So we just go around loop, evcond, the CDR of the clauses0:30:15
relative to that environment. And otherwise, I had a true clause, in which case, what I0:30:27
want is to evaluate the CADAR of the clauses relative to0:30:40
that environment. Boy, it's almost done.0:30:51
It's quite close to done. I think we're going to finish this part off. So just buzzing through this evaluator, but so far you're seeing almost everything.0:31:01
Let's look at the next transparency here. Here is bind.0:31:11
Bind is for making more table. And what we are going to do here is make a-- we're going to make a no-frame for an environment structure.0:31:22
The environment structure is going to be represented as a list of frames. So given an existing environment structure, I'm going to make a new environment structure by0:31:32
consing a new frame onto the existing environment structure, where the new frame consists of the result of pairing up the variables, which are the bound variables0:31:41
of the procedure I'm applying, to the values which are the arguments that were passed that procedure. This is just making a list, adding a new element to our0:31:53
list of frames, which is an environment structure, to make a new environment. Where pair-up is very simple. Pair-up is nothing more than if I have a list of variables0:32:04
and a list of values, well, if I run out of variables and if I run out of values, everything's OK. Otherwise, I've given too many arguments. If I've not run out of variables, but I've run out of0:32:15
values, that I have too few arguments. And in the general case, where I don't have any errors, and I'm not done, then I really am just adding a new pair of the0:32:26
first variable with the first argument, the first value, onto a list resulting from pairing-up the rest of the0:32:37
variables with the rest of the values. Lookup is of course equally simple.0:32:46
If I have to look up a symbol in an environment, well, if the environment is empty, then I've got an unbound variable. Otherwise, what I'm going to do is use a special pair list0:32:59
lookup procedure, which we'll have very shortly, of the symbol in the first frame of the environment. Since I know the environment is not empty, it must have a first frame.0:33:09
So I lookup the symbol in the first frame. That becomes the value cell here. And then, if the value cell is empty, if there is no such0:33:19
value cell, then I have to continue and look at the rest of the frames. It means there was nothing found there. So that's a property of ASSQ is it returns emptiness if it0:33:29
doesn't find something. but if it did find something, then I'm going to use the CDR of the value cell here, which is the thing that was the pair0:33:38
consisting of the variable and the value. So the CDR of it is the value part. Finally, ASSQ is something you've probably seen already.0:33:47
ASSQ takes a symbol and a list of pairs, and if the list is empty, it's empty. If the symbol is the first thing in the list--0:33:57
That's an error. That should be CAAR, C-A-A-R. Everybody note that.0:34:07
Right there, OK? And in any case, if the symbol is the CAAR of the A list,0:34:17
then I want the first, the first pair, in the A list. So, in other words, if this is the key matching the right entry,0:34:26
otherwise, I want to look up that symbol in the rest. Sorry for producing a bug, bugs appear.0:34:35
Well, in any case, you're pretty much seeing the whole thing now. It's a very beautiful thing, even though it's written in an0:34:45
ugly style, being the kernel of every language. I suggest that we just-- let's look at it for a while.0:34:56
[MUSIC PLAYING]0:35:49
Are there any questions?0:36:01
Alright, I suppose it's time to take a small break then. [MUSIC PLAYING]0:36:56
OK, now we're just going to do a little bit of practice understanding what it is we've just shown you. What we're going to do is go through, in detail, an0:37:05
evaluation by informally substituting through the interpreter. And since we have no assignments or definitions in0:37:14
this interpreter, we have no possible side effects, and so the we can do substitution with impunity and not worry0:37:23
about results. So the particular problem I'd like to look at is it an interesting one. It's the evaluation of quote, open, open, open, lambda of x,0:37:41
lambda of y plus x y, lambda, lambda, applied to three,0:37:55
applied to four, in some global environment which I'll call e0.0:38:04
So what we have here is a procedure of one argument x, which produces as its value a procedure of one argument y, which adds x to y.0:38:14
We are applying the procedure of one argument x to three. So x should become three. And the result of that should be procedure of one argument0:38:23
y, which will then apply to 4. And there is a very simple case, they will then add those results.0:38:34
And now in order to do that, I want to make a very simple environment model. And at this point, you should already have in your mind the environments that this produces.0:38:44
But we're going to start out with a global environment, which I'll call e0, which is that.0:38:56
And it's going to have in it things, definitions for plus, and times, and--0:39:07
using Greek letters, isn't that interesting, for the objects-- and minus, and quotient, and CAR, and CDR, and CONS, and0:39:27
EQ, and everything else you might imagine in a global environment. It's got something there for each of those things, something the machine is born with, that's e0.0:39:39
Now what does it mean to do this evaluation? Well, we go through the set of special forms. First of all, this is not a number.0:39:48
This is not a symbol. Gee, it's not a quoted expression. This is a quoted expression, but that's not what I0:40:00
interested in. The question is, whether or not the thing which is quoted is quoted expression? I'm evaluating an expression. This just says it's this particular expression.0:40:11
This is not a quoted expression. It's not a thing that begins with lambda. It's not a thing that begins with COND.0:40:22
Therefore, it's an application of its of an operated operands. It's a combination. The combination thus has this as the operator and this is0:40:35
the operands. Well, that means that what I'm going to do is transform this into apply of eval, of quote, open, open lambda of0:40:54
x, lambda of y-- I'm evaluating the operator-- plus x y, in the environment, also e0, with the operands0:41:13
that I'm going to apply this to, the arguments being the result of EVLIST, the list containing four, fin e0.0:41:29
I'm using this funny notation here for e0 because this should be that environment. I haven't a name for it, because I have no environment0:41:38
to name it in. So this is just a representation of what would be a quoted expression, if you will.0:41:47
The data structure, which is the environment, goes there. Well, that's what we're seeing here. Well in order to do this, I have to do this, and0:41:57
I have to do that. Well this one's easy, so why don't we do that one first. This turns into apply of eval-- just0:42:07
copying something now. Most of the substitution rule is copying.0:42:18
So I'm going to not say the words when I copy, because it's faster. And then the EVLIST is going to turn into a cons, of eval,0:42:34
of four, in e0-- because it was not an empty list-- onto the result of EVLISTing, on the empty list, in e0.0:42:52
And I'm going to start leaving out steps soon, because it's going to get boring. But this is basically the same thing as apply, of eval--0:43:07
I'm going to keep doing this-- the lambda of x, the lambda of y, plus xy, 3, close, e0.0:43:20
I'm a pretty good machine. Well, eval of four, that's meets the question, is it a number. So that's cons, cons of 4.0:43:35
And EVLIST of the empty list is the empty list, so that's this. And that's very simple to understand, because that means0:43:46
the list containing four itself. So this is nothing more than apply of eval, quote, open,0:43:56
open, lambda of x, lambda of y, plus x y, three applied to,0:44:06
e0, applied to the list four-- bang. So that's that step.0:44:18
Now let's look at the next, more interesting thing. What do I do to evaluate that? Evaluating this means I have to evaluate--0:44:27
Well, it's not. It's nothing but an application. It's not one of the special things. If the application of this operator, which we see here--0:44:37
here's the operator-- applied to this operands, that combination.0:44:46
But we know how to do that, because that's the last case of the conditional. So substituting in for this evaluation, it's apply of eval0:44:56
of the operator in the EVLIST of the operands. Well, it's apply, of apply, of eval, of quote, open, lambda0:45:12
of x, lambda of y, plus x y, lambda, lambda,0:45:23
in environment e0. I'm going to short circuit the evaluation of the operands ,0:45:32
because they're the same as they were before. I got a list containing three, apply that, and apply that to four.0:45:42
Well let's see. Eval of a lambda expression produces a procedure object.0:45:52
So this is apply, of apply, of the procedure object closure,0:46:04
which contains the body of the procedure, x, which is lambda-- which binds x [UNINTELLIGIBLE] the internals of the body, it returns the procedure of one0:46:17
argument y, which adds x to y. Environment e0 is now captured in it, because this was0:46:27
evaluated with respect to e0. e0 is part now of the closure object. Apply that to open, three, close, apply, to open, 4,0:46:40
close, apply. So going from this step to this step meant that I made up0:46:50
a procedure object which captured in it e0 as part of the procedure object. Now, we're going to pass those to apply. We have to apply this procedure0:47:00
to that set of arguments. Well, but that procedure is not primitive. It's, in fact, a thing which has got the tag closure, and,0:47:10
therefore, what we have to do is do a bind. We have to bind. A new environment is made at this point, which has as its0:47:21
parent environment the one over here, e0, that environment.0:47:30
And we'll call this one, e1. Now what's bound in there? x is bound to three. So I have x equal three.0:47:41
That's what's in there. And we'll call that e1. So what this transforms into is an eval of the body of0:47:51
this, which is this, the body of that procedure, in the environment that you just saw.0:48:00
So that's an apply, of eval, quote, open, lambda of y, plus0:48:11
x y-- the body-- in e1.0:48:20
And apply the result of that to four, open, close, 4-- list of arguments. Well, that's sensible enough because evaluating a lambda, I0:48:31
know what to do. That means I apply, the procedure which is closure,0:48:43
binds one argument y, adds x to y, with e1 captured in it.0:48:55
And you should really see this. I somehow manufactured a closure. I should've put this here. There was one over here too.0:49:06
Well, there's one here now. I've captured e1, and this is the procedure of one argument y, whatever this is.0:49:17
That's what that is there, that closure. I'm going to apply that to four.0:49:30
Well, that's easy enough. That means I have to make a new environment by copying0:49:39
this pointer, which was the pointer of the procedure, which binds y equal 4 with that environment.0:49:49
And here's my new environment, which I'll call e2. And, of course, this application then is evaluate0:49:58
the body in e2. So this is eval, the body, which is plus x y, in the0:50:10
environment e2. But this is an application, so this is the apply, of eval,0:50:22
plus in e2, an EVLIST, quote, open, x y, in e2.0:50:44
Well, but let's see. That is apply, the object which is a result of that and plus.0:50:54
So here we are in e2, plus is not here, it's not here, oh, yes, but's here as some primitive operator. So it's the primitive operator for addition.0:51:08
Apply that to the result of evaluating x and y in e2. But we can see that x is three and y is four.0:51:18
So that's a three and four, here. And that magically produces for me a seven.0:51:30
I wanted to go through this so you would see, essentially, one important ingredient, which is what's being passed around, and who owns what, and what his job is.0:51:40
So what do we have here? We have eval, and we have apply, the two main players.0:51:49
And there is a big loop the goes around like this. Which is eval produces a procedure and0:52:00
arguments for apply. Now some things eval could do by itself.0:52:09
Those are little self things here. They're not interesting. Also eval evaluates all of the arguments, one after another. That's not very interesting. Apply can apply some procedures like plus, not very0:52:21
interesting. However, if apply can't apply a procedure like plus, it produces an expression and environment for eval.0:52:35
The procedural arguments wrap up essentially the state of a computation and, certainly, the expression of environment. And so what we're actually going to do next is not the0:52:45
complete state, because it doesn't say who wants the answers. But what we're going to do-- it's always got something like an expression of environment or procedure and arguments as0:52:56
the main loop that we're going around. There are minor little sub loops like eval through EVLIST, or eval through evcond, or apply through a0:53:11
primitive apply. But they're not the essential things. So that's what I wanted you to see.0:53:21
Are there any questions? Yes. AUDIENCE: I'm trying to understand how x got down to0:53:32
three instead of four. At the early part of the-- PROFESSOR: Here.0:53:41
You want to know how x got down to three? AUDIENCE: Because x is the outer procedure, and x and y are the inner procedure.0:53:51
PROFESSOR: Fine. Well, I was very careful and mechanical. First of all, I should write those procedures again for you, pretty printed.0:54:00
First order of business, because you're probably not reading them well. So I have here that procedure of-- was it x over there--0:54:11
which is-- value of that procedure of y, which adds x to y, lambda,0:54:20
lambda, applied that to three, takes the result of that, and applied that to four. Is that not what I wrote? Now, you should immediately see that here is an0:54:34
application-- let me get a white piece of chalk-- here is an application, a combination.0:54:44
That combination has this as the operator and this as the operand. The three is going in for the x here.0:54:54
The result of this is a procedure of one argument y, which gets applied to four. So you just weren't reading the expression right.0:55:04
The way you see that over here is that here I have the actual procedure object, x.0:55:13
It's getting applied to three, the list containing three. What I'm left over with is something which gets applied to four.0:55:24
Are there any other questions? Time for our next small break then. Thank you.0:55:33
[MUSIC PLAYING]0:56:08
Let's see, at this point, you should be getting the feeling, what's this nonsense this Sussman character is feeding me?0:56:20
There's an awful lot of strange nonsense here. After all, he purported to explain to me Lisp, and he wrote me a Lisp program on the blackboard.0:56:30
The Lisp program was intended to be interpreted for Lisp, but you need a Lisp interpreter in order to understand that program. How could that program have told me anything there is to0:56:41
be known about Lisp? How is that not completely vacuous? It's a very strange thing.0:56:50
Does it tell me anything at all? Well, you see, the whole thing is sort of like these Escher's0:56:59
hands that we see on this slide. Yes, eval and apply each sort of draw each other and0:57:11
construct the real thing, which can sit out and draw itself. Escher was a very brilliant man, he just didn't know the names of these spirits.0:57:23
Well, I'm going to do now, is I'm going to try to convince you that both this mean something, and, as a aside,0:57:33
I'm going to show you why you don't need definitions. Just turns out that that sort of falls out, why definitions are not essential in a mathematical sense for doing0:57:42
all the things we need to do for computing. Well, let's see here. Consider the following small program, what does it mean?0:57:54
This is a program for computing exponentials.0:58:07
The exponential of x to the nth power is if--0:58:16
and is zero, then the result is one. Otherwise, I want the product of x and the result of0:58:29
exponentiating x to the n minus one power.0:58:42
I think I got it right. Now this is a recursive definition. It's a definition of the exponentiation procedure in0:58:53
terms of itself. And, as it has been mentioned before, your high school geometry teacher probably gave you a hard time0:59:03
about things like that. Was that justified? Why does this self referential definition make any sense?0:59:13
Well, first of all, I'm going to convince you that your high school geometry teacher was I telling you nonsense. Consider the following set of definitions here.0:59:24
x plus y equals three, and x minus y equal one.0:59:33
Well, gee, this tells you x in terms of y, and this one tells you y in terms of x, presumably. And yet this happens to have a unique solution in x and y.0:59:55
However, I could also write two x plus two y is six.1:00:06
These two equations have an infinite number solutions.1:00:15
And I could write you, for example, x minus y equal 2, and these two equations have no solutions.1:00:29
Well, I have here three sets of simultaneous linear equations, this set, this set, and this set.1:00:39
But they have different numbers of solutions. The number of solutions is not in the form of the equations. They all three sets have the same form.1:00:48
The number of solutions is in the content. I can't tell by looking at the form of a definition whether it makes sense, only by its detailed content.1:00:59
What are the coefficients, for example, in the case of linear equations? So I shouldn't expect to be able to tell looking at something like this, from some simple things like, oh yes,1:01:11
EXPT is the solution of this recursion equation. Expt is the procedure which if substituted in here,1:01:22
gives me EXPT back. I can't tell, looking at this form, whether or not there's a single, unique solution for EXPT, an infinite number of1:01:33
solutions, or no solutions. It's got to be how it counts and things like that, the details. And it's harder in programming than linear algebra.1:01:42
There aren't too many theorems about it in programming. Well, I want to rewrite these equations a little bit, these over here.1:01:53
Because what we're investigating is equations like this. But I want to play a little with equations like this that we understand, just so we get some insight into1:02:02
this kind of question. We could rewrite our equations here, say these two, the ones that are interesting, as x equals three minus y, and y1:02:17
equals x minus one. What do we call this transformation? This is a linear transformation, t.1:02:29
Then what we're getting here is an equation x y equals t of x y.1:02:42
What am I looking for? I'm looking for a fixed point of t. The solution is a fixed point of t.1:03:01
So the methods we should have for looking for solutions to equations, if I can do it by fixed points, might be applicable.1:03:10
If I have a means of finding a solution to an equations by fixed points-- just, might not work-- but it might be applicable to investigating solutions of1:03:21
equations like this. But what I want you to feel is that this is an equation.1:03:30
It's an expression with several instances of various names which puts a constraint on the name, saying what that1:03:39
name could have as its value, rather than some sort of mechanical process of substitution right now. This is an equation which I'm going to try to solve.1:03:51
Well, let's play around and solve it. First of all, I want to write down the function which corresponds to t.1:04:00
First I want to write down the function which corresponds to t whose fixed point is the answer to this question.1:04:11
Well, let's consider the following procedure f. I claim it computes that function. f is that procedure of one argument g, which is that1:04:26
procedure of two arguments x and n. Which have the property that if n is zero, then the result1:04:42
is one, otherwise, the result is the product of x and g,1:04:56
applied to x, and minus n1. g, times, else, COND, lambda, lambda--1:05:11
Here f is a procedure, which if I had a solution to that equation, if I had a good exponentiation procedure, and1:05:23
I applied f to that procedure, then the result would be a good exponentiation procedure.1:05:37
Because, what does it do? Well, all it is is exposing g were a good exponentiation procedure, well then this would produce, as its value, a1:05:48
procedure to arguments x and n, such that if n were 0, the result would be one, which is certainly true of exponentiation. Otherwise, it will be the result of multiplying x by the1:05:57
exponentiation procedure given to me with x and n minus one as arguments. So if this computed the correct exponentiation for n minus one, then this would be the correct exponentiation for1:06:10
exponent n, so this would have been the right exponentiation procedure. So what I really want to say here is E-X-P-T is a fixed1:06:26
point of f.1:06:37
Now our problem is there might be more than one fixed point. There might be no fixed points. I have to go hunting for the fixed points.1:06:48
Got to solve this equation. Well there are various ways to hunt for fixed points. Of course, the one we played with at the beginning of this1:06:58
term worked for cosine. Go into radians mode on your calculator and push cosine,1:07:09
and just keep doing it, and you get to some number which is about 0.73 or 0.74. I can't remember which.1:07:22
By iterating a function, whose fixed point I'm searching for, it is sometimes the case that that function will converge in1:07:32
producing the fixed point. I think we luck out in this case, so let's look for it. Let's look at this slide.1:07:48
Consider the following sequence of procedures. e0 over here is the procedure which does nothing at all.1:08:02
It's the procedure which produces an error for any arguments you give it. It's basically useless.1:08:14
Well, however, I can make an approximation. Let's consider it the worst possible approximation to exponentiation, because it does nothing.1:08:26
Well, supposing I substituted e0 for g by calling f, as you see over here on e0.1:08:37
So you see over here, have e0 there. Then gee, what's e1? e1 is a procedure which exponentiate things to the 0th1:08:47
power, with no trouble. It gets the right answer, anything to the zero is one, and it makes an error on anything else.1:08:57
Well, now what if I take e1 and I substitute if for g by1:09:06
calling f on e1? Oh gosh, I have here a procedure of two arguments.1:09:15
Now remember e1 was appropriate for taking exponentiations of 0, for raising to the 0 exponent.1:09:24
So here, is n is 0, the result is one, so this guy is good for that too. However, I can use something for raising to the 0th power to multiply it by x to raise something to the first power.1:09:35
So e2 is good for both power 0 and one. And e3 is constructed from e2 in the same way.1:09:47
And e3, of course, by the same argument is good for powers 0, one, and two. And so I will assert for you, without proof, because the1:10:00
proof is horribly difficult. And that's the sort of thing that people called denotational semanticists do. This great idea was invented by Scott and Strachey.1:10:14
They're very famous mathematician types who invented the interpretation for these programs that we have that I'm talking to you about right now.1:10:24
And they proved, by topology that there is such a fixed point in the cases that we want. But the assertion is E-X-P-T is limit as n goes1:10:41
to infinity of em. and And that we've constructed this by the following way.1:10:50
--is Well, it's f of, f of, f of, f of, f of-- f applied to anything at all.1:11:01
It didn't matter what that was, because, in fact, this always produces an error. Applied to this--1:11:12
That's by infinite nesting of f's. So now my problem is to make some infinite things.1:11:22
We need some infinite things. How am I going to nest up an f an infinite number of times? I'd better construct this.1:11:32
Well, I don't know. How would I make an infinite loop at all? Let's take a very simple infinite loop, the simplest infinite loop imaginable.1:11:43
If I were to take that procedure of one argument x which applies x to x and apply that to the procedure of one1:11:57
argument x which applies x to x, then this is an infinite loop.1:12:07
The reason why this is an infinite loop is as follows. The way I understand this is I substitute the argument for the formal parameter in the body.1:12:18
But if I do that, I take for each of these x's, I substitute one of these, making a copy of the original expression I just started with, the1:12:28
simplest infinite loop. Now I want to tell you about a particular operator which is1:12:40
constructed by a perturbation from this infinite loop. I'll call it y.1:12:52
This is called Curry's Paradoxical Combinator of y after a fellow by the name of Curry, who was a logician of the 1930s also.1:13:04
And if I have a procedure of one argument f, what's it going to have in it? It's going to have a kind of infinite loop in it, which is1:13:13
that procedure of one argument x which applies f to x of x, applied to that procedure of one argument x, which applies1:13:25
f to f of x. Now what's this do?1:13:34
Suppose we apply y to F. Well, that's easy enough. That's this capital F over here.1:13:46
Well, the easiest thing to say there is, I substitute F for here.1:13:55
So that's going to give me, basically-- because then I'm going to substitute this for x in here.1:14:08
Let me actually do it in steps, so you can see it completely. I'm going to be very careful. This is open, open, lambda of x , capital F, x, x, applied1:14:27
to itself, F of x of x.1:14:37
Substituting this for this in here, this is F applied to-- what is it--1:14:47
substituting this in here, open, open, lambda of x, F, of x and x, applied to lambda of x, F of x of x, F, lambda,1:15:08
pair, F. Oh, but what is this? This thing over here that I just computed, is1:15:17
this thing over here. But I just wrapped another F around it. So by applying y to F, I make an infinite series of F's.1:15:27
If I just let this run forever, I'll just keep making more and more F's outside. I ran an infinite loop which is useless, but it doesn't matter that the inside is useless.1:15:40
So y of F is F applied to y of F. So y is a magical thing1:15:53
which, when applied to some function, produces the object which is the fixed point of that function, if it exists,1:16:03
and if this all works. Because, indeed, if I take y of F and put it into F, I get y of F out.1:16:16
Now I want you to think this in terms of the eval-apply interpreter for a bit. I wrote down a whole bunch of recursion equations out there.1:16:28
They're simultaneous in the same way these are simultaneous equations. Exponentiation was not a simultaneous equation. It was only one variable I was looking for a meaning for.1:16:38
But what Lisp is is the fixed point of the process which says, if I knew what Lisp was and substituted it in for eval, and apply, and so on, on the right hand sides of all1:16:47
those recursion equations, then if it was a real good Lisp, is a real one, then the left hand side would also be Lisp.1:16:58
So I made sense of that definition. Now whether or not there's an answer isn't so obvious. I can't attack that.1:17:07
Now these arguments that I'm giving you now are quite dangerous. Let's look over here. These are limit arguments. We're talking about limits, and it's really calculus, or1:17:17
topology, or something like that, a kind of analysis. Now here's an argument that you all believe. And I want to make sure you realize that I could be1:17:27
bullshitting you. What is this? u is the sum of 1/2, 1/4, and 1/8, and so on, the sum of a1:17:40
geometric series. And, of course, I could play a game here. u minus one is 1/2, plus 1/4, plus 1/8, and so on.1:17:53
What I could do here-- oops. There is a parentheses error here. But I can put here two times u minus one is one plus 1/2,1:18:02
plus 1/4, plus 1/8. Can I fix that?1:18:14
Yes, well. But that gives me back two times u minus one is u,1:18:27
therefore, we conclude that u is two. And this actually is true. There's no problem like that. But supposing I did something different.1:18:38
Supposing I start up with something which manifestly has no sum. v is one, plus two, plus four, plus 8, plus dot, dot, dot.1:18:47
Well, v minus one is surely two, plus four, plus eight, plus dot, dot, dot. v minus one over two, gee, that looks like v again.1:18:57
From that I should be able to conclude that-- that's also wrong, apparently. v equals minus one.1:19:12
That should be a minus one. And that's certainly a false conclusion.1:19:22
So when you play with limits, arguments that may work in one case they may not work in some other case. You have to be very careful.1:19:32
The arguments have to be well formed. And I don't know, in general, what the story is about arguments like this.1:19:43
We can read a pile of topology and find out. But, surely, at least you understand now, why it might be some meaning to the things we've been writing on the1:19:52
blackboard. And you understand what that might mean. So, I suppose, it's almost about time for you to merit1:20:02
being made a member of the grand recursive order of lambda calculus hackers. This is the badge. Because you now understand, for example, what it says at1:20:14
the very top, y F equals F y F. Thank you. Are there any questions?1:20:24
Yes, Lev. AUDIENCE: With this, it seems that then there's no need to define, as you imply, to just remember a value, to apply it later.1:20:34
Defines were kind of a side-effect it seemed in the language. [INTERPOSING] are order dependent. Does this eliminate the side-effect from the [INTERPOSING]1:20:43
PROFESSOR: The answer is, this is not the way these things were implemented. Define, indeed is implemented as an operation that actually1:20:53
modifies an environment structure, changes the frame that the define is executed in.1:21:03
And there are many reasons for that, but a lot of this has to do with making an interactive system. What this is saying is that if you've made a system, and you1:21:14
know you're not going to do any debugging or anything like that, and you know everything there is all at once, and you want to say, what is the meaning of a final set of equations?1:21:24
This gives you a meaning for it. But in order to make an interactive system, where you can change the meaning of one thing without changing everything else, incrementally, you can't do1:21:33
that by implementing it this way. Yes. AUDIENCE: Another question on your danger slide.1:21:44
It seemed that the two examples that you gave had to do with convergence and non-convergence? And that may or may not have something to do with function1:21:53
theory in a way which would lead you to think of it in terms of linear systems, or non-linear systems. How does this convergence relate to being able to see a priori1:22:03
what properties of that might be violated? PROFESSOR: I don't know. The answer is, I don't know under what circumstances. I don't know how to translate that into less than an1:22:13
hour of talk more. What are the conditions under which, for which we know that these things converge?1:22:22
And v, all that was telling you that arguments that are based on convergence are flaky if you don't know the convergence beforehand.1:22:32
You can make wrong arguments. You can make deductions, as if you know the answer, and not be stopped somewhere by some obvious contradiction. AUDIENCE: So can we say then that if F is a convergent1:22:43
mathematical expression, then the recursion property can be-- PROFESSOR: Well, I think there's a technical kind of F,1:22:52
there is a technical description of those F's that have the property that when you iteratively apply them like this, you converge.1:23:03
Things that are monotonic, and continuous, and I forgot what else. There is a whole bunch of little conditions like that1:23:12
which have this property. Now the real problem is deducing from looking at the F, its definition here, whether not it has those properties, and that's very hard.1:23:22
The properties are easy. You can write them down. You can look in a book by Joe Stoy. It's a great book-- Stoy.1:23:31
It's called, The Scott-Strachey Method of Denotational Semantics, and it's by Joe Stoy, MIT Press.1:23:47
And he works out all this in great detail, enough to horrify you. But it really is readable.1:24:09
OK, well, thank you. Time for the bigger break, I suppose.0:00:00
Lecture 7B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING]0:00:16
PROFESSOR: Well, let's see. What we did so far was a lot of fun, was it useful for anything?0:00:26
I suppose the answer is going to be yes. If these metacircular interpreters are a valuable thing to play with.0:00:38
Well, there have been times I spend 50% of my time, over a year, trying various design alternatives by experimenting with them with metacircular interpreters--0:00:49
metacircular interpreters like the sort you just saw. Metacircular is because they are defined in terms of themselves in such a way that the language they interpret0:00:58
contains itself. Such interpreters are a convenient medium for exploring language issues. If you want to try adding a new feature, it's sort of a0:01:11
snap, it's easy, you just do it and see what happens. You play with that language for a while you say, gee, I'm didn't like that, you throw it away.0:01:21
Or you might want to see what the difference is if you'd make a slight difference in the binding strategy, or some0:01:30
more complicated things that might occur. In fact, these metacircular interpreters are an excellent medium for people exchanging ideas about language design,0:01:44
because they're pretty easy to understand, and they're short, and compact, and simple. If I have some idea that I want somebody to criticize0:01:54
like say, Dan Friedman at Indiana, I'd write a little metacircular interpreter and send him some network mail0:02:04
with this interpreter in it. He could whip it up on his machine and play with it and say, that's no good. And then send it back to me and say, well, why don't you0:02:13
try this one, it's a little better. So I want to show you some of that technology. See, because, really, it's the essential, simple technology0:02:24
for getting started in designing your own languages for particular purposes. Let's start by adding a very simple feature to a Lisp.0:02:40
Now, one thing I want to tell you about is features, before I start.0:02:49
There are many languages that have made a mess of themselves by adding huge numbers of features. Computer scientists have a joke about bugs that transform0:03:00
it to features all the time. But I like to think of it is that many systems suffer from0:03:10
what's called creeping featurism. Which is that George has a pet feature he'd like in the system, so he adds it.0:03:20
And then Harry says, gee, this system is no longer what exactly I like, so I'm going to add my favorite feature. And then Jim adds his favorite feature.0:03:30
And, after a while, the thing has a manual 500 pages long that no one can understand. And sometimes it's the same person who writes all of these0:03:40
features and produces this terribly complicated thing. In some cases, like editors, it's sort of reasonable to have lots of features, because there are a lot of things you0:03:51
want to be able to do and many of them arbitrary. But in computer languages, I think it's a disaster to have0:04:00
too much stuff in them. The other alternative you get into is something called feeping creaturism, which is where you have a box which has0:04:12
a display, a fancy display, and a mouse, and there is all sorts of complexity associated with all this fancy IO.0:04:21
And your computer language becomes a dismal, little, tiny thing that barely works because of all the swapping, and disk twitching, and so on, caused by your Windows system.0:04:30
And every time you go near the computer, the mouse process wakes up and says, gee do you have something for me to do, and then it goes back to sleep. And if you accidentally push mouse with you elbow, a big0:04:40
puff of smoke comes out of your computer and things like that. So two ways to disastrously destroy a system by adding features. But try right now to add a little, simple feature.0:04:52
This actually is a good one, and in fact, real Lisps have it. As you've seen, there are procedures like plus and times0:05:03
that take any number of arguments. So we can write things like the sum of the product of a and x and x, and the product of b and x and c.0:05:17
As you can see here, addition takes three arguments or two arguments, multiplication takes two arguments or three arguments, taking numbers of arguments all of which are to0:05:27
be treated in the same way. This is a valuable thing, indefinite numbers of arguments. Yet the particular Lisp system that I showed you is one where0:05:40
the numbers of arguments is fixed, because I had to match the arguments against the formal parameters in the binder, where there's a pairup.0:05:50
Well, I'd like to be able to define new procedures like this that can have any number of arguments. Well there's several parts to this problem.0:06:01
The first part is coming up with the syntactic specification, some way of notating the additional0:06:10
arguments, of which you don't know how many there are. And then there's the other thing, which is once we've notated it, how are we going to interpret that notation so0:06:21
as to do the right thing, whatever the right thing is? So let's consider an example of a sort of thing we might want to be able to do.0:06:33
So an example might be, that I might want to be able to define a procedure which is a procedure of one required argument x and a bunch of arguments, I don't know how0:06:45
many there are, called y. So x is required, and there are many y's, many argument--0:07:04
y will be the list of them.0:07:14
Now, with such a thing, we might be able to say something like, map-- I'm going to do something to every one-- of that procedure of one argument u, which multiplies x0:07:30
by u, and we'll apply that to y. I've used a dot here to indicate that the thing after0:07:41
this is a list of all the rest of the arguments. I'm making a syntactic specification.0:07:53
Now, what this depends upon, the reason why this is sort of a reasonable thing to do, is because this happens to be a syntax that's used in the Lisp reader for0:08:04
representing conses. We've never introduced that before. You may have seen when playing with the system that if you0:08:13
cons two things together, you get the first, space, dot, the second, space-- the first, space, dot, space, the second with parentheses0:08:23
around the whole thing. So that, for example, this x dot y corresponds to a pair,0:08:36
which has got an x in it and a y in it. The other notations that you've seen so far are things0:08:45
like a procedure of arguments x and y and z which do things,0:08:55
and that looks like-- Just looking at the bound variable list, it looks like0:09:04
this, x, y, z, and the empty thing.0:09:18
If I have a list of arguments I wish to match this against, supposing, I have a list of arguments one, two, three, I want to match these against. So I might have here a list of0:09:36
three things, one, two, three.0:09:48
And I want to match x, y, z against one, two, three. Well, it's clear that the one matches the x, because I can just sort of follow the structure, and the two matches0:10:00
the y, and the three matches the z. But now, supposing I were to compare this x dot y--0:10:09
this is x dot y-- supposing I compare that with a list of three arguments, one, two, three.0:10:18
Let's look at that again.0:10:28
One, two, three-- Well, I can walk along here and say, oh yes, x matches the one, the y matches the list, which is two and three.0:10:43
So the notation I'm choosing here is one that's very natural for Lisp system.0:10:52
But I'm going to choose this as a notation for representing a bunch of arguments. Now, there's an alternative possibility. If I don't want to take one special out, or two special0:11:03
ones out or something like that, if I don't want to do that, if I want to talk about just the list of all the arguments like in addition, well then the argument list0:11:16
I'm going to choose to be that procedure of all the arguments x which does something with x.0:11:25
And which, for example, if I take the procedure, which takes all the arguments x and returned the list of them,0:11:35
that's list. That's the procedure list.0:11:45
How does this work? Well, indeed what I had as the bound variable list in this case, whatever it is, is being matched against a list of arguments.0:11:55
This symbol now is all of the arguments. And so this is the choice I'm making for a particular syntactic specification, for the description of procedures0:12:08
which take indefinite numbers of arguments. There are two cases of it, this one and this one.0:12:18
When you make syntactic specifications, it's important that it's unambiguous, that neither of these can be confused with a representation we already have, this one.0:12:33
I can always tell whether I have a fixed number of explicitly named arguments made by these formal parameters, or a fixed number of named formal parameters0:12:45
followed by a thing which picks up all the rest of them, or a list of all the arguments which will be matched against0:12:54
this particular formal parameter called x, because these are syntactically distinguishable. Many languages make terrible errors in that form where0:13:05
whole segments of interpretation are cut off, because there are syntactic ambiguities in the language.0:13:14
They are the traditional problems with ALGOL like languages having to do with the nesting of ifs in the predicate part.0:13:25
In any case, now, so I've told you about the syntax, now, what are we going to do about the semantics of this?0:13:35
How do we interpret it? Well this is just super easy. I'm going to modify the metacircular interpreter to do it. And that's a one liner.0:13:46
There it is. I'm changing the way you pair things up.0:13:56
Here's the procedure that pairs the variables, the0:14:06
formal parameters, with the arguments that were passed from the last description of the metacircular interpreter.0:14:18
And here's some things that are the same as they were before. In other words, if the list of variables is empty, then if the list of values is empty, then I have an empty list.0:14:31
Otherwise, I have too many arguments, that is, if I have empty variables but not empty values.0:14:41
If I have empty values, but the variables are not empty, I have too few arguments.0:14:50
The variables are a symbol-- interesting case-- then, what I should do is say, oh yes, this is the special0:15:04
case that I have a symbolic tail. I have here a thing just like we looked over here.0:15:14
This is a tail which is a symbol, y. It's not a nil. It's not the empty list. Here's a symbolic tail that is0:15:24
just the very beginning of the tail. There is nothing else. In that case, I wish to match that variable with all the0:15:36
values and add that to the pairing that I'm making. Otherwise, I go through the normal arrangement of making0:15:47
up the whole pairing. I suppose that's very simple. And that's all there is to it.0:15:57
And now I'll answer some questions. The first one-- Are there any questions?0:16:06
Yes? AUDIENCE: Could you explain that third form? PROFESSOR: This one? Well, maybe we should look at the thing as a0:16:15
piece of list structure. This is a procedure which contains a lambda.0:16:25
I'm just looking at the list structure which represents this. Here's x. These are our symbols.0:16:37
And then the body is nothing but x. If I were looking for the bound variable list part of0:16:48
this procedure, I would go looking at the CADR, and I'd find a symbol. So the, naturally, which is this pairup thing I just showed you, is going to be matching a symbolic object0:17:01
against a list of arguments that were passed. And it will bind that symbol to the list of arguments.0:17:13
In this case, if I'm looking for it, the match will be against this in the bound variable list position.0:17:24
Now, if what this does is it gets a list of arguments and returns it, that's list. That's what the procedure is.0:17:34
Oh well, thank you. Let's take a break. [MUSIC PLAYING]0:18:20
PROFESSOR: Well let's see. Now, I'm going to tell you about a rather more substantial variation, one that's a famous variation that0:18:32
many early Lisps had. It's called dynamic binding of variables.0:18:41
And we'll investigate a little bit about that right now. I'm going to first introduce this by showing you the sort of thing that would make someone want this idea.0:18:53
I'm not going to tell what it is yet, I'm going to show you why you might want it. Suppose, for example, we looked at the sum procedure0:19:02
again for summing up a bunch of things. To be that procedure, of a term, lower bound, method of0:19:15
computing the next index, and upper bound, such that, if a0:19:25
is greater than b then the result is 0, otherwise, it's0:19:34
the sum, of the term, procedure, applied to a and the result of adding up, terms, with the next a being0:19:51
the a, the next procedure passed along, and the upper0:20:06
bound being passed along. Blink, blink, blink--0:20:18
Now, when I use this sum procedure, I can use it, for example, like this. We can define the sum of the powers to be, for example, sum0:20:38
of a bunch of powers x to the n, to be that procedure of a, b, and n-- lower bound, the upper bound, and n--0:20:48
which is sum, of lambda of x, the procedure of one argument x, which exponentiates x to the n, with the a, the0:21:05
incrementer, and b, being passed along. So we're adding up x to n, given an x.0:21:16
x takes on values from a to b, incrementing by one. I can also write the--0:21:27
That's right. Product, excuse me. The product of a bunch of powers.0:21:38
It's a strange name. I'm going to leave it there. Weird-- I write up what I have. I'm sure that's right.0:21:50
And if I want the product of a bunch of powers-- That was 12 brain cells, that double-take.0:22:03
I can for example use the procedure which is like sum, which is for making products, but it's similar to that, that you've seen before. There's a procedure of three arguments again.0:22:16
Which is the product of terms that are constructed, or factors in this case, constructed from0:22:26
exponentiating x to the n, where I start with a, I0:22:35
increment, and I go to b. Now, there's some sort of thing here that should disturb0:22:48
you immediately. These look the same. Why am I writing this code so many times? Here I am, in the same boat I've been in before.0:23:01
Wouldn't it be nice to make an abstraction here? What's an example of a good abstraction to make? Well, I see some codes that's identical. Here's one, and here's another.0:23:14
And so maybe I should be able to pull that out. I should be able to say, oh yes, the sum of the powers could be written in terms of something called0:23:23
the nth power procedure. Imagine somebody wanted to write a slightly different procedure that looks like this.0:23:37
The sum powers to be a procedure of a, b, and n, as0:23:49
the result of summing up the nth power. We're going to give a name to that idea, for starting at a,0:23:59
going by one, and ending at b. And similarly, I might want to write the product powers this0:24:12
way, abstracting out this idea. I might want this.0:24:22
Product powers, to be a procedure of a, b, and n,0:24:35
which is the product of the nth power operation on a with0:24:47
the incrementation and b being my arguments for the0:24:56
analogous-thing product. And I'd like to be able to define, I'd like to be able to define nth power-- I'll put it over here.0:25:11
I'll put it at the top.0:25:25
--to be, in fact, my procedure of one argument x which is the result of exponentiating x to the n.0:25:35
But I have a problem. My environment model, that is my means of interpretation for0:25:44
the language that we've defined so far, does not give me a meaning for this n. Because, as you know, this n is free in this procedure.0:26:06
The environment model tells us that the meaning of a free variable is determined in the environment in which this procedure is defined.0:26:16
In a way I have written it, assuming these things are defined on the blackboard as is, this is defined in the global environment, where there is no end.0:26:25
Therefore, n is unbound variable. But it's perfectly clear, to most of us, that we would like it to be this n and this n.0:26:38
On the other hand, it would be nice. Certainly we've got to be careful here of keeping this to be this, and this one over here, wherever it0:26:51
is to be this one. Well, the desire to make this work has led to0:27:01
a very famous bug. I'll tell you about the famous bug. Look at this slide.0:27:10
This is an idea called dynamic binding. Where, instead of the free variable being interpreted in the environment of definition of a procedure, the free0:27:22
variable is interpreted as having its value in the environment of the caller of the procedure.0:27:31
So what you have is a system where you search up the chain of callers of a particular procedure, and, of course, in0:27:41
this case, since nth power is called from inside product whatever it is-- I had to write our own sum which is the analogous procedure--0:27:50
and product is presumably called from product powers, as you see over here, then since product powers bind with variable n , then nth powers n would be derived0:28:03
through that chain. Similarly, this n, the nth power in n in this case, would0:28:12
come through nth power here being called from inside sum. You can see it being called from inside sum here. It's called term here.0:28:22
But sum was called from inside of sum powers, which bound n. Therefore, there would be an n available for that n to get0:28:35
it's value from. What we have below this white line plus over here, is what's called a dynamic binding view of the world.0:28:46
If that works, that's a dynamic binding view. Now, let's take a look, for example, at just what it takes0:28:55
to implement that. That's real easy. In fact, the very first Lisps that had any interpretations of the free variables at all, had dynamic binding0:29:04
interpretations for the free variables. APL has dynamic binding interpretation for the free variables, not lexical or static binding.0:29:15
So, of course, the change is in eval. And it's really in two places. First of all, one thing we see, is that things become a0:29:27
little simpler. If I don't have to have the environment be the environment of definition for procedure, the procedure need not capture0:29:38
the environment at the time it's defined. And so if we look here at this slide, we see that the clause0:29:47
for a lambda expression, which is the way a procedure is defined, does not make up a thing which has a type closure0:29:57
and a attached environment structure. It's just the expression itself. And we'll decompose that some other way somewhere else.0:30:06
The other thing we see is the applicator must be able to get the environment of the caller. The caller of a procedure is right here.0:30:19
If the expression we're evaluating is anpplication or a combination, then we're going to call a procedure which is the value of the operator.0:30:29
The environment of the caller is the environment we have right here, available now. So all I have to do is pass that environment to the0:30:38
applicator, to apply. And if we look at that here, the only change we have to make is that fellow takes that environment and uses that0:30:49
environment for the purpose of extending that environment when abiding the formal parameters of the procedure to0:31:00
the arguments that were passed, not an environment that was captured in the procedure. The reason why the first Lisps were implemented this way, is0:31:09
the sort of the obvious, accidental implementation. And, of course, as usual, people got used to it and liked it. And there were some people said, this is0:31:18
the way to do it. Unfortunately that causes some serious problems. The most important, serious problem in using dynamic binding is0:31:31
there's a modularity crisis that's involved it. If two people are working together on some big system, then an important thing to want is that the names used by0:31:41
each one don't interfere with the names of the other. It's important that when I invent some segment of code0:31:51
that no one can make my code stop working by using my names that I use internal to my code, internal to his code. However, dynamic binding violates that particular0:32:03
modularity constraint in a clear way. Consider, for example, what happens over here.0:32:12
Suppose it was the case that I decided to change the word next. Supposing somebody is writing sum, and somebody else is0:32:25
going to use sum. The writer of sum has a choice of what names he may use. Let's say, I'm that writer.0:32:36
Well, by gosh, just happens I didn't want to call this next. I called it n. So all places where you see next, I called it n.0:32:48
Whoops. I changed nothing about the specifications of this program, but this program stops working. Not only that, unfortunately, this one does too.0:32:59
Why do these programs stop working? Well, it's sort of clear. Instead of chasing out the value of the n that occurs in0:33:09
nth power over here or over here, through the environment of definition, where this one is always linked to this one,0:33:19
if it was through the environment of definition, because here is the definition. This lambda expression was executed in the environment where that n was defined.0:33:30
If instead of doing that, I have to chase through the call chain, then look what horrible thing happens. Well, this was called from inside sum as term, term a.0:33:44
I'm looking for a value of n. Instead of getting this one, I get that one. So by changing the insides of this program, this program0:33:53
stops working. So I no longer have a quantifier, as I described before.0:34:02
The lambda symbol is supposed to be a quantifier. A thing which has the property that the names that are bound by it are unimportant, that I can uniformly substitute any0:34:14
names for these throughout this thing, so long as they don't occur in here, the new names, and the meaning of this expression should remain unchanged.0:34:24
I've just changed the meaning of the expression by changing the one of the names. So lambda is no longer a well defined idea. It's a very serious problem.0:34:34
So for that reason, I and my buddies have given up this particular kind of abstraction, which I would0:34:43
like to have, in favor of a modularity principle. But this is the kind of experiment you can do if you0:34:52
want to play with these interpreters. You can try them out this way, that way, and the other way. You see what makes a nicer language.0:35:02
So that's a very important thing to be able to do. Now, I would like to give you a feeling for I think the right thing to do is here. How are you going to I get this kind of power in a0:35:14
lexical system? And the answer is, of course, what I really want is a something that makes up for me an exponentiator for a particular n.0:35:23
Given an n, it will make me an exponentiator. Oh, but that's easy too. In other words, I can write my program this way.0:35:35
I'm going to define a thing called PGEN, which is a procedure of n which produces for me an exponentiator.0:35:50
--x to the n. Given that I have that, then I can capture the abstraction I0:36:00
wanted even better, because now it's encapsulated in a way where I can't be destroyed by a change of names. I can define some powers to be a procedure again of a, b, and0:36:20
n which is the sum of the term function generated by using this generator, PGEN, n, with a, incrementer, and b.0:36:42
And I can define the product of powers to be a procedure of0:36:57
a, b, and n which is the product PGEN, n, with a,0:37:09
increment, and b. Now, of course, this is a very simple example where this object that I'm trying to abstract over is small. But it could be a 100 lines of code.0:37:20
And so, the purpose of this is, of course, to make it simple. I'd give a name to it, it's just that here it's a parameterized name. It's a name that depends upon, explicitly, the lexically0:37:31
apparent value of n. So you can think of this as a long name.0:37:40
And here, I've solved my problem by naming the term generation procedures within an n in them.0:37:55
Are there any questions? Oh, yes, David. AUDIENCE: Is the only solution to the problem you raise to0:38:04
create another procedure? In other words, can this only work in languages that are capable of defining objects as procedures? PROFESSOR: Oh, I see.0:38:16
My solution to making this abstraction, when I didn't want include the procedure inside the body, depends upon my ability to return a procedure or export one.0:38:28
And that's right. If I don't have that, then I just don't have this ability to make an abstraction in a way where I don't have0:38:39
possibilities of symbol conflicts that were unanticipated. That's right. I consider being able to return the procedural value0:38:52
and, therefore, to sort of have first class procedures, in general, as being essential to doing very good modular0:39:01
programming. Now, indeed there are many other ways to skin this cat. What you can do is take for each of the bad things that0:39:10
you have to worry about, you can make a special feature that covers that thing. You can make a package system. You can make a module system as in Ada, et cetera.0:39:22
And all of those work, or they cover little regions of it. The thing is that returning procedures as values cover all of those problems. And so it's the simplest mechanism that0:39:35
gives you the best modularity, gives you all of the known modularity mechanisms.0:39:45
Well, I suppose it's time for the next break, thank you. [MUSIC PLAYING]0:40:41
PROFESSOR: Well, yesterday when you learned about streams, Hal worried to you about the order of evaluation0:40:52
and delayed arguments to procedures. The way we played with streams yesterday, it was the responsibility of the caller and the callee to both agree0:41:07
that an argument was delayed, and the callee must force the argument if it needs the answer. So there had to be a lot of hand shaking between the0:41:18
designer of a procedure and user of it over delayedness. That turns out, of course, to be a fairly bad thing, it0:41:29
works all right with streams. But as a general thing, what you want is an idea to have a locus, a decision, a design decision in general, to have a place where it's made,0:41:40
explicitly, and notated in a clear way. And so it's not a very good idea to have to have an agreement, between the person who writes a procedure and the0:41:52
person who calls it, about such details as, maybe, the arguments of evaluation, the order of evaluation. Although, that's not so bad. I mean, we have other such agreements like,0:42:02
the input's a number. But it would be nice if only one of these guys could take responsibility, completely.0:42:11
Now this is not a new idea. ALGOL 60 had two different ways of calling a procedure.0:42:22
The arguments could be passed by name or by value. And what that meant was that a name argument was delayed.0:42:31
That when you passed an argument by name, that its value would only be obtained if you accessed that argument.0:42:42
So what I'd like to do now is show you, first of all, a little bit about, again, we're going to make a modification to a language. In this case, we're going to add a feature.0:42:53
We're going to add the feature of, by name parameters, if you will, or delayed parameters. Because, in fact, the default in our Lisp system is by the0:43:05
value of a pointer. A pointer is copied, but the data structure it points at is not. But I'd like to, in fact, show you is how you add name0:43:17
arguments as well. Now again, why would we need such a thing? Well supposing we wanted to invent certain kinds of what0:43:26
otherwise would be special forms, reserve words? But I'd rather not take up reserve words. I want procedures that can do things like if.0:43:36
If is special, or cond, or whatever it is. It's the same thing. It's special in that it determines whether or not to evaluate the consequent or the alternative based on the value0:43:48
of the predicate part of an expression. So taking the value of one thing determines whether or not to do something else.0:43:57
Whereas all the procedures like plus, the ones that we can define right now, evaluate all of their arguments before application.0:44:08
So, for example, supposing I wish to be able to define something like the reverse of if in terms of if.0:44:19
Call it unless. We've a predicate, a consequent, and an alternative.0:44:28
Now what I would like to sort of be able to do is say-- oh, I'll do it in terms of cond. Cond, if not the predicate, then take the consequent,0:44:41
otherwise, take the alternative.0:44:51
Now, what I'd like this to mean, is supposing I do something like this. I'd like this unless say if equals one, 0, then the answer0:45:05
is two, otherwise, the quotient of one and 0.0:45:15
What I'd like that to mean is the result of substituting equal one, 0, and two, and the quotient of one, 0 for p, c, and a.0:45:25
I'd like that to mean, and this is funny, I'd like it to transform into or mean cond not equal one, 0, then the0:45:40
result is two, otherwise I want it to be the quotient one and 0.0:45:54
Now, you know that if I were to type this into Lisp, I'd get a two. There's no problem with that. However, if I were to type this into Lisp, because all0:46:05
the arguments are evaluated before I start, then I'm going to get an error out of this. So that if the substitutions work at all, of course, I0:46:16
would get the right answer. But here's a case where the substitutions don't work. I don't get the wrong answer. I get no answer. I get an error.0:46:28
Now, however, I'd like to be able to make my definition so that this kind of thing works. What I want to do is say something special about c and a.0:46:39
I want them to be delayed automatically. I don't want them to be evaluated at the time I call.0:46:51
So I'm going to make a declaration, and then I'm going to see how to implement such a declaration. But again, I want you to say to yourself, oh, this is an interesting kluge he's adding in here.0:47:02
The piles of kluges make a big complicated mess. And is this going to foul up something else that might occur. First of all, is it syntactically unambiguous?0:47:13
Well, it will be syntactically unambiguous with what we've seen so far. But what I'm going to do may, in fact, cause trouble. It may be that the thing I had will conflict with type0:47:25
declarations I might want to add in the future for giving some system, some compiler or something, the ability to optimize given the types are known.0:47:34
Or it might conflict with other types of declarations I might want to make about the formal parameters. So I'm not making a general mechanism here where I can add0:47:44
declarations. And I would like to be able to do that. But I don't want to talk about that right now. So here I'm going to do, I'm going to build a kluge.0:47:57
So we're going to define unless of a predicate--0:48:08
and I'm going to call these by name-- the consequent, and name the alternative.0:48:19
Huh, huh-- I got caught in the corner.0:48:31
If not p then the result is c, else--0:48:40
that's what I'd like. Where I can explicitly declare certain of the parameters to0:48:49
be delayed, to be computed later. Now, this is actually a very complicated modification to an interpreter rather than a simple one.0:49:00
The ones you saw before, dynamic binding or adding indefinite argument procedures, is relatively simple.0:49:09
But this one changes a basic strategy. The problem here is that our interpreter, as written,0:49:18
evaluates a combination by evaluating the procedure, the operator producing the procedure, and evaluating the operands producing the arguments, and then doing0:49:31
apply of the procedure to the arguments. However, here, I don't want to evaluate the operands to0:49:40
produce the arguments until after I examined the procedure to see what the procedure's declarations look like.0:49:49
So let's look at that. Here we have a changed evaluator. I'm starting with the simple lexical evaluator, not0:50:02
dynamic, but we're going to have to do something sort of similar in some ways. Because of the fact that, if I delay a procedure--0:50:13
I'm sorry-- delay an argument to a procedure, I'm going to have to attach and environment to it. Remember how Hal implemented delay.0:50:23
Hal implemented delay as being a procedure of no arguments which does some expression. That's what delay of the expression is.0:50:35
--of that expression. This turned into something like this.0:50:44
Now, however, if I evaluate a lambda expression, I have to capture the environment. The reason why is because there are variables in there0:50:56
who's meaning I wish to derive from the context where this was written. So that's why a lambda does the job.0:51:06
It's the right thing. And such that the forcing of a delayed expression was same0:51:17
thing as calling that with no arguments. It's just the opposite of this. Producing an environment of the call which is, in fact,0:51:28
the environment where this was defined with an extra frame in it that's empty. I don't care about that. Well, if we go back to this slide, since it's the case, if0:51:42
we look at this for a second, everything is the same as it was before except the case of applications or combinations.0:51:51
And combinations are going to do two things. One, is I have to evaluate the procedure-- forget the procedure-- by evaluating the operator.0:52:00
That's what you see right here. I have to make sure that that's current, that is not a delayed object, and evaluate that to the point where it's forced now.0:52:10
And then I have to somehow apply that to the operands. But I have to keep the environment, pass that0:52:20
environmental along. So some of those operands I may have to delay. I may have to attach that environment to those operands.0:52:29
This is a rather complicated thing happening here. Looking at that in apply. Apply, well it has a primitive procedure0:52:39
thing just like before. But the compound one is a little more interesting. I have to evaluate the body, just as before, in an0:52:50
environment which is the result of binding some formal parameters to arguments in the environment.0:53:00
That's true. The environment is the one that comes from the procedure now. It's a lexical language, statically bound. However, one thing I have to do is strip off the0:53:11
declarations to get the names of the variables. That's what this guy does, vnames. And the other thing I have to do is process these declarations, deciding which of these operands--0:53:21
that's the operands now, as opposed to the arguments-- which of these operands to evaluate, and which of them are to be encapsulated in delays of some sort.0:53:37
The other thing you see here is that we got a primitive, a primitive like plus, had better get at the real operands. So here is a place where we're going to have to force them.0:53:47
And we're going to look at what evlist is going to have to do a bunch of forces. So we have two different kinds of evlist now. We have evlist and gevlist. Gevlist is going to wrap delays around some things and force others, evaluate others.0:53:59
And this guy's going to do some forcing of things. Just looking at this a little bit, this is a game you must0:54:10
play for yourself, you know. It's not something that you're going to see all possible variations on an evaluator talking to me.0:54:19
What you have to do is do this for yourself. And after you feel this, you play this a bit, you get to see all the possible design decisions and what they might mean, and how they interact with each other.0:54:29
So what languages might have in them. And what are some of the consistent sets that make a legitimate language. Whereas what things are complicated kluges that are0:54:39
just piles of junk. So evlist of course, over here, just as I said, is a list of operands which are going to be undelayed after0:54:49
evaluation. So these are going to be forced, whatever that's going to mean. And gevlist, which is the next thing--0:55:01
Thank you. What we see here, well there's a couple of possibilities. Either it's a normal, ordinary thing, a symbol sitting there0:55:13
like the predicate in the unless, and that's what we have here. In which case, this is intended to be evaluated in applicative order.0:55:23
And it's, essentially, just what we had before. It's mapping eval down the list. In other words, I evaluate the first expression and continue gevlisting the0:55:35
CDR of the expression in the environment. However, it's possible that this is a name parameter. If it's a name parameter, I want to put a delay in which0:55:47
combines that expression, which I'm calling by name, with the environment that's available at this time and0:55:59
passing that as the parameter. And this is part of the mapping process that you see here.0:56:09
The only other interesting place in this interpreter is cond. People tend to write this thing, and then they leave this one out.0:56:18
There's a place where you have to force. Conditionals have to know whether or not the answer is true or false. It's like a primitive.0:56:28
When you do a conditional, you have to force. Now, I'm not going to look at any more of this in any detail. It isn't very exciting. And what's left is how you make delays.0:56:38
Well, delays are data structures which contain an expression, an environment, and a type on them. And it says they're a thunk. That comes from ALGOL language, and it's claimed to0:56:50
be the sound of something being pushed on a stack. I don't know. I was not an ALGOLician or an ALGOLite or whatever, so I don't know. But that's what was claimed.0:57:00
And undelay is something which will recursively undelay thunks until the thunk becomes something which isn't a thunk. This is the way you implement a call by name0:57:09
like thing in ALGOL. And that's about all there is. Are there any questions?0:57:26
AUDIENCE: Gerry? PROFESSOR: Yes, Vesko? AUDIENCE: I noticed you avoided calling by name in the primitive procedures, I was wondering what0:57:38
cause you have on that? You never need that? PROFESSOR: Vesko is asking if it's ever reasonable to call a primitive procedure by name?0:57:47
The answer is, yes. There's one particular case where it's reasonable, actually two.0:57:56
Construction of a data structure like cons where making an array if you have arrays with any number of elements. It's unnecessary to evaluate those arguments.0:58:07
All you need is promises to evaluate those arguments if you look at them. If I cons together two things, then I could cons together the0:58:17
promises just as easily as I can cons together the things. And it's not even when I CAR CDR them that I have to look at them. That just gets out the promises and0:58:26
passes them to somebody. That's why the lambda calculus definition, the Alonzo Church definition of CAR, CDR, and cons makes sense. It's because no work is done in CAR, CDR, and cons, it's0:58:36
just shuffling data, it's just routing, if you will. However, the things that do have to look at data are things like plus.0:58:45
Because they have a look at the bits that the numbers are made out of, unless they're lambda calculus numbers which are funny. They have to look at the bits to be able to crunch them0:58:54
together to do the add. So, in fact, data constructors, data selectors,0:59:03
and, in fact, things that side-effect data objects don't need to do any forcing in the laziest possible interpreters.0:59:16
On the other hand predicates on data structures have to. Is this a pair? Or is it a symbol? Well, you better find out.0:59:25
You got to look at it then. Any other questions?0:59:40
Oh, well, I suppose it's time for a break. Thank you. [MUSIC PLAYING]1:00:02
and0:00:00
Lecture 8A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING BY J.S. BACH]0:00:17
PROFESSOR: The last time we began having a look at how languages are constructed. Remember the main point that an evaluator for, LISP, say,0:00:26
has two main elements. There is EVAL, and EVAL's job is to take in an expression0:00:36
and an environment and turn that into a procedure and some arguments and pass that off to APPLY.0:00:49
And APPLY takes the procedure in the arguments, turns that back into, in a general case, another expression to be evaluated in another environment and passes that0:00:58
off to EVAL, which passes it to APPLY, and there's this whole big circle where things go around and around and around until you get either to some very primitive data or to a primitive procedure.0:01:07
See, what this cycle has to do with is unwinding the means of combination and the means of abstraction in the language. So for instance, you have a procedure in LISP-- a0:01:17
procedure is a general way of saying, I want to be able to evaluate this expression for any value of the arguments, and that's sort of what's going on here.0:01:27
That's what APPLY does. It says the general thing coming in with the arguments reduces to the expression that's the body, and then if that's a compound expression or another procedure application, the thing will go around and around the circle.0:01:40
Anyway, that's sort of the basic structure of gee, pretty much any interpreter. The other thing that you saw is once you have the interpreter in your hands, you have all this power to start0:01:49
playing with the language. So you can make it dynamically scoped, or you can put in normal order evaluation, or you can add new forms to the language, whatever you like. Or more generally, there's this notion of metalinguistic0:02:00
abstraction, which says that part of your perspective as an engineer, as a software engineer, but as an engineer0:02:09
in general is that you can gain control of complexity by inventing new languages sometimes.0:02:18
See, one way to think about computer programming is that it only incidentally has to do with getting a computer to do something. Primarily what a computer program has to do with, it's a0:02:29
way of expressing ideas with communicating ideas. And sometimes when you want to communicate new kinds of ideas, you'd like to invent new modes of expressing that.0:02:39
Well, today we're going to apply this framework to build a new language. See, once we have the basic idea of the interpreter, you0:02:48
can pretty much go build any language that you like. So for example, we can go off and build Pascal. And gee, we would worry about syntax and parsing and various0:02:58
kinds of compiler optimizations, and there are people who make honest livings doing that, but at the level of abstraction that we're talking, a Pascal interpreter0:03:09
would not look very different at all from what you saw Gerry do last time. Instead of that, we'll spend today building a really0:03:18
different language, a language that encourages you to think about programming not in terms of procedures, but in a really different way.0:03:29
And the lecture today is going to be at two levels simultaneously. On the one hand, I'm going to show you what this language looks like, and on the other hand, I'll show you how it's0:03:40
implemented. And we'll build an implementation in LISP and see how that works. And you should be drawing lessons on two levels. The first is to realize just how different a0:03:52
language can be. So if you think that the jump from Fortran to LISP is a big deal, you haven't seen anything yet.0:04:01
And secondly, you'll see that even with such a very different language, which will turn out to not have procedures at all and not talk about functions at all, there0:04:12
will still be this basic cycle of eval and apply that's unwinds the means of combination and the means an abstraction. And then thirdly, as kind of a minor but elegant technical0:04:24
point, you'll see a nice use of streams to avoid backtracking. OK, well, I said that this language is very different.0:04:35
To explain that, let's go back to the very first idea that we talked about in this course, and that was the idea of the0:04:44
distinction between the declarative knowledge of mathematics-- the definition of a square root as a mathematical truth--0:04:55
and the idea that computer science talks about the how to knowledge-- contrast that definition of square root with a program to compute a square root.0:05:05
That's where we started off. Well, wouldn't it be great if you could somehow bridge this gap and make a programming language which sort of did0:05:16
things, but you talked about it in terms of truth, in declarative terms? So that would be a programming language in which you specify facts.0:05:27
You tell it what is. You say what is true. And then when you want an answer, somehow the language has built into it automatically general kinds of0:05:38
how to knowledge so it can just take your facts and it can evolve these methods on its on using the facts you gave it and maybe some general rules of logic.0:05:49
So for instance, I might go up to this program and start telling it some things. So I might tell it that the son of Adam is Abel.0:06:08
And I might tell it that the son of Adam is Cain.0:06:17
And I might tell it that the son of Cain is Enoch.0:06:27
And I might tell it that the son of Enoch is Irad, and all0:06:37
through the rest of our chapter whatever of Genesis, which ends up ending in Adah, by the way, and this shows the genealogy of Adah from Cain.0:06:48
Anyway, once you tell it these facts, you might ask it things. You might go up to your language and say, who's the0:06:58
son of Adam? And you can very easily imagine having a little general purpose search program which would be able to go through and in response to that say, oh yeah, there are0:07:08
two answers: the son of Adam is Abel and the son of Adam is Cain. Or you might say, based on the very same facts, who is Cain0:07:19
the son of? And then you can imagine generating another slightly different search program which would be able to go through0:07:29
and checked for who is Cain, and son of, and come up with Adam. Or you might say, what's the relationship0:07:40
between Cain and Enoch? And again, a minor variant on that search program. You could figure out that it said son of.0:07:52
But even here in this very simple example, what you see is that a single fact, see, a single fact like the son of Adam is Cain can be used to answer0:08:04
different kinds of questions. You can say, who's the son of, or you can say who's the son of Adam, or you can say what's the relation between Adam and Cain? Those are different questions being run by different0:08:17
traditional procedures all based on the same fact. And that's going to be the essence of the power of this programming style, that one piece of declarative knowledge0:08:30
can be used as the basis for a lot of different kinds of how-to knowledge, as opposed to the kinds of procedures we're writing where you sort of tell it what input you're0:08:39
giving it and what answer you want. So for instance, our square root program can perfectly well answer the question, what's the square root of 144?0:08:48
But in principle, the mathematical definition of square root tells you other things. Like it could say, what is 17 the square root of?0:08:57
And that would be have to be answered by a different program. So the mathematical definition, or in general, the facts that you give it are somehow unbiased as to what0:09:09
the question is. Whereas the programs we tend to write specifically because they are how-to knowledge tend to be looking for a specific answer. So that's going to be one characteristic of what we're0:09:19
talking about. We can go on. We can imagine that we've given our language some sort of facts. Now let's give it some rules of inference.0:09:30
We can say, for instance, if the-- make up some syntax here-- if the son of x is y--0:09:41
I'll put question marks to indicate variables here-- if the son of x is y and the son of y is z, then the0:10:01
grandson of x is z. So I can imagine telling my machine that rule and then0:10:15
being able to say, for instance, who's the grandson of Adam? Or who is Irad the grandson of?0:10:24
Or deduce all grandson relationships you possibly can from this information. We can imagine somehow the language knowing how to do0:10:34
that automatically. Let me give you maybe a little bit more concrete example.0:10:49
Here's a procedure that merges two sorted lists. So x and y are two, say, lists of numbers, lists of distinct0:11:01
numbers, if you like, that are in increasing order. And what merge does is take two such lists and combine them into a list where everything's in increasing0:11:10
order, and this is a pretty easy programs that you ought to be able to write. It says, if x is empty, the answer is y. If y is empty, the answer is x.0:11:21
Otherwise, you compare the first two elements. So you pick out the first thing in x and the first thing in y, and then depending on which of those first elements0:11:31
is less, you stick the lower one on to the result a recursively merging, either chopping the first one off x0:11:40
or chopping the first one off y. That's a standard kind of program. Let's look at the logic. Let's forget about the program and look at the logic on which0:11:51
that procedure is based. See, there's some logic which says, gee, if the first one is less, then we get the answer by sticking something onto the0:12:00
result of recursively merging the rest. So let's try and be explicit about what that logic is that's making the program work. So here's one piece.0:12:10
Here's the piece of the program which recursively chops down x if the first thing in x is smaller.0:12:19
And if you want to be very explicit about what the logic is there, what's really going on is a deduction, which says, if you know that some list, that we'll call cdr of x, and0:12:31
y merged to form z, and you know that a is less than the0:12:40
first thing in y, then you know that if you put a onto the cdr of x, then that result and y merge to form a onto z.0:12:55
And what that is, that's the underlying piece of logic-- I haven't written it as a program, I wrote it a sort of deduction that's underneath this particular clause that0:13:05
says we can use the recursion there. And then similar, here's the other clause just to complete it.0:13:14
The other clause is based on this piece of logic, which is almost the same and I won't go through it, and then there's the n cases where we tested for null, and that's based on the idea that for any x, x and the empty list merge to form0:13:26
an x, or for any y, the empty list and y merge to form y. OK, so there's a piece of procedure and the logic on0:13:39
which it's based. And notice a big difference. The procedure looked like this: it0:13:51
said there was a box-- and all the things we've been doing have the characteristic we have boxes and things going in and things going out-- there was this box called merge, and in came an x and y,0:14:04
and out came an answer. That's the character of the procedure that we wrote.0:14:13
These rules don't look like that. These rules talk about a relation. There's some sort of relation that in those slides I called0:14:23
mrege-to-form. So I said x and y merge to form z, and somehow this is a function.0:14:32
Right? The answer is a function of x and y, and here what I have is a relation between three things. And I'm not going to specify which is the input and which0:14:43
is the output. And the reason I want to say that is because in principle, we could use exactly those same logic rules to answer a lot of different questions.0:14:54
So we can say, for instance-- imagine giving our machine those rules of logic. Not the program, the underlying rules of logic. Then it ought to be able to say--0:15:04
we could ask it-- 1, 3, 7 and 2, 4, 8 merge to form what?0:15:20
And that's a question it ought to be able to answer. That's exactly the same question that our list procedure answered. But the exact same rules should also be able to answer0:15:33
a question like this: 1, 3, 7 and what merged to form 1, 2, 3, 4, 7, 8?0:15:45
The same rules of logic can answer this, although the procedure we wrote can't answer that question. Or we might be able to say what and what0:15:56
else merge to form--0:16:07
what and what else merge to form 1, 2, 3, 4, 7, 8? And the thing should be able to go through, if it really0:16:16
can apply that logic, and deduce all, whatever is, 2 to the sixth answers to that question.0:16:25
It could be 1 and the rest, or it could be 1, 2 and the rest. Or it could be 1 and 3 and 7 and the rest. There's a whole bunch of answers. And in principle, the logic should be0:16:36
enough to deduce that. So there are going to be two big differences in the kind of program we're going to look at and not only list, but0:16:48
essentially all the programming you've probably done so far in pretty much any language you can think of. The first is, we're not going to be computing functions.0:17:00
We're not going to be talking about things that take input and output. We're going to be talking about relations. And that means in principle, these relations don't have0:17:09
directionality. So the knowledge that you specify to answer this question, that same knowledge should also allow you to0:17:19
answer these other questions and conversely. And the second issue is that since we're talking about0:17:30
relations, these relations don't necessarily have one answer. So that third question down there doesn't have a particular answer, it has a whole bunch of answers.0:17:42
Well, that's where we're going. This style of programming, by the way, is called logic programming, for kind of obvious reasons.0:17:56
And people who do logic programming say that-- they have this little phrase-- they say the point of logic programming is that you use logic to express what is true,0:18:10
you use logic to check whether something is true, and you use logic to find out what is true.0:18:19
The best known logic programming language, as you probably know, is called Prolog. The language that we're going to implement this morning is0:18:31
something we call the query language, and it essentially has the essence of prologue. It can do about the same stuff, although it's a lot slower because we're going to implement it in LISP rather0:18:42
than building a particular compiler. We're going to interpret it on top of the LISP interpreter. But other than that, it can do about the same stuff as prolog. It has about the same power and about the same0:18:52
limitations. All right, let's break for question. STUDENT: Yes, could you please repeat what the three things0:19:04
you use logic programming to find? In other words, to find what is true, learn what is true-- what is the? PROFESSOR: Right. Sort of a logic programmer's little catechism.0:19:15
You use logic to express what is true, like these rules. You use logic to check whether something is true, and that's0:19:26
the kind of question I didn't answer here. I might say-- another question I could put down here is to say, is it true that 1, 3, 7 and 2, 4, 8 merge to form 1, 2, 6, 10 And0:19:41
that same logic should be enough to say no. So I use logic to check what is true, and then you also use logic to find out what's true.0:20:04
All right. Let's break. [MUSIC PLAYING BY J.S. BACH]0:20:22
[MUSIC ENDS]0:20:47
[MUSIC PLAYING BY J.S. BACH]0:21:02
PROFESSOR: OK, let's go ahead and take a look at this query language and operation. The first thing you might notice, when I put up that0:21:12
little biblical database, is that it's nice to be able to ask this language questions in relation to some collection of facts.0:21:21
So let's start off and make a little collection of facts. This is a tiny fragment of personnel records for a Boston0:21:31
high tech company, and here's a piece of the personnel records of Ben Bitdiddle. And Ben Bitdiddle is the computer wizard in this0:21:41
company, he's the underpaid computer wizard in this company. His supervisor is all Oliver Warbucks, and here's his address.0:21:52
So the format is we're giving this information: job, salary, supervisor, address. And there are some other conventions. Computer here means that Ben works in the computer0:22:01
division, and his position in the computer division is wizard. Here's somebody else. Alyssa, Alyssa P. Hacker is a computer programmer, and she0:22:13
works for Ben, and she lives in Cambridge. And there's another programmer who works for Ben who's Lem E. Tweakit.0:22:22
And there's a programmer trainee, who is Louis Reasoner, and he works for Alyssa. And the big wheel of the company doesn't work for0:22:34
anybody, right? That's Oliver Warbucks. Anyway, what we're going to do is ask questions about that0:22:43
little world. And that'll be a sample world that we're going to do logic in. Let me just write up here, for probably the last time, what I0:22:55
said is the very most important thing you should get out of this course, and that is, when somebody tells you about a language, you say, fine-- what are the primitives, what are the means of combination,0:23:15
how do you put the primitives together, and then how do you abstract them, how do you abstract the compound pieces0:23:24
so you can use them as pieces to make something more complicated? And we've said this a whole bunch of times already, but it's worth saying again.0:23:36
Let's start. The primitives. Well, there's really only one primitive, and the primitive in this language is called a query. A primitive query.0:23:46
Let's look at some primitive queries. Job x. Who is a computer programmer?0:23:55
Or find every fact in the database that matches job of0:24:04
the x is computer programmer. And you see a little syntax here. Things without question marks are meant to be literal, question mark x means that's a variable, and this thing will0:24:13
match, for example, the fact that Alyssa P. Hacker is a computer programmer, or x is Alyssa P. Hacker.0:24:26
Or more generally, I could have something with two variables in it. I could say, the job of x is computer something, and0:24:39
that'll match computer wizard. So there's something here: type will match wizard, or type will match programmer, or x might match0:24:49
various certain things. So there are, in our little example, only three facts in that database that match that query.0:24:59
Let's see, just to show you some syntax, the same query, this query doesn't match the job of x, doesn't match Lewis0:25:11
Reasoner, the reason for that is when I write something here, what I mean is that this is going to be a list of two symbols, of which the first is the word computer, and the0:25:22
second can be anything. And Lewis's job description here has three symbols, so it doesn't match. And just to show you a little bit of syntax, the more0:25:35
general thing I might want to type is a thing with a dot here, and this is just standard this notation for saying, this is a list, of which the first element is the0:25:46
word computers, and THE REST, is something that I'll call type. So this one would match.0:25:56
Lewis's job is computer programmer trainee, and type here would be the cdr of this list. It would be the list programmer trainee.0:26:06
And that kind of dot processing is done automatically by the LISP reader.0:26:15
Well, let's actually try this. The idea is I'm going to type in queries in this language, and answers will come out. Let's look at this.0:26:25
I can go up and say, who works in the computer division? Job of x is computer dot y.0:26:39
Doesn't matter what I call the dummy variables. It says the answers to that, and it's found four answers.0:26:48
Or I can go off and say, tell me about everybody's supervisor. So I'll put in the query, the primitive query, the supervisor of x is y.0:27:02
There are all the supervisor relationships I know. Or I could go type in, who lives in Cambridge? So I can say, the address of x is Cambridge dot anything.0:27:25
And only one person lives in Cambridge. OK, so those are primitive queries. And you see what happens to basic interaction with the0:27:34
system is you type in a query, and it types out all possible answers. Or another way to say that: it finds out all the possible0:27:43
values of those variables x and y or t or whatever I've called them, and it types out all ways of taking that query and instantiating it--0:27:53
remember that from the rule system lecture-- instantiates the query with all possible values for those variables and then types out all of them. And there are a lot of ways you can0:28:02
arrange a logic language. Prolog, for instance, does something slightly different. Rather than typing back your query, prolog would type out, x equals this and y equals that, or x sequels this and y0:28:12
equals that. And that's a very surface level thing, you can decide what you like. OK. All right.0:28:21
So the primitives in this language? Only one, right? Primitive query.0:28:31
OK. Means of combination. Let's look at some compound queries in this language. Here's one.0:28:41
This one says, tell me all the people who work in the computer division. Tell me all the people who work in the computer division0:28:52
together with their supervisors. The way I write that is the query is and. And the job of the x is computer something or other.0:29:04
And job of x is computer dot y. And the supervisor of x is z. Tell me all the people in the computer division--0:29:13
that's this-- together with their supervisors. And notice in this query I have three variables-- x, y, and z.0:29:23
And this x is supposed to be the same as that x. So x works in the computer division, and the supervisor of x is z.0:29:34
Let's try another one. So one means of combination is and. Who are all the people who make more than $30,000?0:29:45
And the salary of some person p is some amount a.0:29:54
And when I go and look at a, a is greater than $30,000. And LISP value here is a little piece of interface that0:30:06
interfaces the query language to the underlying LISP. And what the LISP value allows you to do is call any LISP predicate inside a query.0:30:17
So here I'm using the LISP predicate greater than, so I say LISP value. This I say and. So all the people whose salary is greater than $30,000.0:30:28
Or here's a more complicated one. Tell me all the people who work in the computer division who do not have a supervisor who works in0:30:38
the computer division. and x works in the computer division. The job of x is computer dot y.0:30:47
And it's not the case that both x has a supervisor z and the job of z is computer something or other.0:30:59
All right, so again, this x has got to be that x, and this z is going to be that z.0:31:09
And then you see another means a combination, not. All right, well, let's look at that.0:31:20
It works the same way. I can go up to the machine and say and the job of the x is0:31:33
computer dot y. And the supervisor of x is z.0:31:46
And I typed that in like a query. And what it types back, what you see are the queries I0:31:55
typed in instantiated by all possible answers. And then you see there are a lot of answers. All right. So the means of combination in this language--0:32:05
and this is why it's called a logic language-- are logical operations. Means of combinations are things like AND and NOT and0:32:16
there's one I didn't show you, which is OR. And then I showed you LISP value, which is not logic, of course, but is a little special hack to interface that0:32:26
to LISP so you can get more power. Those are the means of combination. OK, the means of abstraction. What we'd like to do--0:32:38
let's go back for second and look at that last slide. We might like to take very complicated thing, the idea that someone works in a division but does not have a0:32:48
supervisor in the division. And as before, name that. Well, if someone works in a division and does not have a0:32:58
supervisor who works in that division, that means that person is a big shot. So let's make a rule that somebody x is a big shot in0:33:08
some department if x works in the department and it's not the case that x has a supervisor who works in the0:33:19
department. So this is our means of abstraction. This is a rule. And a rule has three parts.0:33:30
The thing that says it's a rule. And then there's the conclusion of the rule. And then there's the body of the rule.0:33:40
And you can read this as a piece of logic which says, if you know that the body of the rule is true, then you can conclude that the conclusion is true.0:33:49
Or in order to deduce that x is a big shot in some department, it's enough to verify that. So that's what rules look like.0:34:03
Let's go back and look at that merge example that I did before the break. Let's look at how that would look in terms of rules. I'm going to take the logic I put up and just change it into0:34:14
a bunch of rules in this format. We have a rule. Remember, there was this thing merge-to-form. There is a rule that says, the empty list and y0:34:28
merge to form y. This is the rule conclusion. And notice this particular rule has no body. And in this language, a rule with no body is something that0:34:40
is always true. You can always assume that's true. And there was another piece of logic that said anything in the empty list merged to form the anything.0:34:49
That's this. A rule y and the empty list merge to form y. Those corresponded to the two end cases in our merge0:34:58
procedure, but now we're talking about logic, not about procedures. Then we had another rule, which said if you know how0:35:07
shorter things merge, you can put them together. So this says, if you have a list x and y and z, and if you want to deduce that a dot x-- this means constant a onto x,0:35:19
or a list whose first thing is a and whose rest is x-- so if you want to deduce that a dot x and b dot y merge to form b dot c--0:35:30
that would say you merge these two lists a x and b y and you're going to get something that starts with b-- you can deduce that if you know that it's the case both0:35:41
that a dot x and y merge to form z and a is larger than b. So when I merge them, b will come first in the list. That's0:35:52
a little translation of the logic rule that I wrote in pseudo-English before. And then just for completeness, here's the other case.0:36:03
a dot x and b dot y merge to form a dot z if x and b dot y merged to form z and b is larger than a.0:36:12
So that's a little program that I've typed in in this language, and now let's look at it run.0:36:21
So I typed in the merge rules before, and I could use this like a procedure. I could say merge to form 1 and 3 and 2 and 7.0:36:39
So here I'm using it like the LISP procedure. Now it's going to think about that for a while and apply these rules.0:36:50
So it found an answer. Now it's going to see if there are any other answers but it doesn't know a priori there's only one answer. So it's sitting here checking all possibilities, and it0:37:00
says, no more. Done. So there I've used those rules like a procedure. Or remember the whole point is that I can ask different kinds of questions.0:37:10
I could say merge to form, let's see, how about 2 and a.0:37:24
Some list of two elements which I know starts with 2, and the other thing I don't know, and x and some other0:37:34
list merge to form a 1, 2, 3 and 4. So now it's going to think about that.0:37:44
It's got to find-- so it found one possibility. It said a could be 3, and x could be the list 1, 4.0:37:53
And now, again, it's got to check because it doesn't a priori know that there aren't any other possibilities going on.0:38:03
Or like I said, I could say something like merge to form, like, what and what else merge to form 1, 2, 3, 4, 5?0:38:24
Now it's going to think about that. And there are a lot of answers that it might get.0:38:35
And what you see is here you're really paying the price of slowness. And kind of for three reasons. One is that this language is doubly interpreted.0:38:47
Whereas in a real implementation, you would go compile this down to primitive operations. The other reason is that this particular algorithm for0:38:56
merges is doubly recursive. So it's going to take a very long time. And eventually, this is going to go through and find--0:39:06
find what? Two to the fifth possible answers. And you see they come out in some fairly arbitrary order, depending on which order it's going to be0:39:17
trying these rules. In fact, what we're going to do when they edit the videotape is speed all this up. Don't you like taking out these weights?0:39:26
And don't you wish you could do that in your demos? Anyway, it's still grinding there.0:39:39
Anyway, there are 32 possibilities-- we won't wait for it to print out all of them. OK, so the needs of abstraction in this0:39:49
language are rules. So we take some bunch of things that are put together with logic and we name them.0:40:00
And you can think of that as naming a particular pattern of logic. Or you can think of that as saying, if you want to deduce some conclusion, you can apply those rules of logic.0:40:10
And those are three elements of this language. Let's break now, and then we'll talk about how it's actually implemented.0:40:22
STUDENT: Does using LISP value primitive or whatever interfere with your means to go both directions on a query?0:40:31
PROFESSOR: OK, that's a-- the question is, does using LISP value interfere with the ability to go both directions on the query?0:40:40
We haven't really talked about the implementation yet, but the answer is, yes, it can. In general, as we'll see at the end--0:40:50
although I really won't to go into details-- it's fairly complicated, especially when you use either not or LISP value--0:40:59
or actually, if you use anything besides only and, it becomes very complicated to say when these things will work.0:41:08
They won't work quite in all situations. I'll talk about that at the end of the second half today. But the answer to your question is, yes, by dragging0:41:17
in a lot more power from LISP value, you lose some of the principal power of logic programming. That's a trade-off that you have to make.0:41:28
OK, let's take a break.0:00:00
Lecture 8B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:18
PROFESSOR: All right, well, we've seen how the query language works. Now, let's talk about how it's implemented. You already pretty much can guess what's going on there.0:00:29
At the bottom of it, there's a pattern matcher. And we looked at a pattern matcher when we did the rule-based control language.0:00:38
Just to remind you, here are some sample patterns. This is a pattern that will match any list of three things of which the first is a and the second is c and the middle0:00:48
one can be anything. So in this little pattern-matching syntax, there's only one distinction you make. There's either literal things or variables, and variables0:00:57
begin with question mark. So this matches any list of three things of which the first is a and the second is c.0:01:06
This one matches any list of three things of which the first is the symbol job. The second can be anything. And the third is a list of two things of which the first is0:01:16
the symbol computer and the second can be anything. And this one, this next one matches any list of three0:01:25
things, and the only difference is, here, the third list, the first is the symbol computer, and then there's some rest of the list. So this means two elements and this0:01:36
means arbitrary number. And our language implementation isn't even going to have to worry about implementing this dot because that's automatically done by Lisp's reader.0:01:48
Remember matchers also have some consistency in them. This match is a list of three things of which the first is a. And the second and third can be anything, but they have to be the same thing.0:01:57
They're both called x. And this matches a list of four things of which the first is the fourth and the second is the same as the third. And this last one matches any list that begins with a.0:02:09
The first thing is a, and the rest can be anything. So that's just a review of pattern matcher syntax that you've already seen.0:02:18
And remember, that's implemented by some procedure called match. And match takes a pattern and some data and a dictionary.0:02:43
And match asks the question is there any way to match this pattern against this data object subject to the bindings0:02:55
that are already in this dictionary? So, for instance, if we're going to match the pattern x, y, y, x against the data a, b, b, a subject to a dictionary,0:03:18
that says x equals a. Then the matcher would say, yes, that's consistent. These match, and it's consistent with what's in the0:03:28
dictionary to say that x equals a. And the result of the match is the extended dictionary that says x equals a and y equals b.0:03:39
So a matcher takes in pattern data dictionary, puts out an extended dictionary if it matches, or if it doesn't match, says that it fails. So, for example, if I use the same pattern here, if I say0:03:51
this x, y, y, x match a, b, b, a with the dictionary y equals0:04:02
a, then the matcher would put out fail.0:04:12
Well, you've already seen the code for a pattern matcher so I'm not going to go over it, but it's the same thing we've been doing before.0:04:21
You saw that in the system on rule-based control. It's essentially the same matcher. In fact, I think the syntax is a little bit simpler because we're not worrying about arbitrary constants and0:04:30
expressions and things. There's just variables and constants. OK, well, given that, what's a primitive query?0:04:42
Primitive query is going to be a rather complicated thing. It's going to be-- let's think about the query job of x is d dot y.0:05:06
That's a query we might type in. That's going to be implemented in the system. We'll think of it as this little box.0:05:15
Here's the primitive query. What this little box is going to do is take in two streams0:05:32
and put out a stream. So the shape of a primitive query is that it's a thing where two streams come in and one stream goes out.0:05:41
What these streams are going to be is down here is the database.0:05:51
So we imagine all the things in the database sort of sitting there in a stream and this thing sucks on them.0:06:00
So what are some things that might be in the database? Oh, job of Alyssa is something and some0:06:22
other job is something. So imagine all of the facts in the database sitting there in the stream.0:06:32
That's what comes in here. What comes in here is a stream of dictionaries. So one particular dictionary might say y equals programmer.0:06:55
Now, what the query does when it gets in a dictionary from this stream, it finds all possible ways of matching the0:07:06
query against whatever is coming in from the database. It looks at the query as a pattern, matches it against0:07:15
any fact from the database or all possible ways of finding and matching the database with respect to this dictionary0:07:24
that's coming in. So for each fact in the database, it calls the matcher using the pattern, fact, and dictionary.0:07:35
And every time it gets a good match, it puts out the extended dictionary. So, for example, if this one comes in and it finds a match,0:07:44
out will come a dictionary that in this case will have y equals programmer and x equals something.0:07:56
y is programmer, x is something, and d is whatever it found. And that's all. And, of course, it's going to try this for every fact in the0:08:07
dictionary. So it might find lots of them. It might find another one that says y equals programmer and x equals, and d equals.0:08:20
So for one frame coming in, it might put out-- for one dictionary coming in, it might put out a lot of dictionaries, or it might put out none.0:08:30
It might have something that wouldn't match like x equals FOO.0:08:39
This one might not match anything in which case nothing will go into this stream corresponding to this frame. Or what you might do is put in an empty frame, and an empty0:08:53
frame says try matching all ways-- find all possible ways of matching the query against0:09:02
something in the database subject to no previous restrictions. And if you think about what that means, that's just the computation that's done when you type in a query right off.0:09:13
It tries to find all matches. So a primitive query sets up this mechanism. And what the language does, when you type in the query at0:09:23
the top level, it takes this mechanism, feeds in one single empty dictionary, and then for each thing that comes out0:09:33
takes the original query and instantiates the result with all the different dictionaries, producing a new stream of instantiated patterns here.0:09:44
And that's what gets printed on the terminal. That's the basic mechanism going on there.0:09:53
Well, why is that so complicated? You probably can think of a lot simpler ways to arrange this match for a primitive query rather than having all0:10:03
of these streams floating around. And the answer is-- you probably guess already. The answer is this thing extends elegantly to implement0:10:15
the means of combination. So, for instance, suppose I don't only want to do this. I don't want to say who to be everybody's job description.0:10:27
Suppose I want to say AND the job of x is d dot y and the0:10:39
supervisor of x is z.0:10:48
Now, supervisor of x is z is going to be another primitive query that has the same shape to take in a stream of data0:10:57
objects, a stream of initial dictionaries, which are the restrictions to try and use when you match, and it's going to put out a stream of dictionaries.0:11:08
So that's what this primitive query looks like. And how do I implement the AND? Well, it's simple. I just hook them together. I take the output of this one, and I put that to the0:11:17
input of that one. And I take the dictionary here and I fan it out.0:11:26
And then you see how that's going to work, because what's going to happen is a frame will now come in here, which has a binding for x, y, and d.0:11:37
And then when this one gets it, it'll say, oh, gee, subject to these restrictions, which now already have values in the dictionary for y and x and d, it looks in the0:11:52
database and says, gee, can I find any supervisor facts? And if it finds any, out will come dictionaries which have bindings for y and x and d and z now.0:12:12
And then notice that because the frames coming in here have these restrictions, that's the thing that assures that when you do the AND, this x will mean the same thing as that x.0:12:26
Because by the time something comes floating in here, x has a value that you have to match against consistently. And then you remember from the code from the matcher, there0:12:36
was something in the way the matcher did dictionaries that arrange consistent matches. So there's AND. The important point to notice is the general shape.0:12:48
Look at what happened: the AND of two queries, say, P and Q. Here's P and Q. The AND of two queries, well,0:13:00
it looks like this. Each query takes in a stream from the database, a stream of inputs, and puts out a stream of outputs.0:13:10
And the important point to notice is that if I draw a box around this thing and say this is AND of P and Q, then that0:13:26
box has exactly the same overall shape. It's something that takes in a stream from the database. Here it's going to get fanned out inside, but from the0:13:37
outside you don't see that. It takes an input stream and puts out an output stream. So this is AND. And then similarly, OR would look like this.0:13:46
OR would-- although I didn't show you examples of OR. OR would say can I find all ways of matching P or Q. So I0:13:55
have P and Q. Each will have their shape.0:14:04
And the way OR is implemented is I'll take my database stream. I'll fan it out.0:14:13
I'll put one into P and one into Q. I'll take my initial query stream coming in and fan it out.0:14:26
So I'll look at all the answers I might get from P and all the answers I might get from Q, and I'll put them through some sort of thing that appends them or merges0:14:35
the result into one stream, and that's what will come out. And this whole thing from the outside is OR.0:14:52
And again, you see it has the same overall shape when looked at from the outside.0:15:01
What's NOT? NOT works kind of the same way. If I have some query P, I take the primitive query for P.0:15:14
Here, I'm going to implement NOT P. And NOT's just going to act as a filter. I'll take in the database and my original stream of0:15:27
dictionaries coming in, and what NOT P will do is it will filter these guys.0:15:39
And the way it will filter it, it will say when I get in a dictionary here, I'll find all the matches, and if I find any, I'll throw it away. And if I don't find any matches to something coming in0:15:49
here, I'll just pass that through, so NOT is a pure filter. So AND is-- think of these sort of electoral0:15:59
resistors or something. AND is series combination and OR is parallel combination. And then NOT is not going to extend any dictionaries at all. It's just going to filter it.0:16:08
It's going to throw away the ones for which it finds a way to match. And list value is sort of the same way. The filter's a little more complicated. It applies to predicate.0:16:19
The major point to notice here, and it's a major point we've looked at before, is this idea of closure.0:16:28
The things that we build as a means of combination have the same overall structure as the primitive things that we're combining.0:16:39
So the AND of two things when looked at from the outside has the same shape. And what that means is that this box here could be an AND0:16:48
or an OR or a NOT or something because it has the same shape to interface to the larger things. It's the same thing that allowed us to get complexity0:16:57
in the Escher picture language or allows you to immediately build up these complicated structures just out of pairs. It's closure.0:17:06
And that's the thing that allowed me to do what by now you took for granted when I said, gee, there's a query which is AND of job and salary, and I said, oh,0:17:15
there's another one, which is AND of job, a NOT of something. The fact that I can do that is a direct consequence of this closure principle.0:17:25
OK, let's break and then we'll go on. AUDIENCE: Where does the dictionary come from? PROFESSOR: The dictionary comes initially from0:17:35
what you type in. So when you start this up, the first thing it does is set up this whole structure. It puts in one empty dictionary.0:17:45
And if all you have is one primitive query, then what will come out is a bunch of dictionaries with things filled in. The general situation that I have here is when this is in0:17:55
the middle of some nest of combined things. Let's look at the picture over here. This supervisor query gets in some dictionary.0:18:06
Where did this one come from? This dictionary came from the fact that I'm looking at the output of this primitive query.0:18:16
So maybe to be very specific, if I literally typed in just this query at the top level, this AND, what would actually happen is it would build this structure and start up this0:18:26
whole thing with one empty dictionary. And now this one would process, and a whole bunch of dictionaries would come out with x, y's and d's in them.0:18:38
Run it through this one. So now that's the input to this one. This one would now put out some other stuff. And if this itself were buried in some larger thing, like an0:18:50
OR of something, then that would go feed into the next one. So you initially get only one empty dictionary when you0:19:00
start it, but as you're in the middle of processing these compounds things, that's where these cascades of dictionaries start getting generated. AUDIENCE: Dictionaries only come about as a result of0:19:11
using the queries? Or do they become-- do they stay someplace in space like the database does?0:19:23
Are these temporary items? PROFESSOR: They're created temporarily in the matcher. Really, they're someplace in storage. Initially, someone creates a thing called the empty0:19:32
dictionary that gets initially fed to this match procedure, and then the match procedure builds some dictionaries, and they get passed on and on. AUDIENCE: OK, so they'll go way after the match?0:19:43
PROFESSOR: They'll go away when no one needs them again, yeah. AUDIENCE: It appears that the AND performs some redundant0:19:54
searches of the database. If the first clause matched, let's say, the third element and not on the first two elements, the second clause is going to look at those first two elements again, discarding0:20:04
them because they don't match. The match is already in the dictionary. Would it makes sense to carry the data element from the database along with the dictionary?0:20:17
PROFESSOR: Well, in general, there are other ways to arrange this search, and there's some analysis that you can do. I think there's a problem in the book, which talks about a different way that you can cascade AND to eliminate0:20:27
various kinds of redundancies. This one is meant to be-- was mainly meant to be very simple so you can see how they fit together. But you're quite right. There are redundancies here that you can get rid of.0:20:38
That's another reason why this language is somewhat slow. There are a lot smarter things you can do. We're just trying to show you a very simple, in principle, implementation.0:20:51
AUDIENCE: Did you model this language on Prolog, or did it just come out looking like Prolog?0:21:04
PROFESSOR: Well, Jerry insulted a whole bunch of people yesterday, so I might as well say that the MIT attitude towards Prolog is something that people did in about 1971 and decided that it wasn't really the right thing0:21:15
and stopped. So we modeled this on the sort of natural way that this thing was done in about 1971, except at that point, we didn't do it0:21:26
with streams. After we were using it for about six months, we discovered that it had all these problems, some of which0:21:35
I'll talk about later. And we said, gee, Prolog must have fixed those, and then we found out that it didn't. So this does about the same thing as Prolog. AUDIENCE: Does Prolog use streams?0:21:44
PROFESSOR: No. In how it behaves, it behaves a lot like Prolog. Prolog uses a backtracking strategy.0:21:53
But the other thing that's really good about Prolog that makes it a usable thing is that there's a really very, very well-engineered compiler technology that makes it run0:22:04
fast. So although you saw the merge spitting out these answers very, very slowly, a real Prolog will run very,0:22:13
very fast. Because even though it's sort of doing this, the real work that went into Prolog is a very, very excellent compiler effort.0:22:24
Let's take a break.0:23:16
We've looked at the primitive queries and the ways that streams are used to implement the means of combination: AND and OR and NOT.0:23:26
Now, let go on to the means of abstraction. Remember, the means of abstraction in this language are rules.0:23:35
So z is a boss in division d if there's some x who has a job in division d and z is the supervisor of x.0:23:48
That's what it means for someone to be a boss. And in effect, if you think about what we're doing with relation to this, there's the query we wrote-- the job of x0:23:58
is in d and the supervisor of x is z-- what we in effect want to do is take this whole mess and draw a box around it and say this whole thing inside the0:24:24
box is boss of z in division d.0:24:33
That's in effect what we want to do. So, for instance, if we've done that, and we want to0:24:45
check whether or not it's true that Ben Bitdiddle is a boss in the computer division, so if I want to say boss of Ben0:25:00
Bitdiddle in the computer division, imagine typing that in as query to the system, in effect what we want to do is0:25:10
set up a dictionary here, which has z to Ben Bitdiddle0:25:28
and d to computer.0:25:37
Where did that dictionary come from? Let's look at the slide for one second. That dictionary came from matching the query that said boss of Ben Bitdiddle and computer onto the conclusion0:25:47
of the rule: boss of z and d. So we match the query to the conclusion of the rule. That gives us a dictionary, and that's the thing that we0:26:00
would now like to put into this whole big thing and process and see if anything comes out the other side. If anything comes out, it'll be true.0:26:11
That's the basic idea. So in general, the way we implement a rule is we match the conclusion of the rule against something we might0:26:21
want to check it's true. That match gives us a dictionary, and with respect to that dictionary, we process the body of the rule.0:26:36
Well, that's really all there is, except for two technical points. The first technical point is that I might have said0:26:46
something else. I might have said who's the boss in the computer division? So I might say boss of who in computer division.0:27:00
And if I did that, what I would really like to do in effect is start up this dictionary with a match that0:27:09
sort of says, well, d is computer and z is whatever who is.0:27:21
And our matcher won't quite do that. That's not quite matching a pattern against data. It's matching two patterns and saying are they consistent or0:27:31
not or what ways make them consistent. In other words, what we need is not quite a pattern matcher, but something a little bit more general called a unifier.0:27:44
And a unifier is a slight generalization of a pattern matcher. What a unifier does is take two patterns and say what's0:27:55
the most general thing you can substitute for the variables in those two patterns to make them satisfy the pattern0:28:04
simultaneously? Let me give you an example. If I have the pattern two-element list, which is x0:28:13
and x, so I have a two-element list where both elements are the same and otherwise I don't care what they are, and I unify that against the pattern that says there's a0:28:23
two-element list, and the first one is a and something in c and the second one is a and b and z, then what the0:28:33
unifier should tell me is, oh yeah, in that dictionary, x has to be a, b, c, and y has to be d and z has to be c.0:28:43
Those are the restrictions I'd have to put on the values of x, y, and z to make these two unify, or in other words, to make this match x and make this match x.0:28:55
The unifier should be able to deduce that. But the unifier may-- there are more complicated things. I might have said something a little bit more complicated. I might have said there's a list with two elements, and0:29:07
they're both the same, and they should unify against something of this form. And the unifier should be able to deduce from that.0:29:16
Like that y would have to be b. y would have to be b. Because these two are the same, so y's got to be b. And v here would have to be a.0:29:28
And z and w can be anything, but they have to be the same thing. And x would have to be b, followed by a, followed by0:29:40
whatever w is or whatever z is, which is the same. So you see, the unifier somehow has to deduce things to unify these patterns.0:29:50
So you might think there's some kind of magic deduction going on, but there's not. A unifier is basically a very simple modification of a0:29:59
pattern matcher. And if you look in the book, you'll see something like three or four lines of code added to the pattern matcher you just saw to handle the symmetric case.0:30:08
Remember, the pattern matcher has a place where it says is this variable matching a constant. And if so, it checks in the dictionary. There's only one other clause in the unifier, which says is0:30:18
this variable matching a variable, in which case you go look in the dictionary and see if that's consistent with what's in the dictionary.0:30:27
So all the, quote, deduction that's in this language, if you sort of look at it, sort of sits in the rule applications, which, if you look at that, sits in the0:30:37
unifier, which, if you look at that under a microscope, sits essentially in the pattern matcher. There's no magic at all going on in there.0:30:47
And the, quote, deduction that you see is just the fact that there's this recursion, which is unwinding the matches bit by bit.0:30:56
So it looks like this thing is being very clever, but in fact, it's not being very clever at all. There are cases where a unifier might have to be clever. Let me show you one more.0:31:11
Suppose I want to unify a list of two elements, x and x, with a thing that says it's y followed by a dot y.0:31:24
Now, if you think of what that would have to mean, it would have to mean that x had better be the same as y, but also x had better be the same as a list whose first element is a0:31:35
and whose rest is y. And if you think about what that would have to mean, it would have to mean that y is the infinite list of a's.0:31:47
In some sense, in order to do that unification, I have to solve the fixed-point equation cons of a to y is equal to y.0:32:04
And in general, I wrote a very simple one. Really doing unification might have to solve an arbitrary fixed-point equation: f of y equals y.0:32:15
And basically, you can't do that and make the thing finite all the time. So how does the logic language handle that?0:32:25
The answer is it doesn't. It just punts. And there's a little check in the unifier, which says, oh, is this one of the hard cases which when I go to match0:32:35
things would involve solving a fixed-point equation? And in this case, I will throw up my hands. And if that check were not in there, what would happen?0:32:47
In most cases is that the unifier would just go into an infinite loop. And other logic programming languages work like that.0:32:56
So there's really no magic. The easy case is done in a matcher. The hard case is not done at all. And that's about the state of this technology.0:33:12
Let me just say again formally how rules work now that I talked about unifiers. So the official definition is that to apply a rule, we--0:33:25
well, let's start using some words we've used before. Let's talk about sticking dictionaries into these big boxes of query things as evaluating these large queries0:33:40
relative to an environment or a frame. So when you think of that dictionary, what's the dictionary after all? It's a bunch of meanings for symbols. That's what we've been calling frames or environments.0:33:51
What does it mean to do some processing relevant to an environment? That's what we've been calling evaluation. So we can say the way that you apply a rule is to evaluate0:34:03
the rule body relative to an environment that's formed by unifying the rule conclusion with the given query.0:34:13
And the thing I want you to notice is the complete formal similarity to the net of circular evaluator or the substitution model. To apply a procedure, we evaluate the procedure body0:34:27
relative to an environment that's formed by blinding the procedure parameters to the arguments. There's a complete formal similarity here between the0:34:36
rules, rule application, and procedure application even though these things are very, very different. And again, you have the EVAL APPLY loop.0:34:47
EVAL and APPLY. So in general, I might be processing some combined0:34:57
expression that will turn into a rule application, which will generate some dictionaries or frames or environments-- whatever you want to call them-- from match, which will then be the input to some big compound thing like this.0:35:08
This has pieces of it and may have other rule applications. And you have essentially the same cycle even though there's nothing here at all that looks like procedures.0:35:19
It really has to do with the fact you've built a language whose means of combination and abstraction unwind in certain ways.0:35:28
And then in general, what happens at the very top level, you might have rules in your database also, so things in0:35:37
this database might be rules. There are ways to check that things are true. So it might come in here and have to do a rule check.0:35:46
And then there's some control structure which says, well, you look at some rules, and you look at some data elements, and you look at some rules and data elements, and these fan out and out and out. So it becomes essentially impossible to say what order0:35:56
it's looking at these things in, whether it's breadth first or depth first or anything. And it's even more impossible because the actual order is somehow buried in the delays of the streams. So what's very0:36:08
hard to tell from this is the order in which it's scanned. But what's true, because you're looking at the stream view, is that all of them eventually get looked at.0:36:24
Let me just mention one tiny technical problem.0:36:37
Suppose I tried saying boss of y is computer, then a funny thing would happen. As I stuck a dictionary with y in here, I might get--0:36:53
this y is not the same as that y, which was the other piece of somebody's job description. So if I really only did literally what I said, we'd0:37:04
get some variable conflict problems. So I lied to you a little bit. Notice that problem is exactly a problem we've run into before.0:37:14
It is precisely the need for local variables in a language. When I have the sum of squares, that x had better not be that x.0:37:24
That's exactly the same as this y had better not be that y. And we know how to solve that.0:37:33
That was this whole environment model, and we built chains of frames and all sorts of things like that. There's a much more brutal way to solve it. In the query language, we didn't even do that. We did something completely brutal.0:37:43
We said every time you apply a rule, rename consistently all the variables in the rule to some new unique names that won't conflict with anything.0:37:55
That's conceptually simpler, but really brutal and not particularly efficient. But notice, we could have gotten rid of all of our environment structures if we defined for procedures in Lisp0:38:08
the same thing. If every time we applied a procedure and did the substitution model we renamed all the variables in the procedure, then we never would have had to worry about local variables because they would never arise.0:38:19
OK, well, that would be inefficient, and it's inefficient here in the query language, too, but we did it to keep it simple. Let's break for questions.0:38:30
AUDIENCE: When you started this section, you emphasized how powerful our APPLY EVAL model was that we could use it0:38:40
for any language. And then you say we're going to have this language which is so different. It turns out that this language, as you just pointed out, is very much the same. I'm wondering if you're arguing that all languages end0:38:49
up coming down to this you can apply a rule or apply a procedure or some kind of apply? PROFESSOR: I would say that pretty much any language where0:38:59
you really are building up these means of combination and giving them simpler names and you're saying anything of the sort, like here's a general kind of expression, like how0:39:10
to square something, almost anything that you would call a procedure. If that's got to have parts, you have to unwind those parts. You have to have some kind of organization which says when I0:39:20
look at the abstract variables or tags or whatever you want to call them that might stand for particular things, you have to keep track of that, and that's going to be0:39:29
something like an environment. And then if you say this part can have parts which I have to unwind, you've got to have something like this cycle.0:39:39
And lots and lots of languages have that character when they sort of get put together in this way. This language again really is different because there's nothing like procedures on the outside.0:39:50
When you go below the surface and you see the implementation, of course, it starts looking the same. But from the outside, it's a very different world view. You're not computing functions of inputs.0:40:03
AUDIENCE: You mentioned earlier that when you build all of these rules in pattern matcher and with the delayed action of streams, you really have no way to know in what0:40:13
order things are evaluated. PROFESSOR: Right. AUDIENCE: And that would indicate then that you should only express declarative knowledge that's true for all-time, no-time sequence built into it.0:40:23
Otherwise, these things get all-- PROFESSOR: Yes. Yes. The question is this really is set up for doing declarative0:40:32
knowledge, and as I presented it-- and I'll show you some of the ugly warts under this after the break. As I presented it, it's just doing logic.0:40:43
And in principle, if it were logic, it wouldn't matter what order it's getting done. And it's quite true when you start doing things where you0:40:52
have side effects like adding things to the database and taking things out, and we'll see some others, you use that kind of control.0:41:01
So, for example, contrasting with Prolog. Say Prolog has various features where you really exploit the order of evaluation. And people write Prolog programs that way.0:41:11
That turns out to be very complicated in Prolog, although if you're an expert Prolog programmer, you can do it. However, here I don't think you can do it at all.0:41:20
It's very complicated because you really are giving up control over any prearranged order of trying things. AUDIENCE: Now, that would indicate then that you have a0:41:29
functional mapping. And when you started out this lecture, you said that we express the declarative knowledge which is a relation, and we don't talk about the inputs and the outputs.0:41:41
PROFESSOR: Well, there's a pun on functional, right? There's function in the sense of no side effects and not depending on what order is going on. And then there's functional in the sense of mathematical0:41:50
function, which means input and output. And it's just that pun that you're making, I think. AUDIENCE: I'm a little unclear on what you're doing with these two statements, the two boss statements.0:42:01
Is the first one building up the database and the second one a query or-- PROFESSOR: OK, I'm sorry.0:42:12
What I meant here, if I type something like this in as a query-- I should have given an example way at the very beginning. If I type in job, Ben Bitdiddle, computer wizard,0:42:25
what the processing will do is if it finds a match, it'll find a match to that exact thing, and it'll type out a job, Ben Bitdiddle, computer wizard.0:42:34
If it doesn't find a match, it won't find anything. So what I should have said is the way you use the query language to check whether something is true, remember,0:42:43
that's one of the things you want to do in logic programming, is you type in your query and either that comes out or it doesn't. So what I was trying to illustrate here, I wanted to0:42:52
start with a very simple example before talking about unifiers. So what I should have said, if I just wanted to check whether this is true, I could type that in and see if anything0:43:02
came out AUDIENCE: And then the second one-- PROFESSOR: The second one would be a real query. AUDIENCE: A real query, yeah. PROFESSOR: What would come out, see, it would go in here0:43:12
say with FOO, and in would go frame that says z is bound to who and d is bound to computer. And this will pass through, and then by the time it got0:43:21
out of here, who would pick up a binding. AUDIENCE: On the unifying thing there, I still am not0:43:31
sure what happens with who and z. If the unifying-- the rule here says--0:43:42
OK, so you say that you can't make question mark equal to question mark who. PROFESSOR: Right. That's what the matcher can't do. But what this will mean to a unifier is that there's an0:43:52
environment with three variables. d here is computer. z is whatever who is.0:44:01
So if later on in the matcher routine it said, for example, who has to be 3, then when I looked up in the dictionary,0:44:14
it will say, oh, z is 3 because it's the same as who. And that's in some sense the only thing you need to do to extend the unifier to a matcher. AUDIENCE: OK, because it looked like when you were0:44:23
telling how to unify it, it looked like you would put the things together in such a way that you'd actually solve and have a value for both of them. And what it looks like now is that you're actually pass a0:44:32
dictionary with two variables and the variables are linked. PROFESSOR: Right. It only looks like you're solving for both of them because you're sort of looking at the whole solution at once. If you sort of watch the thing getting built up recursively,0:44:42
it's merely this. AUDIENCE: OK, so you do pass off that dictionary with two variables? PROFESSOR: That's right. AUDIENCE: And link? PROFESSOR: Right. It just looks like an ordinary dictionary.0:44:54
AUDIENCE: When you're talking about the unifier, is it that there are some cases or some points that you are not able to use by them?0:45:04
PROFESSOR: Right. AUDIENCE: Can you just by building the rules or writing the forms know in advance if you are going to be able to0:45:15
solve to get the unification or not? Can you add some properties either to the rules itself or to the formula that you're writing so that you avoid the0:45:26
problem of not finding unification? PROFESSOR: I mean, you can agree, I think, to write in a fairly restricted way where you won't run into it.0:45:35
See, because what you're getting-- see, the place where you get into problems is when you-- well, again, you're trying to match things like that against0:45:45
things where these have structure, where a, y, b, y something.0:45:58
So this is the kind of place where you're going to get into trouble. AUDIENCE: So you can do that syntactically? PROFESSOR: So you can kind of watch your rules in the kinds0:46:09
of things that your writing. AUDIENCE: So that's the problem that the builder of the database has to be concerned? PROFESSOR: That's a problem.0:46:19
It's a problem either-- not quite the builder of the database, the person who is expressing the rules, or the builder of the database. What the unifier actually does is you can check at the next0:46:29
level down when you actually get to the unifier and you'll see in the code where it looks up in the dictionary. If it sort of says what does y have to be? Oh, does y have to be something that contains a y as0:46:40
its expression? At that point, the unifier and say, oh my God, I'm trying to solve a fixed-point equation. I'll give it up here.0:46:49
AUDIENCE: You make the distinction between the rules in the database. Are the rules added to the database? PROFESSOR: Yes. Yes, I should have said that.0:46:58
One way to think about rules is that they're just other things in the database. So if you want to check the things that have to be checked in the database, they're kind of virtual facts that are in0:47:08
the database. AUDIENCE: But in that explanation, you made the differentiation between database and the rules itself.0:47:18
PROFESSOR: Yeah, I probably should not have done that. The only reason to do that is in terms of the implementation. When you look at the implementation, there's a part which says check either primitive assertions in the0:47:28
database or check rules. And then the real reason why you can't tell what order things are going to come out in and is that the rules0:47:38
database and the data database sort of get merged in a kind of delayed evaluation way. And so that's what makes the order very complicated.0:47:55
OK, let's break.0:48:33
We've just seen how the logic language works and how rules work. Now, let's turn to a more profound question. What do these things mean?0:48:43
That brings us to the subtlest, most devious part of this whole query language business, and that is that it's not quite what it seems to be.0:48:53
AND and OR and NOT and the logical implication of rules are not really the AND and OR and NOT and logical0:49:05
implication of logic. Let me give you an example of that. Certainly, if we have two things in logic, it ought to be the case that AND of P and Q is the same as AND of Q and0:49:22
P and that OR of P and Q is the same as OR of Q and P. But let's look here. Here's an example.0:49:32
Let's talk about somebody outranking somebody else in our little database organization. We'll say s is outranked by b or if either the supervisor of0:49:47
this is b or there's some middle manager here, that supervisor of s is m, and m is outranked by b.0:49:59
So there's one way to define rule outranked by. Or we can write exactly the same thing, except at the bottom here, we reversed the order of these two clauses.0:50:11
And certainly if this were logic, those ought to mean the same thing. However, in our particular implementation, if you say0:50:20
something like who's outranked by Ben Bitdiddle, what you'll find is that this rule will work perfectly well and generate answers, whereas this rule will go0:50:31
into an infinite loop. And the reason for that is that this will come in and say, oh, who's outranked by Ben Bitdiddle?0:50:41
Find an s which is outranked by b, where b is Ben Bitdiddle, which is going to happen in it a subproblem.0:50:50
Oh gee, find an m such as m is outranked by Ben Bitdiddle with no restrictions on m. So this will say in order to solve this problem, I solve0:51:01
exactly the same problem. And then after I've solved that, I'll check for a supervisory relationship. Whereas this one won't get into that, because before it0:51:10
tries to find this outranked by, it'll already have had a restriction on m here. So these two things which ought to mean the same, in0:51:21
fact, one goes into an infinite loop. One does not. That's a very extreme case of a general thing that you'll0:51:30
find in logic programming that if you start changing the order of the things in the ANDs or ORs, you'll find0:51:39
tremendous differences in efficiency. And we just saw an infinitely big difference in efficiency and an infinite loop.0:51:49
And there are similar things having to do with the order in which you enter rules. The order in which it happens to look at rules in the database may vastly change the efficiency with which it gets0:51:59
out answers or, in fact, send it into an infinite loop for some orderings. And this whole thing has to do with the fact that you're0:52:08
checking these rules in some order. And some rules may lead to really long paths of implication. Others might not. And you don't know a priori which ones are good and which0:52:18
ones are bad. And there's a whole bunch of research having to do with that, mostly having to do with thinking about making parallel implementations of logic programming languages. And in some sense, what you'd like to do is check all rules0:52:29
in parallel and whichever ones get answers, you bubble them up. And if some go down infinite deductive changed, well, you just-- you know, memory is cheap and processors are cheap, and you0:52:38
just let them buzz for as for as long as you want. There's a deeper problem, though, in comparing this0:52:47
logic language to real logic. The example I just showed you, it went into an infinite loop maybe, but at least it didn't give the wrong answer.0:52:58
There's an actual deeper problem when we start comparing, seriously comparing this logic language with real0:53:07
classical logic. So let's sort of review real classical logic. All humans are mortal.0:53:22
That's pretty classical logic. Then maybe we'll continue in the very best classical tradition. We'll say all--0:53:31
let's make it really classical. All Greeks are human, which has the syllogism that0:53:41
Socrates is a Greek. And then what do you write here? I think three dots, classical logic.0:53:51
Therefore, then the syllogism, Socrates is mortal.0:54:01
So there's some real honest classical logic. Let's compare that with our classical logic database.0:54:12
So here's a classical logic database. Socrates is a Greek. Plato is a Greek. Zeus is a Greek, and Zeus is a god.0:54:24
And all humans are mortal. To show that something is mortal, it's enough to show that it's human.0:54:34
All humans are fallible. And all Greeks are humans is not quite right. This says that all Greeks who are not gods are human.0:54:45
So to show something's human, it's enough to show it's a Greek and not a god. And the address of any Greek god is Mount Olympus.0:54:54
So there's a little classical logic database. And indeed, that would work fairly well. If we type that in and say is Socrates mortal or Socrates0:55:05
fallible or mortal? It'll say yes. Is Plato mortal and fallible. It'll say yes. If we say is Zeus mortal? It won't find anything.0:55:14
And it'll work perfectly well. However, suppose we want to extend this. Let's define what it means for someone to be a perfect being.0:55:25
Let's say rule: a perfect being.0:55:34
And I think this is right. If you're up on your medieval scholastic philosophy, I believe that perfect beings are ones who were neither mortal nor fallible.0:55:44
AND NOT mortal x, NOT fallible x.0:55:59
So we'll define this system to teach it what a perfect being is. And now what we're going to do is he ask for the address of0:56:09
all the perfect beings. AND the address of x is y and x is perfect.0:56:23
And so what we're generating here is the world's most exclusive mailing list. For the address of all the perfect0:56:32
things, we might have typed this in. Or we might type in this. We'll say AND perfect of x and the address of x is y.0:56:52
Well, suppose we type all that in and we try this query. This query is going to give us an answer. This query will say, yeah, Mount Olympus.0:57:04
This query, in fact, is going to give us nothing. It will say no addresses of perfect beings. Now, why is that? Why is there a difference?0:57:14
This is not an infinite loop question. This is a different answer question. The reason is that if you remember the implementation of NOT, NOT acted as a filter.0:57:25
NOT said I'm going to take some possible dictionaries, some possible frames, some possible answers, and filter out the ones that happened to satisfy some condition, and0:57:35
that's how I implement NOT. If you think about what's going on here, I'll build this query box where the output of an address piece gets fed into0:57:46
a perfect piece. What will happen is the address piece will set up some things of everyone whose address I know.0:57:55
Those will get filtered by the NOTs inside perfect here. So it will throw out the ones which happened to be either mortal or fallible.0:58:04
In the other order what happens is I set this up, started up with an empty frame. The perfect in here doesn't find anything for the NOTs to filter, so nothing comes out here at all.0:58:18
And there's sort of nothing there that gets fed into the address thing. So here, I don't get an answer. And again, the reason for that is NOT isn't generating anything.0:58:27
NOT's only throwing out things. And if I never started up with anything, there's nothing for it to throw out. So out of this thing, I get the wrong answer.0:58:37
How can you fix that? Well, there are ways to fix that. So you might say, well, that's sort of stupid. Why are you just doing all your NOT stuff at the beginning? The right way to implement NOT is to realize that when you0:58:48
have conditions like NOT, you should generate all your answers first, and then with each of these dictionaries pass along until at the very end I'll do filtering.0:58:58
And there are implementations of logic languages that work like that that solve this particular problem. However, there's a more profound problem, which is0:59:10
which one of these is the right answer? Is it Mount Olympus or is it nothing? So you might say it's Mount Olympus, because after all,0:59:19
Zeus is in that database, and Zeus was neither mortal nor fallible.0:59:29
So you might say Zeus wants to satisfy NOT mortal Zeus or NOT0:59:43
fallible Zeus. But let's actually look at that database. Let's look at it. There's no way-- how does it know that Zeus is not fallible?0:59:54
There's nothing in there about that. What's in there is that humans are fallible. How does it know that Zeus is not mortal?1:00:04
There's nothing in there about that. It just said I don't have any rule, which-- the only way I can deduce something's mortal is if it's1:00:13
human, and that's all it really knows about mortal. And in fact, if you remember your classical mythology, you know that the Greek gods were not mortal but fallible.1:00:25
So the answer is not in the rules there. See, why does it deduce that?1:00:34
See, Socrates would certainly not have made this error of logic. What NOT needs in this language is not NOT.1:00:43
It's not the NOT of logic. What NOT needs in this language is not deducible from things in the database as opposed to not true.1:00:55
That's a very big difference. Subtle, but big. So, in fact, this is perfectly happy to say not anything that it doesn't know about.1:01:04
So if you ask it is it not true that Zeus likes chocolate ice cream? It will say sure, it's not true. Or anything else or anything it doesn't know about. NOT means not deducible from the things you've told me.1:01:18
In a world where you're identifying not deducible with, in fact, not true, this is called the closed world assumption.1:01:36
The closed world assumption. Anything that I cannot deduce from what I know is not true, right?1:01:46
If I don't know anything about x, the x isn't true. That's very dangerous. From a logical point of view, first of all, it doesn't really makes sense. Because if I don't know anything about x, I'm willing1:01:58
to say not x. But am I willing to say not not x? Well, sure, I don't know anything about that either maybe. So not not x is not necessarily the same as x and1:02:09
so on and so on and so on, so there's some sort of funny bias in there. So that's sort of funny. The second thing, if you start building up real reasoning1:02:22
programs based on this, think how dangerous that is. You're saying I know I'm in a position to deduce everything1:02:33
true that's relevant to this problem. I'm reasoning, and built into my reasoning mechanism is the assumption that anything that I don't know can't possibly be1:02:45
relevant to this problem, right? There are a lot of big organizations that work like that, right?1:02:54
Most corporate marketing divisions work like that. You know the consequences to that. So it's very dangerous to start really typing in these1:03:04
big logical implication systems and going on what they say, because they have this really limiting assumption built in. So you have to be very, very careful about that.1:03:14
And that's a deep problem. That's not a problem about we can make a little bit cleverer implementation and do the filters and organize the infinite loops to make them go away.1:03:23
It's a different kind of problem. It's a different semantics. So I think to wrap this up, it's fair to say that logic programming I think is a terrifically exciting idea,1:03:34
the idea that you can bridge this gap from the imperative to the declarative, that you can start talking about relations and really get tremendous power by going1:03:46
above the abstraction of what's my input and what's my output. And linked to logic, the problem is it's a goal that I1:03:55
think has yet to be realized. And probably one of the very most interesting research questions going on now in languages is how do you1:04:06
somehow make a real logic language? And secondly, how do you bridge the gap from this world of logic and relations to the worlds of more traditional1:04:16
languages and somehow combine the power of both. OK, let's break. AUDIENCE: Couldn't you solve that last problem by having1:04:25
the extra rules that imply it? The problem here is you have the definition of something, but you don't have the definition of its opposite. If you include in the database something that says something1:04:35
implies mortal x, something else implies not mortal x, haven't you basically solved the problem? PROFESSOR: But the issue is do you put a finite1:04:45
number of those in? AUDIENCE: If things are specified always in pairs--1:04:54
PROFESSOR: But the impression is then what do you do about deduction? You can't specify NOTs.1:05:03
But the problem is, in a big system, it turns out that might not be a finite number of things.1:05:12
There are also sort of two issues. Partly it might not be finite. Partly it might be that's not what you want.1:05:21
So a good example would be suppose I want to do connectivity. I want a reason about connectivity. And I'm going to tell you there's four things: a and b1:05:32
and c and d. And I'll tell you a is connected to b and c's connected to d.1:05:43
And now I'll tell you is a connected to d? That's the question. There's an example where I would like something like the closed world assumption.1:05:54
That's a tiny toy, but a lot of times, I want to be able to say something like anything that I haven't told you, assume is not true.1:06:04
So it's not as simple as you only want to put in explicit NOTs all over the place. It's that sometimes it really isn't clear what you even want.1:06:14
That having to specify both everything and not everything is too precise, and then you get down into problems there. But there are a lot of approaches that explicitly put1:06:24
in NOTs and reason based on that. So it's a very good idea. It's just that then it starts becoming a little cumbersome in the very large problems you'd like to use.1:06:43
AUDIENCE: I'm not sure how directly related to the argument this is, but one of your points was that one of the dangers of the closed rule is you never really know all the things that are there.1:06:53
You never really know all the parts to it. Isn't that a major problem with any programming? I always write programs where I assume that I've got all the cases, and so I check for them all or whatever, and somewhere1:07:04
down the road, I find out that I didn't check for one of them. PROFESSOR: Well, sure, it's true. But the problem here is it's that assumption which is the1:07:14
thing that you're making if you believe you're identifying this with logic. So you're quite right. It's a situation you're never in. The problem is if you're starting to believe that what1:07:24
this is doing is logic and you look at the rules you write down and say what can I deduce from them, you have to be very careful to remember that NOT means something else.1:07:33
And it means something else based on an assumption which is probably not true. AUDIENCE: Do I understand you correctly that you cannot fix this problem without killing off all possibilities of1:07:44
inference through altering NOT? PROFESSOR: No, that's not quite right. There are other-- there are ways to do logic with real NOTs.1:07:56
There are actually ways to do that. But they're very inefficient as far as anybody knows. And they're much more--1:08:05
the, quote, inference in here is built into this unifier and this pattern matching unification algorithm. There are ways to automate real logical reasoning.1:08:16
But it's not based on that, and logic programming languages don't tend to do that because it's very inefficient as far as anybody knows.1:08:29
All right, thank you.0:00:00
Lecture 9A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:17
PROFESSOR: Well, up 'til now, I suppose, we've been learning about a lot of techniques for organizing big programs,0:00:26
symbolic manipulation a bit, some of the technology that you use for establishing languages, one in terms of0:00:36
another, which is used for organizing very large programs. In fact, the nicest programs I know look more like a pile of languages than like a decomposition of a problem0:00:47
into parts. Well, I suppose at this point, there are still, however, a few mysteries about how this sort of stuff works.0:00:56
And so what we'd like to do now is diverge from the plan of telling you how to organize big programs, and rather tell0:01:06
you something about the mechanisms by which these things can be made to work. The main reason for this is demystification, if you will,0:01:18
that we have a lot of mysteries left, like exactly how it is the case that a program is controlled, how a0:01:27
computer knows what the next thing to do is, or something like that. And what I'd like to do now is make that clear to you, that0:01:36
even if you've never played with a physical computer before, the mechanism is really very simple, and that you can understand it completely with no trouble.0:01:47
So I'd like to start by imagining that we-- well, the way we're going to do this, by the way, is we're going to take some very simple Lisp programs, very simple0:01:57
Lisp programs, and transform them into hardware. I'm not going to worry about some intermediate step of going through some existing computer machine language and0:02:07
then showing you how that computer works, because that's not as illuminating. So what I'm really going to show you is how a piece of0:02:16
machinery can be built to do a job that you have written down as a program. That program is, in fact, a description of a machine.0:02:25
We're going to start with a very simple program, proceed to show you some simple mechanisms, proceed to a few more complicated programs, and then later show you a not very0:02:36
complicated program, how the evaluator transforms into a piece of hardware. And of course at that point, you have made the universal transition and can execute any program imaginable with a0:02:47
piece of well-defined hardware. Well, let's start up now, give you a real concrete feeling for this sort of thing. Let's start with a very simple program.0:02:59
Here's Euclid's algorithm. It's actually a little bit more modern than Euclid's algorithm. Euclid's algorithm for computing the greatest common0:03:09
divisor of two numbers was invented 350 BC, I think. It's the oldest known algorithm.0:03:19
But here we're going to talk about GCD of A and B, the Greatest Common Divisor or two numbers, A and B. And the algorithm is extremely simple.0:03:29
If B is 0, then the result is going to be A. Otherwise, the0:03:38
result is the GCD of B and the remainder when A is divided by0:03:52
B. So this we have here is a very simple iterative process.0:04:02
This a simple recursive procedure, recursively defined procedure, recursive definition, which yields an iterative process. And the way it works is that every step, it determines0:04:13
whether B was zero. And if B is 0, we got the answer in A. Otherwise, we make another step where A is the old B, and B is the0:04:25
remainder of the old A divided by the old B. Very simple. Now this, I've already told you some of the mechanism by just saying it that way.0:04:34
I set it in time. I said there are certain steps, and that, in fact, one of the things you can see here is that one of the reasons why this is iterative is nothing is needed of the last step to0:04:46
get the answer. All of the information that's needed to run this algorithm is in A and B. It has two well-defined state variables.0:05:00
So I'm going to define a machine for you that can compute you GCDs. Now let's see. Every computer that's ever been made that's a0:05:10
single-process computer, as opposed to a multiprocessor of some sort, is made according to the same plan. The plan is the computer has two parts, a part called the0:05:21
datapaths, and a part called the controller. The datapaths correspond to a calculator that you might have. It contains certain registers that remember0:05:31
things, and you've all used calculators. It has some buttons on it and some lights. And so by pushing the various buttons, you can cause operations to happen inside there among the registers, and0:05:42
some of the results to be displayed. That's completely mechanical. You could imagine that box has no intelligence in it. Now it might be very impressive that it can produce0:05:52
the sine of a number, but that at least is apparently possibly mechanical. At least, I could open that up in the same way I'm about to open GCD.0:06:02
So this may have a whole computer inside of it, but that's not interesting. Addition is certainly simple. That can be done without any further mechanism. Now also, if we were to look at the other half, the0:06:15
controller, that's a part that's dumb, too. It pushes the buttons. It pushes them according to the sequence, which is written down on a piece of paper, and observes the lights.0:06:26
And every so often, it comes to a place in a sequence that says, if light A is on, do this sequence. Otherwise, do that sequence. And thereby, there's no complexity there either.0:06:37
Well, let's just draw that and see what we feel about that. So for computing GCDs, what I want you to think about is0:06:48
that there are these registers. A register is a place where I store a number, in this case. And this one's called a. And then there's another one for storing b.0:07:03
Now we have to see what things we can do with these registers, and they're not entirely obvious what you can do with them. Well, we have to see what things we need to do with them. We're looking at the problem we're trying to solve.0:07:14
One of the important things for designing a computer, which I think most designers don't do, is you study the problem you want to solve and then use what you learn from0:07:23
studying the problem you want to solve to put in the mechanisms needed to solve it in the computer you're building, no more no less.0:07:32
Now it may be that the problem you're trying to solve is everybody's problem, in which case you have to build in a universal interpreter of some language. But you shouldn't put any more in than required to build the0:07:42
universal interpreter of some language. We'll worry about that in a second. OK, going back to here, let's see. What do we have to be able to do?0:07:51
Well, somehow, we have to be able to get B into A. We have to be able to get the old value of B into the value of A. So we have to have some path by which stuff can flow,0:08:03
whatever this information is, from b to a. I'm going to draw that with by an arrow saying that it is possible to move the contents of b into a, replacing the0:08:13
value of a. And there's a little button here which you push which allows that to happen. That's what the little x is here.0:08:23
Now it's also the case that I have to be able to compute the remainder of a and b. Now that may be a complicated mess. On the other hand, I'm going to make it a small box. If we have to, we may open up that box and look inside and0:08:34
see what it is. So here, I'm going to have a little box, which I'm going to draw this way, which we'll call the remainder.0:08:46
And it's going to take in a. That's going to take in b. And it's going to put out something, the remainder of a0:08:59
divided by b. Another thing we have to see here is that we have to be able to test whether b is equal to 0.0:09:08
Well, that means somebody's got to be looking at-- a thing that's looking at the value of b. I have a light bulb here which lights up if b equals 0.0:09:21
That's its job. And finally, I suppose, because of the fact that we want the new value of a to be the old value of b, and0:09:30
simultaneously the new value of b to be something I've done with a, and if I plan to make my machine such that everything happens one at a time, one motion at a time,0:09:41
and I can't put two numbers in a register, then I have to have another place to put one while I'm interchanging. OK?0:09:50
I can't interchange the two things in my hands, unless I either put two in one hand and then pull it back the other way, or unless I put one down, pick it up, and put the other one, like that, unless I'm a juggler, which I'm not, as you0:10:02
can see, in which case I have a possibility of timing errors. In fact, much of the type of computer design people do0:10:11
involves timing errors, of some potential timing errors, which I don't much like. So for that reason, I have to have a place to put the second0:10:22
one of them down. So I have a place called t, which is a register just for temporary, t, with a button on it. And then I'll take the result of that, since I have to take0:10:32
that and put into b, over here, we'll take the result of that and go like this, and a button here.0:10:42
So that's the datapaths of a GCD machine. Now what's the controller? Controller's a very simple thing, too.0:10:52
The machine has a state. The way I like to visualize that is that I've got a maze. And the maze has a bunch of places0:11:01
connected by directed arrows. And what I have is a marble, which represents the state of the controller.0:11:10
The marble rolls around in the maze. Of course, this analogy breaks down for energy reasons. I sometimes have to pump the marble up to the top, because0:11:19
it's going to otherwise be a perpetual motion machine. But not worrying about that, this is not a physical analogy. This marble rolls around. And every time it rolls around certain bumpers, like in a0:11:30
pinball machine, it pushes one of these buttons. And every so often, it comes to a place, which is a division, where it has to make a choice.0:11:40
And there's a flap, which is controlled by this. So that's a really mechanical way of thinking about it. Of course, controllers these days, are not built that way0:11:50
in real computers. They're built with a little bit of ROM and a state register. But there was a time, like the DEC PDP-6, where that's how0:11:59
you built the controller of a machine. There was a bit that ran around the delay line, and it triggered things as it went by.0:12:08
And it would come back to the beginning and get fed round again. And of course, there were all sorts of great bugs you could have like two bits going around, two marbles.0:12:17
And then the machine has lost its marbles. That happens, too. Oh, well. So anyway, for this machine, what I have to do is the following. I'm going to start my maze here.0:12:30
And the first thing I've got to do, in a notation which many of you are familiar with, is b equal to zero, a test.0:12:41
And there's a possibility, either yes, in which case I'm done. Otherwise, if no, then I'm going have to0:12:53
roll over some bumpers. I'm going to do it in the following order. I want to do this interchange game.0:13:04
Now first, since I need both a and b, but then the first-- and this is not necessary-- I want to collect this. This is the thing that's going to go into b.0:13:13
So I'm going to say, take this, which depends upon both a and b, and put the remainder into here. So I'm going to push this button first. Then, I'm going0:13:22
to transfer b to a, push that button, and then I transfer the temporary into b, push that button.0:13:32
So a very sequential machine, it's very inefficient. But that's fine right now. We're going to name the buttons, t gets remainder.0:13:46
a gets b. And b gets t.0:13:55
And then I'm going to go around here and it's to go back to start. And if you look, what are we seeing here? We're seeing the various--0:14:05
what I really have is some sort of mechanical connection, where t gets r controls this thing.0:14:16
And I have here that a gets b controls this fellow over here, and this fellow over here.0:14:28
Boy, that's absolutely pessimal, the inverse of optimal. Every line heads across every other line the way I drew it.0:14:38
I suppose this goes here, b gets t. Now I'd like to run this machine.0:14:48
But before I run the machine, I want to write down a description of this controller, just so you can see that these things, of course, as usual, can be written down in some nice language, so that we don't have to always draw these diagrams. One of the problems0:14:59
with diagrams is that they take up a lot of space. And for a machine this small, it takes two blackboards. For a machine that's the evaluator machine, I have trouble putting it into this room, even though0:15:08
it isn't very big. So I'm going to make a little language for this that's just a description of that, saying define a0:15:17
machine we'll call GCD. Of course, once we have something like this, we have a simulator for it.0:15:27
And the reason why we want to build a language in this form, is because all of a sudden we can manipulate these expressions that I'm writing down. And then of course I can write things that can algebraically manipulate these things, simulate them, all that sort0:15:38
of things that I might want to do, perhaps transform them as a layout, who knows. Once I have a nice representation of registers,0:15:48
it has certain registers, which we can call A, B, and T. And there's a controller.0:16:02
Actually, a better language, which would be more explicit, would be one which named every button also and said what it did. Like, this button causes the contents of T to go to the0:16:13
contents of B. Well I don't want to do that, because it's actually harder to read to do that, and it takes up more space. So I'm going to have that in the instructions written in the controller.0:16:23
It's going to be implicit what the operations are. They can be deduced by reading these and collecting together all the different things that can be done.0:16:33
Well, let's just look at what these things are. There's a little loop that we go around which says branch,0:16:42
this is the representation of the little flap that decides which way you go here, if 0 fetch of B, the contents of B,0:16:58
and if the contents of B is 0, then go to a place called done. Now, one thing you're seeing here, this looks very much like a traditional computer language.0:17:08
And what you're seeing here is things like labels that represent places in a sequence written down as a sequence.0:17:17
The reason why they're needed is because over here, I've written something with loops. But if I'm writing English text, or something like that,0:17:26
it's hard to refer to a place. I don't have arrows. Arrows are represented by giving names to the places where the arrows terminate, and then referring to them by0:17:35
those names. Now this is just an encoding. There's nothing magical about things like that. Next thing we're going to do is we're going to say, how do0:17:45
we do T gets R? Oh, that's easy enough, assign. We assign to T the remainder.0:17:56
Assign is the name of the button. That's the button-pusher. Assign to T the remainder, and here's the representation of0:18:05
the operation, when we divide the fetch of A by the fetch of0:18:17
B. And we're also going to assign to A the fetch of B, assign to0:18:35
B the result of getting the contents of T. And now I have0:18:50
to refer to the beginning here. I see, why don't I call that loop like I have here?0:19:05
So that's that reference to that arrow. And when we're done, we're done. We go to here, which is the end of the thing.0:19:14
So here's just a written representation of this fragment of machinery that we've drawn here. Now the next thing I'd like to do is run this.0:19:25
I want us to feel it running. Never done this before, you got to do it once. So let's take a particular problem. Suppose we want to compute the GCD of a equals 300:19:38
and b equals 42. I have no idea what that is right now. But a 30 and b is 42.0:19:50
So that's how I start this thing up. Well, what's the first thing I do? I say is B equal to 0, no. Then assign to T the remainder of the fetch of A and the0:20:01
fetch of B. Well the remainder of 30 when divided by 42 is itself 30.0:20:11
Push that button. Now the marble has rolled to here. A gets B. That pushes this button.0:20:21
So 42 moves into here. B gets C. Push that button. The 30 goes here.0:20:32
Let met just interchange them. Now let's see, go back to the beginning. B 0, no. T gets the remainder.0:20:43
I suppose the remainder when dividing 42 by 30 is 12. I push that one. Next thing I do is allow the 30 to go to here, push this0:20:54
one, allow the 12 to go to here. Go around this thing. Is that done? No. How about--0:21:05
so now I have to find out the remainder of 30 divided by 12. And I believe that's 6. So 6 goes here on this button push.0:21:15
Then the next thing I push is this one, which the 12 goes into here. Then I push this button.0:21:25
The 6 gets into here. Is 6 equal to 0? No. OK.0:21:34
So then at that point, the next thing to do is divide it. Ooh, this has got a remainder of 0. Looks like we're almost done. Move the 6 over here next.0:21:47
0 over here. Is the answer 0? Yes. B is 0, therefore the answer is in A. The answer is 6.0:21:56
And indeed that's right, because if we look at the original problem, what we have is 30 is 2 times 3 times 5,0:22:07
and 42 is 2 times 3 times 7. So the greatest common divisor is 2 times 3, which is 6.0:22:18
Now normally, we write one other little line here, just to make it a little bit clearer, which is that we leave in a connection saying that this light is the guy0:22:29
that that flap looks at. Of course, any real machine has a lot more complicated0:22:38
things in it than what I've just shown you. Let's look for a second at the first still store.0:22:47
Wow. Well you see, for example, one thing we might want to do is worry about the operations that are of IO form.0:22:56
And we may have to collect something from the outside. So a state machine that we might have, the controller may0:23:06
have to, for example, get a value from something and put register a to load it up. I have to master load up register b with another value.0:23:17
And then later, when I'm done, I might want to print the answer out. And of course, that might be either simple or complicated.0:23:26
I'm writing, assuming print is very simple, and read is very simple. But in fact, in the real world, those are very complicated operations, usually much, much larger and more complicated than the thing you're doing as your0:23:37
problem you're trying to solve. On the other hand, I can remember a time when, I remember using IBM 7090 computer of sorts, where0:23:49
things like read and write of a single object, a single number, a number, is a primitive operation of the IO0:23:58
controller. OK? And so we have that kind of thing in there. And in such a machine, well, what are we really doing?0:24:08
We're just saying that there's a source over here called "read," which is an operation which always has a value. We have to think about this as always having a value which0:24:17
can be gated into either register a or b. And print is some sort of thing which when you gate it appropriately, when you push the button on it, will cause a0:24:27
print of the value that's currently in register a. Nothing very exciting. So that's one sort of thing you might want to have. But0:24:36
these are also other things that are a little bit worrisome. Like I've used here some complicated mechanisms. What you see here is remainder. What is that? That may not be so obvious how to compute.0:24:46
It may be something which when you open it up, you get a whole machine. OK? In fact, that's true. For example, if I write down the program for remainder, the0:24:59
simplest program for it is by repeated subtraction. Because of course, division can be done by repeated subtraction of numbers, of integers.0:25:09
So the remainder of N divided by D is nothing more than if N0:25:30
is less than D, then the result is N. Otherwise, it's the remainder when we subtract D from N with respect to D,0:25:48
when divided by D. Gee, this looks just like the GCD program. Of course, it's not a very nice way to do remainders.0:25:59
You'd really want to use something like binary notation and shift and things like that in a practical computer. But the point of that is that if I open this thing up, I0:26:09
might find inside of it a computer. Oh, we know how to do that. We just made one. And it could be another thing just like this. On the other hand, we might want to make a more efficient0:26:20
or better-structured machine, or maybe make use of some of the registers more than once, or some horrible mess like that that hardware designers like to do, and for very good reasons.0:26:29
So for example, here's a machine that you see, which you're not supposed to be able to read. It's a little bit complicated. But what it is is the integration of the remainder0:26:41
into the GCD machine. And it takes, in fact, no more registers. There are three registers in the datapaths. OK? But now there's a subtractor.0:26:51
There are two things that are tested. Is b equal to 0, or is t less than b? And then the controller, which you see over here, is not much0:27:00
more complicated. But it has two loops in it, one of which is the main one for doing the GCD, and one of which is the subtraction loop0:27:10
for doing the remainder sub-operation. And there are ways, of course, of, if you think about it, taking the remainder program.0:27:19
If I take remainder, as you see over there, as a lambda expression, substitute it in for remainder over here in the GCD program, then do some simplification by substituting0:27:30
a and b for remainder in there, then I can unwind this loop. And I can get this piece of machinery by basically, a0:27:41
little bit of algebraic simplification on the lambda expressions. So I suppose you've seen your first very0:27:50
simple machines now. Are there any questions?0:28:02
Good. This looks easy, doesn't it? Thank you. I suppose, take a break.0:28:11
[MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:28:47
PROFESSOR: Well, let's see. Now you know how to make an iterative procedure, or a procedure that yields an iterative process, turn into a machine.0:28:57
I suppose the next thing we want to do is worry about things that reveal recursive processes. So let's play with a simple factorial procedure.0:29:10
We define factorial of N to be if n is 1, the result is 1,0:29:24
using 1 right now to decrease the amount of work I have to do to simulate it, else it's times N factorial N minus 1.0:29:42
And what's different with this program, as you know, is that after I've computed factorial of N minus 1 here, I have to0:29:51
do something to the result. I have to multiply it by N. So the only way I can visualize what this machine is0:30:00
doing, because of the fact-- think of it this way, that I have a machine out here which somehow needs a factorial machine in order to compute its answer.0:30:09
But this machine, the outer machine, has to exist before and after the factorial machine, which is inside. Whereas in the iterative case, the outer machine doesn't need0:30:20
to exist after the inner machine is running, because you never need to go back to the outer machine to do anything. So here we have a problem where we have a machine which0:30:31
has the same machine inside of it, an infinitely large machine.0:30:40
And it's got other things inside of it, like a multiplier, which takes some inputs, and there's a minus 1 box, and things like that.0:30:50
You can imagine that's what it looks like. But the important thing is that here I have something that happens before and after, in the outer machine, the0:31:00
execution of the inner machine. So this machine has to have a life. It has to exist on both times sides of this machine.0:31:13
So somehow, I have to have a place to store the things that this thing needs to run. Infinite objects don't exist in the real world.0:31:24
What we have to do is arrange an illusion that we have an infinite object, we have an infinite amount of hardware somewhere. Now of course, illusion's all that really matters.0:31:36
If we can arrange that every time you look at some infinite object, the part of it that you look at is there, then it's as infinite as you need it to be.0:31:47
And of course, one of the things we might want to do, just look at this thing over here, is the organization that we've had so far involves having a part of the machine,0:32:01
which is the controller, which sits right over here, which is perfectly finite and very simple. We have some datapaths, which consist of0:32:11
registers and operators. And what I propose to do here is decompose the machine into two parts, such that there is a part which is fundamentally finite, and some part where a certain amount of infinite0:32:22
stuff can be kept. On the other hand this is very simple and really isn't infinite, but it's just very large. But it's so simple that it could be cheaply reproduced in0:32:31
such large amounts, we call it memory, that we can make a structure called a stack out of it which will allow us to,0:32:40
in fact, simulate the existence of an infinite machine which is made out of a recursive nest of many machines. And the way it's going to work is that we're going to store0:32:51
in this place called the stack the information required after the inner machine runs to resume the operation of the0:33:00
outer machine. So it will remember the important things about the life of the outer machine that will be needed for this0:33:09
computation. Since, of course, these machines are nested in a recursive manner, then in fact the stack will only be0:33:20
accessed in a manner which is the last thing that goes in is the first thing that comes out.0:33:29
So we'll only need to access some little part of this stack memory. OK, well, let's do it. I'm going to build you a datapath now, and I'm going to0:33:38
write the controller. And then we're going to execute this to see how you do it. So the factorial machine isn't so bad.0:33:47
It's going to have a register called the value, where the answer is going to be stored, and a registered called N,0:33:59
which is where the number I'm taking factorial will be stored, factorial of. And it will be necessary in some instances to connect VAL0:34:09
to N. In fact, one nice case of this is if I just said over here, N, because that would be right for N equal 1N.0:34:19
And I could just move the answer over there if that's important. I'm not worried about that right now. And there are things I have to be able to do.0:34:29
Like I have to be able to, as we see here, multiply N by something in VAL, because VAL is the result of computing factorial.0:34:38
And I have to put the result back into VAL. So here we can see that the result of computing a factorial is N times the result0:34:48
of computing a factorial. VAL will be the representation of the answer of the inner factorial. And so I'm going to have to have a multiplier here, which0:35:02
is going to sample the value of N and the value of VAL and put the result back into VAL like that.0:35:17
I'm also going to have to be able to see if N is 1. So I need a light bulb.0:35:28
And I suppose the other thing I'm going to need to have is a way of decrementing N. So I'm going to have a decrementer,0:35:38
which takes N and is going to put back the result into N. That's pretty much what I need in my machine.0:35:49
Now, there's a little bit else I need. It's a little bit more complicated, because I'm also going to need a way to store, to save away, the things that0:35:58
are going to be needed for resuming the computation of a factorial after I've done a sub-factorial. What's that?0:36:07
One thing I need is N. So I'm going to build here a thing called a stack. The stack is a bunch of stuff that I'm going to write in0:36:24
sequentially. I don't how long it is. The longer it is, the better my illusion of infinity. And I'm going to have to have a way of getting stuff out of0:36:36
N and into the stack and vice versa. So I'm going to need a connection like this, which is two-way, whereby I can save the value of N and then0:36:52
restore it some other time through that connection. This is the stack. I also need a way of remembering where I was in the0:37:02
computation of factorial in the outer program. Now in the case of this machine, it0:37:11
isn't very much a problem. Factorial always returns, has to go back to the place where we multiply by N, except for the last time, when it has to0:37:21
return to whatever needs the factorial or go to done or stop. However, in general, I'm going to have to remember where I have been, because I might have computed factorial from0:37:30
somewhere else. I have to go back to that place and continue there. So I'm going to have to have some way of taking the place where the marble is in the finite state controller, the0:37:41
state of the controller, and storing that in the stack as well. And I'm going to have to have ways of restoring that back to the state of the-- the marble.0:37:51
So I have to have something that moves the marble to the right place. Well, we're going to have a place which is the marble now. And it's called the continue register, called continue,0:38:09
which is the place to put the marble next time I go to continue. That's what that's for. And so there's got to be some path from that into the controller.0:38:22
I also have to have some way of saving that on the stack. And I have to have some way of setting that up to have0:38:32
various constants, a certain fixed number of constants. And that's very easy to arrange. So let's have some constants here. We'll call this one after-fact.0:38:47
And that's a constant which we'll get into the continue register, and also another one called fact-done.0:39:05
So this is the machine I want to build. That's its datapaths, at least. And it mixes a little with the controller here, because of the fact that I have to remember where I was and restore0:39:15
myself to that place. But let's write the program now which represents the controller. I'm not going to write the define machine thing and the register list, because that's not very interesting.0:39:24
I'm just going to write down the sequence of instructions that constitute the controller. So we have assign, to set up, continue to done.0:39:44
We have a loop which says branch if equal 1 fetch N, if0:40:01
N is 1, then go to the base step of the induction, the simple case. Otherwise, I have to remember the things that are necessary0:40:10
to perform a sub-factorial. I'm going to go over here, and I have to perform a sub-factorial. So I have to remember what's needed after I will0:40:21
be done with that. See, I'm about to do something terrible. I'm about to change the value of N. But this guy has to know the old value of N. But in order to make the0:40:32
sub-factorial work, I have to change the value of N. So I have to remember the old value. And I also have to remember where I've been. So I save up continue.0:40:47
And this is an instruction that says, put something in the stack. Save the contents of the continuation register, which0:40:56
in this case is done, because later I'm going to change that, too, because I need to go back to after-fact, as well. We'll see that.0:41:05
We save N, because I'm going to need that for later. Assign to N the decrement of fetch N. Assign continue,0:41:31
we're going to look at this now, to after, we'll call it. That's a good name for this, a little bit easier and shorter, and fits in here.0:41:52
Now look what I'm doing here. I'm saying, if the answer is 1, I'm done. I'm going to have to just get the answer.0:42:02
Otherwise, I'm going to save the continuation, save N, make N one less than N, remember I'm going to come back to someplace else, and go back and start doing another factorial.0:42:13
However, I've got a different machine [? in me ?] now. N is 1, and continue is something else.0:42:22
N is N minus 1. Now after I'm done with that, I can go there. I will restore the old value of N, which is the opposite of0:42:34
this save over here. I will restore the continuation.0:42:49
I will then go to here. I will assign to the VAL register the product0:43:03
of N and fetch VAL.0:43:13
VAL fetch product assign. And then I will be done. I will have my answer to the sub-factorial in VAL.0:43:26
At that point, I'm going to return by going to the place where the continuation is pointing. That says, go to fetch continue.0:43:45
And then I have finally a base step, which is the immediate answer. Assign to VAL fetch N, and go to fetch continue.0:44:12
And then I'm done. Now let's see how this executes on a very simple case, because then we'll see the use of this stack to do0:44:25
the job we need. This is statically what it's doing, but we have look dynamically at this. So let's see. First thing we do is continue gets done.0:44:36
The way that happened is I pushed this. Let's call that done the way I have it.0:44:46
I push that button. Done goes into there. Now I also have to set this thing up to have an initial value. Let's consider a factorial of three, a simple case.0:45:00
And we're going to start out with our stack growing over here. Stacks have their own little internal state saying where they are, where the next place I'm going to write is.0:45:12
So now we say, is N 1? The answer is no. So now I'm going to save continue, bang. Now that done goes in here.0:45:22
And this moves to here, the next place I'm going to write. Save N 3. OK? Assign to N the decrement of N. That means0:45:34
I've pushed this button. This becomes 2. Assign to continue aft. So I've pushed that button.0:45:43
Aft goes in here. OK, now go to loop, bang, so up to here.0:45:54
Is N 1? No. So I have to save continue. What's continue? Continue is aft. Push this button. So this moves to here.0:46:08
I have to save N. N is over here. I got to 2. Push that button. So a 2 gets written there. And then this thing moves down here.0:46:20
OK, save N. Assign N to the decrement of N. This becomes a 1.0:46:29
Assign continue to aft. A-F-T gets written there again. Go to loop. Is N equal to 1? Oh, yes, the answer is 1.0:46:41
OK, go to base step. Assign to VAL fetch of N. Bang, 1 gets put in there.0:46:51
Go to fetch continue. So we look in continue. Basically, I'm pushing a button over here that goes to the controller. The continue becomes aft, and all of a sudden, the program's running here.0:47:02
I now have to restore the outer version of factorial. So we go here. We say, restore N. So restore N means take the contents0:47:12
that's here. Push this button, and it goes into here, 2, and the pointer moves up.0:47:22
Restore continue, pretty easy. Go push this button. And then aft gets written in here again.0:47:31
That means this thing moves up. I've gotten rid of something else on my stack.0:47:42
Right, then I go to here, which says, assign to VAL the product of N an VAL. So I push this button over here, bang. 2 times 1 gives me a 2, get written there.0:47:55
Go to fetch continue. Continue is aft. I go to aft. Aft says restore N. Do your restore N, means I take the0:48:06
value over here, which is 3, push this up to here, and move it into here, N. Now it's pushing that button.0:48:17
The next thing I do is restore continue. Continue is now going to become done. So this moves up here when I push this button.0:48:27
Done may or may be there anymore, I'm not interested, but it certainly is here. Next thing I do is assign to VAL the product of the fetch0:48:39
of N and the fetch of VAL. That's pushing this button over here, bang. 2 times 3 is 6. So I get a 6 over here.0:48:52
And go to fetch continue, whoops, I go to done, and I'm done. And my answer is 6, as you can see in the VAL register. And in fact, the stack is in the state it0:49:02
originally was in. Now there's a bit of discipline in using these things like stacks that we have to be careful of.0:49:13
And we'll see that in the next segment. But first I want to ask if there are any questions for this.0:49:28
Are there any questions? Yes, Ron. AUDIENCE: What happens when you roll off the end of the stack with-- PROFESSOR: What do you mean, roll off of? AUDIENCE: Well, the largest number-- a larger starting point of N requires more memory, correct?0:49:38
PROFESSOR: Oh, yes. Well, I need to have a long enough stack. You say, what if I violate my illusion? AUDIENCE: Yes. PROFESSOR: Well, then the magic doesn't work.0:49:48
The truth of the matter is that every machine is finite. And for a procedure like this, there's a limit to the number of sub-factorials I could have.0:49:59
Remember when we were doing the y-operator a while ago, we pointed out that there was a sequence of exponentiation procedures, each of which was a little better than the previous one.0:50:08
Well, we're now seeing how we implement that mathematical idea. The limiting process is only so good as as far as you take the limit.0:50:17
If you think about it, what am I using here? I'm using about two pieces of memory for every recursion of0:50:26
this process. If we try to compute factorial of 10,000, that's not a lot of memory. On the other hand, it's an awful big number.0:50:36
So the question is, is that a valuable thing in this case. But it really turns out not to be a terrible limit, because memory is el cheapo, and people are pretty expensive.0:50:48
OK, thank you, let's take a break. [MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:51:55
PROFESSOR: Well, let's see. What I've shown you now is how to do a simple iterative process and a simple recursive process.0:52:05
I just want to summarize the design of simple machines for specific applications by showing you a little bit more complicated design, that of a thing that does doubly0:52:15
recursive Fibonacci, because it will indicate to us, and we'll understand, a bit about the conventions required for making stacks operate correctly.0:52:26
So let's see. I'm just going to write down, first of all, the program I'm going to translate. I need a Fibonacci procedure, it's very simple, which says,0:52:41
if N is less than 2, the result is N, otherwise it's0:52:50
the sum of Fib of N minus 1 and Fib of N minus 2.0:53:07
That's the plan I have here. And we're just going to write down the controller for such a machine. We're going to assume that there are registers, N, which0:53:16
holds the number we're taking Fibonacci of, VAL, which is where the answer is going to get put, and continue, which is the thing that's linked to the controller, like before.0:53:26
But I'm not going to draw another physical datapath, because it's pretty much the same as the last one you've seen. And of course, one of the most amazing things about0:53:37
computation is that after a while, you build up a little more features and a few more features, and all of the sudden, you've got everything you need. So it's remarkable that it just gets there so fast. I0:53:48
don't need much more to make a universal computer. But in any case, let's look at the controller for the Fibonacci thing. First thing I want to do is start the thing up by assign0:54:01
to continue a place called done, called Fib-done here.0:54:13
So that means that somewhere over here, I'm going to have a label, Fib-done, which is the place where I go when I want the machine to stop.0:54:24
That's what that is. And I'm going to make up a loop. It's a place I'm going to go to in order to start up0:54:33
computing a Fib. Whatever is in N at this point, Fibonacci will be computed of, and we will return to the place specified by continue.0:54:46
So what you're going to see here at this place, what I want here is the contract that says, I'm going to write this with a comment syntax, the contract is N contains arg,0:55:00
the argument. Continue is the recipient.0:55:12
And that's where it is. At this point, if I ever go to this place, I'm expecting this to be true, the argument for computing the Fibonacci.0:55:24
Now the next thing I want to do is to branch. And if N is less than 2--0:55:34
by the way, I'm using what looks like Lisp syntax. This is not Lisp. This does not run. What I'm writing here does not run as a simple Lisp program.0:55:46
This is a representation of another language. The reason I'm using the syntax of parentheses and so on is because I tend to use a Lisp system to write an0:55:56
interpreter for this which allows me to simulate the machine I'm trying to build. I don't want to confuse this to think that0:56:05
this is Lisp code. It's just I'm using a lot of the pieces of Lisp. I'm embedding a language in Lisp, using Lisp as pieces to make my process of making my simulator easy.0:56:16
So I'm inheriting from Lisp all of its properties. Fetch of N 2, I want to go to a place called immediate answer.0:56:25
It's the base step. Now, that's somewhere over here, just above done.0:56:37
And we'll see it later. Now, in the general case, which is the part I'm going to write down now, let's just do it. Well, first of all, I'm going to have to0:56:46
call Fibonacci twice. In each case-- well, in one case at least, I'm going to have to know what to do to come back and do the next one.0:56:56
I have to remember, have I done the first Fib, or have I done the second one? Do I have to come back to the place where I do the second0:57:06
Fib, or do I have to come back to the place where I do the add? In the first case, over the first Fibonacci, I'm going to need the value of N for computing for the second one.0:57:20
So I have to store some of these things up. So first I'm going to save continue. That's who needs the answer.0:57:31
And the reason I'm doing that is because I'm about to assign continue to the place which is the place I0:57:42
want to go to after. Let's call it Fib-N-minus-1, big long name,0:57:52
classic Lisp name. Because I'm going to compute the first Fib of N minus 1, and then after that, I want to come back and0:58:02
do something else. That's the place I want to go to after I've done the first Fibonacci calculation.0:58:11
And I want to do a save of N, because I'm going to need it later, after that. Now I'm going to, at this point, get ready to do the0:58:21
Fibonacci of N minus 1. So assign to N the difference of the fetch of N and 1.0:58:38
Now I'm ready to go back to doing the Fib loop.0:58:47
Have I satisfied my contract? And the answer is yes. N contains N minus 1, which is what I need.0:58:57
Continue contains a place I want to go to when I'm done with calculating N minus 1. So I've satisfied the contract. And therefore, I can write down here a label,0:59:11
after-Fib-N-minus-1.0:59:20
Now what am I going to do here? Here's a place where I now have to get ready to do Fib of N minus 2.0:59:29
But in order to do a Fib of N minus 2, look, I don't know. I've clobbered my N over here. And presumably my N is counted down all the way to 1 or 0 or something at this point.0:59:39
So I don't know what the value of N in the N register is. I want the value of N that was on the stack that I saved over here so that could restore it over here.0:59:49
I saved up the value of N, which is this value of N at this point, so that I could restore it after computing Fib of N minus 1, so that I could count that down to N minus 20:59:59
and then compute Fib of N minus 2. So let's restore that.1:00:08
Restore of N. Now I'm about to do something which is superstitious, and we will remove it shortly.1:00:18
I am about to finish the sequence of doing the subroutine call, if you will. I'm going to say, well, I also saved up the continuation,1:00:28
since I'm going to restore it now. But actually, I don't have to, because I'm not going to need it. We'll fix that in a second. So we'll do a restore of continue, which is what I1:00:46
would in general need to do. And we're just going to see what you would call in the compiler world a peephole optimization, which says, whoops, you didn't have to do that.1:00:55
OK, so the next thing I see here is that I have to get ready now to do Fibonacci of N minus 2. But I don't have to save N anymore.1:01:05
The reason why I don't have to save N anymore is because I don't need N after I've done Fib of N minus 2, because the next thing I do is add. So I'm just going to set up my N that way.1:01:16
Assign N minus difference of fetch N and 2.1:01:31
Now I have to finish the setup for calling Fibonacci of N minus 2. Well, I have to save up continue and assign continue,1:01:48
continue, to the place which is after Fib N 2, that place1:02:03
over here somewhere. However, I've got to be very careful. The old value, the value of Fib of N minus 1, I'm going to1:02:12
need later. The value of Fibonacci of N minus 1, I'm going to need. And I can't clobber it, because I'm going to have to1:02:21
add it to the value of Fib of N minus 2. That's in the value register, so I'm going to save it. So I have to save this right now, save up VAL.1:02:33
And now I can go off to my subroutine, go to Fib loop.1:02:44
Now before I go any further and finish this program, I just want to look at this segment so far and see, oh yes, there's a sequence of instructions here, if you1:02:55
will, that I can do something about. Here I have a restore of continue, a save of continue,1:03:06
and then an assign of continue, with no other references to continue in between. The restore followed by the save1:03:15
leaves the stack unchanged. The only difference is that I set the continue register to a value, which is the value that was on the stack.1:03:24
Since I now clobber that value, as in it was never referenced, these instructions are unnecessary. So we will remove these.1:03:38
But I couldn't have seen that unless I had written them down. Was that really true? Well, I don't know.1:03:48
OK, so we've now gone off to compute Fibonacci of N minus 2. So after that, what are we going to do?1:04:05
Well, I suppose the first thing we have to do-- we've got two things. We've got a thing in the value register which is now valuable. We also have a thing on the stack that can be restored into the value register.1:04:14
And what I have to be careful with now is I want to shuffle this right so I can do the multiply. Now there are various conventions I might use, but I'm going to be very picky and say, I'm only going to restore1:04:24
into a register I've saved from. If that's the case, I have to do a shuffle here. It's the same problem with how many hands I have. So I'm going to assign to N, because I'm not going to need N1:04:37
anymore, N is useless, the current value of VAL, which was the value of Fib of N minus 2.1:04:52
And I'm going to restore the value register now.1:05:01
This restore matches this save. And if you're very careful and examine very carefully what goes on, restores and saves are always matched.1:05:13
Now there's an outstanding save, of course, that we have to get rid of soon. And so I restored the value register. Now I restore the continue one, which matches this one,1:05:34
dot, dot, dot, dot, dot, dot, dot, down to here, restoring that continuation. That continuation is a continuation of Fib of N,1:05:46
which is the problem I was trying to solve, a major problem I'm trying to solve. So that's the guy I have to go back to who wants Fib of N. I saved them all the way up here when I realized N was1:05:55
not less than 2. And so I had to do a complicated operation. Now I've got everything I need to do it. So I'm going to restore that, assign to VAL the sum of fetch1:06:17
VAL and fetch of N, and go to continue.1:06:38
So now I've returned from computing Fibonacci of N, the general case.1:06:47
Now what's left is we have to fix up a few details, like there's the base case of this induction, immediate answer,1:07:03
which is nothing more than assign to VAL fetch of N,1:07:13
because N was less than 2, and therefore, the answer is N in our original program, and return continue--1:07:31
bobble, bobble almost-- and finally Fib done.1:07:43
So that's a fairly complicated program. And the reason I wanted you see to that is because I want you to see the particular flavors of stack discipline that I was obeying.1:07:52
It was first of all, I don't want to take anything that I'm not going to need later. I was being very careful.1:08:01
And it's very important. And there are all sorts of other disciplines people make with frames and things like that of some sort, where you1:08:10
save all sorts of junk you're not going to need later and restore it because, in some sense, it's easier to do that. That's going to lead to various disasters, which we'll1:08:19
see a little later. It's crucial to say exactly what you're going to need later. It's an important idea.1:08:29
And the responsibility of that is whoever saves something is the guy who restores it, because he needs it. And in such discipline, you can see what things are1:08:40
unnecessary, operations that are unimportant. Now, one other thing I want to tell you about that's very1:08:49
simple is that, of course, the picture you see is not the whole picture. Supposing I had systems that had things like other1:08:58
operations, CAR, CDR, cons, building a vector and referencing the nth element of it, or things like that.1:09:10
Well, at this level of detail, whatever it is, we can conceptualize those as primitive operations in the datapath. In other words, we could say that some machine that, for1:09:21
example, has the append machine, which has to do cons of the CAR of x with the append of the CDR of x and y, well, gee, that's exactly the same as1:09:31
the factorial structure. Well, it's got about the same structure. And what do we have? We have some sort of things in it which may be registers, x1:09:41
and y, and then x has to somehow move to y sometimes, x has to get the value of y. And then we may have to be able to do something which is a cons.1:09:51
I don't remember if I need to like this is in this system, but cons is sort of like subtract or add or something.1:10:01
It combines two things, producing a thing which is the cons, which we may then think goes into there. And then maybe a thing called the CAR, which will produce--1:10:14
I can get the CAR or something. And maybe I can get the CDR of something, and so on. But we shouldn't be too afraid of saying things this way, because the worst that could happen is if we open up cons,1:10:27
what we're going to find is some machine. And cons may in fact overlap with CAR and CDR, and it always does, in the same way that plus and minus overlap,1:10:38
and really the same business. Cons, CAR, and CDR are going to overlap, and we're going to find a little controller, a little datapath, which may1:10:48
have some registers in it, some stuff like that. And maybe inside it, there may also be an infinite part, a part that's semi-infinite or something, which is a lot of1:10:59
very uniform stuff, which we'll call memory. And I wouldn't be so horrified if that were the way it works.1:11:09
In fact, it does, and we'll talk about that later. So are there any questions?1:11:24
Gee, what an unquestioning audience. Suppose I tell you a horrible pile of lies.1:11:39
OK. Well, thank you. Let's take our break. [MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:00
Lecture 9B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
0:00:15
PROFESSOR: Well, I hope you appreciate that we have inducted you into some real magic, the magic of building0:00:26
languages, really building new languages. What have we looked at? We've looked at an Escher picture language: this0:00:39
language invented by Peter Henderson. We looked at digital logic language.0:00:53
Let's see. We've looked at the query language. And the thing you should realize is, even though these were toy examples, they really are the kernels of really0:01:06
useful things. So, for instance, the Escher picture language was taken by Henry Wu, who's a student at MIT, and developed into a real0:01:17
language for laying out PC boards based just on extending those structures. And the digital logic language, Jerry mentioned when he showed it to you, was really extended to be used as0:01:28
the basis for a simulator that was used to design a real computer. And the query language, of course, is kind of the germ of prologue.0:01:37
So we built all of these languages, they're all based on LISP. A lot of people ask what particular problems is LISP0:01:48
good for solving for? The answer is LISP is not good for solving any particular problems. What LISP is good for is constructing within it the right language to solve the problems you want to0:01:58
solve, and that's how you should think about it. So all of these languages were based on LISP. Now, what's LISP based on?0:02:07
Where's that come from? Well, we looked at that too. We looked at the meta-circular evaluator and said well, LISP0:02:23
is based on LISP. And when we start looking at that, we've got to do some real magic, right? So what does that mean, right? Why operators, and fixed points, and the idea that what0:02:37
this means is that LISP is somehow the fixed-point equation for this funny set of things which are defined in terms of themselves.0:02:47
Now, it's real magic. Well, today, for a final piece of magic, we're going to make all the magic go away.0:03:06
We already know how to do that. The idea is, we're going to take the register machine architecture and show how to implement LISP on terms of that.0:03:15
And, remember, the idea of the register machine is that there's a fixed and finite part of the machine.0:03:24
There's a finite-state controller, which does some particular thing with a particular amount of hardware. There are particular data paths: the operation the machine does.0:03:33
And then, in order to implement recursion and sustain the illusion of infinity, there's some large amount of memory, which is the stack.0:03:42
So, if we implement LISP in terms of a register machine, then everything ought to become, at this point, completely concrete. All the magic should go away.0:03:51
And, by the end of this talk, I want you get the feeling that, as opposed to this very mysterious meta-circular evaluator, that a LISP evaluator really is something0:04:01
that's concrete enough that you can hold in the palm of your hand. You should be able to imagine holding a LISP interpreter there. All right, how are we going to do this?0:04:10
We already have all the ingredients. See, what you learned last time from Jerry is how to take any particular couple of LISP procedures and hand-translate0:04:23
them into something that runs on a register machine. So, to implement all of LISP on a register machine, all we have to do is take the particular procedures that are0:04:34
the meta-circular evaluator and hand-translate them for a register machine. And that does all of LISP, right? So, in principle, we already know how to do this.0:04:45
And, indeed, it's going to be no different, in kind, from translating, say, recursive factorial or recursive Fibonacci.0:04:54
It's just bigger and there's more of it. So it'd just be more details, but nothing really conceptually new. All right, also, when we've done that, and the thing is0:05:03
completely explicit, and we see how to implement LISP in terms of the actual sequential register operations, that's going to be our final most explicit model of0:05:13
LISP in this course. And, remember, that's a progression through this course. We started out with substitution, which is sort of like algebra. And then we went to the environment model, which0:05:22
talked about the actual frames and how they got linked together. And then we made that more concrete in the meta-circular evaluator.0:05:31
There are things the meta-circular evaluator doesn't tell us. You should realize that. For instance, it left unanswered the question of how0:05:40
a procedure, like recursive factorial here , somehow takes space that grows. On the other hand, a procedure which also looks syntactically0:05:51
recursive, called fact-iter, somehow doesn't take space. We justify that it doesn't need to take space by showing0:06:01
the substitution model. But we didn't really say how it happens that the machine manages to do that, that that has to do with the details of how arguments are passed to procedures.0:06:12
And that's the thing we didn't see in the meta-circular evaluator precisely because the way arguments got passed to procedures in this LISP depended on the way arguments0:06:21
got passed to procedures in this LISP. But, now, that's going to become extremely explicit.0:06:30
OK. Well, before going on to the evaluator, let me just give you a sense of what a whole LISP system looks like so you can see the parts we're going to talk about and the parts0:06:39
we're not going to talk about. Let's see, over here is a happy LISP user, and the LISP0:06:49
user is talking to something called the reader.0:07:00
The reader's job in life is to take characters from the user0:07:14
and turn them into data structures in something called a list structure memory.0:07:29
All right, so the reader is going to take symbols, parentheses, and A's and B's, and ones and threes that you type in, and turn these into actual list structure: pairs,0:07:39
and pointers, and things. And so, by the time evaluator is going, there are no characters in the world. And, of course, in more modern list systems, there's sort of0:07:49
a big morass here that might sit between the user and the reader: Windows systems, and top levels, and mice, and all kinds of things. But conceptually, characters are coming in.0:07:59
All right, the reader transforms these into pointers to stuff in this memory, and that's what the0:08:09
evaluator sees, OK? The evaluator has a bunch of helpers.0:08:19
It has all possible primitive operators you might want. So there's a completely separate box, a floating point0:08:29
unit, or all sorts of things, which do the primitive operators. And, if you want more special primitives, you build more0:08:38
primitive operators, but they're separate from the evaluator. The evaluator finally gets an answer and communicates that to the printer.0:08:50
And now, the printer's job in life is to take this list structure coming from the evaluator, and turn it back into characters, and communicate them to the user0:09:03
through whatever interface there is. OK. Well, today, what we're going to talk about is this evaluator.0:09:12
The primitive operators have nothing particular to do with LISP, they're however you like to implement primitive operations. The reader and printer are actually complicated, but0:09:22
we're not going to talk about them. They sort of have to do with details of how you might build up list structure from characters. So that is a long story, but we're not going0:09:31
to talk about it. The list structure memory, we'll talk about next time. So, pretty much, except for the details of reading and printing, the only mystery that's going to be left after0:09:41
you see the evaluator is how you build list structure on conventional memories. But we'll worry about that next time too.0:09:50
OK. Well, let's start talking about the evaluator. The one that we're going to show you, of course, is not, I0:09:59
think, nothing special about it. It's just a particular register machine that runs LISP. And it has seven registers, and here0:10:08
are the seven registers. There's a register, called EXP, and its job is to hold the expression to be evaluated.0:10:18
And by that, I mean it's going to hold a pointer to someplace in list structure memory that holds the expression to be evaluated. There's a register, called ENV, which holds the0:10:29
environment in which this expression is to be evaluated. And, again, I made a pointer. The environment is some data structure.0:10:38
There's a register, called FUN, which will hold the procedure to be applied when you go to apply a procedure. A register, called ARGL, which wants the list0:10:48
of evaluated arguments. What you can start seeing here is the basic structure of the evaluator. Remember how evaluators work. There's a piece that takes expressions and environments,0:10:57
and there's a piece that takes functions, or procedures and arguments. And going back and forth around here is the eval/apply loop.0:11:07
So those are the basic pieces of the eval and apply. Then there's some other things, there's continue. You just saw before how the continue register is used to implement recursion and stack discipline.0:11:19
There's a register that's going to hold the result of some evaluation. And then, besides that, there's one temporary register, called UNEV, which typically, in the evaluator,0:11:29
is going to be used to hold temporary pieces of the expression you're working on, which you haven't gotten around to evaluate yet, right? So there's my machine: a seven-register machine.0:11:40
And, of course, you might want to make a machine with a lot more registers to get better performance, but this is just a tiny, minimal one. Well, how about the data paths?0:11:49
This machine has a lot of special operations for LISP. So, here are some typical data paths.0:12:00
A typical one might be, oh, assign to the VAL register the contents of the EXP register. In terms of those diagrams you saw, that's a little button on0:12:10
some arrow. Here's a more complicated one. It says branch, if the thing in the expression register is a conditional to some label here, called the0:12:21
ev-conditional. And you can imagine this implemented in a lot of different ways. You might imagine this conditional test as a special purpose sub-routine, and conditional might be0:12:32
represented as some data abstraction that you don't care about at this level of detail. So that might be done as a sub-routine. This might be a machine with hardware-types, and0:12:41
conditional might be testing some bits for a particular code. There are all sorts of ways that's beneath the level of abstraction we're looking at.0:12:50
Another kind of operation, and there are a lot of different operations assigned to EXP, the first clause of what's in EXP. This might be part of processing a conditional.0:12:59
And, again, first clause is some selector whose details we don't care about. And you can, again, imagine that as a sub-routine which'll do some list operations, or you can imagine that as0:13:09
something that's built directly into hardware. The reason I keep saying you can imagine it built directly into hardware is even though there are a lot of operations,0:13:18
there are still a fixed number of them. I forget how many, maybe 150. So, it's plausible to think of building these directly into hardware. Here's a more complicated one.0:13:28
You can see this has to do with looking up the values of variables. It says assign to the VAL register the result of looking up the variable value of some particular expression, which,0:13:39
in this case, is supposed to be a variable in some environment. And this'll be some operation that searches through the environment structure, however it is represented, and goes0:13:49
and looks up that variable. And, again, that's below the level of detail that we're thinking about. This has to do with the details of the data structures0:13:58
for representing environments. But, anyway, there is this fixed and finite number of operations in the register machine.0:14:08
Well, what's its overall structure? Those are some typical operations. Remember what we have to do, we have to take the0:14:17
meta-circular evaluator-- and here's a piece of the meta-circular evaluator. This is the one using abstract syntax that's in the book.0:14:28
It's a little bit different from the one that Jerry shows you. And the main thing to remember about the evaluator is that0:14:37
it's doing some sort of case analysis on the kinds of expressions: so if it's either self-evaluated, or quoted, or0:14:46
whatever else. And then, in the general case where the expression it's looking at is an application, there's some tricky recursions going on.0:14:55
First of all, eval has to call itself both to evaluate the operator and to evaluate all the operands.0:15:05
So there's this sort of red recursion of values walking down the tree that's really the easy recursion. That's just a val walking down this tree of expressions.0:15:14
Then, in the evaluator, there's a hard recursion. There's the red to green. Eval calls apply. That's the case where evaluating a procedure or0:15:26
argument reduces to applying the procedure to the list of arguments. And then, apply comes over here. Apply takes a procedure and arguments and, in the general0:15:39
case where there's a compound procedure, apply goes around and green calls red. Apply comes around and calls eval again.0:15:48
Eval's the body of the procedure in the result of extending the environment with the parameters of the procedure by binding the arguments.0:15:59
Except in the primitive case, where it just calls something else primitive-apply, which is not really the business of the evaluator. So this sort of red to green, to red to green, that's the0:16:11
eval/apply loop, and that's the thing that we're going to want to see in the evaluator. All right. Well, it won't surprise you at all that the two big pieces of0:16:22
this evaluator correspond to eval and apply. There's a piece called eval-dispatch, and a piece called apply-dispatch.0:16:32
And, before we get into the details of the code, the way to understand this is to think, again, in terms of these pieces of the evaluator having contracts with the rest of the world.0:16:41
What do they do from the outside before getting into the grungy details? Well, the contract for eval-dispatch--0:16:50
remember, it corresponds to eval. It's got to evaluate an expression in an environment. So, in particular, what this one is going to do, eval-dispatch will assume that, when you call it, that0:16:59
the expression you want to evaluate is in the EXP register. The environment in which you want the evaluation to take place is in the ENV register.0:17:09
And continue tells you the place where the machine should go next when the evaluation is done. Eval-dispatch's contract is that it'll actually perform0:17:20
that evaluation, and, at the end of which, it'll end up at the place specified by continue. The result of the evaluation will be in the VAL register.0:17:29
And it just warns you, it makes no promises about what happens to the registers. All other registers might be destroyed. So, there's one piece, OK?0:17:41
Together, the pieces, apply-dispatch that corresponds to apply, it's got to apply a procedure to some arguments, so it assumes that this register, ARGL, contains0:17:52
a list of the evaluated arguments. FUN contains the procedure. Those correspond to the arguments to the apply procedure in the meta-circular evaluator.0:18:03
And apply, in this particular evaluator, we're going to use a discipline which says the place the machine should go to next when apply is done is, at the moment apply-dispatch is0:18:14
called at the top of the stack, that's just discipline for the way this particular machine's organized. And now apply's contract is given all that.0:18:23
It'll perform the application. The result of that application will end up in VAL. The stack will be popped. And, again, the contents of all the other registers may be0:18:33
destroyed, all right? So that's the basic organization of this machine. Let's break for a little bit and see if there are any questions, and then we'll do a real example.0:19:47
Well, let's take the register machine now, and actually step through, and really, in real detail, so you see completely0:19:57
concrete how some expressions are evaluated, all right? So, let's start with a very simple expression.0:20:09
Let's evaluate the expression 1.0:20:18
And we need an environment, so let's imagine that somewhere there's an environment, we'll call it E,0.0:20:30
And just, since we'll use these later, we obviously don't really need anything to evaluate 1. But, just for reference later, let's assume that E,0 has in0:20:40
it an X that's bound to 3 and a Y that's bound to 4, OK?0:20:49
And now what we're going to do is we're going to evaluate 1 in this environment, and so the ENV register has a pointer0:20:59
to this environment, E,0, all right? So let's watch that thing go. What I'm going to do is step through the code.0:21:08
And, let's see, I'll be the controller. And now what I need, since this gets rather complicated, is a very little execution unit. So here's the execution unit, OK?0:21:22
OK. OK. All right, now we're going to start. We're going to start the machine at0:21:31
eval-dispatch, right? That's the beginning of this. Eval-dispatch is going to look at the expression in dispatch, just like eval where we look at the very first thing.0:21:42
We branch on whether or not this expression is self-evaluating. Self-evaluating is some abstraction we put into the machine--0:21:52
it's going to be true for numbers-- to a place called ev-self-eval, right? So me, being the controller, looks at ev-self-eval, so we'll go over to there.0:22:02
Ev-self-eval says fine, assign to val whatever is in the expression unit, OK?0:22:15
And I have a bug because what I didn't do when I initialized this machine is also say what's supposed to happen when it's done, so I should have started out the machine with0:22:27
done being in the continue register, OK? So we assign to VAL. And now go to fetch of continue, and [? the value changed. ?]0:22:38
OK. OK, let's try something harder. Let's reset the machine here, and we'll put in the0:22:47
expression register, X, OK?0:22:56
Start again at eval-dispatch. Check, is it self-evaluating? No. Is it a variable? Yes.0:23:05
We go off to ev-variable. It says assign to VAL, look up the variable value in the0:23:14
expression register, OK? Go to fetch of continue.0:23:23
PROFESSOR: Done. PROFESSOR: OK. All right. Well, that's the basic idea. That's a simple operation of the machine.0:23:32
Now, let's actually do something a little bit more interesting. Let's look at the expression the sum of x and y.0:23:49
OK. And now we'll see how you start unrolling these expression trees, OK? Well, start again at eval-dispatch, all right?0:24:04
Self-evaluating? No. Variable? No. All the other special forms which I didn't write down, like quote, and lambda, and set, and whatever, it's none of those.0:24:13
It turns out to be an application, so we go off to ev-application, OK? Ev-application, remember what it's going to do overall.0:24:25
It is going to evaluate the operator. It's going to evaluate the arguments, and then it's going to go apply them.0:24:35
So, before we start, since we're being very literal, we'd better remember that, somewhere in this environment, it's linked to another environment in which plus is0:24:46
bound to the primitive procedure plus before we get an unknown variable in our machine.0:24:55
OK, so we're at ev-application. OK, assign to UNEV the operands of what's in the0:25:05
expression register, OK? Those are the operands. UNEV's a temporary register where we're going to save them. PROFESSOR: I'm assigning. PROFESSOR: Assign to x the operator.0:25:18
Now, notice we've destroyed that expression in x, but the piece that we need is now in UNEV. OK. Now, we're going to get set up to recursively0:25:27
evaluate the operator. Save the continue register on the stack. Save the environment.0:25:40
Save UNEV. OK, assign to continue a0:25:53
label called eval-args. Now, what have we done? We've set up for a recursive call.0:26:04
We're about to go to eval-dispatch. We've set up for a recursive call to eval-dispatch. What did we do? We took the things we're going to need later, those operands0:26:15
that were in UNEV; the environment in which we're going to eventually have to, maybe, evaluate those operands; the place we eventually want to go to, which, in this case, was done; we've saved them on the stack.0:26:27
The reason we saved them on the stack is because eval-dispatch makes no promises about what registers it may destroy. So all that stuff is saved on the stack. Now, we've set up eval-dispatch's contract.0:26:37
There's a new expression, which is the operator plus; a new environment, although, in this case, it's the same one; and a new place to go to when you're done, which is eval-args.0:26:47
So that's set up. Now, we're going to go off to eval-dispatch. Here we are back at eval-dispatch. It's not self-evaluating. Oh, it's a variable, so we'd better go off to0:26:57
ev-variable, right? Ev-variable is assigned to VAL. Look up the variable value of the expression, OK?0:27:08
So VAL is the primitive procedure plus, OK? And go to fetch of continue. PROFESSOR: Eval-args. PROFESSOR: Right, which is now eval-args not done.0:27:19
So we come back here at eval-args, and what do we do? We're going to restore the stuff that we saved, so we restore UNEV. And notice, there, it wasn't necessary,0:27:31
although, in general, it would be. It might be some arbitrary evaluation that happened. We restore ENV. OK, we assign to FUN fetch of VAL.0:27:58
OK, now, we're going to go off and start evaluating some arguments. Well, first thing we'd better do is save FUN because some0:28:08
arbitrary stuff might happen in that evaluation. We initialize the argument list. Assign to argl an empty0:28:18
argument list, and go to eval-arg-loop, OK? At eval-arg-loop, the idea of this is we're going to0:28:29
evaluate the pieces of the expressions that are in UNEV, one by one, and move them from unevaluated in UNEV to evaluated in the arg list, OK?0:28:38
So we save argl. We assign to x the first operand of the stuff in UNEV.0:28:53
Now, we check and see if that was the last operand. In this case, it is not, all right? So we save the environment.0:29:09
We save UNEV because those are all things we might need later. We're going to need the environment to do some more evaluations. We're going to need UNEV to look at what the rest of those0:29:18
arguments were. We're going to assign continue a place called accumulate-args, or accumulate-arg.0:29:30
OK, now, we've set up for another call to eval-dispatch, OK? All right, now, let me short-circuit this so we don't0:29:39
go through the details of eval-dispatch. Eval-dispatch's contract says I'm going to end up, the world will end up, with the value of evaluating this expression in0:29:48
this environment in the VAL register, and I'll end up there. So we short-circuit all of this, and a 3 ends up in VAL.0:29:58
And, when we return from eval-dispatch, we're going to return to accumulate-arg. PROFESSOR: Accumulate-arg. PROFESSOR: With 3 in the VAL register, OK?0:30:08
So that short-circuited that evaluation. Now, what do we do? We're going to go back and look at the rest of the arguments, so we restore UNEV. We restore0:30:18
ENV. We restore argl.0:30:28
One thing. PROFESSOR: Oops! Parity error. [LAUGHTER] PROFESSOR: Restore argl.0:30:41
PROFESSOR: OK. OK, we assign to argl consing on fetch of the value register0:30:51
to what's in argl. OK, we assign to UNEV the rest of the operands in fetch of0:31:04
UNEV, and we go back to eval-arg-loop. PROFESSOR: Eval-arg-loop. PROFESSOR: OK.0:31:15
Now, we're about to do the next argument, so the first thing we do is save argl.0:31:25
OK, we assign to x the first operand of fetch of UNEV. OK,0:31:35
we test and see if that's the last operand. In this case, it is, so we're going to go to a special place that says evaluate the last argument because, notice, after evaluating the argument, we don't need the0:31:45
environment any more. That's going to be the difference. So here, at eval-last-arg, which is assigned to accumulate-last-arg, now, we're set up again for0:32:06
eval-dispatch. We've got a place to go to when we're done. We've got an expression. We've got an environment. OK, so we'll short-circuit the call to eval-dispatch. And what'll happen is there's a y there, it's 4 in that0:32:18
environment, so VAL will end up with 4 in it. And, then, we're going to end up at accumulate-last-arg, OK? So, at accumulate-last-arg, we restore argl.0:32:41
We assign to argl cons of fetch of the new value onto it, so we cons a 4 onto that. We restore what was saved in the function register.0:32:53
And notice, in this case, it had not been destroyed, but, in general, it will be. And now, we're ready to go off to apply-dispatch, all right?0:33:02
So we've just gone through the eval. We evaluated the argument, the operator, and the arguments, and now, we're about to apply them. So we come off to apply-dispatch here, OK?0:33:17
We come off to apply-dispatch, and we're going to check whether it's a primitive or a compound procedure. PROFESSOR: Yes. PROFESSOR: All right. So, in this case, it's a primitive procedure, and we go0:33:27
off to primitive-apply. So we go off to primitive-apply, and it says assign to VAL the result of applying primitive procedure0:33:38
of the function to the argument list. PROFESSOR: I don't know how to add. I'm just an execution unit. PROFESSOR: Well, I don't know how to add either. I'm just the evaluator, so we need a primitive operator.0:33:48
Let's see, so the primitive operator, what's the sum of 3 and 4? AUDIENCE: 7. PROFESSOR: OK, 7. PROFESSOR: Thank you.0:33:58
PROFESSOR: Now, we restore continue, and we go to fetch0:34:12
of continue. PROFESSOR: Done. PROFESSOR: OK. Well, that was in as much detail as you will ever see. We'll never do it in as much detail again.0:34:21
One very important thing to notice is that we just executed a recursive procedure, right? This whole thing, we used a stack and the0:34:31
evaluator was recursive. A lot of people think the reason that you need a stack and recursion in an evaluator is because you might be evaluating recursive procedures like0:34:40
factorial or Fibonacci. It's not true. So you notice we did recursion here, and all we evaluated was plus X, Y, all right? The reason that you need recursion in the evaluator is0:34:51
because the evaluation process, itself, is recursive, all right? It's not because the procedure that you might be evaluating in LISP is a recursive procedure. So that's an important thing that people get0:35:01
confused about a lot. The other thing to notice is that, when we're done here, we're really done. Not only are we at done, but there's no accumulated stuff0:35:12
on the stack, right? The machine is back to its initial state, all right? So that's part of what it means to be done. Another way to say that is the evaluation process has reduced0:35:26
the expression, plus X, Y, to the value here, 7. And by reduced, I mean a very particular thing.0:35:36
It means that there's nothing left on the stack. The machine is now in the same state, except there's something in the value register. It's not part of a sub-problem of anything. There's nothing to go back to.0:35:46
OK. Let's break. Question? AUDIENCE: The question here, in the stack, is because the data may be recursive.0:35:55
You may have embedded expressions, for instance. PROFESSOR: Yes, because you might have embedded expressions. But, again, don't confuse that with what people sometimes0:36:06
mean by the data may be recursive, which is to say you have these list-structured, recursive data list operations. That has nothing to do with it. It's simply that the expressions contain0:36:15
sub-expressions. Yeah? AUDIENCE: Why is it that the order of the arguments in the arg list got reversed? PROFESSOR: Ah! Yes, I should've mentioned that.0:36:27
Here, the reason the order is reversed-- it's a question of what you mean by reversed.0:36:36
I believe it was Newton. In the very early part of optics, people realized that, when you look through the lens of your eye, the image was0:36:46
up-side down. And there was a lot of argument about why that didn't mean you saw things up-side down. So it's sort of the same issue. Reversed from what? So we just need some convention.0:36:57
The reason that they're coming at 4, 3 is because we're taking UNEV and consing the result onto argl. So you have to realize you've made that convention.0:37:06
The place that you have to realize that-- well, there's actually two places. One is in apply-primitive-operator, which has to realize that the arguments to primitives go in,0:37:16
in the opposite order from the way you're writing them down. And the other one is, we'll see later when you actually go to bind a function's parameters, you should realize the arguments are going to come in from the opposite0:37:26
order of the variables to which you're binding them. So, if you just keep track of that, there's no problem. Also, this is completely arbitrary because, if we'd done, say, an iteration through a vector assigning0:37:36
them, they might come out in the other order, OK? So it's just a convention of the way this particular evaluator works.0:37:45
All right, let's take a break.0:38:41
We just saw evaluating an expression and, of course, that was very simple one. But, in essence, it would be no different if it was some0:38:51
big nested expression, so there would just be deeper recursion on the stack. But what I want to do now is show you the last piece. I want to walk you around this eval and apply loop, right?0:39:01
That's the thing we haven't seen, really. We haven't seen any compound procedures where applying a procedure reduces to evaluating the body of the0:39:11
procedure, so let's just suppose we had this. Suppose we were looking at the procedure define F of A and B0:39:29
to be the sum of A and B. So, as we typed in that procedure previously, and now we're going to evaluate F of X and0:39:41
Y, again, in this environment, E,0, where X is bound to 3 and Y is bound to 4.0:39:50
When the defined is executed, remember, there's a lambda here, and lambdas create procedures. And, basically, what will happen is, in E,0, we'll end0:40:01
up with a binding for F, which will say F is a procedure, and its args are A and B, and its body is plus a,b.0:40:18
So that's what the environment would have looked like had we made that definition. Then, when we go to evaluate F of X and Y, we'll go through0:40:29
exactly the same process that we did before. It's even the same expression. The only difference is that F, instead of having primitive plus in it, will have this thing.0:40:41
And so we'll go through exactly the same process, except this time, when we end up at apply-dispatch, the function register, instead of having primitive plus, will0:40:50
have a thing that will represent it saying procedure, where the args are A and B, and the body is plus A, B.0:41:08
And, again, what I mean, by its ENV, I mean there's a pointer to it, so don't worry that I'm writing a lot of stuff there. There's a pointer to this procedure data structure.0:41:17
OK, so, we're in exactly the same situation. We get to apply-dispatch, so, here, we come to apply-dispatch.0:41:26
Last time, we branched off to a primitive procedure. Here, it says oh, we now have a compound procedure, so we're going to go off to compound-apply.0:41:38
Now, what's compound-apply? Well, remember what the meta-circular evaluator did? Compound-apply said we're going to evaluate the body of0:41:50
the procedure in some new environment. Where does that new environment come from? We take the environment that was packaged with the0:42:00
procedure, we bind the parameters of the procedure to the arguments that we're passing in, and use that as a0:42:10
new frame to extend the procedure environment. And that's the environment in which we evaluate the procedure body, right?0:42:21
That's going around the apply/eval loop. That's apply coming back to call eval, all right?0:42:30
OK. So, now, that's all we have to do in compound-apply. What are we going to do? We're going to manufacture a new environment.0:42:43
And we're going to manufacture a new environment, let's see, that we'll call E,1.0:42:53
E,1 is going to be some environment where the parameters of the procedure, where A is bound to 3 and B is0:43:02
bound to 4, and it's linked to E,0 because that's where f is defined. And, in this environment, we're going to evaluate the0:43:11
body of the procedure. So let's look at that, all right? All right, here we are at compound-apply, which says0:43:20
assign to the expression register the body of the procedure that's in the function register. So I assign to the expression register the0:43:31
procedure body, OK?0:43:42
That's going to be evaluated in an environment which is formed by making some bindings using information determined0:43:53
by the procedure-- that's what's in FUN-- and the argument list. And let's not worry about exactly what that does, but you can see the information's there. So make bindings will say oh, the procedure, itself, had an0:44:06
environment attached to it. I didn't write that quite here. I should've said in environment because every procedure gets built with an environment. So, from that environment, it knows what the procedure's0:44:17
definition environment is. It knows what the arguments are. It looks at argl, and then you see a reversal convention here. It just has to know that argl is reversed, and it builds0:44:27
this frame, E,1. All right, so, let's assume that that's what make bindings returns, so it assigns to ENV this thing, E,1.0:44:41
All right, the next thing it says is restore continue. Remember what continue was here? It got put up in the last segment.0:44:52
Continue got stored. That was the original done, which said what are you going to do after you're done with this particular application? It was one of the very first things that happened when we0:45:01
evaluated the application. And now, finally, we're going to restore continue. Remember apply-dispatch's contract. It assumes that where it should go to next was on the0:45:11
stack, and there it was on the stack. Continue has done, and now we're going to go back to eval-dispatch. We're set up again.0:45:20
We have an expression, an environment, and a place to go to. We're not going to go through that because it's sort of the same expression.0:45:35
OK, but the thing, again, to notice is, at this point, we have reduced the original expression, F,X,Y, right?0:45:44
We've reduced evaluating F,X,Y in environment E,0 to evaluate plus A, B in E,1. And notice, nothing's on the stack, right?0:45:55
It's a reduction. At this point, the machine does not contain, as part of its state, the fact that it's in the middle of evaluating some procedure called f, that's gone, right?0:46:08
There's no accumulated state, OK? Again, that's a very important idea. That's the meaning of, when we used to write in the0:46:17
substitution model, this expression reduces to that expression. And you don't have to remember anything. And here, you see the meaning of reduction. At this point, there is nothing on the stack.0:46:31
See, that has very important consequences. Let's go back and look at iterative factorial, all right?0:46:40
Remember, this was some sort of loop and doing iter. And we kept saying that's an iterative procedure, right?0:46:52
And what we wrote, remember, are things like, we said,0:47:04
fact-iter of 5. We wrote things like reduces to iter of 1, and 1, and 5,0:47:19
which reduces to iter of 1, and 2, and 5, and so on, and so on, and so on. And we kept saying well, look, you don't have to build up any0:47:29
storage to do that. And we waved our hands, and said in principle, there's no storage needed. Now, you see no storage needed. Each of these is a real reduction, right?0:47:49
As you walk through these expressions, what you'll see are these expressions on the stack in some particular environment, and then these expressions in the EXP0:48:00
register in some particular environment. And, at each point, there'll be no accumulated stuff on the stack because each one's a real reduction, OK?0:48:09
All right, so, for example, just to go through it in a little bit more care, if I start out with an expression that says something like, oh, say, fact-iter of 5 in some0:48:33
environment that will, at some point, create an environment0:48:46
in which n is down to 5. Let's call that--0:48:55
And, at some point, the machine will reduce this whole thing to a thing that says that's really iter of 1, and0:49:08
1, and n, evaluated in this environment, E,1 with nothing on the stack.0:49:17
See, at this moment, the machine is not remembering that evaluating this expression, iter-- which is the loop-- is part of this thing0:49:27
called iterative factorial. It's not remembering that. It's just reducing the expression to that, right? If we look again at the body of iterative factorial, this0:49:38
expression has reduced to that expression. Oh, I shouldn't have the n there. It's a slightly different convention from the slide to0:49:48
the program, OK? And, then, what's the body of iter? Well, iter's going to be an it, and I won't go through the0:49:59
details of if. It'll evaluate the predicate. In this case, it'll be false. And this iter will now reduce to the expression iter of0:50:14
whatever it says, star, counter product, and-- what does it say-- plus counter 1 in some other environment, by this time,0:50:30
E,2, where E,2 will be set up having bindings for product and counter, right?0:50:43
And it'll reduce to that, right? It won't be remembering that it's part of something that it has to return to. And when iter calls iter again, it'll reduce to another thing that looks like this in some environment, E,3, which0:50:55
has new bindings for product and counter. So, if you're wondering, see, if you've always been queasy0:51:08
about how it is we've been saying those procedures, that look syntactically recursive, are, in fact, iterative, run in constant space, well, I don't know if this makes you0:51:19
less queasy, but at least it shows you what's happening. There really isn't any buildup there. Now, you might ask well, is there buildup in principle in0:51:28
these environment frames? And the answer is yeah, you have to make these new environment frames, but you don't have to hang onto them when you're done. They can be garbage collected, or the space can be reused0:51:39
automatically. But you see the control structure of the evaluator is really using this idea that you actually have a reduction, so these procedures really are iterative procedures.0:51:50
All right, let's stop for questions.0:52:02
All right, let's break.0:52:48
Let me contrast the iterative procedure just so you'll see where space does build up with a recursive procedure, so you can see the difference.0:52:58
Let's look at the evaluation of recursive factorial, all right? So, here's fact-recursive, or standard factorial definition.0:53:07
We said this one is still a recursive procedure, but this is actually a recursive process. And then, just to link it back to the way we started, we said0:53:17
oh, you can see that it's going to be recursive process by the substitution model because, if I say recursive factorial of 5, that turns into 5 times--0:53:36
what is it, fact-rec, or record fact-- 5 times recursive factorial of 4, which turns into 5 times 40:53:54
times fact-rec of 3, which returns into 5 times 4 times 30:54:08
times, and so on, right? The idea is there was this chain of stuff building up,0:54:18
which justified, in the substitution model, the fact that it's recursive. And now, let's actually see that chain of stuff build up and where it is in the machine, OK?0:54:27
All right, well, let's imagine we're going to start out again. We'll tell it to evaluate recursive factorial of 5 in0:54:41
some environment, again, E,0 where recursive factorial is defined, OK? Well, now we know what's eventually going to happen.0:54:52
This is going to come along, it'll evaluate those things, figure out it's a procedure, build somewhere over here an environment, E,1, which has n bound to 5, which hangs off of0:55:05
E,0, which would be, presumably, the definition environment of recursive factorial, OK?0:55:14
And, in this environment, it's going to go off and evaluate the body. So, again, the evaluation here will reduce to evaluating the0:55:27
body in E,1. That's going to look at an if, and I won't go through the details of if. It'll look at the predicate. It'll decide it eventually has to evaluate the alternative.0:55:37
So this whole thing, again, will reduce to the alternative of recursive factorial, the alternative clause, which says0:55:47
that this whole thing reduces to times n of recursive0:55:56
factorial of n minus 1 in the environment E,1, OK?0:56:08
So the original expression, now, is going to reduce to evaluating that expression, all right? Now we have an application. We did an application before.0:56:18
Remember what happens in an application? The first thing you do is you go off and you save the value of the continue register on the stack. So the stack here is going to have done in it.0:56:29
And then you're going to set up to evaluate the sub-parts, OK? So here we go off to evaluate the sub-parts.0:56:39
First thing we're going to do is evaluate the operator. What happens when we evaluate an operator? Well, we arrange things so that the operator ends up in0:56:49
the expression register. The environments in the ENV register continue someplace where we're going to go evaluate the arguments. And, on the stack, we've saved the original continue, which0:56:59
is where we wanted to be when we're all done. And then the things we needed when we're going to get done evaluating the operator, the things we'll need to evaluate the arguments, namely, the environment and those0:57:11
arguments, those unevaluated arguments, so there they are sitting on the stack. And we're about to go off to evaluate the operator.0:57:23
Well, when we return from this particular call-- so we're about to call eval-dispatch here-- when we return from this call, the value of that operator,0:57:32
which, in this case, is going to be the primitive multiplier procedure, will end up in the FUN register, all right?0:57:43
We're going to evaluate some arguments. They will evaluate in here. That'll give us 5, in this case. We're going to put that in the argl register, and then we'll0:57:53
go off to evaluate the second operand. So, at the point where we go off to evaluate the second operand-- and I'll skip details like computing, and0:58:02
minus 1, and all of that-- but, when we go off to evaluate the second operand, that will eventually reduce to another call to fact-recursive.0:58:12
And, what we've got on the stack here is the operator from that combination that we're going to use it in and the other argument, OK?0:58:23
So, now, we're set up for another call to recursive factorial. And, when we're done with this one, we're going to go to0:58:32
accumulate the last arg. And remember what that'll do? That'll say oh, whatever the result of this has to get combined with that, and we're going to multiply them.0:58:41
But, notice now, we're at another recursive factorial. We're about to call eval-dispatch again, except we haven't really reduced it because there's stuff0:58:51
on the stack now. The stuff on the stack says oh, when you get back, you'd better multiply it by the 5 you had hanging there. So, when we go off to make another call, we0:59:07
evaluate the n minus 1. That gives us another environment in which the new n's going to be down to 4. And we're about to call eval-dispatch again, right?0:59:18
We get another call. That 4 is going to end up in the same situation. We'll end up with another call to fact-recursive n.0:59:30
And sitting on the stack will be the stuff from the original one and, now, the subsidiary one we're doing. And both of them are waiting for the same thing. They're going to go to accumulate a last argument.0:59:40
And then, of course, when we go to the fourth call, the same thing happens, right? And this goes on, and on, and on. And what you see here on the stack, exactly what's sitting0:59:51
here on the stack, the thing that says times and 5. And what you're going to do with that is accumulate that into a last argument.1:00:00
That's exactly this, right? This is exactly where that stuff is hanging. Effectively, the operator you're going to apply, the1:00:12
other argument that it's got to be multiplied by when you get back and the parentheses, which says yeah, what you wanted to do was accumulate them. So, you see, the substitution model is not such a lie.1:00:22
That really is, in some sense, what's sitting right on the stack. OK. All right, so that, in some sense, should explain for you,1:00:33
or at least convince you, that, somehow, this evaluator is managing to take these procedures and execute some of them iteratively and some of them recursively, even though,1:00:46
as syntactically, they look like recursive procedures. How's it managing to do that? Well, the basic reason it's managing to do that is the evaluator is set up to save only what it needs later.1:01:01
So, for example, at the point where you've reduced evaluating an expression and an environment to applying a procedure to some arguments, it doesn't need that original1:01:11
environment anymore because any environment stuff will be packaged inside the procedures where the application's going to happen.1:01:20
All right, similarly, when you're going along evaluating an argument list, when you've finished evaluating the list, when you're finished evaluating the last argument, you don't need that argument list any more, right?1:01:31
And you don't need the environment where those arguments would be evaluated, OK? So the basic reason that this interpreter is being so smart1:01:40
is that it's not being smart at all, it's being stupid. It's just saying I'm only going to save what I really need. Well, let me show you here.1:01:54
Here's the actual thing that's making a tail recursive. Remember, it's the restore of continue. It's saying when I go off to evaluate the procedure body, I1:02:09
should tell eval to come back to the place where that original evaluation was supposed to come back to. So, in some sense, you want to say what's the actual line that makes a tail recursive?1:02:18
It's that one. If I wanted to build a non-tail recursive evaluator, for some strange reason, all I would need to do is, instead1:02:27
of restoring continue at this point, I'd set up a label down here called, "Where to come back after you've finished applying the procedure." Instead, I'd1:02:38
set continue to that. I'd go to eval-dispatch, and then eval-dispatch would come back here. At that point, I would restore continue and go to the original one.1:02:47
So here, the only consequence of that would be to make it non-tail recursive. It would give you exactly the same answers, except, if you did that iterative factorial and all those iterative1:02:57
procedures, it would execute recursively. Well, I lied to you a little bit, but just a little bit, because I showed you a slightly over-simplified1:03:07
evaluator where it assumes that each procedure body has only one expression. Remember, in general, a procedure has a sequence of expressions in it.1:03:17
So there's nothing really conceptually new. Let me just show you the actual evaluator that handles sequences of expressions.1:03:28
This is compound-apply now, and the only difference from the old one is that, instead of going off to eval directly, it takes the whole body of the procedure, which, in this1:03:38
case, is a sequence of expressions, and goes off to eval-sequence. And eval-sequence is a little loop that, basically, does1:03:48
these evaluations one at a time. So it does an evaluation. Says oh, when I come back, I'd better come back here to do the next one.1:03:58
And, when I'm all done, when I want to get the last expression, I just restore my continue and go off to eval-dispatch. And, again, if you wanted for some reason to break tail1:04:08
recursion in this evaluator, all you need to do is not handle the last expression, especially. Just say, after you've done the last expression, come back1:04:17
to some other place after which you restore continue. And, for some reason, a lot of LISP evaluators tended to work that way.1:04:26
And the only consequence of that is that iterative procedures built up stack. And it's not clear why that happened.1:04:35
All right. Well, let me just sort of summarize, since this is a lot of details in a big program. But the main point is that it's no different,1:04:44
conceptually, from translating any other program. And the main idea is that we have this universal evaluator program, the meta-circular evaluator. If we translate that into LISP, then1:04:53
we have all of LISP. And that's all we did, OK? The second point is that the magic's gone away. There should be no more magic in this whole system, right?1:05:04
In principle, it should all be very clear except, maybe, for how list structured memory works, and we'll see that later. But that's not very hard.1:05:15
The third point is that all this tail recursion came from the discipline of eval being very careful to save only what it needs next time.1:05:25
It's not some arbitrary thing where we're saying well, whenever we call a sub-routine, we'll save all the registers in the world and come back, right? See, sometimes it pays to really worry about efficiency.1:05:37
And, when you're down in the guts of your evaluator machine, it really pays to think about things like that because it makes big consequences. Well, I hope what this has done is really made the1:05:49
evaluator seem concrete, right? I hope you really believe that somebody could hold a LISP evaluator in the palm of their hand.1:05:59
Maybe to help you believe that, here's a LISP evaluator that I'm holding the palm of my hand, right? And this is a chip which is actually quite a bit more1:06:11
complicated than the evaluator I showed you. Maybe, here's a better picture of it.1:06:22
What there is, is you can see the same overall structure. This is a register array. These are the data paths. Here's a finite state controller. And again, finite state, that's all there is.1:06:32
And somewhere there's external memory that'll worry about things. And this particular one is very complicated because it's trying to run LISP fast. And it has some very, very fast1:06:41
parallel operations in there like, if you want to index into an array, simultaneously check that the index is an1:06:50
integer, check that it doesn't exceed the array bands, and go off and do the memory access, and do all those things simultaneously. And then, later, if they're all OK, actually get the value there.1:07:00
So there are a lot of complicated operations in these data paths for making LISP run in parallel. It's a completely non-risk philosophy of evaluating LISP.1:07:10
And then, this microcode is pretty complicated. Let's see, there's what? There's about 389 instructions of 220-bit microcode sitting1:07:23
here because these are very complicated data paths. And the whole thing has about 89,000 transistors, OK?1:07:33
OK. Well, I hope that that takes away a lot of the mystery. Maybe somebody wants to look at this.1:07:42
Yeah. OK. Let's stop.1:07:55
Questions? AUDIENCE: OK, now, it sounds like what you're saying is that, with the restore continue put in the proper place, that procedures that would invoke a recursive1:08:08
process now invoke an integer process just by the way that the eval signature is? PROFESSOR: I think the way I'd prefer to put it is that, with1:08:17
restore continue put in the wrong place, you can cause any syntactically-looking recursive procedure, in fact, to build up stack as it runs.1:08:28
But there's no reason for that, so you might want to play around with it. You can just switch around two or three instructions in the1:08:38
way compound-apply comes back, and you'll get something which isn't tail recursive. But the thing I wanted to emphasize is there's no magic.1:08:47
It's not as if there's some very clever pre-processing program that's looking at this procedure, factorial iter, and say oh, gee, I really notice that I don't have to push1:08:59
stack in order to do this. Some people think that that's what's going on. It's something much, much more dumb than that, it's this one place you're putting the restore instruction.1:09:08
It's just automatic. AUDIENCE: OK. AUDIENCE: But that's not affecting the time complexity is it?1:09:17
PROFESSOR: No. AUDIENCE: It's just that it's handling it recursively instead of iteratively. But, in terms of the order of time it takes to finish the1:09:26
operation, it's the same one way or the other, right? PROFESSOR: Yes. Tail recursion is not going to change the time complexity of anything because, in some sense, it's the same algorithm that's going on.1:09:36
What it's doing is really making this thing run as an iteration, right? Not going to run out of memory counting up to a giant number simply because the stack would get pushed.1:09:47
See, the thing you really have to believe is that, when we write-- see, we've been writing all these things called iterations, infinite loops, define loop to be called loop.1:10:01
That's is as much an iteration as if we wrote do forever loop, right? It's just syntactic sugar as the difference. These things are real, honest to god, iterations, right?1:10:14
They don't change the time complexity, but they turn them into real iterations. All right, thank you.0:00:00
Lecture 10A | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC PLAYING]0:00:20
PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between all these high-level languages like Lisp and the query0:00:30
language and all of that stuff, bridged the gap between that and a conventional register machine. And in fact, you can think of the explicit control evaluator0:00:40
either as, say, the code for a Lisp interpreter if you wanted to implement it in the assembly language of some conventional register transfer machine, or, if you like, you0:00:50
can think of it as the microcode of some machine that's going to be specially designed to run Lisp. In either case, what we're doing is we're taking a machine that speaks some low-level language, and we're0:01:01
raising the machine to a high-level language like Lisp by writing an interpreter. So for instance, here, conceptually, is a special0:01:21
purpose machine for computing factorials. It takes in five and puts out 120. And what this special purpose machine is is actually a Lisp0:01:32
interpreter that's configured itself to run factorials, because you fit into it a description of the factorial machine.0:01:42
So that's what an interpreter is. It configures itself to emulate a machine whose description you read in. Now, inside the Lisp interpreter, what's that?0:01:52
Well, that might be your general register language interpreter that configures itself to behave like a Lisp interpreter, because you put in a whole bunch of0:02:01
instructions in register language. This is the explicit control evaluator. And then it also has some sort of library, a library of primitive operators and Lisp operations and all sorts of0:02:11
things like that. That's the general strategy of interpretation. And the point is, what we're doing is we're writing an interpreter to raise the machine to the level of the0:02:24
programs that we want to write. Well, there's another strategy, a different one, which is compilation. Compilation's a little bit different. Here--here we might have produced a special purpose0:02:37
machine for, for computing factorials, starting with some sort of machine that speaks register language, except0:02:46
we're going to do a different strategy. We take our factorial program. We use that as the source code into a compiler. What the compiler will do is translate that factorial0:02:57
program into some register machine language. And this will now be not the explicit control evaluator for Lisp, this will be some register language for computing factorials.0:03:06
So this is the translation of that. That will go into some sort of loader which will combine this code with code selected from the library to do things like0:03:17
primitive multiplication. And then we'll produce a load module which configures the register language machine to be a special purpose factorial machine.0:03:28
So that's a, that's a different strategy. In interpretation, we're raising the machine to the level of our language, like Lisp. In compilation, we're taking our program and lowering it to0:03:38
the language that's spoken by the machine. Well, how do these two strategies compare? The compiler can produce code that will execute more0:03:48
efficiently. The essential reason for that is that if you think about the register operations that are running, the interpreter has0:04:02
to produce register operations which, in principle, are going to be general enough to execute any Lisp procedure. Whereas the compiler only has to worry about producing a0:04:12
special bunch of register operations for, for doing the particular Lisp procedure that you've compiled. Or another way to say that is that the interpreter is a0:04:23
general purpose simulator, that when you read in a Lisp procedure, then those can simulate the program described by that, by that procedure. So the interpreter is worrying about making a general purpose0:04:33
simulator, whereas the compiler, in effect, is configuring the thing to be the machine that the interpreter would have been simulating. So the compiler can be faster.0:04:52
On the other hand, the interpreter is a nicer environment for debugging. And the reason for that is that we've got the source code0:05:02
actually there. We're interpreting it. That's what we're working with. And we also have the library around. See, the interpreter--the library sitting there is part of the interpreter.0:05:11
The compiler only pulls out from the library what it needs to run the program. So if you're in the middle of debugging, and you might like to write a little extra program to examine some run0:05:21
time data structure or to produce some computation that you didn't think of when you wrote the program, the interpreter can do that perfectly well, whereas the compiler can't. So there are sort of dual, dual advantages.0:05:31
The compiler will produce code that executes faster. The interpreter is a better environment for debugging. And most Lisp systems end up having both, end up being0:05:43
configured so you have an interpreter that you use when you're developing your code. Then you can speed it up by compiling. And very often, you can arrange that compiled code and interpreted code can call each other.0:05:54
We'll see how to do that. That's not hard. In fact, the way we'll--0:06:04
in the compiler we're going to make, the way we'll arrange for compiled coding and interpreted code to call, to call each other, is that we'll have the compiler use exactly the same register conventions as the interpreter.0:06:18
Well, the idea of a compiler is very much like the idea of an interpreter or evaluator. It's the same thing.0:06:27
See, the evaluator walks over the code and performs some register operations. That's what we did yesterday.0:06:37
Well, the compiler essentially would like to walk over the code and produce the register operations that the evaluator would have done were it evaluating the thing.0:06:48
And that gives us a model for how to implement a zeroth-order compiler, a very bad compiler but0:06:57
essentially a compiler. A model for doing that is you just take the evaluator, you run it over the code, but instead of executing the actual operations, you just save them away.0:07:07
And that's your compiled code. So let me give you an example of that. Suppose we're going to compile--suppose we want to compile the expression f of x.0:07:25
So let's assume that we've got f of x in the x register and something in the environment register. And now imagine starting up the evaluator.0:07:34
Well, it looks at the expression and it sees that it's an application. And it branches to a place in the evaluator code we saw0:07:43
called ev-application. And then it begins. It stores away the operands and unev, and then it's going to put the operator in exp, and it's going to go0:07:53
recursively evaluate it. That's the process that we walk through. And if you start looking at the code, you start seeing some register operations. You see assign to unev the operands, assign to exp the0:08:03
operator, save the environment, generate that, and so on. Well, if we look on the overhead here, we can see, we0:08:16
can see those operations starting to be produced. Here's sort of the first real operation that the evaluator would have done. It pulls the operands out of the exp register and assigns0:08:27
it to unev. And then it assigns something to the expression register, and it saves continue, and it saves env. And all I'm doing here is writing down the register0:08:38
assignments that the evaluator would have done in executing that code. And can zoom out a little bit. Altogether, there are about 19 operations there.0:08:49
And this is the--this will be the piece of code up until the point where the evaluator branches off to apply-dispatch. And in fact, in this compiler, we're not going to worry about0:09:00
apply-dispatch at all. We're going to have everything--we're going to have both interpreted code and compiled code. Always evaluate procedures, always apply procedures by going to apply-dispatch.0:09:10
That will easily allow interpreted code and compiled code to call each other. Well, in principle, that's all we need to do.0:09:21
You just run the evaluator. So the compiler's a lot like the evaluator. You run it, except it stashes away these operations instead of actually executing them. Well, that's not, that's not quite true.0:09:32
There's only one little lie in that. What you have to worry about is if you have a, a predicate. If you have some kind of test you want to do, obviously, at0:09:44
the point when you're compiling it, you don't know which branch of these--of a conditional like this you're going to do. So you can't say which one the evaluator would have done.0:09:55
So all you do there is very simple. You compile both branches. So you compile a structure that looks like this. That'll compile into something that says, the code, the code0:10:08
for P. And it puts its results in, say, the val register.0:10:18
So you walk the interpreter over the predicate and make sure that the result would go into the val register. And then you compile an instruction that says, branch0:10:30
if, if val is true, to a place we'll call label one.0:10:44
Then we, we will put the code for B to walk the interpreter--walk the interpreter over B. And then0:10:54
go to put in an instruction that says, go to the next thing, whatever, whatever was supposed to happen after this0:11:03
thing was done. You put in that instruction. And here you put label one. And here you put the code for A. And you0:11:19
put go to next thing.0:11:31
So that's how you treat a conditional. You generate a little block like that. And other than that, this zeroth-order compiler is the0:11:40
same as the evaluator. It's just stashing away the instructions instead of executing them. That seems pretty simple, but we've gained something by that.0:11:50
See, already that's going to be more efficient than the evaluator. Because, if you watch the evaluator run, it's not only generating the register operations we wrote down, it's0:12:01
also doing things to decide which ones to generate. So the very first thing it does, say, here for instance, is go do some tests and decide that this is an application,0:12:13
and then branch off to the place that, that handles applications. In other words, what the evaluator's doing is simultaneously analyzing the code to see what to do, and0:12:23
running these operations. And when you-- if you run the evaluator a million times, that analysis phase happens a million times, whereas in the compiler, it's happened once, and then you just have the register0:12:33
operations themselves. Ok, that's a, a zeroth-order compiler, but it is a0:12:42
wretched, wretched compiler. It's really dumb. Let's--let's go back and, and look at this overhead.0:12:52
So look at look at some of the operations this thing is doing. We're supposedly looking at the operations and0:13:01
interpreting f of x. Now, look here what it's doing. For example, here it assigns to exp the0:13:10
operator in fetch of exp. But see, there's no reason to do that, because this is-- the compiler knows that the operator, fetch of exp, is f0:13:21
right here. So there's no reason why this instruction should say that. It should say, we'll assign to exp, f. Or in fact, you don't need exp at all.0:13:32
There's no reason it should have exp at all. What, what did exp get used for? Well, if we come down here, we're going to assign to val,0:13:43
look up the stuff in exp in the environment. So what we really should do is get rid of the exp register altogether, and just change this instruction to say,0:13:53
assign to val, look up the variable value of the symbol f in the environment. Similarly, back up here, we don't need unev at all,0:14:04
because we know what the operands of fetch of exp are for this piece of code. It's the, it's the list x.0:14:13
So in some sense, you don't want unev and exp at all. See, what they really are in some sense, those aren't0:14:22
registers of the actual machine that's supposed to run. Those are registers that have to do with arranging the thing that can simulate that machine. So they're always going to hold expressions which, from0:14:34
the compiler's point of view, are just constants, so can be put right into the code. So you can forget about all the operations worrying about exp and unev and just use those constants.0:14:44
Similarly, again, if we go, go back and look here, there are things like assign to continue eval-args.0:14:53
Now, that has nothing to do with anything. That was just the evaluator keeping track of where it should go next, to evaluate the arguments in some, in some0:15:05
application. But of course, that's irrelevant to the compiler, because you-- the analysis phase will have already done that.0:15:15
So this is completely irrelevant. So a lot of these, these assignments to continue have not to do where the running machine is supposed to0:15:24
continue in keeping track of its state. It has to, to do with where the evaluator analysis should continue, and those are completely irrelevant. So we can get rid of them.0:15:44
Ok, well, if we, if we simply do that, make those kinds of optimizations, get rid, get rid of worrying about exp and unev, and get rid of these irrelevant register0:15:55
assignments to continue, then we can take this literal code, these sort of 19 instructions that the, that the evaluator0:16:05
would have done, and then replace them. Let's look at the, at the slide. Replace them by--we get rid of about half of them.0:16:18
And again, this is just sort of filtering what the evaluator would have done by getting rid of the irrelevant stuff. And you see, for instance, here the--where the evaluator0:16:29
said, assign val, look up variable value, fetch of exp, here we have put in the constant f. Here we've put in the constant x.0:16:39
So there's a, there's a little better compiler. It's still pretty dumb. It's still doing a lot of dumb things.0:16:50
Again, if we go look at the slide again, look at the very beginning here, we see a save the environment, assign0:17:00
something to the val register, and restore the environment. Where'd that come from? That came from the evaluator back here saying, oh, I'm in the middle of evaluating an application.0:17:11
So I'm going to recursively call eval dispatch. So I'd better save the thing I'm going to need later, which is the environment. This was the result of recursively0:17:21
calling eval dispatch. It was evaluating the symbol f in that case. Then it came back from eval dispatch, restored the environment.0:17:31
But in fact, the actual thing it ended up doing in the evaluation is not going to hurt the environment at all. So there's no reason to be saving the environment and0:17:40
restoring the environment here. Similarly, here I'm saving the argument list. That's a piece0:17:53
of the argument evaluation loop, saving the argument list, and here you restore it. But the actual thing that you ended up doing didn't trash the argument list. So there was no reason to save it.0:18:08
So another way to say, another way to say that is that the, the evaluator has to be maximally pessimistic, because0:18:19
as far from its point of view it's just going off to evaluate something. So it better save what it's going to need later. But once you've done the analysis, the compiler is in a0:18:28
position to say, well, what actually did I need to save? And doesn't need to do any-- it doesn't need to be as careful as the evaluator, because it knows what it0:18:38
actually needs. Well, in any case, if we do that and eliminate all those redundant saves and restores, then we can0:18:48
get it down to this. And you see there are actually only three instructions that we actually need, down from the initial 11 or so, or the initial 20 or so in the original one.0:19:00
And that's just saying, of those register operations, which ones did we actually need?0:19:09
Let me just sort of summarize that in another way, just to show you in a little better picture. Here's a picture of starting--0:19:18
This is looking at all the saves and restores. So here's the expression, f of x, and then this traces through, on the bottom here, the various places in the0:19:30
evaluator that were passed when the evaluation happened. And then here, here you see arrows.0:19:40
Arrow down means register saved. So the first thing that happened is the environment got saved. And over here, the environment got restored.0:19:52
And these-- so there are all the pairs of stack operations. Now, if you go ahead and say, well, let's remember that we don't--that unev, for instance, is a completely0:20:02
useless register. And if we use the constant structure of the code, well, we don't need, we don't need to save unev. We don't need0:20:11
unev at all. And then, depending on how we set up the discipline of the--of calling other things that apply, we may or may not0:20:22
need to save continue. That's the first step I did. And then we can look and see what's actually, what's actually needed.0:20:32
See, we don't-- didn't really need to save env or cross-evaluating f, because it wouldn't, it wouldn't trash it. So if we take advantage of that, and see the evaluation0:20:46
of f here, doesn't really need to worry about, about hurting env. And similarly, the evaluation of x here, when the0:20:57
evaluator did that it said, oh, I'd better preserve the function register around that, because I might need it later. And I better preserve the argument list.0:21:07
Whereas the compiler is now in a position to know, well, we didn't really need to save-- to do those saves and restores. So in fact, all of the stack operations done by the evaluator turned out to be unnecessary or overly0:21:18
pessimistic. And the compiler is in a position to know that.0:21:27
Well that's the basic idea. We take the evaluator, we eliminate the things that you don't need, that in some sense have nothing to do with the compiler at all, just the evaluator, and then you see0:21:38
which stack operations are unnecessary. That's the basic structure of the compiler that's described in the book. Let me just show you how that examples a0:21:48
little bit too simple. To see how you, how you actually save a lot, let's look at a little bit more complicated expression.0:21:58
F of G of X and 1. And I'm not going to go through all the code. There's a, there's a fair pile of it.0:22:09
I think there are, there are something like 16 pairs of register saves and restores as the evaluator walks through that. Here's a diagram of them.0:22:20
Let's see. You see what's going on. You start out by--the evaluator says, oh, I'm about to do an application. I'll preserve the environment. I'll restore it here.0:22:30
Then I'm about to do the first operand. Here it recursively goes to the evaluator. The evaluator says, oh, this is an application, I'll save0:22:41
the environment, do the operator of that combination, restore it here. This save--this restore matches that save. And so on.0:22:51
There's unev here, which turns out to be completely unnecessary, continues getting bumped around here. The function register is getting, getting saved across0:23:01
the first operands, across the operands. All sorts of things are going on. But if you say, well, what of those really were the business of the compiler as opposed to the evaluator, you get rid of0:23:12
a whole bunch. And then on top of that, if you say things like, the evaluation of F doesn't hurt the environment register, or0:23:24
simply looking up the symbol X, you don't have to protect the function register against that.0:23:34
So you come down to just a couple of, a couple of pairs here. And still, you can do a little better. Look what's going on here with the environment register.0:23:44
The environment register comes along and says, oh, here's a combination.0:23:54
This evaluator, by the way, doesn't know anything about G. So here it says, so it says, I'd better save the environment register, because evaluating G might be some0:24:05
arbitrary piece of code that would trash it, and I'm going to need it later, after this argument, for doing the second argument.0:24:15
So that's why this one didn't go away, because the compiler made no assumptions about what G would do. On the other hand, if you look at what the second argument0:24:26
is, that's just looking up one. That doesn't need this environment register. So there's no reason to save it. So in fact, you can get rid of that one, too.0:24:35
And from this whole pile of, of register operations, if you simply do a little bit of reasoning like that, you get down to, I think, just two pairs of saves and restores.0:24:45
And those, in fact, could go away further if you, if you knew something about G.0:24:56
So again, the general idea is that the reason the compiler can be better is that the interpreter doesn't know what it's about to encounter. It has to be maximally pessimistic in saving things0:25:05
to protect itself. The compiler only has to deal with what actually had to be saved. And there are two reasons that something might0:25:15
not have to be saved. One is that what you're protecting it against, in fact, didn't trash the register, like it was just a variable look-up.0:25:24
And the other one is, that the thing that you were saving it for might turn out not to actually need it. So those are the two basic pieces of knowledge that the0:25:34
compiler can take advantage of in making the code more efficient.0:25:44
Let's break for questions. AUDIENCE: You kept saying that the uneval register, unev0:25:54
register didn't need to be used at all. Does that mean that you could just map a six-register machine? Or is that, in this particular example, it didn't need to be used? PROFESSOR: For the compiler, you could generate code for0:26:05
the six-register, five, right? Because that exp goes away also. Assuming--yeah, you can get rid of both exp and unev,0:26:14
because, see, those are data structures of the evaluator. Those are all things that would be constants from the point of view of the compiler. The only thing is this particular compiler is set up0:26:24
so that interpreted code and compiled code can coexist. So the way to think about it is, is maybe you build a chip0:26:34
which is the evaluator, and what the compiler might do is generate code for that chip. It just wouldn't use two of the registers.0:26:51
All right, let's take a break. [MUSIC PLAYING]0:27:28
We just looked at what the compiler is supposed to do. Now let's very briefly look at how, how this gets accomplished.0:27:38
And I'm going to give no details. There's, there's a giant pile of code in the book that gives all the details. But what I want to do is just show you the, the essential idea here.0:27:49
Worry about the details some other time. Let's imagine that we're compiling an expression that looks like there's some operator, and there are two arguments.0:28:03
Now, the-- what's the code that the compiler should generate? Well, first of all, it should recursively go off and compile0:28:12
the operator. So it says, I'll compile the operator.0:28:21
And where I'm going to need that is to be in the function register, eventually. So I'll compile some instructions that will compile0:28:30
the operator and end up with the result in the function register.0:28:45
The next thing it's going to do, another piece is to say, well, I have to compile the first argument.0:28:55
So it calls itself recursively. And let's say the result will go into val.0:29:09
And then what it's going to need to do is start setting up the argument list. So it'll say, assign to argl cons of0:29:25
fetch-- so it generates this literal instruction-- fetch of val onto empty list.0:29:35
However, it might have to work-- when it gets here, it's going to need the environment. It's going to need whatever environment was here in order0:29:45
to do this evaluation of the first argument. So it has to ensure that the compilation of this operand,0:29:54
or it has to protect the function register against whatever might happen in the compilation of this operand. So it puts a note here and says, oh, this piece should be0:30:04
done preserving the environment register.0:30:17
Similarly, here, after it gets done compiling the first operand, it's going to say, I better compile-- I'm going to need to know the environment0:30:26
for the second operand. So it puts a little note here, saying, yeah, this is also done preserving env. Now it goes on and says, well, the0:30:41
next chunk of code is the one that's going to compile the second argument.0:30:50
And let's say it'll compile it with a targeted to val, as they say.0:31:03
And then it'll generate the literal instruction, building up the argument list. So it'll say, assign to argl cons of0:31:20
the new value it just got onto the old argument list.0:31:34
However, in order to have the old argument list, it better have arranged that the argument list didn't get trashed by whatever happened in here.0:31:43
So it puts a little note here and says, oh, this has to be done preserving argl.0:31:54
Now it's got the argument list set up. And it's all ready to go to apply dispatch.0:32:06
It generates this literal instruction. Because now it's got the arguments in argl and the0:32:19
operator in fun, but wait, it's only got the operator in fun if it had ensured that this block of code didn't trash what was in the function register.0:32:29
So it puts a little note here and says, oh, yes, all this stuff here had better be done preserving0:32:39
the function register. So that's the little--so when it starts ticking--so basically, what the compiler does is append a whole bunch0:32:51
of code sequences. See, what it's got in it is little primitive pieces of things, like how to look up a symbol, how to do a0:33:01
conditional. Those are all little pieces of things. And then it appends them together in this sort of discipline. So the basic means of combining things is to append0:33:11
two code sequences.0:33:21
That's what's going on here. And it's a little bit tricky. The idea is that it appends two code sequences, taking0:33:32
care to preserve a register. So the actual append operation looks like this. What it wants to do is say, if--0:33:41
here's what it means to append two code sequences. So if sequence one needs register--0:33:53
I should change this. Append sequence one to sequence two, preserving some register.0:34:08
Let me say, and. So it's clear that sequence one comes first. So if sequence two needs the register and sequence one0:34:26
modifies the register, then the instructions that the0:34:35
compiler spits out are, save the register. Here's the code.0:34:44
You generate this code. Save the register, and then you put out the recursively compiled stuff for sequence one.0:34:53
And then you restore the register. And then you put out the recursively compiled stuff for0:35:04
sequence two. That's in the case where you need to do it. Sequence two actually needs the register, and sequence one actually clobbers it.0:35:15
So that's sort of if. Otherwise, all you spit out is sequence one followed by0:35:25
sequence two. So that's the basic operation for sticking together these bits of code fragments, these bits of0:35:34
instructions into a sequence. And you see, from this point of view, the difference between the interpreter and the compiler, in some sense,0:35:46
is that where the compiler has these preserving notes, and says, maybe I'll actually generate the saves and restores and maybe I won't, the interpreter being0:35:56
maximally pessimistic always has a save and restore here. That's the essential difference. Well, in order to do this, of course, the compiler needs0:36:07
some theory of what code sequences need and modifier registers. So the tiny little fragments that you put in, like the0:36:17
basic primitive code fragments, say, what are the operations that you do when you look up a variable?0:36:27
What are the sequence of things that you do when you compile a constant or apply a function? Those have little notations in there about what they need and what they modify.0:36:38
So the bottom-level data structures-- Well, I'll say this. A code sequence to the compiler looks like this.0:36:48
It has the actual sequence of instructions. And then, along with it, there's the set0:37:00
of registers modified.0:37:10
And then there's the set of registers needed.0:37:19
So that's the information the compiler has that it draws on in order to be able to do this operation.0:37:29
And where do those come from? Well, those come from, you might expect, for the very primitive ones, we're going to put them in by hand. And then, when we combine two sequences, we'll figure out0:37:39
what these things should be. So for example, a very primitive one, let's see.0:37:48
How about doing a register assignment. So a primitive sequence might say, oh, it's code fragment. Its code instruction is assigned to R1, fetch of R2.0:38:03
So this is an example. That might be an example of a sequence of instructions. And along with that, it'll say, oh, what I need to0:38:13
remember is that that modifies R1, and then it needs R2.0:38:24
So when you're first building this compiler, you put in little fragments of stuff like that. And now, when it combines two sequences, if I'm going to0:38:37
combine, let's say, sequence one, that modifies a bunch of registers M1, and needs a bunch of registers N1.0:38:54
And I'm going to combine that with sequence two. That modifies a bunch of registers M2, and needs a0:39:07
bunch of registers N2. Then, well, we can reason it out. The new code fragment, sequence one, and--0:39:20
followed by sequence two, well, what's it going to modify? The things that it will modify are the things that are0:39:29
modified either by sequence one or sequence two. So the union of these two sets are what0:39:38
the new thing modifies. And then you say, well, what is this--what registers is it going to need?0:39:47
It's going to need the things that are, first of all, needed by sequence one. So what it needs is sequence one. And then, well, not quite all of the ones that are needed by0:39:58
sequence one. What it needs are the ones that are needed by sequence two that have not been set up by sequence one.0:40:08
So it's sort of the union of the things that sequence two needs minus the ones that sequence one modifies.0:40:19
Because it worries about setting them up. So there's the basic structure of the compiler. The way you do register optimizations is you have some0:40:30
strategies for what needs to be preserved. That depends on a data structure. Well, it depends on the operation of what it means to put things together.0:40:39
Preserving something, that depends on knowing what registers are needed and modified by these code fragments.0:40:48
That depends on having little data structures, which say, a code sequence is the actual instructions, what they modify and what they need.0:40:57
That comes from, at the primitive level, building it in. At the primitive level, it's going to be completely obvious what something needs and modifies. Plus, this particular way that says, when I build up bigger0:41:08
ones, here's how I generate the new set of registers modified and the new set of registers needed. And that's the whole-- well, I shouldn't say that's the whole thing.0:41:17
That's the whole thing except for about 30 pages of details in the book. But it is a perfectly usable rudimentary compiler.0:41:28
Let me kind of show you what it does. Suppose we start out with recursive factorial. And these slides are going to be much too small to read.0:41:38
I just want to flash through the code and show you about how much it is. That starts out with--here's a first block of it, where it compiles a procedure entry and does a bunch of assignments.0:41:48
And this thing is basically up through the part where it sets up to do the predicate and test whether the predicate's true. The second part is what results from--0:41:59
in the recursive call to fact of n minus one. And this last part is coming back from that and then taking0:42:08
care of the constant case. So that's about how much code it would produce for factorial. We could make this compiler much, much better, of course.0:42:18
The main way we could make it better is to allow the compiler to make any assumptions at all about what happens when you call a procedure. So this compiler, for instance, doesn't even know,0:42:30
say, that multiplication is something that could be coded in line. Instead, it sets up this whole mechanism. It goes to apply-dispatch.0:42:41
That's a tremendous waste, because what you do every time you go to apply-dispatch is you have to concept this argument list, because it's a very general thing you're going to. In any real compiler, of course, you're going to have0:42:51
registers for holding arguments. And you're going to start preserving and saving the way you use those registers similar to the0:43:00
same strategy here. So that's probably the very main way that this particular compiler in the book could be fixed. There are other things like looking up variable values and0:43:12
making more efficient primitive operations and all sorts of things. Essentially, a good Lisp compiler can absorb an arbitrary amount of effort. And probably one of the reasons that Lisp is slow with0:43:23
compared to languages like FORTRAN is that, if you look over history at the amount of effort that's gone into building Lisp compilers, it's nowhere near the amount of0:43:32
effort that's gone into FORTRAN compilers. And maybe that's something that will change over the next couple of years. OK, let's break.0:43:43
Questions? AUDIENCE: One of the very first classes-- I don't know if it was during class or after class- you0:43:52
showed me the, say, addition has a primitive that we don't see, and-percent add or something like that. Is that because, if you're doing inline code you'd want0:44:03
to just do it for two operators, operands? But if you had more operands, you'd want to do something special?0:44:12
PROFESSOR: Yeah, you're looking in the actual scheme implementation. There's a plus, and a plus is some operator. And then if you go look inside the code for plus, you see something called--0:44:21
I forget-- and-percent plus or something like that. And what's going on there is that particular kind of optimization. Because, see, general plus takes an0:44:30
arbitrary number of arguments. So the most general plus says, oh, if I have an argument list, I'd better cons it up in some list and then figure out0:44:42
how many there were or something like that. That's terribly inefficient, especially since most of the time you're probably adding two numbers. You don't want to really have to cons this argument list. So0:44:52
what you'd like to do is build the code for plus with a bunch of entries. So most of what it's doing is the same. However, there might be a special entry that you'd go to0:45:02
if you knew there were only two arguments. And those you'll put in registers. They won't be in an argument list and you won't have to [UNINTELLIGIBLE]. That's how a lot of these things work.0:45:12
OK, let's take a break. [MUSIC PLAYING]0:00:00
Lecture 10B | MIT 6.001 Structure and Interpretation, 1986
0:00:00
[MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:18
PROFESSOR: Well, there's one bit of mystery left, which I'd like to get rid of right now. And that's that we've been blithely doing things like0:00:28
cons assuming there's always another one. That we've been doing these things like car-ing and0:00:37
cdr-ing and assuming that we had some idea how this can be done. Now indeed we said that that's equivalent to having procedures. But that doesn't really solve the problem, because the0:00:48
procedure need all sorts of complicated mechanisms like environment structures and things like that to work. And those were ultimately made out of conses in the model that we had, so that really doesn't solve the problem.0:00:59
Now the problem here is the glue the data structure's made out of. What kind of possible thing could it be? We've been showing you things like a machine, a computer0:01:11
that has a controller, and some registers, and maybe a stack. And we haven't said anything about, for example, larger memory.0:01:20
And I think that's what we have to worry about right now. But just to make it perfectly clear that this is an inessential, purely implementational thing, I'd0:01:31
like to show you, for example, how you can do it all with the numbers. That's an easy one. Famous fellow by the name of Godel, a logician at the end0:01:45
of the 1930s, invented a very clever way of encoding the complicated expressions as numbers.0:01:54
For example-- I'm not saying exactly what Godel's scheme is, because he didn't use words like cons. He had other kinds of ways of combining to make expressions.0:02:03
But he said, I'm going to assign a number to every algebraic expression. And the way I'm going to manufacture these numbers is by combining the numbers of the parts.0:02:12
So for example, what we were doing our world, we could say that if objects are represented by numbers, then0:02:34
cons of x and y could be represented by 2 to the x times 2 to the y.0:02:46
Because then we could extract the parts. We could say, for example, that then car of, say, x is0:02:57
the number of factors of 2 in x.0:03:06
And of course cdr is the same thing. It's the number of factors of 3 in x.0:03:16
Now this is a perfectly reasonable scheme, except for the fact that the numbers rapidly get to be much larger in number of digits than the number of0:03:25
protons in the universe. So there's no easy way to use this scheme other than the theoretical one. On the other hand, there are other ways of representing0:03:37
these things. We have been thinking in terms of little boxes. We've been thinking about our cons structures as looking0:03:47
sort of like this. They're little pigeon holes with things in them. And of course we arrange them in little trees.0:03:57
I wish that the semiconductor manufacturers would supply me with something appropriate for this, but actually what they do supply me with is a linear memory.0:04:09
Memory is sort of a big pile of pigeonholes, pigeonholes like this. Each of which can hold a certain sized object, a fixed0:04:21
size object. So, for example, a complicated list with 25 elements won't fit in one of these. However, each of these is indexed by an address.0:04:33
So the address might be zero here, one here, two here, three here, and so on. That we write these down as numbers is unimportant. What matters is that they're distinct as a way to get to0:04:42
the next one. And inside of each of these, we can stuff something into these pigeonholes. That's what memory is like, for those of you who haven't0:04:52
built a computer. Now the problem is how are we going to impose on this type of structure, this nice tree structure.0:05:03
Well it's not very hard, and there have been numerous schemes involved in this. The most important one is to say, well assuming that the semiconductor manufacturer allows me to arrange my memory0:05:13
so that one of these pigeonholes is big enough to hold the address of another I haven't made. Now it actually has to be a little bit bigger because I0:05:23
have to also install or store some information as to a tag which describes the kind of thing that's there. And we'll see that in a second.0:05:32
And of course if the semiconductor manufacturer doesn't arrange it so I can do that, then of course I can, with some cleverness, arrange combinations of these to fit together in that way.0:05:43
So we're going to have to imagine imposing this complicated tree structure on our nice linear memory. If we look at the first still store, we see a classic scheme0:05:57
for doing that. It's a standard way of representing Lisp structures in a linear memory. What we do is we divide this memory into two parts.0:06:12
An array called the cars, and an array called the cdrs. Now whether those happen to be sequential addresses or whatever, it's not important.0:06:22
That's somebody's implementation details. But there are two arrays here. Linear arrays indexed by sequential indices like this.0:06:34
What is stored in each of these pigeonholes is a typed object. And what we have here are types which begin with letters0:06:44
like p, standing for a pair. Or n, standing for a number. Or e, standing for an empty list. The end of the list. And0:06:57
so if we wish to represent an object like this, the list beginning with 1, 2 and then having a 3 and a 4 as its second and third elements.0:07:06
A list containing a list as its first part and then two numbers as a second and third parts. Then of course we draw it sort of like this these days, in0:07:15
box-and-pointer notation. And you see, these are the three cells that have as their car pointer the object which is either 1, 2 or 3 or 4.0:07:28
And then of course the 1, 2, the car of this entire structure, is itself a substructure which contains a sublist like that. What I'm about to do is put down places which are--0:07:39
I'm going to assign indices. Like this 1, over here, represents the index of this cell.0:07:49
But that pointer that we see here is a reference to the pair of pigeonholes in the cars and the cdrs that are labeled by 1 in my linear memory down here.0:08:02
So if I wish to impose this structure on my linear memory, what I do is I say, oh yes, why don't we drop this into cell 1?0:08:12
I pick one. There's 1. And that says that its car, I'm going to assign it to be a pair. It's a pair, which is in index 5.0:08:22
And the cdr, which is this one over here, is a pair which I'm going to stick into place 2. p2. And take a look at p2.0:08:32
Oh yes, well p2 is a thing whose car is the number 3, so as you see, an n3. And whose cdr, over here, is a pair, which lives in place 4.0:08:46
So that's what this p4 is. p4 is a number whose value is 4 in its car and whose cdr is0:08:56
an empty list right there. And that ends it. So this is the traditional way of representing this kind of0:09:05
binary tree in a linear memory. Now the next question, of course, that we might want to0:09:15
worry about is just a little bit of implementation. That means that when I write procedures of the form assigned a, [UNINTELLIGIBLE] procedures--0:09:24
lines of register machine code of the form assigned a, the car of [UNINTELLIGIBLE] b, what I really mean is addressing these elements.0:09:38
And so we're going to think of that as a abbreviation for it. Now of course in order to write that down I'm going to introduce some sort of a structure called a vector.0:09:52
And we're going to have something which will reference a vector, just so we can write it down. Which takes the name of the vector, or the--0:10:02
I don't think that name is the right word. Which takes the vector and the index, and I have to have a0:10:12
way of setting one of those with something called a vector set, I don't really care. But let's look, for example, at then that kind of implementation of car and cdr.0:10:26
So for example if I happen to have a register b, which contains the type index of a pair, and therefore it is the0:10:37
pointer to a pair, then I could take the car of that and if I-- write this down-- I might put that in register a. What that really is is a representation of the assign0:10:49
to a, the value of vector reffing-- or array indexing, if you will-- or something, the cars object--0:10:58
whatever that is-- with the index, b. And similarly for cdr. And we can do the same thing for assignment to data structures, if we need to do that sort of0:11:10
thing at all. It's not too hard to build that. Well now the next question is how are we going to do allocation. And every so often I say I want a cons.0:11:21
Now conses don't grow on trees. Or maybe they should. But I have to have some way of getting the next one. I have to have some idea of if their memory is unused that I0:11:33
might want to allocate from. And there are many schemes for doing this. And the particular thing I'm showing you right now is not essential.0:11:42
However it's convenient and has been done many times. One scheme's was called the free list allocation scheme. What that means is that all of the free memory that there is in the world is linked together in a linked list,0:11:54
just like all the other stuff. And whenever you need a free cell to make a new cons, you grab the first, one make the free list be the cdr of it,0:12:04
and then allocate that. And so what that looks like is something like this. Here we have the free list starting in 6.0:12:18
And what that is is a pointer-off to say 8. So what it says is, this one is free and the0:12:27
next one is an 8. This one is free and the next one is in 3, the next one that's free. That one's free and the next one is in 0.0:12:37
That one's free and the next one's in 15. Something like that. We can imagine having such a structure.0:12:46
Given that we have something like that, then it's possible to just get one when you need it. And so a program for doing cons, this is what0:12:57
cons might turn into. To assign to a register A the result of cons-ing, a B onto C, the value in this containing B and the value0:13:08
containing C, what we have to do is get the current [? type ?] ahead of the freelist, make the free list be its cdr. Then we have to change the cars to be the0:13:19
thing we're making up to be in A to be the B, the thing in B. And we have to make change the cdrs of the thing that's in A0:13:30
to be C. And then what we have in A is the right new frob, whatever it is. The object that we want.0:13:40
Now there's a little bit of a cheat here that I haven't told you about, which is somewhere around here I haven't set that I've the type of the thing that I'm cons-ing up to be a0:13:51
pair, and I ought to. So there should be some sort of bits here are being set, and I just haven't written that down. We could have arranged it, of course, for the free lift to0:14:01
be made out of pairs. And so then there's no problem with that. But that sort of-- again, an inessential detail in a way0:14:10
some particular programmer or architect or whatever might manufacture his machine or Lisp system. So for example, just looking at this, to allocate given0:14:23
that I had already the structure that you saw before, supposing I wanted to allocate a new cell, which is going to be representation of list one, one, two, where already one0:14:38
two was the car of the list we were playing with before. Well that's not so hard. I stored that one and one, so p1 one is the0:14:47
representation of this. This is p5. That's going to be the cdr of this. Now we're going to pull something off the free list, but remember the free list started at six.0:14:57
The new free list after this allocation is eight, a free list beginning at eight. And of course in six now we have a number one, which is0:15:06
what we wanted, with its cdr being the pair starting in location five. And that's no big deal.0:15:16
So the only problem really remaining here is, well, I don't have an infinitely large memory.0:15:25
If I do this for a little while, say, for example, supposing it takes me a microsecond to do a cons, and I have a million cons memory then I'm only going to run out0:15:34
in a second, and that's pretty bad. So what we do to prevent that disaster, that ecological disaster, talk about right after questions.0:15:44
Are there any questions? Yes. AUDIENCE: In the environment diagrams that we were drawing0:15:54
we would use the body of procedures, and you would eventually wind up with things that were no longer useful in that structure.0:16:04
How is that represented? PROFESSOR: There's two problems here. One you were asking is that material becomes useless.0:16:13
We'll talk about that in a second. That has to do with how to prevent ecological disasters. If I make a lot of garbage I have to somehow be able to clean up after myself. And we'll talk about that in a second.0:16:23
The other question you're asking is how you represent the environments, I think. AUDIENCE: Yes. PROFESSOR: OK. And the environment structures can be represented in arbitrary ways. There are lots of them. I mean, here I'm just telling you about list cells.0:16:33
Of course every real system has vectors of arbitrary length as well as the vectors of length, too, which represent list cells. And the environment structures that one uses in a0:16:45
professionally written Lisp system tend to be vectors which contain a number of elements approximately equal to the number of arguments-- a little bit more because you0:16:56
need certain glue. So remember, the environment [UNINTELLIGIBLE] frames. The frames are constructed by applying a procedure. In doing so, an allocation is made of a place which is the0:17:08
number of arguments long plus [? unglue ?] that gets linked into a chain. It's just like algol at that level.0:17:19
There any other questions? OK. Thank you, and let's take a short break. [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:18:12
PROFESSOR: Well, as I just said, computer memories supplied by the semiconductor manufacturers are finite. And that's quite a pity.0:18:21
It might not always be that way. Just for a quick calculation, you can see that it's possible that if [? memory ?] prices keep going at the rate they're going that if you0:18:32
still took a microsecond second to do a cons, then-- first of all, everybody should know that there's about pi times ten to the seventh seconds in a year. And so that would be ten to the seventh plus ten to the0:18:42
sixth is ten to the thirteenth. So there's maybe ten to the fourteenth conses in the life of a machine. If there was ten to the fourteenth words of memory on your machine, you'd never run out.0:18:54
And that's not completely unreasonable. Ten to the fourteenth is not a very large number.0:19:03
I don't think it is. But then again I like to play with astronomy. It's at least ten to the eighteenth centimeters between us and the nearest star.0:19:12
But the thing I'm about to worry about is, at least in the current economic state of affairs, ten to the fourteenth0:19:22
pieces of memory is expensive. And so I suppose what we have to do is make do with much smaller. Memories Now in general we want to have an illusion of infinity.0:19:35
All we need to do is arrange it so that whenever you look, the thing is there. That's really an important idea.0:19:49
A person or a computer lives only a finite amount of time and can only take a finite number of looks at something. And so you really only need a finite amount of stuff.0:19:58
But you have to arrange it so no matter how much there is, how much you really claim there is, there's always enough stuff so that when you take a look, it's there. And so you only need a finite amount.0:20:08
But let's see. One problem is, as was brought up, that there are possible ways that there is lots of stuff that we make that we0:20:18
don't need. And we could recycle the material out of which its made. An example is the fact that we're building environment0:20:27
structures, and we do so every time we call a procedure. We have built in it a environment frame. That environment frame doesn't necessarily have a very long lifetime.0:20:36
Its lifetime, meaning its usefulness, may exist only over the invocation of the procedure. Or if the procedure exports another procedure by returning0:20:45
it as a value and that procedure is defined inside of it, well then the lifetime of the frame of the outer procedure still is only the lifetime of the procedure0:20:57
which was exported. And so ultimately, a lot of that is garbage. There are other ways of producing garbage as well. Users produce garbage.0:21:07
An example of user garbage is something like this. If we write a program to, for example, append two lists together, well one way to do it is to reverse the first0:21:19
list onto the empty list and reverse that onto the second list. Now that's not terribly bad way of doing it.0:21:28
And however, the intermediate result, which is the reversal of the first list as done by this program, is never going0:21:37
to be accessed ever again after it's copied back on to the second. It's an intermediate result. It's going to be hard to ever see how anybody would ever be0:21:47
able to access it. In fact, it will go away. Now if we make a lot of garbage like that, and we should be allowed to, then there's got to be some way to0:21:56
reclaim that garbage. Well, what I'd like to tell you about now is a very clever technique whereby a Lisp system can prove a small0:22:09
theorem every so often on the [? forum, ?] the following piece of junk will never be accessed again. It can have no affect on the future of the computation.0:22:21
It's actually based on a very simple idea. We've designed our computers to look sort of like this. There's some data path, which contains the registers.0:22:35
There are things like x, and env, and val, and so on. And there's one here called stack, some sort which points0:22:47
off to a structure somewhere, which is the stack. And we'll worry about that in a second. There's some finite controller, finite state machine controller.0:22:56
And there's some control signals that go this way and predicate results that come this way, not the interesting part. There's some sort of structured memory, which I0:23:07
just told you how to make, which may contain a stack. I didn't tell you how to make things of arbitrary shape, only pairs. But in fact with what I've told you can simulate a stack0:23:16
by a big list. I don't plan to do that, it's not a nice way to do it. But we could have something like that. We have all sorts of little data structures in here that0:23:25
are hooked together in funny ways. They connect to other things. And so on. And ultimately things up there are pointers to these.0:23:37
The things that are in the registers are pointers off to the data structures that live in this Lisp structure memory. Now the truth of the matter is that the entire consciousness0:23:52
of this machine is in these registers. There is no possible way that the machine, if done correctly, if built correctly, can access anything in this0:24:02
Lisp structure memory unless the thing in that Lisp structure memory is connected by a sequence of data structures to the registers.0:24:15
If it's accessible by legitimate data structure selectors from the pointers that are stored in these registers. Things like array references, perhaps.0:24:24
Or cons cell references, cars and cdrs. But I can't just talk about a random place in this memory, because I can't get to it. These are being arbitrary names I'm not allowed to0:24:34
count, at least as I'm evaluating expressions. If that's the case then there's a very simple theorem0:24:44
to be proved. Which is, if I start with all lead pointers that are in all these registers and recursively chase out, marking0:24:53
all the places I can get to by selectors, then eventually I mark everything they can be gotten to. Anything which is not so marked is0:25:02
garbage and can be recycled. Very simple. Cannot affect the future of the computation.0:25:11
So let me show you that in a particular example. Now that means I'm going to have to append to my description of the list structure a mark.0:25:23
And so here, for example, is a Lisp structured memory. And in this Lisp structured memory is a Lisp structure beginning in a place I'm going to call--0:25:35
this is the root. Now it doesn't really have to have a root. It could be a bunch of them, like all the registers. But I could cleverly arrange it so all the registers, all0:25:45
the things that are in old registers are also at the right moment put into this root structure, and then we've got one pointer to it. I don't really care.0:25:54
So the idea is we're going to cons up stuff until our free list is empty. We've run out of things. Now we're going to do this process of proving the theorem0:26:04
that a certain percentage of the memory has got crap in it. And then we're going to recycle that to grow new trees, a standard use of such garbage.0:26:17
So in any case, what do we have here? Well we have some data structure which starts out over here one.0:26:27
And in fact it has a car in five, and its cdr is in two. And all the marks start out at zero.0:26:36
Well let's start marking, just to play this game. OK. So for example, since I can access one from the root I0:26:47
will mark that. Let me mark it. Bang. That's marked. Now since I have a five here I can go to five and see, well0:27:00
I'll mark that. Bang. That's useful stuff. But five references as a number in its car, I'm not interested in marking numbers but its cdr is seven. So I can mark that.0:27:10
Bang. Seven is the empty list, the only thing that references, and it's got a number in its car. Not interesting.0:27:19
Well now let's go back here. I forgot about something. Two. See in other words, if I'm looking at cell one, cell one contains a two right over here.0:27:30
A reference to two. That means I should go mark two. Bang. Two contains a reference to four. It's got a number in its car, I'm not interested in that, so0:27:41
I'm going to go mark that. Four refers to seven through its car, and is empty in its cdr, but I've already marked that one so I don't have to mark it again.0:27:51
This is all the accessible structure from that place. Simple recursive mark algorithm. Now there are some unhappinesses about that0:28:01
algorithm, and we can worry about that a second. But basically you'll see that all the things that have not been marked are places that are free, and I could recycle.0:28:14
So the next stage after that is going to be to scan through all of my memory, looking for things that are not marked. Every time I come across a marked thing I unmark it, and0:28:23
every time I come across an unmarked thing I'm going to link it together in my free list. Classic, very simple algorithm.0:28:32
So let's see. Is that very simple? Yes it is. I'm not going to go through the code in any detail, but I just want to show you about how long it is. Let's look at the mark phase.0:28:42
Here's the first part of the mark phase. We pick up the root. We're going to use that as a recursive procedure call.0:28:52
We're going to sweep from there, after when we're done with marking. And then we're going to do a little couple of instructions that do this checking out on the marks and changing the0:29:01
marks and things like that, according to the algorithm I've just shown you. It comes out here. You have to mark the cars of things and you also have to be able to mark the cdrs of things.0:29:10
That's the entire mark phase. I'll just tell you a little story about this. The old DEC PDP-6 computer, this was the way that the0:29:22
mark-sweep garbage collection, as it was, was written. The program was so small that with the data that it needed,0:29:31
with the registers that it needed to manipulate the memory, it fit into the fast registers of the machine, which were 16. The whole program. And you could execute0:29:40
instructions in the fast registers. So it's an extremely small program, and it could run very fast. Now unfortunately, of course, this program, because the fact0:29:53
that it's recursive in the way that you do something first and then you do something after that, you have to work on the cars and then the cdrs, it requires auxiliary memory.0:30:03
So Lisp systems-- those requires a stack for marking. Lisp systems that are built this way have a limit to the0:30:12
depth of recursion you can have in data structures in either the car or the cdr, and that doesn't work very nicely. On the other hand, you never notice it if it's big enough.0:30:23
And that's certainly been the case for most Maclisp, for example, which ran Macsyma where you could deal with expressions of thousands of elements long.0:30:33
These are algebraic expressions with thousand of terms. And there's no problem with that. Such, the garbage collector does work.0:30:42
On the other hand, there's a very clever modification to this algorithm, which I will not describe, by Peter Deutsch and Schorr and Waite-- Herb Schorr from IBM and Waite, who I don't know.0:30:55
That algorithm allows you to build-- you do can do this without auxiliary memory, by remembering as you walk the data structures where you came from by reversing the pointers0:31:04
as you go down and crawling up the reverse pointers as you go up. It's a rather tricky algorithm. The first time you write it-- or in fact, the first three times you write it it has a terrible bug in it.0:31:14
And it's also rather slow, because it's complicated. It takes about six times as many memory references to do the sorts of things that we're talking about.0:31:24
Well now once I've done this marking phase, and I get into a position where things look like this, let's look-- yes. Here we have the mark done, just as I did it.0:31:35
Now we have to perform the sweep phase. And I described to you what this sweep is like. I'm going to walk down from one end of memory or the other, I don't care where, scanning every cell that's in0:31:45
the memory. And as I scan these cells, I'm going to link them together, if they are free, into the free list. And if they're not free, I'm going to unmark them so the marks become zero.0:31:57
And in fact what I get-- well the program is not very complicated. It looks sort of like this-- it's a little longer. Here's the first piece of it. This one's coming down from the top of memory.0:32:06
I don't want you to try to understand this at this point. It's rather simple. It's a very simple algorithm, but there's pieces of it that just sort of look like this.0:32:15
They're all sort of obvious. And after we've done the sweep, we get an answer that looks like that.0:32:25
Now there are some disadvantages with mark-sweep algorithms of this sort. Serious ones. One important disadvantage is that your memories get larger0:32:34
and larger. As you say, address spaces get larger and larger, you're willing to represent more and more stuff, then it gets very0:32:43
costly to scan all of memory. What you'd really like to do is only scan useful stuff. It would even be better if you realized that some stuff was0:32:56
known to be good and useful, and you don't have to look at it more than once or twice. Or very rarely. Whereas other stuff that you're not so sure about, you0:33:05
can look at more detail every time you want to do this, want to garbage collect. Well there are algorithms that are organized in this way.0:33:15
Let me tell you about a famous old algorithm which allows you only look at the part of memory which is known to be useful. And which happens to be the fastest known garbage0:33:24
collector algorithm. This is the Minsky-Feinchel-Yochelson garbage collector algorithm. It was invented by Minsky in 1961 or '60 or something, for0:33:36
the RLE PDP-1 Lisp, which had 4,096 words of list memory,0:33:45
and a drum. And the whole idea was to garbage collect this terrible memory. What Minsky realized was the easiest way to do this is to0:33:56
scan the memory in the same sense, walking the good structure, copying it out into the drum, compacted.0:34:06
And then when we were done copying it all out, then you swap that back into your memory. Now whether or you not use a drum, or another piece of memory, or something like that isn't important.0:34:17
In fact, I don't think people use drums anymore for anything. But this algorithm basically depends upon having about twice as much address space as you're actually using.0:34:30
And so what you have is some, initially, some mixture of useful data and garbage. So this is called fromspace.0:34:45
And this is a mixture of crud. Some of it's important and some of it isn't. Now there's another place which is hopefully big enough,0:34:55
if we recall, tospace, which is where we're copying to. And what happens is-- and I'm not going to go through this detail.0:35:04
It's in our book quite explicitly. There's a root point where you start from. And the idea is that you start with the root.0:35:14
You copy the first thing you see, the first thing that the root points at, to the beginning of tospace. The first thing is a pair or something0:35:24
like, a data structure. You then also leave behind a broken heart saying, I moved this object from here to here, giving the place0:35:36
where it moved to. This is called a broken heart because a friend of mine who implemented one of these in 1966 was a very romantic character and called it a broken heart.0:35:49
But in any case, the next thing you do is now you have a new free pointer which is here, and you start scanning. You scan this data structure you just copied.0:36:00
And every time you encounter a pointer in it, you treat it as if it was the root pointer here. Oh, I'm sorry. The other thing you do is you now move the root pointer to there.0:36:09
So now you scan this, and everything you see you treat as it were the root pointer. So if you see something, well it points up into there somewhere.0:36:18
Is it pointing at a thing which you've not copied yet? Is there a broken heart there? If there's a broken heart there and it's something you have copied, you've just replaced this pointer with the0:36:27
thing a broken heart points at. If this thing has not been copied, you copy it to the next place over here. Move your free pointer over here, and then leave a broken0:36:39
heart behind and scan. And eventually when the scant pointer hits the free pointer, everything in memory has been copied.0:36:50
And then there's a whole bunch of empty space up here, which you could either make into a free list, if that's what you want to do. But generally you don't in this kind of system. In this system you sequentially allocate your memory.0:37:00
That is a very, very nice algorithm, and sort of the one we use in the scheme that you've been using. And it's expected--0:37:09
I believe no one has found a faster algorithm than that. There are very simple modifications to this algorithm invented by Henry Baker which allow one to run0:37:19
this algorithm in real time, meaning you don't have to stop to garbage collect. But you could interleave the consing that the machine does when its running with steps of the garbage collection process, so that the garbage collector's distributed, and0:37:31
the machine doesn't have to stop, and garbage collecting can start. Of course in the case of machines with virtual memory where a lot of it is in inaccessible places, this0:37:41
becomes a very expensive process. And there have been numerous attempts to make this much better. There is a nice paper, for those of you who are0:37:52
interested, by Moon and other people which describes a modification to the incremental Minsky-Feinchel-Yochelson algorithm, and modification the Baker algorithm which is more efficient for virtual0:38:05
memory systems. Well I think now the mystery to this is sort of gone. And I'd like to see if there are any questions.0:38:19
Yes. AUDIENCE: I saw one of you run the garbage collector on the systems upstairs, and it seemed to me to run extremely fast. Did the whole thing take--0:38:30
does it sweep through all of memory? PROFESSOR: No. It swept through exactly what was needed to copy the useful structure. It's a copying collector.0:38:40
And it is very fast. On the whole, I suppose to copy-- in a Bobcat-- to copy, I think, a three megabyte thing or something is0:38:52
less than a second, real time. Really, these are very small programs. One thing you should realise is that garbage collectors have to be small.0:39:05
Not because they have to be fast, but because no one can debug a complicated garbage collector. A garbage collector, if it doesn't work, will trash your0:39:15
memory in such a way that you cannot figure out what the hell happened. You need an audit trail. Because it rearranges everything, and how do you know what happened there? So this is the only kind of program that it really,0:39:27
seriously matters if you stare at it long enough so you believe that it works. And sort of prove it to yourself. So there's no way to debug it.0:39:36
And that takes it being small enough so you can hold it in your head. Garbage collectors are special in this way.0:39:45
So every reasonable garbage collector has gotten small, and generally small programs are fast. Yes. AUDIENCE: Can you repeat the name of this technique once again?0:39:54
PROFESSOR: That's the Minsky-Feinchel-Yochelson garbage collector. AUDIENCE: You got that? PROFESSOR: Minsky invented it in '61 for the RLE PDP-1. A version of it was developed and elaborated to be used in0:40:07
Multics Maclisp by Feinchel and Yochelson in somewhere around 1968 or '69.0:40:19
OK. Let's take a break. [MUSIC: "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:41:17
PROFESSOR: Well we've come to the end of this subject, and we've already shown you a universal machine which is down to evaluator.0:41:26
It's down to the level of detail you could imagine you could make one. This is a particular implementation of Lisp, built on one of those scheme chips that was talked about0:41:37
yesterday, sitting over here. This is mostly interface to somebody's memory with a little bit of timing and other such stuff. But this fellow actually ran Lisp at a fairly reasonable0:41:48
rate, as interpretive. It ran Lisp as fast as a DEC PDP-10 back in 1979. And so it's gotten pretty hardware.0:41:59
Pretty concrete. We've also downed you a bit with the things you can compute. But is it the case that there are things we can't compute?0:42:11
And so I'd like to end this with showing you some things that you'd like be able to compute that you can't. The answer is yes, there are things you can't compute.0:42:22
For example, something you'd really like is-- if you're writing [UNINTELLIGIBLE], you'd like a program that would check that the thing you're0:42:32
going to do will work. Wouldn't that be nice? You'd like something that would catch infinite loops, for example, in programs that were written by users.0:42:43
But in general you can't write such a program that will read any program and determine whether or not it's an infinite loop. Let me show you that. It's a little bit of a minor mathematics.0:42:58
Let's imagine that we just had a mathematical function before we start. And there is one, called s, which takes a procedure and0:43:12
its argument, a. And what s does is it determines whether or not it's0:43:24
safe to run p on a. And what I mean by that is this: it's true if p applied0:43:34
to a will converge to a value without an error.0:43:52
And it's false if p of a loops forever or makes an error.0:44:15
Now that's surely a function. There is some for every procedure and for every argument you could give it that is either true or false0:44:25
that it converges without making an error. And you could make a giant table of them. But the question is, can you write a procedure that compute0:44:34
the values of this function? Well let's assume that we can. Suppose that we have a procedure called "safe" that0:44:58
computes the value of s.0:45:12
Now I'm going to show you by several methods that you can't do this. The easiest one, or the first one, let's define a procedure0:45:22
called diag1. Given that we have safe, we can define diag1 to be the0:45:38
procedure of one argument, p, which has the following properties. If if it's safe to apply p to itself, then I wish to have an0:45:54
infinite loop. Otherwise I'm going to return 3.0:46:03
Remember it was 42. What's the answer to the big question? Where of course we know what an infinite loop is.0:46:12
Infinite loop, to be a procedure of no arguments, which is that nice lambda calculus loop. Lambda of x, x of x, applied to lambda of x, x of x.0:46:24
So there's nothing left to the imagination here. Well let's see what the story is. I'm supposing it's the case that we worry about the0:46:38
procedure called diag1 applied to diag1. Well what could it possibly be?0:46:49
Well I don't know. We're going to substitute diag1 for p in the body here. Well is it safe to compute diag1 of diag1?0:47:00
I don't know. There are two possibilities. If it's safe to compute diag1 of diag1 that means it shouldn't loop. That means I go to here, but then I0:47:09
produce an infinite loop. So it can't be safe. But if it's not safe to compute diag1 of diag1 then the answer to this is 3. But that's diag1 of diag1, so it had to be safe.0:47:20
So therefore by contradiction you cannot produce safe. For those of you who were boggled by that one I'm going0:47:30
to say it again, in a different way. Listen to one more alternative. Let's define diag2.0:47:39
These are named diag because of Cantor's diagonal argument. These are instances of a famous argument which was0:47:48
originally used by Cantor in the late part of the last century to prove that the real numbers were not countable, that there are too many real numbers to0:47:58
be counted by integers. That there are more points on a line, for example, than there are counting numbers. It may or may not be obvious, and I don't want to0:48:07
get into that now. But diag2 is again a procedure of one argument p. It's almost the same as the previous one, which is, if0:48:19
it's safe to compute p on p, then I'm going to produce--0:48:29
then I want to compute some other things other than p of p.0:48:38
Otherwise I'm going to put out false. Where other then it says, whatever p of p, I'm going to put out something else.0:48:48
I can give you an example of a definition of other than which I think works. Let's see. Yes. Where other than be a procedure of one argument x0:49:06
which says, if its eq x to, say, quote a, then the answer is quote b.0:49:15
Otherwise it's quote a. That always produces something which is not what its argument is.0:49:25
That's all it is. That's all I wanted. Well now let's consider this one, diag2 of diag2.0:49:38
Well look. This only does something dangerous, like calling p of p, if it's safe to do so.0:49:47
So if safe defined at all, if you can define such a procedure, safe, then this procedure is always defined and therefore safe on any inputs.0:50:01
So diag2 of diag2 must reduce to other than diag2 of diag2.0:50:15
And that doesn't make sense, so we have a contradiction, and therefore we can't define safe. I just waned to do that twice, slightly differently, so you0:50:27
wouldn't feel that the first one was a trick. They may be both tricks, but they're at least slightly different.0:50:37
So I suppose that pretty much wraps it up. I've just proved what we call the halting theorem, and I suppose with that we're going to halt.0:50:46
I hope you have a good time. Are there any questions? Yes. AUDIENCE: What is the value of s of diag1?0:50:56
PROFESSOR: Of what? AUDIENCE: S of diag1. If you said s is a function and we can [INTERPOSING VOICES] PROFESSOR: Oh, I don't know. I don't know. It's a function, but I don't know how to compute it.0:51:06
I can't do it. I'm just a machine, too. Right? There's no machine that in principle-- it might be that in that particular case you just0:51:16
asked, with some thinking I could figure it out. But in general I can't compute the value of s any better than any other machine can. There is such a function, it's just that no machine can be0:51:27
built to compute it. Now there's a way of saying that that should not be surprising. Going through this--0:51:36
I mean, I don't have time to do this here, but the number of functions is very large. If there's a certain number of answers possible and a certain0:51:48
number of inputs possible, then it's the number of answers raised to the number inputs is the number of possible functions. On one variable.0:51:58
Now that's always bigger than the thing you're raising to, the exponent. The number of functions is larger than the number of0:52:12
programs that one can write, by an infinity counting argument. And it's much larger. So there must be a lot of functions that can't be0:52:22
computed by programs. AUDIENCE: A few moments ago you were talking about specifications and automatic generation of solutions. Do you see any steps between specifications and solutions?0:52:37
PROFESSOR: Steps between. You mean, you're saying, how you go about constructing devices given that have specifications for the device? Sure. AUDIENCE: There's a lot of software engineering that goes0:52:48
through specifications through many layers of design and then implementation. PROFESSOR: Yes? AUDIENCE: I was curious if you think that's realistic. PROFESSOR: Well I think that some of it's realistic and0:52:57
some of it isn't. I mean, surely if I want to build an electrical filter and I have a rather interesting possibility.0:53:07
Supposing I want to build a thing that matches some power output to the radio transmitter, to some antenna.0:53:19
And I'm really out of this power-- it's output tube out here. And the problem is that they have different impedances. I want them to match the impedances. I also want to make a filter in there which is going to get0:53:29
rid of some harmonic radiation. Well one old-fashioned technique for doing this is called image impedances, or something like that.0:53:38
And what you do is you say you have a basic module called an L-section. Looks like this.0:53:47
If I happen to connect this to some resistance, r, and if I make this impedance x, xl, and if it happens to be q times r, then this produces a low pass filter with a q square plus0:53:59
one impedance match. Just what I need. Because now I can take two of these, hook them together like this.0:54:11
OK, and I take another one and I'll hook them together like that. And I have two L-sections hooked together.0:54:20
And this will step the impedance down to one that I know, and this will step it up to one I know. Each of these is a low pass filter getting rid of some harmonics. It's good filter, it's called a pie-section filter.0:54:30
Great. Except for the fact that in doing what I just did, I've made a terrible inefficiency in this system. I've made two coils where I should have made one.0:54:41
And the problem with most software engineering art is that there's no mechanism, other than peephole optimization and compilers, for getting rid of the0:54:50
redundant parts that are constructed when doing top down design. It's even worse, there are lots of very important structures that you can't construct at all this way.0:55:01
So I think that the standard top down design is a rather shallow business. Doesn't really capture what people want to do in design. I'll give you another electrical example.0:55:10
Electrical examples are so much clearer than computational examples, because computation examples require a certain degree of complexity to explain them. But one of my favorite examples in the electrical0:55:19
world is how would I ever come up with the output stage of this inter-stage connection in an IF amplifier. It's a little transistor here, and let's see.0:55:32
Well I'm going to have a tank, and I'm going to hook this up to, say, I'm going to link-couple that to the input0:55:43
of the next stage. Here's a perfectly plausible plan-- well except for the fact that since I put that going up I should make that going that way.0:55:53
Here's a perfectly plausible plan for a-- no I shouldn't. I'm dumb. Excuse me. Doesn't matter. The point is [UNINTELLIGIBLE] plan for a couple [UNINTELLIGIBLE]0:56:02
stages together. Now what the problem is is what's this hierarchically? It's not one thing. Hierarchically it doesn't make any sense at all.0:56:11
It's the inductance of a tuned circuit, it's the primary of a transformer, and it's also the DC path by which bias0:56:22
conditions get to the collector of that transistor. And there's no simple top-down design that's going to produce a structure like that with so many overlapping uses for a0:56:33
particular thing. Playing Scrabble, where you have to do triple word scores, or whatever, is not so easy in top-down design strategy.0:56:44
Yet most of real engineering is based on getting the most oomph for effort. And that's what you're seeing here.0:56:54
Yeah? AUDIENCE: Is this the last question? [LAUGHTER]0:57:18
PROFESSOR: Apparently so. Thank you. [APPLAUSE]0:57:39
[MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]