SICP

Lecture 1A | MIT 6.001 Structure and Interpretation, 1986
Lecture 1B | MIT 6.001 Structure and Interpretation, 1986
Lecture 2A | MIT 6.001 Structure and Interpretation, 1986
Lecture 2B | MIT 6.001 Structure and Interpretation, 1986
Lecture 3A | MIT 6.001 Structure and Interpretation, 1986
Lecture 3B | MIT 6.001 Structure and Interpretation, 1986
Lecture 4A | MIT 6.001 Structure and Interpretation, 1986
Lecture 4B | MIT 6.001 Structure and Interpretation, 1986
Lecture 5A | MIT 6.001 Structure and Interpretation, 1986
Lecture 5B | MIT 6.001 Structure and Interpretation, 1986
Lecture 6A | MIT 6.001 Structure and Interpretation, 1986
Lecture 6B | MIT 6.001 Structure and Interpretation, 1986
Lecture 7A | MIT 6.001 Structure and Interpretation, 1986
Lecture 7B | MIT 6.001 Structure and Interpretation, 1986
Lecture 8A | MIT 6.001 Structure and Interpretation, 1986
Lecture 8B | MIT 6.001 Structure and Interpretation, 1986
Lecture 9A | MIT 6.001 Structure and Interpretation, 1986
Lecture 9B | MIT 6.001 Structure and Interpretation, 1986
Lecture 10A | MIT 6.001 Structure and Interpretation, 1986
Lecture 10B | MIT 6.001 Structure and Interpretation, 1986

`0:00:00`Lecture 1A | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING]0:00:14PROFESSOR: I'd like to welcome you to this course on computer science.0:00:28Actually, that's a terrible way to start. Computer science is a terrible name for this business. First of all, it's not a science. It might be engineering or it might be art, but we'll0:00:40actually see that computer so-called science actually has a lot in common with magic, and we'll see that in this course. So it's not a science. It's also not really very much about computers.0:00:53And it's not about computers in the same sense that physics is not really about particle accelerators, and biology is0:01:02not really about microscopes and petri dishes. And it's not about computers in the same sense that geometry is not really about using surveying instruments.0:01:16In fact, there's a lot of commonality between computer science and geometry. Geometry, first of all, is another subject with a lousy name.0:01:25The name comes from Gaia, meaning the Earth, and metron, meaning to measure. Geometry originally meant measuring the Earth or surveying.0:01:34And the reason for that was that, thousands of years ago, the Egyptian priesthood developed the rudiments of geometry in order to figure out how to restore the0:01:45boundaries of fields that were destroyed in the annual flooding of the Nile. And to the Egyptians who did that, geometry really was the use of surveying instruments.0:01:55Now, the reason that we think computer science is about computers is pretty much the same reason that the Egyptians thought geometry was about surveying instruments.0:02:04And that is, when some field is just getting started and you don't really understand it very well, it's very easy to confuse the essence of what you're doing with the tools0:02:15that you use. And indeed, on some absolute scale of things, we probably know less about the essence of computer science than the0:02:25ancient Egyptians really knew about geometry. Well, what do I mean by the essence of computer science? What do I mean by the essence of geometry?0:02:34See, it's certainly true that these Egyptians went off and used surveying instruments, but when we look back on them after a couple of thousand years, we say, gee, what they were doing, the important stuff they were doing, was to0:02:45begin to formalize notions about space and time, to start a way of talking about mathematical truths formally.0:02:57That led to the axiomatic method. That led to sort of all of modern mathematics, figuring out a way to talk precisely about so-called declarative0:03:08knowledge, what is true. Well, similarly, I think in the future people will look back and say, yes, those primitives in the 20th century0:03:18were fiddling around with these gadgets called computers, but really what they were doing is starting to learn how to formalize intuitions about process, how0:03:32to do things, starting to develop a way to talk0:03:47precisely about how-to knowledge, as opposed to geometry that talks about what is true.0:03:56Let me give you an example of that. Let's take a look. Here is a piece of mathematics that says what0:04:08a square root is. The square root of X is the number Y, such that Y squared0:04:17is equal to X and Y is greater than 0. Now, that's a fine piece of mathematics, but just telling you what a square root is doesn't really say anything0:04:26about how you might go out and find one. So let's contrast that with a piece of imperative knowledge,0:04:37how you might go out and find a square root. This, in fact, also comes from Egypt, not ancient, ancient Egypt. This is an algorithm due to Heron of Alexandria, called0:04:50how to find a square root by successive averaging. And what it says is that, in order to find a square root,0:05:03you make a guess, you improve that guess-- and the way you improve the guess is to average the guess0:05:12and X over the guess, and we'll talk a little bit later about why that's a reasonable thing-- and you keep improving the guess until it's good enough. That's a method. That's how to do something as opposed to declarative0:05:25knowledge that says what you're looking for. That's a process.0:05:34Well, what's a process in general? It's kind of hard to say. You can think of it as like a magical spirit that sort of0:05:45lives in the computer and does something. And the thing that directs a process is a pattern of rules0:05:56called a procedure. So procedures are the spells, if you like, that control0:06:05these magical spirits that are the processes. I guess you know everyone needs a magical language, and sorcerers, real sorcerers, use ancient Arcadian or Sumerian0:06:17or Babylonian or whatever. We're going to conjure our spirits in a magical language called Lisp, which is a language designed for talking0:06:26about, for casting the spells that are procedures to direct the processes. Now, it's very easy to learn Lisp. In fact, in a few minutes, I'm going to teach you,0:06:35essentially, all of Lisp. I'm going to teach you, essentially, all of the rules. And you shouldn't find that particularly surprising. That's sort of like saying it's very easy to learn the0:06:46rules of chess. And indeed, in a few minutes, you can tell somebody the rules of chess. But of course, that's very different from saying you understand the implications of those rules and how to use0:06:55those rules to become a masterful chess player. Well, Lisp is the same way. We're going to state the rules in a few minutes, and it'll be very easy to see. But what's really hard is going to be the implications0:07:06of those rules, how you exploit those rules to be a master programmer. And the implications of those rules are going to take us0:07:15the, well, the whole rest of the subject and, of course, way beyond. OK, so in computer science, we're in the business of0:07:26formalizing this sort of how-to imperative knowledge, how to do stuff. And the real issues of computer science are, of0:07:35course, not telling people how to do square roots. Because if that was all it was, there wouldn't be no big deal. The real problems come when we try to build very, very large0:07:45systems, computer programs that are thousands of pages long, so long that nobody can really hold them in their heads all at once.0:07:54And the only reason that that's possible is because there are techniques for controlling the complexity of0:08:17these large systems. And these techniques that are controlling complexity are what this course is really about. And in some sense, that's really what0:08:26computer science is about. Now, that may seem like a very strange thing to say. Because after all, a lot of people besides computer scientists deal with controlling complexity.0:08:37A large airliner is an extremely complex system, and the aeronautical engineers who design that are dealing with immense complexity.0:08:47But there's a difference between that kind of complexity and what we deal with in computer science. And that is that computer science, in some0:08:57sense, isn't real. You see, when an engineer is designing a physical system,0:09:07that's made out of real parts. The engineers who worry about that have to address problems of tolerance and approximation and noise in the system.0:09:16So for example, as an electrical engineer, I can go off and easily build a one-stage amplifier or a two-stage amplifier, and I can imagine cascading a lot of0:09:25them to build a million-stage amplifier. But it's ridiculous to build such a thing, because long before the millionth stage, the thermal noise in those components way at the beginning is going to get0:09:34amplified and make the whole thing meaningless. Computer science deals with idealized components.0:09:44We know as much as we want about these little program and data pieces that we're fitting things together. We don't have to worry about tolerance.0:09:53And that means that, in building a large program, there's not all that much difference between what I can build and what I can imagine, because the parts are these0:10:06abstract entities that I know as much as I want. I know about them as precisely as I'd like. So as opposed to other kinds of engineering, where the0:10:15constraints on what you can build are the constraints of physical systems, the constraints of physics and noise and approximation, the constraints imposed in0:10:24building large software systems are the limitations of our own minds. So in that sense, computer science is like an abstract form of engineering.0:10:33It's the kind of engineering where you ignore the constraints that are imposed by reality.0:10:42Well, what are some of these techniques? They're not special to computer science. First technique, which is used in all of engineering, is a0:10:54kind of abstraction called black-box abstraction.0:11:07Take something and build a box about it. Let's see, for example, if we looked at that square root0:11:19method, I might want to take that and build a box.0:11:29That sort of says, to find the square root of X. And that0:11:38might be a whole complicated set of rules. And that might end up being a kind of thing where I can put in, say, 36 and say, what's the square root of 36?0:11:50And out comes six. And the important thing is that I'd like to design that0:11:59so that if George comes along and would like to compute, say, the square root of A plus the square root of B, he can0:12:11take this thing and use it as a module without having to look inside and build something that looks like this, like an A and a B and a square root box and another0:12:24square root box and then something that adds that would put out the answer.0:12:33And you can see, just from the fact that I want to do that, is from George's point of view, the internals of what's in here should not be important.0:12:44So for instance, it shouldn't matter that, when I wrote this, I said I want to find the square root of X. I could have said the square root of Y, or the square root of A, or0:12:54anything at all. That's the fundamental notion of putting something in a box0:13:03using black-box abstraction to suppress detail. And the reason for that is you want to go off and build bigger boxes. Now, there's another reason for doing black-box0:13:13abstraction other than you want to suppress detail for building bigger boxes. Sometimes you want to say that your way of doing something,0:13:24your how-to method, is an instance of a more general thing, and you'd like your language to be able to express0:13:33that generality. Let me show you another example sticking with square roots. Let's go back and take another look at that slide with the0:13:42square root algorithm on it. Remember what that says. That says, in order to do something, I make a guess, and I improve that guess, and I sort of keep0:13:53improving that guess. So there's the general strategy of, I'm looking for something, and the way I find it is that I0:14:02keep improving it. Now, that's a particular case of another kind of strategy for finding a fixed point of something.0:14:14So you have a fixed point of a function. A fixed point of a function is something, is a value.0:14:26A fixed point of a function F is a value Y, such that F of Y equals Y. And the way I might do that is start with a guess.0:14:41And then if I want something that doesn't change when I keep applying F, is I'll keep applying F over and over until that result doesn't change very much.0:14:50So there's a general strategy. And then, for example, to compute the square root of X, I can try and find a fixed point of the function which0:15:00takes Y to the average of X/Y. And the idea that is that if I really had Y equal to the square root of X, then Y and0:15:09X/Y would be the same value. They'd both be the square root of X, because X over the square root of X is the square root of X.0:15:19And so the average if Y were equal to the square of X, then the average wouldn't change. So the square root of X is a fixed point of that particular function.0:15:30Now, what I'd like to have, I'd like to express the general strategy for finding fixed points. So what I might imagine doing, is to find, is to be able to use my language to define a box that says "fixed point," just like I could make a box that says "square root." And I'd like to be able to express this in my language.0:15:56So I'd like to express not only the imperative how-to knowledge of a particular thing like square root, but I'd like to be able to express the imperative knowledge of0:16:05how to do a general thing like how to find fixed point. And in fact, let's go back and look at that slide again.0:16:15See, not only is this a piece of imperative knowledge, how to find a fixed point, but over here on the bottom,0:16:27there's another piece of imperative knowledge which says, one way to compute square root is to apply this general fixed point method.0:16:36So I'd like to also be able to express that imperative knowledge. What would that look like? That would say, this fixed point box is such that if I0:16:46input to it the function that takes Y to the average of Y0:16:56and X/Y, then what should come out of that fixed point box is a method for finding square roots.0:17:08So in these boxes we're building, we're not only building boxes that you input numbers and output numbers, we're going to be building in boxes that, in effect, compute0:17:19methods like finding square root. And my take is their inputs functions, like Y goes to the average of Y and X/Y. The reason we want to do that, the0:17:32reason this is a procedure, will end up being a procedure, as we'll see, whose value is another procedure, the reason we want to do that is because procedures are going to be our0:17:42ways of talking about imperative knowledge. And the way to make that very powerful is to be able to talk about other kinds of knowledge.0:17:53So here is a procedure that, in effect, talks about another procedure, a general strategy that itself talks about general strategies.0:18:04Well, our first topic in this course-- there'll be three major topics-- will be black-box abstraction. Let's look at that in a little bit more detail.0:18:15What we're going to do is we will start out talking about how Lisp is built up out of primitive objects.0:18:27What does the language supply with us? And we'll see that there are primitive procedures and primitive data. Then we're going to see, how do you take those primitives0:18:38and combine them to make more complicated things, means of combination? And what we'll see is that there are ways of putting things together, putting primitive procedures together0:18:47to make more complicated procedures. And we'll see how to put primitive data together to make compound data. Then we'll say, well, having made those compounds things,0:18:59how do you abstract them? How do you put those black boxes around them so you can use them as components in more complex things?0:19:08And we'll see that's done by defining procedures and a technique for dealing with compound data called data abstraction. And then, what's maybe the most important thing, is going0:19:19from just the rules to how does an expert work? How do you express common patterns of doing things, like saying, well, there's a general method of fixed point0:19:28and square root is a particular case of that? And we're going to use-- I've already hinted at it-- something called higher-order procedures, namely procedures whose inputs and outputs are0:19:40themselves procedures. And then we'll also see something very interesting. We'll see, as we go further and further on and become more abstract, there'll be very--0:19:50well, the line between what we consider to be data and what we consider to be procedures is going to blur at an incredible rate.0:20:03Well, that's our first subject, black-box abstraction. Let's look at the second topic. I can introduce it like this.0:20:13See, suppose I want to express the idea-- remember, we're talking about ideas--0:20:22suppose I want to express the idea that I can take something and multiply it by the sum of two other things.0:20:36So for example, I might say, if I had one and three and multiply that by two, I get eight. But I'm talking about the general idea of what's called linear combination, that you can add two things and0:20:46multiply them by something else. It's very easy when I think about it for numbers, but suppose I also want to use that same idea to think about,0:20:56I could add two vectors, a1 and a2, and then scale them by some factor x and get another vector. Or I might say, I want to think about a1 and a2 as being0:21:08polynomials, and I might want to add those two polynomials and then multiply them by two to get a more complicated one.0:21:20Or a1 and a2 might be electrical signals, and I might want to think about summing those two electrical signals and then putting the whole thing through an0:21:29amplifier, multiplying it by some factor of two or something. The idea is I want to think about the general notion of that.0:21:38Now, if our language is going to be good language for expressing those kind of general ideas, if I really,0:21:47really can do that, I'd like to be able to say I'm going to multiply by x the sum of a1 and a2, and I'd like that to0:22:03express the general idea of all different kinds of things that a1 and a2 could be. Now, if you think about that, there's a problem, because after all, the actual primitive operations that go0:22:16on in the machine are obviously going to be different if I'm adding two numbers than if I'm adding two polynomials, or if I'm adding the representation of two0:22:25electrical signals or wave forms. Somewhere, there has to be the knowledge of the kinds of various things that you can add and the ways of adding them.0:22:37Now, to construct such a system, the question is, where do I put that knowledge? How do I think about the different kinds of choices I have? And if tomorrow George comes up with a new kind of object0:22:48that might be added and multiplied, how do I add George's new object to the system without screwing up everything that was already there?0:22:57Well, that's going to be the second big topic, the way of controlling that kind of complexity. And the way you do that is by establishing conventional0:23:07interfaces, agreed upon ways of plugging things together.0:23:20Just like in electrical engineering, people have standard impedances for connectors, and then you know if you build something with one of those standard impedances, you can plug it together with something else.0:23:32So that's going to be our second large topic, conventional interfaces. What we're going to see is, first, we're going to talk about the problem of generic operations, which is the one I0:23:41alluded to, things like "plus" that have to work with all different kinds of data.0:23:52So we talk about generic operations. Then we're going to talk about really large-scale structures. How do you put together very large programs that model the0:24:01kinds of complex systems in the real world that you'd like to model? And what we're going to see is that there are two very important metaphors for putting together such systems.0:24:11One is called object-oriented programming, where you sort of think of your system as a kind of society full of little things that interact by sending0:24:21information between them. And then the second one is operations on aggregates, called streams, where you think of a large system put together kind of like a signal processing engineer puts0:24:33together a large electrical system. That's going to be our second topic.0:24:43Now, the third thing we're going to come to, the third basic technique for controlling complexity, is making new languages. Because sometimes, when you're sort of overwhelmed by the0:24:54complexity of a design, the way that you control that complexity is to pick a new design language. And the purpose of the new design language will be to0:25:03highlight different aspects of the system. It will suppress some kinds of details and emphasize other kinds of details.0:25:12This is going to be the most magical part of the course. We're going to start out by actually looking at the technology for building new computer languages.0:25:21The first thing we're going to do is actually build in Lisp. We're going to express in Lisp the process of interpreting0:25:32Lisp itself. And that's going to be a very sort of self-circular thing. There's a little mystical symbol that has to do with that. The process of interpreting Lisp is sort of a giant wheel0:25:45of two processes, apply and eval, which sort of constantly reduce expressions to each other. Then we're going to see all sorts of other magical things.0:25:54Here's another magical symbol. This is sort of the Y operator, which is, in some sense, the expression of infinity inside0:26:04our procedural language. We'll take a look at that. In any case, this section of the course is called Metalinguistic Abstraction, abstracting by talking about0:26:24how you construct new languages. As I said, we're going to start out by looking at the0:26:34process of interpretation. We're going to look at this apply-eval loop, and build Lisp. Then, just to show you that this is very general, we're0:26:44going to use exactly the same technology to build a very different kind of language, a so-called logic programming language, where you don't really talk about procedures at all that have inputs and outputs.0:26:54What you do is talk about relations between things. And then finally, we're going to talk about how you implement these things very concretely on the very0:27:04simplest kind of machines. We'll see something like this. This is a picture of a chip, which is the Lisp interpreter0:27:14that we will be talking about then in hardware. Well, there's an outline of the course, three big topics.0:27:24Black-box abstraction, conventional interfaces, metalinguistic abstraction. Now, let's take a break now and then we'll get started.0:27:33[MUSIC PLAYING]0:28:04Let's actually start in learning Lisp now. Actually, we'll start out by learning something much more important, maybe the very most important thing in this course, which is not Lisp, in particular, of course, but0:28:16rather a general framework for thinking about languages that I already alluded to. When somebody tells you they're going to show you a language, what you should say is, what I'd like you to tell0:28:27me is what are the primitive elements?0:28:37What does the language come with? Then, what are the ways you put those together? What are the means of combination?0:28:50What are the things that allow you to take these primitive elements and build bigger things out of them? What are the ways of putting things together?0:29:01And then, what are the means of abstraction? How do we take those complicated things and draw0:29:15those boxes around them? How do we name them so that we can now use them as if they were primitive elements in making still more complex things? And so on, and so on, and so on.0:29:26So when someone says to you, gee, I have a great new computer language, you don't say, how many characters does it take to invert a matrix?0:29:35It's irrelevant. What you say is, if the language did not come with matrices built in or with something else built in, how could I then build that thing?0:29:45What are the means of combination which would allow me to do that? And then, what are the means of abstraction which allow me then to use those as elements in making more complicated0:29:55things yet? Well, we're going to see that Lisp has some primitive data and some primitive procedures.0:30:05In fact, let's really start. And here's a piece of primitive data in Lisp, number three.0:30:16Actually, if I'm being very pedantic, that's not the number three. That's some symbol that represents Plato's concept of the number three.0:30:27And here's another. Here's some more primitive data in Lisp, 17.4. Or actually, some representation of 17.4.0:30:40And here's another one, five. Here's another primitive object that's built in Lisp, addition.0:30:52Actually, to use the same kind of pedantic-- this is a name for the primitive method of adding things. Just like this is a name for Plato's number three, this is0:31:02a name for Plato's concept of how you add things. So those are some primitive elements.0:31:12I can put them together. I can say, gee, what's the sum of three and 17.4 and five? And the way I do that is to say, let's apply the sum0:31:25operator to these three numbers. And I should get, what? eight, 17. 25.4.0:31:34So I should be able to ask Lisp what the value of this is, and it will return 25.4.0:31:43Let's introduce some names. This thing that I typed is called a combination.0:31:56And a combination consists, in general, of applying an operator-- so this is an operator--0:32:09to some operands. These are the operands.0:32:21And of course, I can make more complex things. The reason I can get complexity out of this is because the operands themselves, in general, can be0:32:30combinations. So for instance, I could say, what is the sum of three and the product of five and six and eight and two?0:32:45And I should get-- let's see-- 30, 40, 43. So Lisp should tell me that that's 43.0:32:56Forming combinations is the basic needs of combination that we'll be looking at. And then, well, you see some syntax here.0:33:10Lisp uses what's called prefix notation, which means that the operator is written to the left of the operands.0:33:25It's just a convention. And notice, it's fully parenthesized. And the parentheses make it completely unambiguous. So by looking at this, I can see that there's the operator,0:33:36and there are one, two, three, four operands. And I can see that the second operand here is itself some0:33:46combination that has one operator and two operands. Parentheses in Lisp are a little bit, or are very unlike0:33:55parentheses in conventional mathematics. In mathematics, we sort of use them to mean grouping, and it sort of doesn't hurt if sometimes you leave out parentheses if people understand0:34:04that that's a group. And in general, it doesn't hurt if you put in extra parentheses, because that maybe makes the grouping more distinct. Lisp is not like that.0:34:13In Lisp, you cannot leave out parentheses, and you cannot put in extra parentheses, because putting in parentheses always means, exactly and precisely, this is a0:34:23combination which has meaning, applying operators to operands. And if I left this out, if I left those parentheses out, it0:34:32would mean something else. In fact, the way to think about this, is really what I'm doing when I write something like this is writing a tree.0:34:42So this combination is a tree that has a plus and then a thee and then a something else and an eight and a two.0:34:54And then this something else here is itself a little subtree that has a star and a five and a six.0:35:03And the way to think of that is, really, what's going on are we're writing these trees, and parentheses are just a way0:35:13to write this two-dimensional structure as a linear character string. Because at least when Lisp first started and people had0:35:22teletypes or punch cards or whatever, this was more convenient. Maybe if Lisp started today, the syntax of Lisp would look like that.0:35:31Well, let's look at what that actually looks like on the computer. Here I have a Lisp interaction set up. There's a editor.0:35:41And on the top, I'm going to type some values and ask Lisp what they are. So for instance, I can say to Lisp, what's the value of that symbol? That's three.0:35:50And I ask Lisp to evaluate it. And there you see Lisp has returned on the bottom, and said, oh yeah, that's three. Or I can say, what's the sum of three and four and eight?0:36:06What's that combination? And ask Lisp to evaluate it. That's 15.0:36:16Or I can type in something more complicated. I can say, what's the sum of the product of three and the0:36:27sum of seven and 19.5? And you'll notice here that Lisp has something built in0:36:37that helps me keep track of all these parentheses. Watch as I type the next closed parentheses, which is going to close the combination starting with the star. The opening one will flash.0:36:48Here, I'll rub those out and do it again. Type close, and you see that closes the plus. Close again, that closes the star.0:36:57Now I'm back to the sum, and maybe I'm going to add that all to four. That closes the plus. Now I have a complete combination, and I can ask Lisp for the value of that.0:37:07That kind of paren balancing is something that's built into a lot of Lisp systems to help you keep track, because it is kind of hard just by hand doing all these parentheses.0:37:16There's another kind of convention for keeping track of parentheses. Let me write another complicated combination. Let's take the sum of the product of three and five and0:37:33add that to something. And now what I'm going to do is I'm going to indent so that the operands are written vertically. Which the sum of that and the product of 47 and--0:37:47let's say the product of 47 with a difference of 20 and 6.8. That means subtract 6.8 from 20.0:37:58And then you see the parentheses close. Close the minus. Close the star. And now let's get another operator. You see the Lisp editor here is indenting to the right0:38:08position automatically to help me keep track. I'll do that again. I'll close that last parentheses again. You see it balances the plus.0:38:20Now I can say, what's the value of that? So those two things, indenting to the right level, which is0:38:29called pretty printing, and flashing parentheses, are two things that a lot of Lisp systems have built in to help you keep track. And you should learn how to use them.0:38:42Well, those are the primitives. There's a means of combination. Now let's go up to the means of abstraction. I'd like to be able to take the idea that I do some0:38:52combination like this, and abstract it and give it a simple name, so I can use that as an element. And I do that in Lisp with "define." So I can say, for0:39:01example, define A to be the product of five and five.0:39:17And now I could say, for example, to Lisp, what is the product of A and A?0:39:26And this should be 25, and this should be 625. And then, crucial thing, I can now use A--0:39:36here I've used it in a combination-- but I could use that in other more complicated things that I name in turn. So I could say, define B to be the sum of, we'll say, A and0:39:53the product of five and A. And then close the plus.0:40:03Let's take a look at that on the computer and see how that looks. So I'll just type what I wrote on the board. I could say, define A to be the product of five and five.0:40:23And I'll tell that to Lisp. And notice what Lisp responded there with was an A in the bottom. In general, when you type in a definition in Lisp, it responds with the symbol being defined.0:40:35Now I could say to Lisp, what is the product of A and A? And it says that's 625.0:40:46I can define B to be the sum of A and the product of five0:40:59and A. Close a paren closes the star. Close the plus. Close the "define." Lisp says, OK, B, there on the bottom.0:41:11And now I can say to Lisp, what's the value of B? And I can say something more complicated, like what's the sum of A and the quotient of B and five?0:41:26That slash is divide, another primitive operator. I've divided B by five, added it to A. Lisp says, OK, that's 55.0:41:36So there's what it looks like. There's the basic means of defining something. It's the simplest kind of naming, but it's not really0:41:47very powerful. See, what I'd really like to name-- remember, we're talking about general methods-- I'd like to name, oh, the general idea that, for0:41:56example, I could multiply five by five, or six by six, or0:42:101,001 by 1,001, 1,001.7 by 1,001.7. I'd like to be able to name the general idea of0:42:22multiplying something by itself. Well, you know what that is. That's called squaring.0:42:31And the way I can do that in Lisp is I can say, define to0:42:43square something x, multiply x by itself.0:42:57And then having done that, I could say to Lisp, for example, what's the square of 10?0:43:06And Lisp will say 100. So now let's actually look at that a little more closely. Right, there's the definition of square.0:43:17To square something, multiply it by itself. You see this x here.0:43:26That x is kind of a pronoun, which is the something that I'm going to square. And what I do with it is I multiply x, I0:43:35multiply it by itself.0:43:44OK. So there's the notation for defining a procedure. Actually, this is a little bit confusing, because this is sort of how I might use square.0:43:53And I say square root of x or square root of 10, but it's not making it very clear that I'm actually naming something.0:44:02So let me write this definition in another way that makes it a little bit more clear that I'm naming something. I'll say, "define" square to be lambda of x times xx.0:44:36Here, I'm naming something square, just like over here, I'm naming something A. The thing that I'm naming square-- here, the thing I named A was the value of this combination.0:44:49Here, the thing that I'm naming square is this thing that begins with lambda, and lambda is Lisp's way of saying make a procedure.0:45:00Let's look at that more closely on the slide. The way I read that definition is to say, I define square to be make a procedure--0:45:12that's what the lambda is-- make a procedure with an argument named x. And what it does is return the results of0:45:22multiplying x by itself. Now, in general, we're going to be using this top form of0:45:32defining, just because it's a little bit more convenient. But don't lose sight of the fact that it's really this. In fact, as far as the Lisp interpreter's concerned,0:45:41there's no difference between typing this to it and typing this to it. And there's a word for that, sort of syntactic sugar.0:45:54What syntactic sugar means, it's having somewhat more convenient surface forms for typing something. So this is just really syntactic sugar for this0:46:04underlying Greek thing with the lambda. And the reason you should remember that is don't forget that, when I write something like this, I'm really naming something.0:46:14I'm naming something square, and the something that I'm naming square is a procedure that's getting constructed. Well, let's look at that on the computer, too.0:46:24So I'll come and I'll say, define square of x to be times xx.0:46:49Now I'll tell Lisp that. It says "square." See, I've named something "square." Now, having done that, I can ask Lisp for, what's0:47:00the square of 1,001? Or in general, I could say, what's the square of the sum0:47:14of five and seven? The square of 12's 144.0:47:25Or I can use square itself as an element in some combination. I can say, what's the sum of the square of three and the0:47:36square of four? nine and 16 is 25. Or I can use square as an element in some much more0:47:49complicated thing. I can say, what's the square of, the sqare of, the square of 1,001?0:48:07And there's the square of the square of the square of 1,001. Or I can say to Lisp, what is square itself? What's the value of that?0:48:17And Lisp returns some conventional way of telling me that that's a procedure. It says, "compound procedure square." Remember, the value of square is this procedure, and the thing with the stars0:48:30and the brackets are just Lisp's conventional way of describing that. Let's look at two more examples of defining.0:48:45Here are two more procedures. I can define the average of x and y to be the sum of x and y divided by two.0:48:54Or having had average and mean square, having had average and square, I can use that to talk about the mean square of0:49:03something, which is the average of the square of x and the square of y. So for example, having done that, I could say, what's the0:49:13mean square of two and three?0:49:24And I should get the average of four and nine, which is 6.5. The key thing here is that, having defined square, I can0:49:37use it as if it were primitive. So if we look here on the slide, if I look at mean square, the person defining mean square doesn't have to0:49:50know, at this point, whether square was something built into the language or whether it was a procedure that was defined.0:49:59And that's a key thing in Lisp, that you do not make arbitrary distinctions between things that happen to be0:50:08primitive in the language and things that happen to be built in. A person using that shouldn't even have to know. So the things you construct get used with all the power0:50:17and flexibility as if they were primitives. In fact, you can drive that home by looking on the computer one more time. We talked about plus.0:50:26And in fact, if I come here on the computer screen and say, what is the value of plus? Notice what Lisp types out.0:50:36On the bottom there, it typed out, "compound procedure plus." Because, in this system, it turns out that the addition operator is itself a compound procedure.0:50:45And if I didn't just type that in, you'd never know that, and it wouldn't make any difference anyway. We don't care. It's below the level of the abstraction that we're dealing with.0:50:54So the key thing is you cannot tell, should not be able to tell, in general, the difference between things that are built in and things that are compound.0:51:03Why is that? Because the things that are compound have an abstraction wrapper wrapped around them. We've seen almost all the elements of Lisp now.0:51:12There's only one more we have to look at, and that is how to make a case analysis. Let me show you what I mean. We might want to think about the mathematical definition of0:51:22the absolute value functions. I might say the absolute value of x is the function which has the property that it's negative of x.0:51:35For x less than zero, it's zero for x equal to zero. And it's x for x greater than zero.0:51:49And Lisp has a way of making case analyses. Let me define for you absolute value. Say define the absolute value of x is conditional.0:52:03This means case analysis, COND. If x is less than zero, the answer is negate x.0:52:22What I've written here is a clause. This whole thing is a conditional clause,0:52:33and it has two parts. This part here is a predicate or a condition.0:52:44That's a condition. And the condition is expressed by something called a predicate, and a predicate in Lisp is some sort of thing that returns either true or false.0:52:53And you see Lisp has a primitive procedure, less-than, that tests whether something is true or false. And the other part of a clause is an action or a thing to do,0:53:06in the case where that's true. And here, what I'm doing is negating x. The negation operator, the minus sign in Lisp is a little bit funny.0:53:17If there's two or more arguments, if there's two arguments it subtracts the second one from the first, and we saw that. And if there's one argument, it negates it. So this corresponds to that.0:53:27And then there's another COND clause. It says, in the case where x is equal to zero, the answer is zero.0:53:37And in the case where x is greater than zero, the answer is x. Close that clause.0:53:46Close the COND. Close the definition. And there's the definition of absolute value. And you see it's the case analysis that looks very much like the case analysis you use in mathematics.0:53:58There's a somewhat different way of writing a restricted case analysis. Often, you have a case analysis where you only have one case, where you test something, and then depending0:54:08on whether it's true or false, you do something. And here's another definition of absolute value which looks almost the same, which says, if x is less than zero, the0:54:21result is negate x. Otherwise, the answer is x. And we'll be using "if" a lot. But again, the thing to remember is that this form of0:54:30absolute value that you're looking at here, and then this one over here that I wrote on the board, are essentially the same.0:54:39And "if" and COND are-- well, whichever way you like it. You can think of COND as syntactic sugar for "if," or you can think of "if" as syntactic sugar for COND, and it doesn't make any difference.0:54:48The person implementing a Lisp system will pick one and implement the other in terms of that. And it doesn't matter which one you pick.0:55:02Why don't we break now, and then take some questions. How come sometimes when I write define, I put an open0:55:11paren here and say, define open paren something or other, and sometimes when I write this, I don't put an open paren?0:55:22The answer is, this particular form of "define," where you say define some expression, is this very special thing for defining procedures.0:55:33But again, what it really means is I'm defining this symbol, square, to be that. So the way you should think about it is what "define" does0:55:44is you write "define," and the second thing you write is the symbol here-- no open paren-- the symbol you're defining and what you're defining it to be.0:55:54That's like here and like here. That's sort of the basic way you use "define." And then, there's this special syntactic trick which allows you to0:56:05define procedures that look like this. So the difference is, it's whether or not you're defining a procedure. [MUSIC PLAYING]0:56:38Well, believe it or not, you actually now know enough Lisp to write essentially any numerical procedure that you'd write in a language like FORTRAN or Basic or whatever,0:56:49or, essentially, any other language. And you're probably saying, that's not believable, because you know that these languages have things like "for statements," and "do until while" or something.0:57:00But we don't really need any of that. In fact, we're not going to use any of that in this course. Let me show you.0:57:10Again, looking back at square root, let's go back to this square root algorithm of Heron of Alexandria. Remember what that said.0:57:20It said, to find an approximation to the square root of X, you make a guess, you improve that guess by averaging the guess and X over the guess.0:57:32You keep improving that until the guess is good enough. I already alluded to the idea. The idea is that, if the initial guess that you took0:57:44was actually equal to the square root of X, then G here would be equal to X/G. So if you hit the square root, averaging them0:57:54wouldn't change it. If the G that you picked was larger than the square root of X, then X/G will be smaller than the square root of X, so0:58:03that when you average G and X/G, you get something in between. So if you pick a G that's too small, your answer will be too large.0:58:13If you pick a G that's too large, if your G is larger than the square root of X and X/G will be smaller than the square root of X. So averaging always gives you something in between.0:58:24And then, it's not quite trivial, but it's possible to show that, in fact, if G misses the square root of X by a little bit, the average of G and X/G will actually keep0:58:34getting closer to the square root of X. So if you keep doing this enough, you'll eventually get as close as you want. And then there's another fact, that you can always start out0:58:44this process by using 1 as an initial guess. And it'll always converge to the square root of X. So that's this method of successive averaging due to0:58:55Heron of Alexandria. Let's write it in Lisp. Well, the central idea is, what does it mean to try a0:59:05guess for the square root of X? Let's write that. So we'll say, define to try a guess for the square root of0:59:24X, what do we do? We'll say, if the guess is good enough to be a guess for0:59:44the square root of X, then, as an answer, we'll take the guess. Otherwise, we will try the improved guess.0:59:58We'll improve that guess for the square root of X, and we'll try that as a guess for the square root of X. Close1:00:09the "try." Close the "if." Close the "define." So that's how we try a guess. And then, the next part of the process said, in order to1:00:18compute square roots, we'll say, define to compute the1:00:28square root of X, we will try one as a guess for the square root of X. Well, we have to define a couple more things.1:00:40We have to say, how is a guess good enough? And how do we improve a guess? So let's look at that. The algorithm to improve a guess for the square root of1:00:53X, we average-- that was the algorithm-- we average the guess with the quotient of dividing X by the guess.1:01:03That's how we improve a guess. And to tell whether a guess is good enough, well, we have to decide something. This is supposed to be a guess for the square root of X, so one possible thing you can do is say, when you take that1:01:14guess and square it, do you get something very close to X? So one way to say that is to say, I square the guess, subtract X from that, and see if the absolute value of that1:01:26whole thing is less than some small number, which depends on my purposes.1:01:35So there's a complete procedure for how to compute the square root of X. Let's look at the structure of that a little bit.1:01:47I have the whole thing. I have the notion of how to compute a square root. That's some kind of module.1:01:56That's some kind of black box. It's defined in terms of how to try a guess for the square1:02:07root of X. "Try" is defined in terms of, well, telling whether something is good enough and telling1:02:16how to improve something. So good enough. "Try" is defined in terms of "good enough" and "improve."1:02:30And let's see what else I fill in. Well, I'll go down this tree. "Good enough" was defined in terms of absolute value, and square.1:02:40And improve was defined in terms of something called averaging and then some other primitive operator. Square root's defined in terms of "try." "Try" is defined in1:02:49terms of "good enough" and "improve," but also "try" itself. So "try" is also defined in terms of how to try itself.1:03:02Well, that may give you some problems. Your high school geometry teacher probably told you that it's naughty to try and define things in terms of themselves, because it doesn't1:03:13make sense. But that's false. Sometimes it makes perfect sense to define things in terms of themselves. And this is the case.1:03:22And we can look at that. We could write down what this means, and say, suppose I asked Lisp what the square root of two is.1:03:32What's the square root of two mean? Well, that means I try one as a guess for the1:03:42square root of two. Now I look. I say, gee, is one a good enough guess for the square root of two?1:03:51And that depends on the test that "good enough" does. And in this case, "good enough" will say, no, one is not a good enough guess for the square root of two. So that will reduce to saying, I have to try an improved--1:04:10improve one as a guess for the square root of two, and try that as a guess for the square root of two.1:04:19Improving one as a guess for the square root of two means I average one and two divided by one. So this is going to be average.1:04:29This piece here will be the average of one and the quotient of two by one.1:04:40That's this piece here. And this is 1.5.1:04:49So this square root of two reduces to trying one for the square root of two, which reduces to trying 1.5 as a1:05:03guess for the square root of two. So that makes sense. Let's look at the rest of the process. If I try 1.5, that reduces.1:05:141.5 turns out to be not good enough as a guess for the square root of two. So that reduces to trying the average of 1.5 and two divided1:05:23by 1.5 as a guess for the square root of two. That average turns out to be 1.333. So this whole thing reduces to trying 1.333 as a guess for1:05:34the square root of two. And then so on. That reduces to another called a "good enough," 1.4 something or other. And then it keeps going until the process finally stops with1:05:45something that "good enough" thinks is good enough, which, in this case, is 1.4142 something or other. So the process makes perfect sense.1:05:59This, by the way, is called a recursive definition.1:06:14And the ability to make recursive definitions is a source of incredible power. And as you can already see I've hinted at, it's the thing1:06:24that effectively allows you to do these infinite computations that go on until something is true, without having any other constricts other than the ability to call a procedure.1:06:35Well, let's see, there's one more thing. Let me show you a variant of this definition of square root here on the slide.1:06:46Here's sort of the same thing. What I've done here is packaged the definitions of "improve" and "good enough" and "try" inside "square1:06:55root." So, in effect, what I've done is I've built a square root box. So I've built a box that's the square root procedure that1:07:07someone can use. They might put in 36 and get out six. And then, packaged inside this box are the definitions of "try" and "good enough" and "improve."1:07:26So they're hidden inside this box. And the reason for doing that is that, if someone's using this square root, if George is using this square root, George probably doesn't care very much that, when I implemented1:07:39square root, I had things inside there called "try" and "good enough" and "improve." And in fact, Harry might have1:07:48a cube root procedure that has "try" and "good enough" and "improve." And in order to not get the whole system confused, it'd be good for Harry to package his internal procedures inside his cube root procedure.1:07:58Well, this is called block structure, this particular way of packaging internals inside of a definition.1:08:09And let's go back and look at the slide again. The way to read this kind of procedure is to say, to define "square root," well, inside that definition, I'll have the1:08:23definition of an "improve" and the definition of "good enough" and the definition of "try." And then, subject to those definitions, the way I do square root is to try one.1:08:36And notice here, I don't have to say one as a guess for the square root of X, because since it's all inside the square root, it sort of has this X known.1:08:54Let me summarize. We started out with the idea that what we're going to be doing is expressing imperative knowledge.1:09:04And in fact, here's a slide that summarizes the way we looked at Lisp. We started out by looking at some primitive elements in1:09:13addition and multiplication, some predicates for testing whether something is less-than or something's equal. And in fact, we saw really sneakily in the system we're1:09:22actually using, these aren't actually primitives, but it doesn't matter. What matters is we're going to use them as if they're primitives. We're not going to look inside. We also have some primitive data and some numbers.1:09:34We saw some means of composition, means of combination, the basic one being composing functions and building combinations with operators and operands.1:09:44And there were some other things, like COND and "if" and "define." But the main thing about "define," in particular,1:09:53was that it was the means of abstraction. It was the way that we name things. You can also see from this slide not only where we've been, but holes we have to fill in. At some point, we'll have to talk about how you combine1:10:03primitive data to get compound data, and how you abstract data so you can use large globs of data as if they were primitive.1:10:13So that's where we're going. But before we do that, for the next couple of lectures we're going to be talking about, first of all, how it is that1:10:25you make a link between these procedures we write and the processes that happen in the machine. And then, how it is that you start using the power of Lisp1:10:36to talk not only about these individual little computations, but about general conventional methods of doing things.1:10:45OK, are there any questions? AUDIENCE: Yes. If we defined A using parentheses instead of as we did, what would be the difference? PROFESSOR: If I wrote this, if I wrote that, what I would be1:10:58doing is defining a procedure named A. In this case, a procedure of no arguments, which, when I ran it, would1:11:07give me back five times five. AUDIENCE: Right. I mean, you come up with the same thing, except for you really got a different-- PROFESSOR: Right. And the difference would be, in the old one--1:11:16Let me be a little bit clearer here. Let's call this A, like here. And pretend here, just for contrast, I wrote, define D to1:11:35be the product of five and five. And the difference between those, let's think about interactions with the Lisp interpreter.1:11:45I could type in A and Lisp would return 25. I could type in D, if I just typed in D, Lisp would return1:12:01compound procedure D, because that's what it is. It's a procedure. I could run D. I could say, what's the value of running D?1:12:12Here is a combination with no operands. I see there are no operands. I didn't put any after D. And it would say, oh, that's 25.1:12:22Or I could say, just for completeness, if I typed in, what's the value of running A? I get an error.1:12:31The error would be the same one as over there. It'd be the error would say, sorry, 25, which is the value1:12:40of A, is not an operator that I can apply to something.

`0:00:00`Lecture 1B | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING BY J.S. BACH]0:00:14PROFESSOR: Hi. You've seen that the job of a programmer is to design processes that accomplish particular goals, such as0:00:24finding the square roots of numbers or other sorts of things you might want to do. We haven't introduced anything else yet. Of course, the way in which a programmer does this is by0:00:34constructing spells, which are constructed out of procedures and expressions. And that these spells are somehow direct a process to0:00:46accomplish the goal that was intended by the programmer. In order for the programmer to do this effectively, he has to understand the relationship between the particular things that he writes, these particular spells, and the0:00:56behavior of the process that he's attempting to control. So what we're doing this lecture is attempt to establish that connection in as clear a way as possible.0:01:07What we will particularly do is understand how particular patterns of procedures and expressions cause particular patterns of execution, particular0:01:17behaviors from the processes. Let's get down to that. I'm going to start with a very simple program.0:01:28This is a program to compute the sum of the squares of two numbers. And we'll define the sum of the squares of x and y to be0:01:45the sum of the square of x-- I'm going to write it that way-- and the square of y where the square of x is the0:02:08product of x and x. Now, supposing I were to say something to this, like, to0:02:17the system after having defined these things, of the form, the sum of the squares of three and four, I am hoping0:02:26that I will get out a 25. Because the square of three is nine, and the square of four is 16, and 25 is the sum of those. But how does that happen?0:02:36If we're going to understand processes and how we control them, then we have to have a mapping from the mechanisms of this procedure into the way in which these processes behave.0:02:49What we're going to have is a formal, or semi-formal, mechanical model whereby you understand how a machine could, in fact, in principle, do this. Whether or not the actual machine really does what I'm0:03:00about to tell you is completely irrelevant at this moment. In fact, this is an engineering model in the same way that, electrical resistor, we write down a model v equals0:03:09i r, it's approximately true. It's not really true. If I put up current through the resistor it goes boom. So the voltage is not always proportional to the current,0:03:20but for some purposes the model is appropriate. In particular, the model we're going to describe right now, which I call the substitution model, is the simplest model0:03:29that we have for understanding how procedures work and how processes work. How procedures yield processes. And that substitution model will be accurate for most of0:03:39the things we'll be dealing with in the next few days. But eventually, it will become impossible to sustain the illusion that that's the way the machine works, and we'll go to other more specific and particular models that will0:03:50show more detail. OK, well, the first thing, of course, is we say, what are the things we have here?0:03:59We have some cryptic symbols. And these cryptic symbols are made out of pieces. There are kinds of expressions. So let's write down here the kinds of expressions there are.0:04:17And we have-- and so far I see things like numbers. I see things like symbols like that.0:04:32We have seen things before like lambda expressions, but they're not here. I'm going to leave them out. Lambda expressions, we'll worry about them later.0:04:44Things like definitions. Things like conditionals.0:04:58And finally, things like combinations.0:05:07These kinds of expressions are-- I'll worry about later-- these are special forms. There are particular rules for each of these.0:05:17I'm going to tell you, however, the rules for doing a general case. How does one evaluate a combination? Because, in fact, over here, all I really have are combinations and some symbols and numbers.0:05:29And the simple things like a number, well, it will evaluate to itself. In the model I will have for you, the symbols will disappear. They won't be there at the time when you need them, when0:05:40you need to get at them. So the only thing I really have to explain to you is, how do we evaluate combinations? OK, let's see.0:05:50So first I want to get the first slide. Here is the rule for evaluating an application.0:06:01What we have is a rule that says, to evaluate a combination, there are two parts, three parts to the rule. The combination has several parts.0:06:12It has operators and it has operands. The operator returns into a procedure. If we evaluate the operator, we will get a procedure.0:06:22And you saw, for example, how I'll type at the machine and out came compound procedure something or other. And the operands produce arguments.0:06:31Once we've gotten the operator evaluated to get a procedure, and the argument is evaluated to get argument-- the operand's value to get arguments-- we apply the procedure to these arguments by copying the0:06:43body of the procedure, which is the expression that the procedure is defined in terms of. What is it supposed to do? Substituting the argument supplied for the formal0:06:53parameters of the procedure, the formal parameters being the names defined by the declaration of the procedure. Then we evaluate the resulting new body, the body resulting0:07:02from copying the old body with the substitutions made. It's a very simple rule, and we're going to do it very formally for a little while.0:07:12Because for the next few lectures, what I want you to do is to say, if I don't understand something, if I don't understand something, be very mechanical and do this.0:07:23So let's see. Let's consider a particular evaluation, the one we were talking about before. The sum of the squares of three and three.0:07:35What does that mean? It says, take-- well, I could find out what's on the square-- it's some procedure, and I'm not going to worry about the representation, and I'm not going to write it on the0:07:44blackboard for you. And I have that three represents some number, but if I have to repeat that number, I can't tell you the number. The number itself is some abstract thing.0:07:54There's a numeral which represents it, which I'll call three, and I'll use that in my substitution. And four is also a number. I'm going to substitute three for x and four for y in the0:08:06body of this procedure that you see over here. Here's the body of the procedure. It corresponds to this combination, which is an addition.0:08:17So what that reduces to, as a reduction step, we call it, is the sum of the square of three and the square of four.0:08:30Now, what's the next step I have to do here? I say, well, I have to evaluate this. According to my rule, which you just saw on that overhead0:08:40or slide, what we had was that we have to evaluate the operands-- and here are the operands, here's one and here's the next operand--0:08:49and how we have to evaluate procedure. The order doesn't matter. And then we're going to apply the procedure, which is plus, and magically somehow that's going to produce the answer.0:08:59I'm not to open up plus and look inside of it. However, in order to evaluate the operand, let's pick some arbitrary order and do them. I'm going to go from right to left.0:09:08Well, in order to evaluate this operand, I have to evaluate the parts of it by the same rule. And the parts are I have to find out what square is-- it's some procedure, which has a formal parameter x.0:09:19And also, I have an operand which is four, which I have to substitute for x in the body of square.0:09:28So the next step is basically to say that this is the sum of the square of three and the product of four and four.0:09:40Of course, I could open up asterisk if I liked-- the multiplication operation-- but I'm not going to do that. I'm going to consider that primitive.0:09:50And, of course, at any level of detail, if you look inside this machine, you're going to find that there's multiple levels below that that you don't know about. But one of the things we have to learn how to0:09:59do is ignore details. The key to understanding complicated things is to know what not to look at and what not compute and what not to think.0:10:09So we're going to stop this one here and say, oh, yes, this is the product of two things. We're going to do it now. So this is nothing more than the sum of the square0:10:19of three and 16. And now I have another thing I have to evaluate, but that square of three, well, it's the same thing.0:10:29That's the sum of the product of three and three and 16, which is the sum of nine and 16, which is 25.0:10:44So now you see the basic method of doing substitutions. And I warn you that this is not a perfect description of0:10:54what the computer does. But it's a good enough description for the problems that we're going to have in the next few lectures that you0:11:03should think about this religiously. And this is how the machine works for now. Later we'll get more detailed.0:11:12Now, of course, I made a specific choice of the order of evaluation here. There are other possibilities. If we go back to the telestrator here and look at0:11:21the substitution rule, we see that I evaluated the operator to get the procedures, and I evaluated the operands to get the arguments first, before I do the application.0:11:31It's entirely possible, and there are alternate rules called normal order evaluation whereby you can do the substitution of the expressions which are the0:11:41operands for the formal parameters inside the body first. And you'll get also the same answer. But right now, for concreteness, and because this0:11:50is the way our machine really does it, I'm going to give you this rule, which has a particular order. But that order is to some extent arbitrary, too.0:12:01In the long run, there are some reasons why you might pick one order or another, and we'll get to that later in the subject.0:12:12OK, well now the only other thing I have to tell you about just to understand what's going on is let's look at the rule for conditionals. Conditionals are very simple, and I'd like to examine this.0:12:27A conditional is something that is if-- there's also cond, of course-- but I'm going to give names to the parts of the expression. There's a predicate, which is a thing that is0:12:39either true or false. And there's a consequent, which is the thing you do if the predicate is true.0:12:48And there's an alternative, which is the thing you do if the predicate is false. It's important, by the way, to get names for, to get names0:13:00for, the parts of things, or the parts of expressions. One of the things that every sorcerer will tell you is if you have the name of a spirit, you have power over it.0:13:10So you have to learn these names so that we can discuss these things. So here we have a predicate, a consequent, and an alternative. And, using such words, we see that an if expression, the0:13:21problems you evaluate to the predicate expression, if that yields true, then you then go on to evaluate the consequent. Otherwise, you evaluate the alternative expression.0:13:34So I'd like to illustrate that now in the context of a particular little program.0:13:43Going to write down a program which we're going to see many times. This is the sum of x and y done by what's called Peano0:13:58arithmetic, which is all we're doing is incrementing and decrementing. And we're going to see this for a little bit. It's a very important program. If x equals zero, then the result is y.0:14:12Otherwise, this is the sum of the decrement of x and the increment of y.0:14:23We're going to look at this a lot more in the future. Let's look at the overhead. So here we have this procedure, and we're going to look at how we do the substitutions, the sequence of0:14:33substitutions. Well, I'm going to try and add together three and four. Well, using the first rule that I showed you, we substitute three for x and four four y in the body of0:14:45this procedure. The body of the procedure is the thing that begins with if and finishes over here. So what we get is, of course, if three is zero, then the0:14:54result is four. Otherwise, it's the sum of the decrement of three and the increment of four. But I'm not going to worry about these yet because three0:15:04is not zero. So the answer is not four. Therefore, this if reduces to an evaluation of the expression, the sum to the decrement of three and the0:15:14increment of four. Continuing with my evaluation, the increment I presume to be primitive, and so I get a five there.0:15:23OK, and then the decrement is also primitive, and I get a two. And so I change the problem into a simpler problem. Instead of adding three to four, I'm adding two to five.0:15:33The reason why this is a simpler problem is because I'm counting down on x, and eventually, then, x will be zero.0:15:43So, so much for the substitution rule. In general, I'm not going to write down intermediate steps when using substitutions having to do with ifs, because0:15:52they just expand things to become complicated. What we will be doing is saying, oh, yes, the sum of three and four results in the sum of two and five and0:16:01reduces to the sum of two and five, which, in fact, reduces to the sum of one and six, which reduces to the sum of zero and seven over here, which reduces to a seven.0:16:14That's what we're going to be seeing. Are there any questions for the first segment yet? Yes? STUDENT: You're using one plus and minus one plus.0:16:24Are those primitive operations? PROFESSOR: Yes. One of the things you're going to be seeing in this subject is I'm going to, without thinking about it, introduce0:16:33more and more primitive operations. There's presumably some large library of primitive operations somewhere. But it doesn't matter that they're primitive-- there may be some manual that lists them all.0:16:43If I tell you what they do, you say, oh, yes, I know what they do. So one of them is the decrementor-- minus one plus-- and the other operation is increment, which is one plus.0:16:53Thank you. That's the end of the first segment. [MUSIC PLAYING BY J.S. BACH]0:17:19PROFESSOR: Now that we have a reasonably mechanical way of understanding how a program made out of procedures and0:17:28expressions evolves a process, I'd like to develop some intuition about how particular programs evolve particular processes, what the shapes of programs have to be in order0:17:39to get particular shaped processes. This is a question about, really, pre-visualizing. That's a word from photography.0:17:49I used to be interested in photography a lot, and one of the things you discover when you start trying to learn about photography is that you say, gee, I'd like to be a creative photographer.0:17:58Now, I know the rules, I push buttons, and I adjust the aperture and things like that. But the key to being a creative person, partly, is to be able to do analysis at some level.0:18:09To say, how do I know what it is that I'm going to get on the film before I push the button. Can I imagine in my mind the resulting image very precisely0:18:23and clearly as a consequence of the particular framing, of the aperture I choose, of the focus, and things like that?0:18:32That's part of the art of doing this sort of thing. And learning a lot of that involves things like test strips. You take very simple images that have varying degrees of0:18:44density in them, for example, and examine what those look like on a piece of paper when you print them out. You find out what is the range of contrasts that you can0:18:54actually see. And what, in a real scene, would correspond to the various levels and zones that you have of density in an image.0:19:05Well, today I want to look at some very particular test strips, and I suppose one of them I see here is up on the telestrator, so we should switch to that.0:19:14There's a very important, very important pair of programs for understanding what's going on in the evolution of a process0:19:24by the execution of a program. What we have here are two procedures that are almost identical. Almost no difference between them at all.0:19:35It's a few characters that distinguish them. These are two ways of adding numbers together. The first one, which you see here, the first one is the sum0:19:48of two numbers-- just what we did before-- is, if the first one is zero, it's the answer of the second one. Otherwise, it's the sum of the decrement of the first and the increment of the second.0:19:57And you may think of that as having two piles. And the way I'm adding these numbers together to make a0:20:06third pile is by moving marbles from one to the other. Nothing more than that. And eventually, when I run out of one, then the other is the sum.0:20:15However, the second procedure here doesn't do it that way. It says if the first number is zero, then the answer is the second.0:20:24Otherwise, it's the increment of the sum of the decrement of the first number and the second. So what this says is add together the decrement of the0:20:35first number and the second-- a simpler problem, no doubt-- and then change that result to increment it. And so this means that if you think about this in terms of0:20:45piles, it means I'm holding in my hand the things to be added later. And then I'm going to add them in. As I slowly decrease one pile to zero, I've got what's left0:20:57here, and then I'm going to add them back. Two different ways of adding. The nice thing about these two programs is that they're almost identical.0:21:06The only thing is where I put the increment. A couple of characters moved around. Now I want to understand the kind of behavior we're going0:21:15to get from each of these programs. Just to get them firmly in your mind-- I usually don't want to be this careful-- but just to get them firmly in your mind, I'm going to write0:21:24the programs again on the blackboard, and then I'm going to evolve a process. And you're going to see what happens. We're going to look at the shape of the process as a consequence of the program.0:21:34So the program we started with is this: the sum of x and y0:21:44says if x is zero, then the result is y. Otherwise, it's the sum of the decrement of x and the0:21:56increment of y. Now, supposing we wish to do this addition of three and0:22:05four, the sum of three and four, well, what is that? It says that I have to substitute the arguments for0:22:14the formal parameters in the body. I'm doing that in my mind. And I say, oh, yes, three is substituted for x, but three is not zero, so I'm going to go directly to this part and0:22:28write down the simplified consequent here. Because I'm really interested in the behavior of addition. Well, what is that? That therefore turns into the sum of two and five.0:22:38In other words, I've reduced this problem to this problem. Then I reduce this problem to the sum of one and six, and0:22:47then, going around again once, I get the sum of zero and seven. And that's one where x equals zero so the result is y, and0:22:56so I write down here a seven. So this is the behavior of the process evolved by trying to add together three and four with this program.0:23:07For the other program, which is over here, I will define0:23:20the sum of x and y. And what is it? If x is zero, then the result is y-- almost the same--0:23:32otherwise the increment of the sum of the decrement of x and y.0:23:47No. I don't have my balancer in front of me.0:23:56OK, well, let's do it now. The sum of three and four. Well, this is actually a little more interesting. Of course, three is not zero as before, so that results in0:24:07the increment of the sum of the decrement of x, which is two and four, which is the increment of0:24:19the sum of one and-- whoops: the increment of the increment. What I have to do now is compute what this means.0:24:30I have to evaluate this. Or what that is, the result of substituting two and four for x and y here. But that is the increment of the sum of one0:24:40and four, which is-- well, now I have to expand this. Ah, but that's the increment of the increment of the0:24:52increment of the sum of zero and four. Ah, but now I'm beginning to find things I can do.0:25:03The increment of the increment of the increment of-- well, the sum of zero and four is four.0:25:12The increment of four is five. So this is the increment of the increment of five, which is the increment of six, which is seven.0:25:26Two different ways of computing sums. Now, let's see. These processes have very different shapes. I want you to feel these shapes.0:25:36It's the feeling for the shapes that matters. What's some things we can see about this? Well, somehow this is sort of straight.0:25:45It goes this way-- straight. This right edge doesn't vary particularly in size.0:25:54Whereas this one, I see that this thing gets bigger and then it gets smaller. So I don't know what that means yet,0:26:03but what are we seeing? We're seeing here that somehow these increments are expanding out and then contracting back.0:26:13I'm building up a bunch of them to do later. I can't do them now. There's things to be deferred. Well, let's see.0:26:23I can imagine an abstract machine. There's some physical machine, perhaps, that could be built to do it, which, in fact, executes these programs exactly as I tell you, substituting character strings in like this.0:26:34Such a machine, the number of such steps is an approximation of the amount of time it takes. So this way is time.0:26:45And the width of the thing is how much I have to remember in order to continue the process. And this much is space. And what we see here is a process that takes a time0:26:58which is proportional to the argument x. Because if I made x larger by one, then I'd had an extra line.0:27:08So this is a process which is space-- sorry-- time. The time of this process is what we say order of x.0:27:20That means it is proportional to x by some constant of proportionality, and I'm not particularly interested in what the constant is. The other thing we see here is that the amount of space this0:27:31takes up is constant, it's proportional to one. So the space complexity of this is order of one.0:27:42We have a name for such a process. Such a process is called an iteration.0:27:51And what matters here is not that some particular machine I designed here and talked to you about and called a substitution machine or whatever--0:28:00substitution model-- managed to do this in constant space. What really matters is this tells us a bound. Any machine could do this in constant space.0:28:09This algorithm represented by this procedure is executable in constant space. Now, of course, the model is ignoring some things, standard0:28:18sorts of things. Like numbers that are bigger take up more space and so on. But that's a level of abstraction at which I'm cutting off. How do you represent numbers? I'm considering every number to be the same size.0:28:28And numbers grow slowly for the amount of space they take up and their size. Now, this algorithm is different in its complexity.0:28:38As we can see here, this algorithm has a time complexity which is also proportional to the input0:28:48argument x. That's because if I were to add one to three, if I made a larger problem, which is larger by one here, then I'd add a line at the top and I'd add a line at the bottom.0:29:00And the fact that it's a constant amount, like this is twice as many lines as that, is not interesting at the level of detail I'm talking about right now. So this is a time complexity order of the input argument x.0:29:13And space complexity, well, this is more interesting. I happen to have some overhead, which you see over here, which is constant approximately.0:29:23Constant overhead. But then I have something which increases and decreases and is proportional to the input argument x. The input argument x is three. That's why there are three deferred increments sitting0:29:34around here. See? So the space complexity here is also order x. And this kind of process, named for the kind of process,0:29:44this is a recursion. A linear recursion, I will call it, because of the fact0:29:56that it's proportional to the input argument in both time and space. This could have been a linear iteration.0:30:13So then what's the essence of this matter? This matter isn't so obvious. Maybe there are other models by which we can describe the differences between iterative and recursive processes.0:30:23Because this is hard now. Remember, we have-- those are both recursive definitions. What we're seeing there are both recursive definitions,0:30:32definitions that refer to the thing being defined in the definition. But they lead to different shape processes. There's nothing special about the fact that the definition0:30:42is recursive that leads to a recursive process. OK. Let's think of another model. I'm going to talk to you about bureaucracy.0:30:52Bureaucracy is sort of interesting. Here we see on a slide an iteration. An iteration is sort of a fun kind of process.0:31:04Imagine that there's a fellow called GJS-- that stands for me-- and he's got a problem: he wants to add together three and four.0:31:13This fella here wants to add together three and four. Well, the way he's going to do it-- he's lazy-- is he's going to find somebody else to help him do it. They way he finds someone else to--0:31:22he finds someone else to help him do it and says, well, give me the answer to three and four and return the result to me. He makes a little piece of paper and says, here, here's a0:31:32piece of paper-- you go ahead and solve this problem and give the result back to me. And this guy, of course, is lazy, too. He doesn't want to see this piece of paper again.0:31:41He says, oh, yes, produce a new problem, which is the sum of two ad five, and return the result back to GJS.0:31:50I don't want to see it again. This guy does not want to see this piece of paper. And then this fellow makes a new problem, which is the0:32:01addition of the sum of one and six, and he give it to this fella and says, produce that answer and returned it to GJS. And that produces a problem, which is to add together zero0:32:11and seven, and give the result to GJS. This fella finally just says, oh, yeah, the answer is seven, and sends it back to GJS. That's what an iteration is.0:32:20By contrast, a recursion is a slightly different kind of process. This one involves more bureaucracy. It keeps more people busy.0:32:30It keeps more people employed. Perhaps it's better for that reason. But here it is: I want the answer to the problem three and four. So I make a piece of paper that says, give the result0:32:40back to me. Give it to this fella. This fellow says, oh, yes, I will remember that I have to add later, and I want to get the answer the problem two0:32:51plus four, give that one to Harry, and have the results sent back to me-- I'm Joe. When the answer comes back from Harry, which is a six, I0:33:01will then do the increment and give that seven back to GJS. So there are more pieces of paper outstanding in the0:33:10recursive process than the iteration. There's another way to think about what an iteration is and0:33:19the difference between an iteration and a recursion. You see, the question is, how much stuff is under the table? If I were to stop--0:33:28supposing I were to kill this computer right now, OK? And at this point I lose the state of affairs, well, I0:33:37could continue the computation from this point but everything I need to continue the computation is in the valuables that were defined in the procedure that the0:33:48programmer wrote for me. An iteration is a system that has all of its state in explicit variables. Whereas the recursion is not quite the same.0:34:01If I were to lose this pile of junk over here, and all I was left with was the sum of one and four, that's not enough information to continue the process of computing out the0:34:10seven from the original problem of adding together three of four. Besides the information that's in the variables of the formal0:34:20parameters of the program, there is also information under the table belonging to the computer, which is what things have been deferred for later.0:34:30And, of course, there's a physical analogy to this, which is in differential equations, for example, when we talk about something like drawing a circle.0:34:42Try to draw a circle, you make that out of a differential equation which says the change in my state as a function of0:34:51my current state. So if my current state corresponds to particular values of y and x, then I can compute from them a derivative0:35:00which says how the state must change. And, in fact, you can see this was a circle because if I0:35:09happen to be, say, at this place over here, at one, zero, for example, on this graph, then it means that the0:35:20derivative of y is x, which we see over here. That's one, so I'm going up. And the derivative of x is minus y, which0:35:29means I'm going backwards. I'm actually doing nothing at this point, then I start going backwards as y increases. So that's how you make a circle.0:35:40And the interesting thing to see is a little program that will draw a circle by this method. Actually, this won't draw a circle because it's a forward oil or integrator and will eventually0:35:49spiral out and all that. But it'll draw a circle for a while before it starts spiraling. However, what we see here is two state variables, x and y.0:35:58And there's an iteration that says, in order to circle, given an x and y, what I want is to circle with the next values of x and y being the old value of x decrement by y0:36:08times dt where dt is the time step and the old value of y being implemented by x times dt, giving me the new values0:36:17of x and y. So now you have a feeling for at least two different kinds of processes that can be evolved by0:36:28almost the same program. And with a little bit of perturbation analysis like this, how you change a program a little bit and see how the0:36:37process changes, that's how we get some intuition. Pretty soon we're going to use that intuition to build big, hairy, complicated systems.0:36:46Thank you. [MUSIC PLAYING BY J.S. BACH]0:37:06PROFESSOR: Well, you've just seen a simple perturbational analysis of some programs. I took a program that was very similar to another program and looked at them both and saw how they evolved processes.0:37:18I want to show you some variety by showing you some other processes and shapes they may have. Again, we're going to take very simple things, programs that you wouldn't want to ever write.0:37:29They would be probably the worst way of computing some of the things we're going to compute. But I'm just going to show you these things for the purpose of feeling out how to program represents itself as the rule0:37:42for the evolution of a process. So let's consider a fun thing, the Fibonacci numbers. You probably know about the Fibonacci numbers.0:37:53Somebody, I can't remember who, was interested in the growth of piles of rabbits. And for some reason or other, the piles of rabbits tend to0:38:03grow exponentially, as we know. And we have a nice model for this process, is that we start with two numbers, zero and one.0:38:13And then every number after this is the sum of the two previous. So we have here a one. Then the sum of these two is two.0:38:22The sum of those two is three. The sum of those two is five. The sum of those two is eight. The sum of those two is 13.0:38:31This is 21. 34. 55. Et cetera.0:38:40If we start numbering these numbers, say this is the zeroth one, the first one, the second one, the third one, the fourth one, et cetera. This is the 10th one, the 10th Fibonacci number.0:38:51These numbers grow very fast. Just like rabbits. Why rabbits grow this way I'm not going to hazard a guess. Now, I'm going to try to write for you the very simplest0:39:02program that computes Fibonacci numbers. What I want is a program that, given an n, will produce for0:39:13me Fibonacci event. OK? I'll write it right here.0:39:28I want the Fibonacci of n, which means the-- this is the n, and this is Fibonacci of n. And here's the story.0:39:38If n is less than two, then the result is n. Because that's what these are.0:39:47That's how you start it up. Otherwise, the result is the sum of Fib of n minus one and0:39:58the Fibonacci number, n minus two.0:40:10So this is a very simple, direct specification of the description of Fibonacci numbers that I gave you when I introduced those numbers. It represents the recurrence relation in the simplest0:40:21possible way. Now, how do we use such a thing? Let's draw this process. Let's figure out what this does. Let's consider something very simple by computing0:40:31Fibonacci of four. To compute Fibonacci of four, what do I do? Well, it says I have--0:40:41it's not less than two. Therefore it's the sum of two things. Well, in order to compute that I have to compute, then, Fibonacci of three and Fibonacci of two.0:40:57In order to compute Fibonacci of three, I have to compute Fibonacci of two and Fibonacci of one.0:41:08In order to compute Fibonacci of two, I have to compute Fibonacci of one and Fibonacci of zero. In order to compute Fibonacci of one, well,0:41:18the answer is one. That's from the base case of this recursion. And in order to compute Fibonacci of one, well, that0:41:28answer is zero, from the same base. And here is a one. And Fibonacci of two is really the sum of Fibonacci of one.0:41:38And Fib of zero, in order to compute that, I get a one, and here I've got a zero.0:41:47I've built a tree. Now, we can observe some things about this tree. We can see why this is an extremely bad way to compute0:41:56Fibonacci numbers. Because in order to compute Fibonacci of four, I had to compute Fibonacci of two's sub-tree twice.0:42:07In fact, in order way to add one more, supposing I want to do Fibonacci of five, what I really have to do then is compute Fibonacci of four plus Fibonacci of three.0:42:18But Fibonacci of three's sub-tree has already been built. This is a prescription for a process that's0:42:27exponential in time. To add one, I have to multiply by something because I take a proportion of the existing thing and add it to itself to0:42:38add one more step. So this is a thing whose time complexity is order of--0:42:48actually, it turns out to be Fibonacci-- of n. There's a thing that grows exactly at Fibonacci numbers.0:43:01It's a horrible thing. You wouldn't want to do it. The reason why the time has to grow that way is because we're presuming in the model-- the substitution model that I gave you, which I'm not doing formally here, I sort of now spit it out in a simple way--0:43:14but presuming that everything is done sequentially. That every one of these nodes in this tree has to be examined.0:43:24And so since the number of nodes in this tree grows exponentially, because I add a proportion of the existing nodes to the nodes I already have to add one, then I know0:43:35I've got an exponential explosion here. Now, let's see if we can think of how much space this takes up.0:43:44Well, it's not so bad. It depends on how much we have to remember in order to continue this thing running. Well, that's not so hard. It says, gee, in order to know where I am in this tree, I0:43:54have to have a path back to the root. In other words, in order to-- let's consider the path I would have to execute this. I'd say, oh, yes, I'm going to go down here.0:44:03I don't care which direction I go. I have to do this. I have to then do this. I have to traverse this tree in a sort of funny way.0:44:12I'm going to walk this nice little path. I come back to here. Well, I've got to remember where I'm going to be next. I've got to keep that in mind. So I have to know what I've done.0:44:21I have to know what's left. In order to compute Fibonacci of four, at some point I'm going to have to be down here. And I have to remember that I have to go back and then go0:44:32back to here to do an addition. And then go back to here to do an addition to something I haven't touched yet. The amount of space that takes up is the path, the longest path.0:44:42How long it is. And that grows as n. So the space-- because that's the length of the deepest0:44:53line through the tree-- the space is order of n. It's a pretty bad process.0:45:09Now, one thing I want to see from this is a feeling of what's going on here. Why are there-- how is this program related to this process?0:45:20Well, what are we seeing here? There really are only two sorts of things this program does. This program consists of two rules, if you will.0:45:29One rule that says Fibonacci of n is this sum that you see over here, which is a node that's shaped like this.0:45:42It says that I break up something into two parts. Under some condition over here that n is greater than two,0:45:52then the node breaks up into two parts. Less than two. No. Greater than two. Yes.0:46:01The other possibility is that I have a reduction that looks like this. And that's this case.0:46:10If it's less than two, the answer is n itself. So what we're seeing here is that the process that got built locally at every place is an instance of this rule.0:46:22Here's one instance of the rule. Here is another instance of the rule. And the reason why people think of programming as being hard, of course, is because you're writing down a general0:46:32rule, which is going to be used for lots of instances, that a particular instance-- it's going to control each particular instance for you.0:46:43You've got to write down something that's a general in terms of variables, and you have to think of all the things that could possibly fit in those variables, and all those have to lead to the process you want to work.0:46:53Locally, you have to break up your process into things that can be represented in terms of these very specific local rules.0:47:03Well, let's see. Fibonaccis are, of course, not much fun. Yes, they are. You get something called the golden ratio, and we may even0:47:12see a lot of that some time. Well, let's talk about another thing. There's a famous game called the Towers of Hanoi, because I want to teach you how to think about these recursively.0:47:24The problem is this one: I have a bunch of disks, I have a bunch of spikes, and it's rumored that somewhere in the0:47:34Orient there is a 64-high tower, and the job of various monks or something is to move these spikes in some complicated pattern so eventually--0:47:43these disks-- so eventually I moved all of the disks from one spike to the other. And if it's 64 high, and it's going to take two to the 64th0:47:54moves, then it's a long time. They claim that the universe ends when this is done.0:48:03Well, let's see. The way in which you would construct a recursive process is by wishful thinking. You have to believe.0:48:14So, the idea. Supposing I want to move this pile from here to here, from spike one to spike two, well, that's not so hard.0:48:25See, supposing somehow, by some magic-- because I've got a simpler problem-- I move a three-high pile to here-- I can only move one disk at a time, so identifying how I did it. But supposing I could do that, well, then I could just pick0:48:37up this disk and move it here. And now I have a simple problem. I have to move a three-high tower to here, which is no problem.0:48:46So by two moves of a three high tower plus one move of a single object, I can move the tower from here to here.0:48:55Now, whether or not-- this is not obvious in any deep way that this works. And why?0:49:04Now, why is it the case that I can presume, maybe, that I can move the three-high tower? Well, the answer is because I'm always counting down, and0:49:14eventually I get down to zero-high tower, and a zero-high tower requires no moves. So let's write the algorithm for that.0:49:24Very easy. I'm going to label these towers with numbers, but it doesn't matter what they're labelled with. And the problem is to move an n-high tower from a spike0:49:35called From to a spike called To with a particular spike called Spare. That's what we're going to do.0:49:50Using the algorithm I informally described to you, move of a n-high tower from From to To with a Spare.0:50:06Well, I've got two cases, and this is a case analysis, just like it is in all the other things we've done.0:50:20If n is zero, then-- I'm going to put out some answers-- Done, we'll say. I don't know what that means.0:50:29Because we'll never use that answer for anything. We're going to do these moves. Else. I'm going to do a move.0:50:40Move a tower of height less than n, the decrement of n height. Now, I'm going to move it to the Spare tower.0:50:51The whole idea now is to move this from here to here, to the Spare tower-- so from From to Spare--0:51:03using To as a spare tower. Later, somewhere later, I'm going to move that same n-high0:51:14tower, after I've done this. Going to move that same n minus one-high tower from the Spare tower to the To tower using the0:51:24From tower as my spare. So the Spare tower to the To tower using0:51:40the From as the spare. All I have to do now is when I've gotten it in this0:51:51condition, between these two moves of a whole tower-- I've got it into that condition-- now I just have to move one disk.0:52:03So I'm going to say that some things are printing a move and I don't care how it works. From the To.0:52:17Now, you see the reason why I'm bringing this up at this moment is this is an almost identical program to this one in some sense.0:52:26It's not computing the same mathematical quantity, it's not exactly the same tree, but it's going to produce a tree. The general way of making these moves is going to lead0:52:38to an exponential tree. Well, let's do this four-high. I have my little crib sheet here otherwise I get confused.0:52:54Well, what I'm going to put in is the question of move a tower of height four from one to spike two using spike three0:53:10as a spare. That's all I'm really going to do. You know, let's just do it. I'm not going to worry about writing out the traits of this. You can do that yourself because it's very simple.0:53:21I'm going to move disk one to disk three. And how do I get to move disk one to disk three? How do I know that? Well, I suppose I have to look at the trace a little bit.0:53:32What am I doing here? Well, and this is not-- n is not zero. So I'm going to look down here. This is going to require doing two moves.0:53:41I'm only going to look at the first one. It's going to require moving-- why do I have move tower? It makes it harder for me to move.0:53:52I'm going to move a three-high tower from the from place, which is four, to the spare, which is two,0:54:04using three as my-- no, using from--0:54:15STUDENT: [INAUDIBLE PHRASE]. PROFESSOR: Yes. I'm sorry. From two-- from one to three using two as my spare.0:54:26That's right. And then there's another move over here afterwards. So now I say, oh, yes, that requires me moving a two-high0:54:37tower from one to two using three as a spare. And so, are the same, and that's going to require me moving and one-high tower from one to three0:54:52using two as a spare. Well, and then there's lots of other things to be done.0:55:03So I move my one-high tower from one to three using two as a spare, which I didn't do anything with. Well, this thing just proceeds very simply.0:55:15I move this from one to two. And I move this disk from three to two. And I don't really want to do it, but I move from one to three.0:55:24Then I move two to one. Then I move two to three. Then one to three.0:55:36One to two. Three to two. Three to one. This all got worked out beforehand, of course.0:55:46Two to one. Three to two. One to three. STUDENT: [INAUDIBLE PHRASE]. PROFESSOR: Oh, one to three.0:55:55Excuse me. Thank you. One to two. And then three to two. Whew.0:56:04Now what I'd like you to think about, you just saw a recursive algorithm for doing this, and it takes exponential time, of course. Now, I don't know if there's any algorithm that doesn't take exponential time-- it has to.0:56:14As I'm doing one operation-- I can only move one thing at a time-- there's no algorithm that's not going to take exponential time. But can you write an iterative algorithm rather than a0:56:24recursive algorithm for doing this? One of the sort of little things I like to think about.0:56:33Can you write one that, in fact, doesn't break this problem into two sub-problems the way I described, but rather proceeds a step at a time using a more local rule?0:56:48That might be fun. Thank you so much for the third segment. Are there questions?0:56:57STUDENT: [INAUDIBLE] a way to reduce a tree or recursion problem, how do you save the immediate work you have done0:57:06in computing the Fibonacci number? PROFESSOR: Oh, well, in fact, one of the ways to do is what you just said. You said, I save the intermediate work.0:57:16OK? Well, let me tell you-- this, again, we'll see later-- but suppose it's the case that anytime I compute anything, any one of these Fibonacci numbers, I remember the table0:57:28that takes only linear time to look up the answer. Then if I ever see it again, instead of doing the expansional tree, I look it up.0:57:37I've just transformed my problem into a problem that's much simpler. Now, of course, there are the way to do this, as well. That one's called memoization, and you'll see it sometime0:57:47later in this term. But I suppose there's a very simple linear time, and, in fact, iterative model for computing Fibonaccis, and0:57:57that's another thing you should sit down and work out. That's important. It's important to see how to do this. I want you to practice.

`0:00:00`Lecture 2A | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:25PROFESSOR: Well, yesterday was easy. You learned all of the rules of programming and lived. Almost all of them.0:00:34And so at this point, you're now certified programmers-- it says. However, I suppose what we did is we, aah, sort of got you a0:00:48little bit of into an easy state. Here, you still believe it's possible that this might be programming in BASIC or Pascal with just a funny syntax.0:00:59Today, that illusion-- or you can no longer support that belief. What we're going to do today is going to completely smash that.0:01:08So let's start out by writing a few programs on the blackboard that have a lot in common with each other. What we're going to do is try to make them abstractions that0:01:19are not ones that are easy to make in most languages. Let's start with some very simple ones that you can make in most languages.0:01:28Supposing I want to write the mathematical expression which adds up a bunch of integers. So if I wanted to write down and say the sum from i0:01:38equal a to b on i. Now, you know that that's an easy thing to compute in a closed form for it, and I'm not interested in that. But I'm going to write a program that0:01:47adds up those integers. Well, that's rather easy to do to say I want to define the0:01:57sum of the integers from a to b to be--0:02:08well, it's the following two possibilities. If a is greater than b, well, then there's nothing to be0:02:17done and the answer is zero. This is how you're going to have to think recursively. You're going to say if I have an easy case that I know the answer to, just write it down.0:02:26Otherwise, I'm going to try to reduce this problem to a simpler problem. And maybe in this case, I'm going to make a subproblem of the simpler problem and then do something to the result.0:02:35So the easiest way to do this is say that I'm going to add the index, which in this case is a, to the result of adding0:02:46up the integers from a plus 1 to b.0:03:02Now, at this point, you should have no trouble looking at such a definition. Indeed, coming up with such a thing might be a little hard in synthesis, but being able to read it at this point0:03:12should be easy. And what it says to you is, well, here is the subproblem I'm going to solve. I'm going to try to add up the integers, one fewer integer0:03:24than I added up for the the whole problem. I'm adding up the one fewer one, and that subproblem, once I've solved it, I'm going to add a to that, and that will0:03:35be the answer to this problem. And the simplest case, I don't have to do any work. Now, I'm also going to write down another simple one just0:03:44like this, which is the mathematical expression, the sum of the square from i equal a to b.0:03:55And again, it's a very simple program.0:04:11And indeed, it starts the same way. If a is greater than b, then the answer is zero.0:04:21And, of course, we're beginning to see that there's something wrong with me writing this down again. It's the same program. It's the sum of the square of a and the sum of the square of0:04:42the increment and b. Now, if you look at these things, these programs are0:04:54almost identical. There's not much to distinguish them. They have the same first clause of the conditional and0:05:03the same predicate and the same consequence, and the alternatives are very similar, too. They only differ by the fact that where here I have a,0:05:15here, I have the square of a. The only other difference, but this one's sort of unessential is in the name of this procedure is sum int, whereas0:05:25the name of the procedure is sum square. So the things that vary between these two are very small. Now, wherever you see yourself writing the same thing down0:05:36more than once, there's something wrong, and you shouldn't be doing it. And the reason is not because it's a waste of time to write something down more than once.0:05:45It's because there's some idea here, a very simple idea, which has to do with the sigma notation--0:05:54this much-- not depending upon what it is I'm adding up. And I would like to be able to--0:06:03always, whenever trying to make complicated systems and understand them, it's crucial to divide the things up into as many pieces as I can, each of which I understand separately.0:06:13I would like to understand the way of adding things up independently of what it is I'm adding up so I can do that having debugged it once and understood it once and having0:06:24been able to share that among many different uses of it. Here, we have another example. This is Leibnitz's formula for finding pi over 8.0:06:40It's a funny, ugly mess. What is it? It's something like 1 over 1 times 3 plus 1 over 5 times 70:06:50plus 1 over 9 times 11 plus-- and for some reason, things like this tend to have0:06:59interesting values like pi over 8. But what do we see here? It's the same program or almost the same program. It's a sum.0:07:09So we're seeing the figure notation, although over here, we're dealing with incrementing by 4, so it's a slightly different problem, which means that over here, I0:07:20have to change a by 4, as you see right over here. It's not by 1. The other thing, of course, is that the thing that's0:07:31represented by square in the previous sum of squares, or a when adding up the integers. Well, here, I have a different thing I'm adding up, a different term, which is 1 over a times a plus 2.0:07:44But the rest of this program is identical. Well, any time we have a bunch of things like this that are identical, we're going to have to come up with some sort of0:07:53abstraction to cover them. If you think about this, what you've learned so far is the rules of some language, some primitive, some means of0:08:03combination, almost all of them, the means of abstraction, almost all of them. But what you haven't learned is common patterns of usage.0:08:13Now, most of the time, you learn idioms when learning a language, which is a common pattern that mean things that are useful to know in a flash. And if you build a great number of them, if you're a0:08:22FORTRAN programmer, of course, everybody knows how to-- what do you do, for example, to get an integer which is the biggest integer in something.0:08:31It's a classic thing. Every FORTRAN programmer knows how to do that. And if you don't know that, you're in real hot water because it takes a long time to think it out. However, one of the things you can do in this language that0:08:41we're showing you is not only do you know something like that, but you give the knowledge of that a name. And so that's what we're going to be going after right now.0:08:53OK, well, let's see what these things have in common. Right over here we have what appears to be a general0:09:02pattern, a general pattern which covers all of the cases we've seen so far. There is a sum procedure, which is being defined.0:09:15It has two arguments, which are a lower bound and an upper bound. The lower bound is tested to be greater than the upper bound, and if it is greater, then the result is zero.0:09:27Otherwise, we're going to do something to the lower bound, which is the index of the conversation, and add that result to the result of following the procedure0:09:40recursively on our lower bound incremented by some next operation with the same upper bound as I had before.0:09:53So this is a general pattern, and what I'd like to do is be able to name this general pattern a bit.0:10:03Well, that's sort of easy, because one of the things I'm going to do right now is-- there's nothing very special about numbers. Numbers are just one kind of data.0:10:14It seems to me perfectly reasonable to give all sorts of names to all kinds of data, for example, procedures.0:10:23And now many languages allow you have procedural arguments, and right now, we're going to talk about procedural arguments. They're very easy to deal with. And shortly, we'll do some remarkable things that are not0:10:33like procedural arguments. So here, we'll define our sigma notation.0:10:43This is called sum and it takes a term, an A, a next0:10:55term, and B as arguments. So it takes four arguments, and there was nothing particularly special about me writing this in lowercase.0:11:06I hope that it doesn't confuse you, so I'll write it in uppercase right now. The machine doesn't care. But these two arguments are different.0:11:17These are not numbers. These are going to be procedures for computing something given a number. Term will be a procedure which, when given an index,0:11:26will produce the value of the term for that index. Next will be given an index, which will produce the next index. This will be for counting.0:11:36And it's very simple. It's exactly what you see. If A is greater than B, then the result is 0.0:11:52Otherwise, it's the sum of term applied to A and the sum0:12:04of term, next index.0:12:14Let me write it this way.0:12:29Now, I'd like you to see something, first of all. I was writing here, and I ran out of space. What I did is I start indenting according to the0:12:38Pretty-printing rule, which says that I align all of the arguments of the procedure so I can see which ones go together.0:12:47And this is just something I do automatically, and I want you to learn how to do that, too, so your programs can be read and understood. However, what do we have here?0:12:57We have four arguments: the procedure, the lower index-- lower bound index-- the way to get the next index, and the upper bound.0:13:09What's passed along on the recursive call is indeed the same procedure because I'm going to need it again, the0:13:18next index, which is using the next procedure to compute it, the procedure for computing next, which I also have to have separately, and that's different. The procedure for computing next is different from the0:13:27next index, which is the result of using next on the last index. And I also have to pass along the upper bound.0:13:37So this captures both of these and the other nice program that we are playing with.0:13:47So using this, we can write down the original program as instances of sum very simply.0:14:08A and B. Well, I'm going to need an identity procedure0:14:17here because ,ahh, the sum of the integers requires me to in0:14:29this case compute a term for every integer, but the term procedure doesn't want to do anything to that integer. So the identity procedure on A is A or X or whatever, and I0:14:41want to say the sum of using identity of the term procedure0:14:52and using A as the initial index and the incrementer being the way to get the next index and B being the high0:15:05bound, the upper bound. This procedure does exactly the same as the sum of the integers over here, computes the same answer.0:15:17Now, one thing you should see, of course, is that there's nothing very special over here about what I used as the formal parameter. I could have, for example, written this0:15:27X. It doesn't matter. I just wanted you to see that this name does not conflict with this one at all. It's an internal name.0:15:37For the second procedure here, the sum of the squares, it's even a little bit easier.0:15:53And what do we have to do? Nothing more than add up the squares, this is the procedure0:16:02that each index will be given, will be given each-- yes. Each index will have this done to it to get the term. That's the thing that maps against term over here.0:16:13Then I have A as the lower bound, the incrementer as the next term method, and B as the upper bound.0:16:26And finally, just for the thing that we did about pi sums, pi sums are sort of-- well, it's even easier to think about them this way0:16:35because I don't have to think. What I'm doing is separating the thing I'm adding up from the method of doing the addition. And so we have here, for example, pi sum A B0:16:57of the sum of things. I'm going to write the terms procedure here explicitly without giving it a name. This is done anonymously.0:17:07I don't necessarily have to give a name to something if I just want to use it once. And, of course, I can write sort of a expression that0:17:18produces a procedure. I'm going to write the Greek lambda letter here instead of L-A-M-B-D-A in general to avoid taking up a lot of space on blackboards.0:17:27But unfortunately, we don't have lambda keys on our keyboards. Maybe we can convince our friends in the computer industry that this is an important. Lambda of i is the quotient of 1 and the product of i and the0:17:43sum of i 2, starting at a with the way of incrementing being0:17:58that procedure of an index i, which adds i to 4, and b being0:18:08the upper bound. So you can see that this notation, the invention of the0:18:17procedure that takes a procedural argument, allows us to compress a lot of these procedures into one thing.0:18:26This procedure, sums, covers a whole bunch of ideas. Now, just why is this important? I tried to say before that it helps us divide a problem into0:18:37two pieces, and indeed, it does, for example, if someone came up with a different way of implementing this, which,0:18:46of course, one might. Here, for example, an iterative implementation of sum.0:18:55Iterative implementation for some reason might be better than the recursive implementation. But the important thing is that it's different.0:19:06Now, supposing I had written my program this way that you see on the blackboard on the left. That's correct, the left.0:19:17Well, then if I want to change the method of addition, then I'd have to change each of these. Whereas if I write them like this that you see here, then0:19:30the method by which I did the addition is encapsulated in the procedure sum. That decomposition allows me to independently change one part of the program and prove it perhaps without changing0:19:43the other part that was written for some of the other cases. Thank you. Are there any questions?0:19:52Yes, sir. AUDIENCE: Would you go over next A and next again on-- PROFESSOR: Yes. It's the same problem. I'm sure you're going to-- you're going to have to work on this. This is hard the first time you've ever seen0:20:01something like this. What I have here is a-- procedures can be named by variables.0:20:10Procedures are not special. Actually, sum square is a variable, which has gotten a value, which is a procedure. This is define sum square to be0:20:20lambda of A and B something. So the procedure can be named. Therefore, they can be passed from one to another, one procedure to another, as arguments.0:20:31Well, what we're doing here is we're passing the procedure term as an argument to sum just when we get it around in the next recursive.0:20:41Here, we're passing the procedure next as an argument also. However, here we're using the procedure next.0:20:50That's what the parentheses mean. We're applying next to A to get the next value of A. If you look at what next is mapped against, remember that0:20:59the way you think about this is that you substitute the arguments for the formal parameters in the body. If you're ever confused, think of the thing that way.0:21:10Well, over here, with sum of the integers. I substitute identity for a term and 1 plus the0:21:21incrementer for next in the body. Well, the identity procedure on A is what I get here.0:21:30Identity is being passed along, and here, I have increment 1 plus being applied to A and 1 plus is being0:21:41passed along. Does that clarify the situation? AUDIENCE: We could also define explicitly those two functions, then pass them.0:21:51PROFESSOR: Sure. What we can do is we could have given names to them, just like I did here. In fact, I gave you various ways so you could see it, a variety. Here, I define the thing which I passed the name of.0:22:05I referenced it by its name. But the thing is, in fact, that procedure, one argument X, which is X. And the identity procedure is just0:22:14lambda of X X. And that's what you're seeing here. Here, I happened to just write its canonical name there for0:22:26you to see. Is it OK if we take our five-minute break?0:23:15As I said, computers to make people happy, not people to make computers happy. And for the most part, the reason why we introduce all this abstraction stuff is to make it so that programs can0:23:26be more easily written and more easily read. Let's try to understand what's the most complicated program we've seen so far using a little bit of0:23:36this abstraction stuff. If you look at the slide, this is the Heron of Alexandria's method of computing square roots that we saw yesterday.0:23:51And let's see. Well, in any case, this program is a little0:24:00complicated. And at the current state of your thinking, you just can't look at that and say, oh, this obviously means something very clear.0:24:10It's not obvious from looking at the program what it's computing. There's some loop here inside try, and a loop does something0:24:21about trying the improvement of y. There's something called improve, which does some0:24:30averaging and quotienting and things like that. But what's the real idea? Can we make it clear what the idea is? Well, I think we can.0:24:41I think we can use abstraction that we have learned about so far to clarify what's going on. Now, what we have mathematically is a procedure0:24:54for improving a guess for square roots. And if y is a guess for a square root, then what we want to get we'll call a function f.0:25:04This is the means of improvement. I want to get y plus x/y over 2, so the average of y and x0:25:17divided by y as the improved value for the square root of x such that-- one thing you can notice about this function f0:25:27is that f of the square root of f is in fact the0:25:36square root of x. In other words, if I take the square root of x and substitute it for y here, I see the square root of x plus x divided by the square of x, which is the square root of x.0:25:47That's 2 times the square root of x divided by 2, is the square root of x. So, in fact, what we're really looking for is we're looking for a fixed point, a fixed point of the function f.0:26:17A fixed point is a place which has the property that if you put it into the function, you get the same value out.0:26:27Now, I suppose if I were giving some nice, boring lecture, and you happened to have in front of you an HP-35 desk calculator like I used to have when I0:26:36went to boring lectures. And if you think it was really boring, you put it into radians mode, and you hit cosine, and you hit cosine, and you hit cosine.0:26:45And eventually, you end up with 0.734 or something like that. 0.743, I don't remember what exactly, and it gets closer and closer to that.0:26:54Some functions have the property that you can find their fixed point by iterating the function, and that's0:27:03essentially what's happening in the square root program by Heron's method. So let's see if we can write that down, that idea.0:27:14Now, I'm not going to say how I compute fixed points yet. There might be more than one way. But the first thing to do is I'm going to say what I just said.0:27:24I'm going to say it specifically, the square root. The square root of x is the fixed point of that procedure0:27:48which takes an argument y and averages of x0:27:59divided by y with y. And we're going to start up with the initial guess for the0:28:08fixed point of 1. It doesn't matter where it starts. A theorem having to do with square roots.0:28:18So what you're seeing here is I'm just trying to write out by wishful thinking. I don't know how I'm going to make fixed point happen. We'll worry about that later. But if somehow I had a way of finding the fixed point of the0:28:29function computed by this procedure, then I would have-- that would be the square root that I'm looking for.0:28:39OK, well, now let's see how we're going to write-- how we're going to come up with fixed points. Well, it's very simple, actually. I'm going to write an abbreviated version here just so we understand it.0:29:00I'm going to find the fixed point of a function f-- actually, the fixed point of the function computed by the procedure whose name will be f in this procedure.0:29:09How's that? A long sentence-- starting with a particular starting value.0:29:19Well, I'm going to have a little loop inside here, which is going to push the button on the calculator repeatedly, hoping that it will eventually converge.0:29:28And we will say here internal loops are written by defining internal procedures.0:29:39Well, one thing I'm going to have to do is I'm going to have to say whether I'm done. And the way I'm going to decide when I'm done is when the old value and the new value are close enough so I can't distinguish them anymore.0:29:50That's the standard thing you do on the calculator unless you look at more precision, and eventually, you run out of precision. So the old value and new value, and I'm going to stay0:30:06here if I can't distinguish them if they're close enough, and we'll have to worry about what that is soon.0:30:20The old value and the new value are close enough to each other and let's pick the new value as the answer. Otherwise, I'm going to iterate around again with the0:30:33next value of old being the current value of new and the next value of new being the result of calling f on new.0:30:54And so this is my iteration loop that pushes the button on the calculator. I basically think of it as having two registers on the calculator: old and new. And in each step, new becomes old, and new gets F of new.0:31:09So this is the thing where I'm getting the next value. And now, I'm going to start this thing up0:31:20by giving two values. I wrote down on the blackboard to be slow0:31:30so you can see this. This is the first time you've seen something quite this complicated, I think. However, we might want to see the whole thing over here in0:31:44this transparency or slide or whatever. What we have is all of the details that are required to0:31:57make this thing work. I have a way of getting a tolerance for a close enough procedure, which we see here. The close enough procedure, it tests whether u and v are0:32:06close enough by seeing if the absolute value of the difference in u and v is less than the given tolerance, OK? And here is the iteration loop that I just wrote on the blackboard and the initialization for it, which0:32:17is right there. It's very simple.0:32:34But let's see. I haven't told you enough. It's actually easier than this. There is more structure to this problem than I've already told you.0:32:43Like why should this work? Why should it converge? There's a hairy theorem in mathematics tied up in what I've written here.0:32:52Why is it that I should assume that by iterating averaging the quotient of x and y and y that I should get the right answer? It isn't so obvious.0:33:03Surely there are other things, other procedures, which compute functions whose fixed points would also be the square root.0:33:12For example, the obvious one will be a new function g, which maps y to x/y.0:33:27That's even simpler. The fixed point of g is surely the square root also, and it's a simpler procedure.0:33:37Why am I not using it? Well, I suppose you know. Supposing x is 2 and I start out with 1, and if I divide 1 into 2, I get 2.0:33:47And then if I divide 2 into 2, I get 1. If I divide 1 into 2, I get 2, and 2 into 2, I get 1, and I never get any closer to the square root. It just oscillates.0:33:59So what we have is a signal processing system, an electrical circuit which is oscillating, and I want to damp out these oscillations.0:34:10Well, I can do that. See, what I'm really doing here when I'm taking my average, the average is averaging the last two values of something which oscillates, getting something in between.0:34:21The classic way is damping out oscillations in a signal processing system. So why don't we write down the strategy that I just said in a0:34:31more clear way? Well, that's easy enough. I'm going to define the square root of x to be a fixed point0:34:53of the procedure resulting from average damping. So I have a procedure resulting from average damp of0:35:10the procedure, that procedure of y, which divides x by y0:35:24starting out at 1. Ah, but average damp is a special procedure that's going0:35:33to take a procedure as its argument and return a procedure as its value. It's a generalization that says given a procedure, it's0:35:42the thing which produces a procedure which averages the last value and the value before and after running the procedure.0:35:51You can use it for anything if you want to damp out oscillations. So let's write that down. It's very easy.0:36:00And stylistically here, I'm going to use lambda notation because it's much easier to think when you're dealing with procedure, the mid-line procedures, to understand that the procedures are the objects I'm dealing with, so I'm going0:36:11to use lambda notation here. Not always. I don't always use it, but very specifically here to expand on that idea, to elucidate it.0:36:28Well, average damp is a procedure, which takes a procedure as its argument, which we will call f.0:36:37And what does it produce? It produces as its value-- the body of this procedure is a thing which produces a procedure, the construct of the procedures right here, of0:36:47one argument x, which averages f of x with x.0:37:10This is a very special thing. I think for the first time you're seeing a procedure which produces a procedure as its value.0:37:21This procedure takes the procedure f and does something to it to produce a new procedure of one argument x, which averages f--0:37:31this f-- applied to x and x itself. Using the context here, I apply average damping to the0:37:40procedure, which just divides x by y. It's a division. And I'm finding to fixed point of that, and that's a clearer0:37:51way of writing down what I wrote down over here, wherever it was. Here, because it tells why I am writing this down.0:38:07I suppose this to some extent really clarifies what Heron of Alexandria was up to. I suppose I'll stop now. Are there any questions?0:38:18AUDIENCE: So when you define average damp, don't you need to have a variable on f? PROFESSOR: Ah, the question was, and here we're having--0:38:28again, you've got to learn about the syntax. The question was when defining average damp, don't you have to have a variable defined with f?0:38:38What you are asking about is the formal parameter of f? AUDIENCE: Yeah. PROFESSOR: OK. The formal parameter of f is here. The formal parameter of f--0:38:47AUDIENCE: The formal parameter of average damp. PROFESSOR: F is being used to apply it to an argument, right? It's indeed true that f must have a formal parameter.0:38:57Let's find out what f's formal parameter is. AUDIENCE: The formal parameter of average damp. PROFESSOR: Oh, f is the formal parameter of average damp. I'm sorry. You're just confusing a syntactic thing.0:39:07I could have written this the other way. Actually, I didn't understand your question. Of course, I could have written it this other way.0:39:19Those are identical notations. This is a different way of writing this.0:39:31You're going to have to get used to lambda notation because I'm going to use it. What it says here, I'm defining the name average damp0:39:40to name the procedure whose of one argument f. That's the formal parameter of the procedure average damp.0:39:49What define does is it says give this name a value. Here is the value of for it.0:40:01That there happens to be a funny syntax to make that easier in some cases is purely convenience.0:40:10But the reason why I wrote it this way here is to emphasize that I'm dealing with a procedure that takes a procedure as its argument and produces a procedure as its value.0:40:23AUDIENCE: I don't understand why you use lambda twice. Can you just use one lambda and take two arguments f and x? PROFESSOR: No. AUDIENCE: You can't? PROFESSOR: No, that would be a different thing.0:40:32If I were to write the procedure lambda of f and x, the average of f of x and x, that would not be something which would be allowed to take a procedure as an argument and0:40:42produce a procedure as its value. That would be a thing that takes a procedure as its argument and numbers its argument and produces a new number. But what I'm producing here is a procedure to fit in the0:40:53procedure slot over here, which is going to be used over here. So the number has to come from here. This is the thing that's going to eventually end up in the x.0:41:04And if you're confused, you should do some substitution and see for yourself. Yes? AUDIENCE: Will you please show the definition for average0:41:15damp without using lambda notation in both cases. PROFESSOR: I can't make a very simple one like that. Let me do it for you, though. I can get rid of this lambda easily.0:41:26I don't want to be-- actually, I'm lying to you. I don't want to do what you want because I think it's more0:41:37confusing than you think. I'm not going to write what you want.0:41:55So we'll have to get a name. FOO of x to be of F of x and x and return as a value FOO.0:42:17This is equivalent, but I've had to make an arbitrary name up. This is equivalent to this without any lambdas.0:42:26Lambda is very convenient for naming anonymous procedures. It's the anonymous name of something. Now, if you really want to know a cute way of doing this,0:42:39we'll talk about it later. We're going to have to define the anonymous procedure. Any other questions?0:42:49And so we go for our break again.0:43:31So now we've seen how to use high-order procedures, they're called. That's procedures that take procedural arguments and produce procedural values to help us clarify and abstract0:43:43some otherwise complicated processes. I suppose what I'd like to do now is have a bit of fun with that and sort of a little practice as well.0:43:54So let's play with this square root thing even more. Let's elaborate it and understand what's going on and make use of this kind of programming style.0:44:04One thing that you might know is that there is a general method called Newton's method the purpose of which is to find the roots--0:44:15that's the zeroes-- of functions. So, for example, to find a y such that f of y equals 0, we0:44:38start with some guess. This is Newton's method.0:44:51And the guess we start with we'll call y0, and then we will iterate the following expression.0:45:01y n plus 1-- this is a difference equation-- is yn minus f of yn over the derivative with respect to y0:45:17of f evaluated at y equal yn. Very strange notation.0:45:26I must say ugh. The derivative of f with respect to y is a function.0:45:35I'm having a little bit of unhappiness with that, but that's all right. It turns out in the programming language world, the notation is much clearer. Now, what is this?0:45:45People call it Newton's method. It's a method for finding the roots of the function f.0:45:54And it, of course, sometimes converges, and when it does, it does so very fast. And sometimes, it doesn't converge, and, oh well, we have to do something else.0:46:03But let's talk about square root by Newton's method. Well, that's rather interesting. Let's do exactly the same thing we did last time: a bit of wishful thinking.0:46:13We will apply Newton's method, assuming we knew how to do it. You don't know how to do it yet. Well, let's go.0:46:25What do I have here? The square root of x. It's Newton's method applied to a procedure which will0:46:37represent that function of y, which computes that function of y. Well, that procedure is that procedure of y, which is the0:46:48difference between x and the square of y.0:47:00Indeed, if I had a value of y for which this was zero, then y would be the square root of x.0:47:13See that? OK, I'm going to start this out searching at 1. Again, completely arbitrary property of square roots that0:47:23I can do that. Now, how am I going to compute Newton's method? Well, this is the method.0:47:32I have it right here. In fact, what I'm doing is looking for a fixed point of some procedure.0:47:41This procedure involves some complicated expressions in terms of other complicated things. Well, I'm trying to find the fixed point of this. I want to find the values of y, which if I put y in here, I0:47:54get the same value out here up to some degree of accuracy. Well, I already have a fixed point process around to do that.0:48:05And so, let's just define Newton's method over here.0:48:19A procedure which computes a function and a guess, initial guess. Now, I'm going to have to do something here.0:48:28I'm going to need the derivative of the function. I'm going to need a procedure which computes the derivative of the function computed by the given a procedure f.0:48:42I'm trying to be very careful about what I'm saying. I don't want to mix up the word procedure and function. Function is a mathematical word. It says I'm mapping from values to other values, a set0:48:52of ordered pairs. But sometimes, I'll accidentally mix those up. Procedures compute functions.0:49:07So I'm going to define the derivative of f to be by wishful thinking again. I don't know how I'm going to do it. Let's worry about that later--0:49:18of F. So if F is a procedure, which happens to be this one over here for a square root, then DF will be the derivative0:49:31of it, which is also the derivative of the function computed by that procedure. DF will be a procedure that computes the derivative of the function computed by the procedure F. And then given0:49:42that, I will just go looking for a fixed point.0:49:51What is the fixed point I'm looking for? It's the one for that procedure of one argument x, which I compute by subtracting x.0:50:00That's the old-- that's the yn here. The quotient of f of x and df of x, starting out with the0:50:21original guess. That's all very simple.0:50:32Now, I have one part left that I haven't written, and I want you to see the process by which I write these things, because this is really true. I start out with some mathematical idea, perhaps.0:50:43By wishful thinking, I assume that by some magic I can do something that I have a name for. I'm not going to worry about how I do it yet.0:50:54Then I go walking down here and say, well, by some magic, I'm somehow going to figure how to do that, but I'm going to write my program anyway.0:51:04Wishful thinking, essential to good engineering, and certainly essential to a good computer science. So anyway, how many of you wished that your0:51:15computer ran faster? Well, the derivative isn't so bad either. Sort of like average damping.0:51:28The derivative is a procedure that takes a procedure that computes a function as its argument, and it produces a0:51:38procedure that computes a function, which needs one argument x. Well, you all know this definition. It's f of x plus delta x minus f of x over delta x, right?0:51:49For some small delta x. So that's the quotient of the difference of f of the sum of0:51:59x and dx minus f point x divided by dx.0:52:18I think the thing was lining up correctly when I balanced the parentheses. Now, I want you to look at this.0:52:27Just look. I suppose I haven't told you what dx is. Somewhere in the world I'm going to have to write down0:52:44something like that. I'm not interested. This is a procedure which takes a procedure and produces an approximation, a procedure that computes an approximation0:52:55of the derivative of the function computed by the procedure given by the standard methods that you all know and love.0:53:04Now, it may not be the case that doing this operation is such a good way of approximating a derivative. Numerical analysts here should jump on me and0:53:14say don't do that. Computing derivatives produces noisy answers, which is true. However, this again is for the sake of understanding.0:53:24Look what we've got. We started out with what is apparently a mathematically complex thing. and. In a few blackboards full, we managed to decompose the0:53:35problem of computing square roots by the way you were taught in your college calculus class-- Newton's method-- so that it can be understood.0:53:45It's clear. Let's look at the structure of what it is we've got. Let's look at this slide.0:53:54This is a diagram of the machine described by the0:54:03program on the blackboard. There's a machine described here. And what have I got? Over here is the Newton's method function f that we have0:54:17on the left-most blackboard. It's the thing that takes an argument called y and puts out the difference between x and the square of y, where x is0:54:32some sort of free variable that comes in from the outside by some magic. So the square root routine picks up an x, and builds this0:54:43procedure, which I have the x rolled up in it by substitution. Now, this procedure in the cloud is fed in as the f into0:54:58the Newton's method which is here, this box. The f is fanned out.0:55:08Part of it goes into something else, and the other part of it goes through a derivative process into something else to produce a procedure, which computes the function which is0:55:20the iteration function of Newton's method when we use the fixed point method. So this procedure, which contains it by substitution--0:55:33remember, Newton's method over here, Newton's method builds this procedure, and Newton's method has in it defined f and0:55:43df, so those are captured over here: f and df. Starting with this procedure, I can now feed this to the fixed point process within an initial guess coming out from0:55:55the outside from square root to produce the square root of x. So what we've built is a very powerful engine, which allows0:56:07us to make nice things like this. Now, I want to end this with basically an idea of Chris0:56:19Strachey, one of the grandfathers of computer science. He's a logician who lived in the-- I suppose about 10 years ago or 15 years ago, he died.0:56:30I don't remember exactly when. He's one of the inventors of something called denotational semantics. He was a great advocate of making procedures or functions0:56:40first-class citizens in a programming language. So here's the rights and privileges of first-class citizens in a programming language.0:56:50It allows you to make any abstraction you like if you have functions as first-class citizens. The first-class citizens must be able0:56:59to be named by variables. And you're seeing me doing that all the time. Here's a nice variable which names a procedure which computes something.0:57:13They have to be passed as arguments to procedures. We've certainly seen that. We have to be able to return them as values from procedures.0:57:23And I suppose we've seen that. We haven't yet seen anything about data structures. We will soon, but it's also the case that in order to have a first-class citizen in a programming language, the0:57:33object has to be allowed to be part of a data structure. We're going to see that soon. So I just want to close with this and say having things0:57:43like procedures as first-class data structures, first-class data, allows one to make powerful abstractions, which encode general methods like Newton's method0:57:53in very clear way. Are there any questions? Yes. AUDIENCE: Could you put derivative instead of df directly in the fixed point?0:58:02PROFESSOR: Oh, sure. Yes, I could have put deriv of f right here, no question.0:58:11Any time you see something defined, you can put the thing that the definition is there because you get the same result.0:58:21In fact, what that would look like, it's interesting. AUDIENCE: Lambda. PROFESSOR: Huh? AUDIENCE: You could put the lambda expression in there. PROFESSOR: I could also put derivative of f here. It would look interesting because of the open paren,0:58:32open paren, deriv of f, closed paren on an x. Now, that would have the bad property of computing the derivative many times, because every time I would run this0:58:43procedure, I would compute the derivative again. However, the two open parens here both would be meaningful.0:58:52I want you to understand syntactically that that's a sensible thing. Because if was to rewrite this program-- and I should do it right here just so you see because that's a good question--0:59:11of F and guess to be fixed point of that procedure of one0:59:25argument x, which subtracts from x the quotient of F0:59:34applied to x and the deriv of F applied to x.0:59:53This is guess. This is a perfectly legitimate program,1:00:02because what I have here-- remember the evaluation rule. The evaluation rule is evaluate all of the parts of the combination: the operator and the operands.1:00:12This is the operator of this combination. Evaluating this operator will, of course, produce the1:00:21derivative of F. AUDIENCE: To get it one step further, you could put the1:00:30lambda expression there, too. PROFESSOR: Oh, of course. Any time I take something which is define, I can put the thing it's defined to be in the place where the thing1:00:40defined is. I can't remember which is definiens and which is definiendum. When I'm trying to figure out how to do a lecture about this1:00:50in a freshman class, I use such words and tell everybody it's fun to tell their friends.1:00:59OK, I think that's it.

`0:00:00`Lecture 2B | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING]0:00:21PROFESSOR: Well, so far in this course we've been talking about procedures, and then just to remind you of this framework that we introduced for talking about languages,0:00:31we talked about the primitive things that are built into the system. We mentioned some means of combination by which you take the primitive things0:00:40and you make more complicated things. And then we talked about the means of abstraction, how you can take those complicated things and name them so you can use them as simple building blocks.0:00:49And then last time you saw we went even beyond that. We saw that by using higher order procedures, you can actually express general methods for computing things.0:00:58Like the method of doing something by fixed points, or Newton's method, and so the incredible expressive power you can get just by combining these means of abstraction.0:01:08And the crucial idea in all of this is the one that we build a layered system. So for instance, if we're writing the square root0:01:17procedure, somewhere the square root procedure uses a procedure called good-enough,0:01:31and between those there is some sort of abstraction boundary. It's almost as if we go out and in writing square root,0:01:41we go and make a contract with George, and tell George that his job is to write good-enough, and so long as good-enough works,0:01:50we don't care what it does. We don't care exactly how it's implemented. There are levels of detail here that are George's concern and not ours.0:02:00So for instance, George might use an absolute value procedure that's written by Harry, and we don't much care about that or even know that, maybe, Harry exists.0:02:13So the crucial idea is that when we're building things, we divorce the task of building things from the task of implementing the parts.0:02:27And in a large system, of course, we have abstraction barriers like this at lots, and lots, and lots of levels. And that's the idea that we've been using so far over and over0:02:36in implementing procedures. Well, now what we're going to do is look at the same issues for data. We're going to see that the system has primitive data.0:02:46In fact, we've already seen that. We've talked about numbers as primitive data. And then we're going to see their means of combination for data. There's glue that allows you to put primitive data together0:02:55to make more complicated, kind of compound data. And then we're going to see a methodology for abstraction0:03:04that's a very good thing to use when you start building up data in terms of simpler data. And again, the key idea is that you're going to build the system in layers0:03:13and set up abstraction barriers that isolate the details at the lower layers from the thing that's going on at the upper layers. The details at the lower layers, the ideas, they won't matter.0:03:25They're going to be George's concern because he signed this contract with us for how the stuff that he implements behaves, and how he implements the thing is his problem.0:03:36All right, well let's look at an example. And the example I'm going to talk about is a system that does arithmetic on rational numbers. And what I have in mind is that we should have something0:03:46in the computer that allows us to ask it, like, what's the sum of 1/2 and 1/4, and somehow the system0:03:56should say, yeah, that's 3/4. Or we should be able to say what's 3/4 times 2/3,0:04:11and the system should be able to say, yeah, that's 1/2. Right? And you know what I have in mind. And you also know how to do this from, I don't know,0:04:20fifth grade or sixth grade. There are these formulas that say if I have some fraction which is a numerator over a denominator, and I want to add that to some other fraction which0:04:31is another numerator over another denominator, then the answer is the numerator of the first times the denominator of the second, plus the numerator0:04:43of the second times the denominator of the first. That's the numerator of the answer, and the denominator is the product0:04:52of the two denominators. Right? So there's something from fifth or sixth grade fraction arithmetic. And then similarly, if I want to multiply two things, n1 over d1 multiplied by n2 over d20:05:05is the product of the numerators over the product of the denominators.0:05:14So it's no problem at all, but it's absolutely no problem to think about what computation you want to make in adding and multiplying these fractions.0:05:23But as soon as we go to implement it, we run up across something. We don't have what a rational number is.0:05:33So we said that the system gives us individual numbers, so we can have 5 and 3, but somehow we0:05:42don't have a way of saying there's a thing that has both a 3 and a 4 in it, or both a 2 and a 3. It's almost as if we'd like to imagine that somehow there0:05:54are these clouds, and a cloud somehow has both a numerator and a denominator in it, and that's what we'd like to work in terms of.0:06:06Well, how are we going to solve that problem? We're going to solve that problem by using this incredibly powerful design strategy that you've already seen us use over and over.0:06:16And that's the strategy of wishful thinking.0:06:25Just like before when we didn't have a procedure, we said, well, let's imagine that that procedure already exists. We'll say, well, let's imagine that we have these clouds.0:06:36Now more precisely what I mean is let's imagine that we have three procedures, one called make-RAT.0:06:47make-RAT is going to take as arguments two numbers, so I'll call them numerator and denominator,0:06:57and it'll return for us a cloud-- one of these clouds. I don't really know what a cloud is.0:07:07It's whatever make-RAT returns, that's its business. And then we're going to say, suppose we've got one of these clouds, we have a procedure called numer, which takes in a cloud that has an n and a d in it,0:07:20whatever a cloud is, and I don't know what it is, and returns for us the numerator part. And then we'll assume we have a procedure denom,0:07:31which again takes in a cloud, whatever a cloud is, and returns for us the denominator [? required. ?] This is just like before, when if we're0:07:40building a square root, we assume that we have good enough. Right? And what we'll say is, we'll go find George, and we'll say to George, well, it's your business0:07:49to make us these procedures. And how you choose to implement these clouds, that's your problem. We don't want to know.0:07:58Well, having pushed this task off onto George, then it's pretty easy to do the other part. Once we've got the clouds, it's pretty easy0:08:07to write the thing that does say addition of rational numbers. You can just say define, well, let's say +RAT.0:08:21Define +RAT, which will take in two rational numbers, x and y. x and y are each these clouds.0:08:31And what does it do? Well, it's going to return for us a rational number.0:08:40What rational number is it? Well, we've got the formulas there. The numerator of it is the sum of the product of the numerator0:08:52of x and the denominator of y.0:09:02It's one thing in the sum. And the other thing in the numerator is the product of the numerator of y and the denominator of x.0:09:19The star, close the plus. Right, that's the first argument to make-RAT, which is the numerator of the thing I'm constructing. And then the rest of the thing goes0:09:28into make-RAT is the denominator of the answer, which is the product of the denominator of x0:09:37and the denominator of y. Like that.0:09:46OK? So there is the analog of doing rational number addition. And it's no problem at all, assuming that we have these clouds.0:09:59And of course, we can do multiplication in the same way. Define how to get the product of two rational numbers,0:10:11call it *RAT. Takes in two of these clouds, x and y, it returns0:10:20a rational number, make-RAT, whose numerator is the product of the numerators-- numerator of x0:10:32times the numerator of y. And the denominator of the thing it's going to return0:10:41is the product of the denominators.0:10:57Well, except that I haven't told you what these clouds are, that's all there is to it. See, what did I do? I assumed by wishful thinking that I0:11:08had a new kind of data object. And in particular, I assumed I had ways of creating these data objects. Make-RAT creates one of these things.0:11:18This is called a constructor. All right, I have a thing that constructs such data objects.0:11:29And then I assume I have things that, having made these things, I have ways of getting the parts out. Those are called selectors.0:11:42And so formally, what I said is I assumed I had procedures that are constructors and selectors for these data objects, and then I went off and used them.0:11:52That's no different in kind from saying I assume I have a procedure good-enough, and I go use it to implement square root. OK, well before we go on, let's ask0:12:05the question of why do we want to do this in the first place? See, why do we want a procedure like +RAT that takes in two0:12:16rational numbers and returns a rational number? See, another way to think about this is, well, here's this formula.0:12:25And I've also got to implement something that adds rational numbers. One other way to think about is, well, there's this thing, and I type in four numbers, an n1, and a d1,0:12:34and an n2, and a d2. And it sets some registers in the machine to this numerator and this denominator. So I might say, well, why don't I0:12:43just add rational numbers by I type in four numbers, numerators and denominators, and get out two numbers, which is a numerator and a denominator. Why are we worrying about building things0:12:54like this anyway? Well, the answer is, suppose you want to think about expressing something like this,0:13:06suppose I'd like to express the idea of taking two rational numbers, x plus y, say,0:13:15and multiplying that by the sum of two other rational numbers. Well, the way I do it, having things like +RAT and *RAT,0:13:28is I'd say, oh yeah, what that is is just the product. That's *RAT of the sum of x and y and the sum of s and t.0:13:51So except for syntax, I get an expression that looks like the way I want to think about it mathematically. I want to say there are two numbers.0:14:02There's a thing which is the sum of them, and there's a thing which is the sum of these two. That's this and this. And then I multiply them.0:14:12So I get an expression that matches this expression. If I did the other thing, if I said, well, the way I want to think about this is I type into my machine four numbers, which are the numerators and the denominators of x and y,0:14:24and then four more numbers, which are the numerators and denominators of s and t. And then what I'd be sitting with is, well, what would I do? I'd add these, and somehow I'd have0:14:33to have two temporary variables, which are the numerators and denominators of this sum, and I'd go off and store them someplace.0:14:42And then I'd go over here, I'd type in four more numbers, I'd get two more temporary variables, which are the numerators and denominators of s and t. And then finally, I put those together by multiplying them.0:14:54You see, what's starting to happen, there are all these temporary variables, which are sort of the guts of the internals of these rational numbers that start hanging0:15:04out all over the system. And of course, if I had more and more complicated expressions, there'd be more and more guts hanging out that confuse my programming.0:15:13And those of you who sort of programmed things like that, where you're just adding numbers in assembly language, you sort of see you have to suddenly be concerned with these temporary variables.0:15:23But more importantly than confusing my programming, they're going to confuse my mind. Because the whole name of this game0:15:33is that we'd like the programming language to express the concepts that we have in our heads, like rational numbers are things that you can add and then take0:15:43that result and multiply them. Let's break for questions.0:15:59Yeah? AUDIENCE: I don't quite see the need- when we had make-RAT with the numerator and denominator, we had to have the numerator and denominator to pass as parameters to create the cloud,0:16:08and then we extracted to get back what we had to have originally. PROFESSOR: That's right. So the question is, I sort of have the numerator and the denominator,0:16:17why am I worrying about having the cloud given that I have to get the pieces out? That's sort of what I tried to say at the end, but let me try and say it again, because that's really0:16:27the crucial question. The point is, I want to carry this numerator and denominator around together all the time.0:16:36And it's almost as if I want to know, yeah, there's a numerator and denominator in there, but also, I would like to say, fine, but from another point0:16:47of view, that's x. And I carry x around, and I name it as x, and I hold it. And I can say things like, the sum of x and y, rather than just have-- see, it's not so bad when I only0:16:58think about x, but if I have a system with 10 rational numbers, suddenly I have 20 numerators and denominators, which are not necessarily-- if I don't link them, then it's just 20 arbitrary numbers that are not0:17:09linked in any particular way. It's a lot like saying, well, I have these instructions that are the body of the procedures, why do I want to package them and say it's the procedure? It's exactly the same idea.0:17:31No? OK. Let's break, let's just stretch and get somebody-- [INAUDIBLE] [MUSIC PLAYING]0:18:27OK, well, we've been working on this rational number arithmetic system, and then what we did, the important thing about what we did, is we thought about the problem0:18:37by breaking it into two pieces. We said, assume there is this contract with George, and George has figured out the way to how to construct these clouds,0:18:47provided us procedures make-RAT, which was a constructor, and selectors, which are numerator and denominator. And then in terms of that, we went off0:18:56and implemented addition and multiplication of rational numbers. Well, now let's go look at George's problem. How can we go and package together0:19:05a numerator and a denominator and actually make one of these clouds? See, what we need is a kind of glue, a glue for data objects0:19:15that allows us to put things together. And Lisp provides such a glue, and that glue is called list structure.0:19:30List structure is a way of gluing things together, and more precisely, Lisp provides a way of constructing things called pairs.0:19:44There's a primitive operator in Lisp called cons. We can take a look at it.0:19:54There's a thing called cons. Cons is an operator which takes in two arguments called0:20:03x and y, and it returns for us a thing called a pair. All right, so a thing called a pair that has a first part0:20:17a second part. So cons takes two objects. There's a thing called a pair.0:20:26The first part of the cons is x, and the second part of the cons is y. And that's what it builds. And then we also assume we have ways of getting things out.0:20:36If you're given a pair, there's a thing called car, and car of a pair, p, gives you out the first part of the pair, p.0:20:46And there's a thing called cdr, and cdr of the pair, p, gives you the second part of the pair, p. OK, so that's how we construct things.0:20:56There's also a conventional way of drawing pictures of these things. Just like we write down that as the conventional way of writing0:21:10Plato's idea of two, the way we could draw a diagram to represent cons of two and three is like this.0:21:21We draw a little box. And so here's the box we're talking about, and this box has two arrows coming out of it.0:21:30And say the first part of this pair is 2, and the second part of this pair is 3. And this notation has a name, it's0:21:40called box and pointer notation.0:21:55By the way, let me say right now that a lot of people get confused that there's some significance to the geometric way I drew these pointers, the directions. Like some people think it'd be different0:22:05if I took this pointer and turned it up here, and put the 3 out here. That has no significance. All right? It's merely you have a bunch of arrows, these pointers, and the boxes.0:22:15The only issue is how they're connected, not the geometric arrangement of whether I write the pointer across, or up, or down. Now it's completely un-obvious, probably,0:22:26why that's called list structure. We're not actually going to talk about that today. We'll see that next time.0:22:37So those are pairs, there's cons that constructs them. And what I'm going to know about cons, and car, and cdr, is precisely that if I have any x and y, all right,0:22:51if I have any things x and y, and I use cons to construct a pair, then the car of that pair0:23:01is going to be x, the thing I put in, and the cdr of that pair is going to be y. That's the behavior of these operators, cons, car, and cdr.0:23:12Given them, it's pretty clear how George can go off and construct his rational numbers. After all, all he has to do-- remember George's problem was to implement make-RAT, numerator,0:23:22and denom. So all George has to do is say define make-RAT of some n and a d-- so all I have to do is cons them.0:23:40That's cons of n and d. And then if I want to get the numerator out, I would say define the numerator, numer,0:23:57of some rational number, x. If the rational number's implemented as a pair, then all I have to do is get out the car of x.0:24:06And then similarly, define the denom is going to be the cdr,0:24:19the other thing I put into the pair. Well, now we're in business.0:24:28That's a complete implementation of rational numbers. Let's use it. Suppose I want to say, so I want to think about how to add 1/2 plus 1/4 and watch the system work.0:24:43Well, the way I'd use that is I'd say, well, maybe define a. I have to make a 1/2.0:24:53Well, that's a rational number with numerator 1 and denominator 2, so a will be make-RAT of 1 and 2.0:25:05And then I'll construct the 1/4. I'll say define d to be make-RAT of 1 and 4.0:25:23And if I'd like to look at the answer-- well, assuming I don't have a special thing that prints rational numbers, or I could make one-- I could say, for instance, define the answer to be +RAT of a and b, and now I can say,0:25:46what's the answer? What are the numerators and denominators of the answer? So if I'm adding 1/2 and 1/4, I'll say, what is the numerator of the answer?0:26:04And the system is going to type out, well, 6. Bad news.0:26:13And if I say what's the denominator of the answer,0:26:22the system's going to type out 8. So instead of what I would really like, which is for it to say that 1/2 and 1/4 is 3/4,0:26:35this foolish machine is going to say, no, it's 6/8. Well, that's sort of bad news. Where's the bug?0:26:47Why does it do that, after all? Well, it's the way that we just had +RAT. +RAT just took the-- it said you add the numerator times0:26:56the denominator, you add that to the numerator times the denominator, and put that over the product of the two denominators, and that's why you get 6/8.0:27:05So what was wrong with our implementation of +RAT? What's wrong with that rational number arithmetic stuff that we did before the break?0:27:15Well, the answer is one way to look at it is absolutely nothing's wrong. That's perfectly good implementation. It follows the sixth grade, fifth grade mathematic0:27:25for adding fractions. One thing we can say is, well, that's George's problem. Like, boy, wasn't George dumb to say0:27:36that he can make a rational number simply by sticking together the numerator and the denominator? Wouldn't it be better for George,0:27:45when he made a rational number, to reduce the stuff to lowest terms? And what I mean is, wouldn't it be better for George,0:27:55instead of using this version of make-RAT, to use this one on the slide? Or instead of just saying cons together n and d, what you do0:28:09is compute the greatest common divisor of n and d, and gcd is the procedure which, well, for all we care is a primitive, which computes the greatest common divisor of two numbers.0:28:20So the way I can construct a rational number is get the greatest common divisor of the two numbers, and I'm going to call that g, and then0:28:30instead of consing together n and d, I'll divide them through. I'll cons together the quotient of n by the the gcd and the quotient of d by the gcd.0:28:40And that will reduce the rational number to lowest terms. So when I do this addition, when +RAT calls make-RAT--0:28:54and for the definition of +RAT it had a make-RAT in there-- just by the fact that it's constructing that, the thing will get reduced to lowest terms automatically.0:29:09OK, that is a complete system. For rational number arithmetic, let's look at what we've done.0:29:19All right, we said we want to build rational number arithmetic, and we had a thing called +RAT. We implemented that.0:29:29And I showed you multiplying rational numbers, and although I didn't put them up there, presumably we'd like to have something that subtracts rational numbers, and I don't know,0:29:39all sorts of things. Things that test equality in division, and maybe things that print rational numbers in some particular way. And we implemented those in terms of pairs.0:29:52These pairs, cons, car, and cdr that are built into Lisp. But the important thing is that between these and these,0:30:05we set up an abstraction barrier. We set up a layer of abstraction.0:30:17And what was that layer of abstraction? That layer of abstraction was precisely the constructor and the selectors. This layer was make-RAT, and numer, and denom.0:30:38This methodology, another way to say what it's doing, is that we are separating the way something is used,0:30:53separating the use of data objects, from the representation of data objects.0:31:07So up here, we have the way that rational numbers are used, do arithmetic on them. Down here, we have the way that they're represented, and they're separated by this boundary.0:31:17The boundary is the constructors and selectors. And this methodology has a name. This is called data abstraction.0:31:35Data abstraction is sort of the programming methodology of setting up data objects by postulating constructors and selectors to isolate use from representation.0:31:47Well, so why? I mean, after all, we didn't have to do it this way. It's perfectly possible to do rational number addition without having any compound data objects, and here on the slide0:31:58is one example. We certainly could have defined +RAT, which takes in things x and y, and we'll say, well what are these rational numbers really?0:32:10So really, they're just pairs, and the numerator's the car and the denominator's the cdr. So what we'll do is we'll take the car of x times the cdr of y, multiply them.0:32:23Take the car of y times the cdr of x, multiply them. Add them. Take the cdr of x and the cdr of y, multiply them, and then constitute together.0:32:35Well, that sort of does the same thing. But this ignores the problem of reducing things to lowest terms, but let's not worry about that for a minute.0:32:47But so what? Why don't we do it that way? Right? After all, there are sort of fewer procedures to define, and it's a lot more straightforward.0:32:57It saves all this self-righteous BS about talking about data abstraction. We just sort of do it. I mean, who knows, maybe it's even marginally more efficient depending on whatever compiler were using for this.0:33:07What's the point of isolating the use from the representation? Well, it goes back to this notion of naming.0:33:17Remember, one of the most important principles in programming is the same as one of the most important principles in sorcery, all right? That's if you have the name of the spirit,0:33:27you get control over it. And if you go back and look at the slide, you see what's in there is we have this thing +RAT,0:33:36but nowhere in the system, if I have a +RAT and a -RAT and a *RAT, and things that look like that, nowhere in the system do I have a thing that I can point0:33:46at which is a rational number. I don't have, in a system like that,0:33:57the idea of rational number as a conceptual entity. Well, what's the advantage of that? What's the advantage of isolating the idea of rational numbers as a conceptual entity,0:34:08and really naming it with make-RAT, numerator, and denominator. Well, one advantage is you might want to have0:34:18alternative representations. See, before I showed you that one way George can solve this things not reduced to lowest terms problem, is when you build a rational number,0:34:29you divide up by the greatest common denominator. Another way to do that is shown over here. I can have an alternative representation0:34:38for rational numbers where when you make a rational number, you just cons them. However, when you go to select out the numerator, at that point you compute the gcd of the stuff0:34:50that's sitting in that pair, and divide out by the gcd. And similarly, when I get the denominator,0:35:01at that point when I go to get the denominator, I'll divide out by the gcd. So the difference would be in the old representation, when ans was constructed here, say what's 6 and 8,0:35:13in the first way, the 6 and 8 would have got reduced when they got stuck into that pair, numerator would select out 3. And in the way I just showed you, well, ans would get 6 and 8 put in,0:35:25and then at the point where I said numerator, some computation would get done to put out 3 instead of 6. So those are two different ways I might do it.0:35:34Which one's better? Well, it depends, right? If I'm making a system where I am mostly constructing rational numbers and hardly ever looking at them, then it's probably better0:35:44not to do that gcd computation when I construct them. If I'm doing a system where I look at things a lot more than I construct them, then it's probably better0:35:53to do the work when I construct them. So there's a choice there. But the real issue is that you might not be able to decide at the moment you're worrying0:36:05about these rational numbers. See, in general, as systems designers, you're forced with the necessity to make decisions0:36:15about how you're going to do things, and in general, the way you'd like to retain flexibility is to never make up your mind about anything until you're forced to do it.0:36:26The problem is, there's a very, very narrow line between deferring decisions and outright procrastination.0:36:38So you'd like to make progress, but also at the same time, never be bound by the consequences of your decisions.0:36:48Data abstraction's one way of doing this. What we did is we used wishful thinking. See, we gave a name to the decision.0:36:57We said, make-RAT, numerator, and denominator will stand for however it's going to be done, and however it's going to be done is George's problem. But really, what that was doing is giving a name0:37:06to the decision of how we're going to do it, and then continuing as if we made the decision. And then eventually, when we really wanted it to work,0:37:17coming back and facing what we really had to do. And in fact, we'll see a couple times from now that you may never have to choose any particular representation, ever, ever.0:37:27Anyway, that's a very powerful design technique. It's the key to the reason people use data abstraction. And we're going to see that idea again and again.0:37:37Let's stop for questions. AUDIENCE: What does this decision making through abstraction layers do to the axiom of do all your design0:37:47before any of your code? PROFESSOR: Well, that's someone's axiom, and I bet that's the axiom of someone who hasn't implemented very large computer systems very much.0:38:01I said that computer science is a lot like magic, and it's sort of good that it's like magic. There's a bad part of computer science that's a lot like religion. And in general, I think people who0:38:12really believe that you design everything before you implement it basically are people who haven't designed very many things.0:38:21The real power is that you can pretend that you've made the decision and then later on figure out which one is right, which decision you ought to have made.0:38:30And when you can do that, you have the best of both worlds. AUDIENCE: Can you explain the difference between let and define?0:38:40PROFESSOR: Oh, OK. Let is a way to establish local names.0:38:55Let me give you sort of the half answer. And I'll say, later on we can talk about the whole very complicated thing. But the big difference for now is that, see,0:39:05when you're typing at Lisp, you're typing in this environment where you're making definitions. And when you say define a to be 5, if I say define a to be 5,0:39:20then from then on the thing will remember that a is 5. Let is a way to set up a local context where0:39:29there's a definition. So if I type something like, saying let a-- no, I shouldn't say a-- if I said let z0:39:43be 10, and within that context, tell me what the sum of z and z0:39:53is. So if I typed in this expression to Lisp, and then this would put out 20.0:40:02However, then if I said what's z, the computer would say that's an unbound variable. So let is a way of setting up a context where0:40:13you can make definitions. But those definitions are local to this context. And of course, if I'd said a in here, I'd still get 20.0:40:27But this a would not interfere at all with this one. So if I type this, and then type this, and then say what's a?0:40:36a will still be 5. So there's some other subtle differences between let and define, but that's the most important one.0:41:20All right, well, we've looked at implementing this little system for doing arithmetic on rational numbers as an example of this methodology of data abstraction.0:41:31And that's a way of controlling complexity in large systems. But, see, like procedure definition, and like all the ways we're going0:41:40to talk about for controlling complexity, the real power of these things show up not when you sort of do these things in themselves, like it's not such a great thing0:41:49that we've done rational number arithmetic, it's that you can use these as building blocks for making more complicated things.0:42:00So it's no wonderful idea that you can just put two numbers together to form a pair. If that's all you ever wanted to do, there are tons of ways that you can do that. The real issue is can you do that in such a way0:42:11so that the things that you build become building blocks for doing something even more complex? So whenever someone shows you a method for controlling complexity, you should say, yeah, that's great,0:42:20but what can I build with it? So for example, let me just run through another thing that's0:42:30a lot like the rational number one. Suppose we would like to represent points in the plane. You sort of say, well, there's a point, and we're going to call that point p.0:42:40And that point might have coordinates, like this might be the point 1 comma 2.0:42:50The x-coordinate might be 1, and it's y-coordinate might be 2. And we'll make a little system for manipulating points in the plane.0:43:00And again, we can do that-- here's a little example of that. It can represent vectors, the same as points in the plane,0:43:10and we'll say, yep, there's a constructor called make-vector, make-vector's going to take two coordinates,0:43:21and here we can implement them if we like as pairs, but the important thing is that there's a constructor. And then given some vector, p, we can find its x-coordinate,0:43:31or we can get its y-coordinate. So there's a constructor and selectors for points in the plane. Well, given points in the plane, we0:43:40might want to use them to build something. So for instance, we might want to talk about, we might have a point, p, and a point, q, and p might be the point 1, 2, and q might be the point 2, 3.0:43:54And we might want to talk about the line segment that starts at p and ends at q. And that might be the segment s.0:44:05So we might want to build points for vectors in terms of numbers, and segments in terms of vectors.0:44:16So we can represent line segments in exactly the same way. All right, so the line segment from p to q, we'll say there's a constructor, make-segment.0:44:27And make up names for the selectors, the starting point of the segment and the ending point of the segment. And again, we can implement a segment using cons as a pair of points, and car and cdr get out the two points0:44:38that we put together to get the segment. Well, now having done that, we can0:44:48have some operations on them. Like we could say, what's the midpoint of a line segment?0:44:57So here's the midpoint of a line segment, that's going to be the points whose coordinates are the averages of the coordinates of the endpoints.0:45:07OK, there's the midpoint. So to get the midpoint of a line segment, s, we'll just say grab the starting point to the segment,0:45:17grab the ending point of the segment, and now make a vector-- make a point whose coordinates are the average of the x-coordinate of the first point0:45:27and the x-coordinate of the second point, and whose y-coordinate is the average of the y-coordinates. So there's an implementation of midpoint.0:45:37And then similarly, we can build something like the length of the segment. The length of the segment is a thing0:45:46whose-- use Pythagoras's rule, the length of the segment is the square root of the d x squared plus d y squared.0:45:57We'll say to get the length of a line segment, we'll let dx be the difference of the x-coordinate of one0:46:06endpoint and the x-coordinate of the other endpoint, and we'll let dy be the difference of the y-coordinates.0:46:16And then we'll take the square root of the sum of the squares of dx and dy, that's what this says. All right, so there's an implementation of length.0:46:26And again, what we built is a layered system.0:46:35We built a system which has, well, say up here there's segments.0:46:47And then there's an abstraction barrier. The abstraction barrier separates the implementation0:46:56of segments from the implementation of vectors and points, and what that abstraction barrier is are the constructors and selectors. It's make-segment, and segment-start, and segment-end.0:47:18And then there are vectors. And vectors in turn are built on top of pairs and numbers. So I'll say pairs and numbers.0:47:29And that has its own abstraction barrier, which is make-vector, and x-coordinate, and y-coordinate.0:47:46So we have, again, a layered system. You're starting to see that there are layers here. I ought to mention, there is a very important thing0:47:57that I kind of took for granted. And it's sort of so natural, but on the other hand it's a very important thing.0:48:07Notice that in order to represent this segment s, I said this segment is a pair of points.0:48:16And a point is a pair of numbers. And if I were going to draw the box and pointers structure for that, I would say, oh, the segment0:48:25is, given those particular representations that I showed you, I'd say this segment s is a pair,0:48:34and the first thing in the pair is a vector, and the vector is a pair of numbers.0:48:45And that's this, that's p. And the other thing in the segment is q, which is itself a pair of numbers.0:49:00So I almost took it for granted when I said that cons allows you to put things together. But it's very easy to not appreciate0:49:12that, because notice, some of the things I can put together can themselves be pairs. And let me introduce a word that I'll talk about more next time,0:49:24it's one of my favorite words, called closure. And by closure I mean that the means of combination0:49:34in your system are such that when you put things together using them, like we make a pair, you can then put those together0:49:43with the same means of combination. So I can have not only a pair of numbers, but I can have a pair of pairs. So for instance, making arrays in a language like Fortran0:49:57is not a closed means of combination, because I can make an array of numbers, but I can't make an array of arrays. And one of the things that you should ask, one of your tests0:50:09of quality for a means of combination that someone shows you, is gee, are the things you make closed under that means of combination?0:50:18So pairs would not be nearly so interesting if all I could do was make a pair of numbers. I couldn't build very much structure at all. OK, well, we'll come back to that.0:50:28I just wanted to mention it now. You'll hear a lot about closure later on. You can also see the potential for losing control0:50:38of complexity as you have a layered system if you don't use data abstraction. Let's go back and look at this slide for length.0:50:48Length works and is a simple thing because I can say, when I want to get this value, I can say, oh, that is the x-coordinate of the first endpoint of the segment.0:51:02And each of these things, each of these selectors, x-coordinate and endpoint, stand for a decision choice whose details I don't have to look at.0:51:12So I could perfectly well, again, just like rational numbers I did before, I could say, oh well, gee, a segment really is a pair of pairs.0:51:21And the x-coordinate of the first endpoint or the segment really is the-- well, what is it? It's the car of the car of the segment.0:51:33So I could perfectly well go and redefine length. I could say, define the length of some segment s.0:51:48And I could start off writing something like, well, we'll let dx be-- well, what's it have to be? It's got to be the difference of the two coordinates,0:51:58so that's the difference of, the first one is the car of the car of s, subtracted0:52:08from the first one, the car of the other half of it, the cdr of s.0:52:21All right, and then dy would be-- well, let's see, I'd get the y-coordinate, so it'd be the difference of the cdr of the car of s,0:52:33and the cdr of the cdr of s, sort of go on.0:52:44You can see that's much harder to read than the program I had before. But worse than that, suppose you'd gone and implemented length?0:52:56And then the next day, George comes to you and says, I'm sorry, I changed my mind. I want to write points with the x-coordinate first. So you come back you stare at this code0:53:06and say, oh gee, what was that? That was the car, so I have to change this to cdr, and this is cdr, and this now has to be car.0:53:20And this has to be car. And you sort of do that, and then the next day George comes back and says, sorry, the guys designing the display0:53:31would like lines to be painted in the opposite direction, so I have to write the endpoint first in the order. And then you come back and you stare at this code, and say,0:53:40gee, what was it talking about? Oh yeah, well I've got to change this one to cdr, and this one becomes car, this one comes car,0:53:49and this becomes cdr. And you go up and do that, and then the next day, George comes back and says, I'm sorry, what I really meant is that the segments always have to be painted from left to right on the screen.0:53:59And then you sort of, it's clear, you just go and punch George in the mouth at that point. But you see, as soon as we have a 10 layer system,0:54:09you see how that complexity immediately builds up to the point where even something like this gets out of control. So again, the way we've gotten out of that0:54:19is we've named that spirit. We built a system where there is a thing, which is the representation choice for how you're0:54:29going to talk about vectors. And choices about that representation are localized right there. They don't have their guts spilling over into things like how you compute the length0:54:38and how you compute the midpoint. And that's the real power of this system. OK, we're explicit about them, so0:54:48that we have control over them. All right, questions? AUDIENCE: What happens in the case where you don't want to be treating objects in terms of pairs? For instance, in three-dimensional space,0:55:00you'd have three coordinates. Or even in the case where you have n-dimensional space, what happens? PROFESSOR: Right, OK. Well, this is a preview of what I'll say tomorrow. But the point is, once you have two things,0:55:14you have as many things as you want. All right? Because if I want to make three things, I could start making things like a pair whose first thing is0:55:251, and whose second thing is another pair that, say, has 2 and 3 in it.0:55:34And so on, a hundred things. I can nest them out of pairs. I made a pretty arbitrary decision about how to do it, and you can immediately see there are lots of ways to do that. What we'll start talking about next time0:55:44are conventions for how to do things like that. But notice that what this really depends on is I can make pairs of pairs. If all I could do was make pairs of numbers, I'd be stuck.0:56:07OK. Let's break. [MUSIC PLAYING]0:56:55All right, well, we've just gone off and done a couple of simple examples of data abstraction. Now I want to do something more complicated.0:57:05We're going to talk about what it means. And this will be harder, because it's always much harder in computer programming to talk about what something means than to go off and do it.0:57:16But let's go back to almost the very beginning. Let's go back to the point where I said,0:57:25we just assumed that there were procedures, make-RAT, and numer, and denom.0:57:38Let's go back to where we had this, at the very beginning, constructors and selectors, and went off and defined the rational number arithmetic.0:57:47And remember, I said at that point we were sort of done, except for George. Well, what is it that we'd actually done at that point? What was it that was done?0:57:59Well, what I want to say is, what was done after we'd implemented the operations and terms of these, was that we had defined a rational number0:58:08representation in terms of abstract data.0:58:17What do I mean by abstract data? Well, the idea is that at that point, when we had our +RAT and our *RAT,0:58:28that any implementation of make-RAT, and numerator, and denominator that George supplied us with,0:58:38could be the basis for a rational number representation. Like, it wasn't our concern where you divided through to get the greatest common denominator, or any of that.0:58:48So the idea is that what we built is a rational arithmetic system that would sit on top of any representation.0:58:57What do I mean by any representation? I mean, certainly it can't be the case that all I mean is George can reach in a bag and pull out three arbitrary procedures and say, well, fine,0:59:09now that's the implementation. That can't be what I mean. What I've got to mean is that there's0:59:18some way of saying whether three procedures are going to be suitable as a basis for rational number representation. If we think about it, what suitable0:59:29might mean is if I have to assume something like this, I have to say that if x is the result of say,0:59:39doing make-RAT of n and d, then the numerator of x divided0:59:59by the denominator of x is equal to n over d.1:00:09See, what that is is that's George's contract. What we mean by writing a contract for rational numbers, if you think about it, this is the right thing.1:00:18And the two ones we showed do the right thing. See, if I'm taking out greatest common divisors, it doesn't matter whether I take them out or not,1:00:27or the place where I take them, because the idea is I'm going to divide through. But see, this is George's contract. So what we really say to George is your business is to go off and find us1:00:39three procedures, make-RAT, and numerator, and denominator, that fulfill this contract for any choice of n and d. And that's what we mean by we can use that1:00:50as the basis for a rational number representation. And other than that, it fulfills this contract. We don't care how he does it.1:00:59It's not our business. It's below the layer of abstraction. In fact, if we want to say, what is a rational number really?1:01:13See, what's it really, without having to talk about going below the layer of abstraction, what we're forced into saying is a rational number really is sort of this axiom, is three procedures,1:01:27make-RAT, numerator, and denominator, that satisfy this axiom. In some sense, abstractly, that's what a rational number is really.1:01:41That's sort of easy words to listen to, because what you have in your head, of course, is well, for all this thing about saying that's what a rational number is really,1:01:50you actually just saw that we built rational numbers. See, what we really did is we built rational numbers1:02:03on top of pairs. So for all I'm saying abstractly, we can say a rational number really is just this axiom.1:02:15You can listen to that comfortably, because you're saying, well, yeah, but really it's actually pairs, and I'm just annoying you by trying to be abstract.1:02:24Well, let me, as an antidote for that, let me do something that I think is really going to terrify you. I mean, it's really going to bring1:02:33you face to face with the sort of existential reality of this abstraction that we're talking about. And what I'm going to talk about is, what are pairs really?1:02:45See, what did I tell you about pairs? I tricked you, right? I said that Lisp has this primitive called cons that builds pairs. But what did I really tell you about?1:02:56If you go back and said, let's look on this slide, all I really told you about pairs is that there happens to be this property, these properties1:03:05of cons, car, and cdr. And all I really said about pairs is that there's a thing called cons, and a thing called car, and a thing called cdr.1:03:14And it is the case that if I build cons of x, y and take car of it, I get x. And if I build cons of x, y and get cdr of it, I get y.1:03:25And even though I lulled you into thinking that there's something in Lisp that does that, so you pretended you knew1:03:34what it was, in fact, I didn't tell you any more about pairs than this tells you about rational numbers. It's just some axiom for pairs.1:03:44Well, to drive that home, let me really scare you, and show you what we might build pairs in terms of.1:03:56And what you're going to see is that we can build rational numbers, and line segments, and vectors, and all of this stuff in terms of pairs, and we're going to see below here that pairs can1:04:06be built out of nothing at all. Pure abstraction. So let me show you on this slide an implementation1:04:17of cons, car, and cdr. And we'll look at it again in a second, but notice that their procedure definitions of cons, car,1:04:26and cdr, you don't see any data in there, what you see is a lambda. So cons here is going to return--1:04:38is a procedure that returns a procedure, just like AVERAGE DAMP. Cons of a and b returns a procedure of an argument1:04:49called pick, and it says, if pick is equal to 1, I'm going to return a, and if pick is equal to 2,1:04:58I'm going to return b, and that's what cons is going to be. Car of a thing x, car of a pair x,1:05:10is going to be x applied to 1. And notice that makes sense. You might not understand why or how I'm doing such a thing, but at least it makes sense, because the thing constructed1:05:19by cons is a procedure, and car applies that to 1. And similarly, cdr applies that thing to 2.1:05:29OK, now I claimed that this is a representation of cons, car, and cdr, and notice there's no data in it. All right, it's built out of air. It's just procedures.1:05:39There's no data objects at all in that representation. Well, what could that possibly mean?1:05:49Well, if you really believe this stuff, then you have to believe that in order to show that that's a representation for cons, car,1:05:59and cdr, all I have to do is show that it satisfies the axiom. See, all I should have to convince you of is, for example, that gee, that car of cons of 37 and 491:06:22is 37 for arbitrary values of 37 and 49. And cdr the same way.1:06:32See, if I really can demonstrate to you that that weird procedure definition, in terms of [? air ?], has the property that it satisfies this,1:06:41then you just have to grant me that that is a possible implementation of cons, car, and cdr, on which I can build everything else. Well, let's look at that.1:06:50And this will be practice in the substitution model.1:06:59How could we check this? We sort of know how to do that. It's just the same substitution model. Let's look. We start out, and we say, what's car of cons of 37 and 49?1:07:11What do we do? Cons is some procedure. Its value is cons was a procedure of a and b. The thing returned by cons is its procedure body1:07:23with 37 and 49 substituted for the parameters. It'll be 37 substituted for a and 49 substituted for b.1:07:32So this expression has the same meaning as this expression. Its car of, and the body of cons was this thing that started with lambda.1:07:43And it says, so if pick is equal to 1, where pick is this other argument, if pick is equal to 1, it's 37, that's where a was, and if pick is equal to 2, it's 49.1:07:55So that's the first step. I'm just going through mechanical substitution. And remember, at this point in the course, if you're confused about what things mean, go mechanically through the substitution model.1:08:05Well, what is this reduced to? Car said, take your argument, which in this case is this,1:08:15and apply it to 1. That was the definition of car. So if I look at car, if I do that, the answer is, well, it's that argument, this was the argument to car,1:08:25applied to 1. Well, what does that mean? I take 1, and I substitute it in the body here for this value of pick, which1:08:36is the name of the argument, what do I get? Well, I get the thing that says if 1 equals 1 it's 37, and if 1 equals 2 it's 49, so the answer's 37.1:08:46And similarly, if I'd taken cdr, that would apply it to 2, and I'd get 49. So you see, what I've demonstrated is that that completely weird implementation of cons, car,1:08:57and cdr, satisfies the axioms. So it's a perfectly valid way of building, in fact, all of the data objects we're going to see in Lisp. So they all, if you like, can be built1:09:07on sort of existential nothing. And as far as you know, that's how it works. You couldn't tell. If all you're ever going to do with pairs1:09:17is construct them with cons and look at them with car and cdr, you couldn't possibly tell how this thing works. Now, it might give you a sort of warm feeling inside if I say,1:09:26well, yeah, in fact, for various reasons there happens to be a primitive called cons, car, and cdr, and if it's too scary, if this kind of stuff is too scary, you don't have to look inside of it.1:09:36So that might make you feel better, but the point is, it really could work this way, and it wouldn't make any difference to the system at all.1:09:46So in some sense, we don't need data at all to build these data abstractions. We can do everything in terms of procedures. OK, well, why did I terrify you in this way?1:09:57First, I really want to reinforce this idea of abstraction, that you really can do these things abstractly.1:10:06Secondly, I want to introduce an idea we're going to see more and more of in this course, which is we're going to blur the line between what's data1:10:17and what's a procedure. See, in this funny implementation it turned out that cons of something happened to be represented in terms of a procedure,1:10:27even though we think of it as data. While here that's sort of a mathematical trick, but one of the things we'll see is1:10:36that a lot of the very important programming techniques that we're going to get to sort of depend very crucially on blurring this traditional line between what you consider a procedure1:10:47and what you consider data. We're going to see more and more of that, especially next time. OK, questions? AUDIENCE: If you asked the system1:10:56to print a, what would happen? PROFESSOR: The question is, what would happen if I asked the system to print a.1:11:05Given this representation, you already know the answer. The answer is compound procedure a, just like last time.1:11:21It'd say compound procedure. It might say a little bit more. It might say compound procedure lambda or something or other, depending on details of how I named it.1:11:31But it's a procedure. And the only reason for that is I haven't told the system anything special about how to print such things.1:11:40Now, it's in fact true that with the actual implementation of cons that to be built in the system, it would print something else. It would print, say, this is a pair.1:11:53AUDIENCE: When you define cons, and then you pass it into values, how does it know where to look for the cons, because you can use cons1:12:05over and over again? How does it know where to look to know which a and b it's supposed to pull back out? I don't know if I'm expressing that quite right.1:12:17Where is it stored? PROFESSOR: OK, the question is, I sort of have a cons with a 37 and a 49, and I might make another cons1:12:27with a 1 and a 2, and I might have one called a, and I might have one called b. And the question is, how does it know? And why don't they get confused? And that's a very good question.1:12:40See, you have to really believe that the procedures are objects. It's sort of like saying-- let's try another simpler example.1:12:49Suppose I ask for the square root of 3. So I asked for the square root of 5,1:12:58and then I ask for the square of 20. You're probably not the least bit1:13:07bothered that I can take square root and apply it to 5, and then I can take square root and apply it to 20. And there's sort of no issue, gee,1:13:16doesn't it get confused about whether it's working on 5 or 20? There's no issue about that because you're thinking of a procedure which goes off and does something.1:13:26Now, in some sense you're asking me the same question. But it's really bothering you, and it's bothering you for a really good reason. Because when I write that, you're saying gee, this is,1:13:36I know, sort of a procedure. But it's not a procedure that's just running. It's just sort of a procedure sitting there. And how can it be that sometimes this procedure has 37 and 49,1:13:46and there might be another one which has 5 and 6 in there, and why don't they get confused? So there's something very, very important that's bothering you.1:13:58And it's really crucial to what's going on. We're suddenly saying that procedures are not just the act of doing something.1:14:08Procedures are conceptual entities, objects, and if I built cons of 37 and 49, that's a particular procedure that sits there.1:14:18And it's different from cons of 3 and 4. That's another procedure that sits there. AUDIENCE: Both of them exist independently. PROFESSOR: And exists independently. AUDIENCE: And they both can be referenced by car and cdr.1:14:28PROFESSOR: And they both would be referenced by car and cdr. Just like I could increment this, and I could increment that.1:14:38They're objects. And that's sort of where we're going. See, the fact that you're asking the question shows that you're really starting to think about the implications of what's going on.1:14:47It's the difference between saying a procedure is just the act of doing something. And a procedure is a real object that has existence.1:14:56AUDIENCE: So when the procedure gets built, the actual values are now substituted for a and b-- PROFESSOR: That's right. AUDIENCE: And then that procedure exists as lambda, and pick is what's actually passed in.1:15:07PROFESSOR: Yes, when cons gets called, and the result of cons is a new procedure that's constructed, that new procedure has an argument that's called pick.1:15:17AUDIENCE: But it no longer has an a and b. The a and b are the actual values that are passed through. PROFESSOR: And it has-- right, according to the substitution model, what it now has is not those arbitrary names a and b,1:15:26it somehow has that 37 and 49 in there. But you're right, that's a hard thing to think about it, and it's different from the way you've1:15:35been thinking about procedures. AUDIENCE: And if I have again cons of 37 and 49, it's a different object? PROFESSOR: And if you make another cons of 37 and 49,1:15:51you're into a wonderful philosophical problem, which is going to be what the lecture about halfway through this course is about.1:16:00Which is, if I cons 37 and 49, and I do it again, is that the same thing, or is it a different thing? And how could you tell? And when could it possibly matter?1:16:10And that's sort of like saying, is that the same thing as this?1:16:21Or is this the same thing as that? It's the same kind of question. And that's a very, very deep question. And I can't answer in less than an hour.1:16:30But we will.

`0:00:00`Lecture 3A | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING]0:00:21PROFESSOR: Well, last time we talked about compound data, and there were two main points to that business. First of all, there was a methodology of data0:00:31abstraction, and the point of that was that you could isolate the way that data objects are used from the way0:00:40that they're represented: this idea that there's this guy, George, and you go out make a contract with him; and it's his business to represent the data objects; and at the moment you are using them, you don't think0:00:49about George's problem. And then secondly, there was this particular way that Lisp has of gluing together things to form objects called pairs,0:01:00and that's done with cons, car and cdr. And the way that cons, car and cdr are implemented is basically irrelevant. That's sort of George's problem of how0:01:09to build those things. It could be done as primitives. It could be done using procedures in some weird way, but we're not going to worry about that. And as an example, we looked at rational number arithmetic.0:01:20We looked at vectors, and here's just a review of vectors. Here's an operation that takes the sum of of two vectors, so we want to add this vector, v1, and this vector, v2, and0:01:32we get the sum. And the sum is the vector whose coordinates are the sum of the coordinates of the pieces you're adding.0:01:41So I can say, to define make-vect, right, to add two vectors I make a vector, whose x coordinate is the sum of the0:01:50two x coordinates, and whose y coordinate is the sum of the two y coordinates. And then similarly, we could have an operation that scales0:02:03vectors, so here's a procedure scale that multiplies a vector, v, by some number, s.0:02:13So here's v, v goes from there to there and I scale v, and I get a vector in the same direction that's longer. And again, to scale a vector, I multiply the successive0:02:23coordinates. So I make a vector, whose x coordinate is the scale factor times the x coordinate and whose y coordinate is the scale factor times the y coordinate.0:02:34So those are two operations that are implemented using the representation of vectors. And the representation of vectors, for instance, is something that we can build in terms of pairs.0:02:45So George has gone out and implemented for us make-vector and x coordinate and y coordinate, and this could be done, for instance, using cons, car and cdr; and notice0:03:04here, I wrote this in a slightly different way. The procedures we've seen before, I've said something like say, make-vector of x and y: cons of x and y.0:03:16And here I just wrote make-vector cons. And that means something slightly different. Previously we'd say, define make-vector to be a procedure that takes two arguments, x and y, and does0:03:26cons of x and y. And here I am saying define make-vector to be the thing that cons is, and that's almost the same as the other0:03:38way we've been writing things. And I just want you to get used to the idea that procedures can be objects, and that you can name them.0:03:48OK, well there's vector representation, and again, if that was all there was to it, this would all be pretty boring. And the point is, remember, that you can use cons to glue0:04:00together not just numbers to form pairs, but to glue together arbitrary things. So for instance, if we'd like to represent a line segment,0:04:11say the line segment that goes from a certain vector: say, the segment from the vector 2,3 to the point represented0:04:27by the vector 5,1. If we want to represent that line segment, then we can build that as a pair of pairs.0:04:41So again, we can represent line segments. We can make a constructor that makes a segment using cons, selects out the start of a segment, selects out the end0:04:50point of the segment; and then if we actually look at that, if we peel away the abstraction layers, and say0:05:00what's that really is a pair of pairs, we'd say well that's a pair. Here's the segment.0:05:10It's car, right, it's car pointer is a pair, and it's cdr is also a pair, and then what the car is-- here's the0:05:21car, that itself is a pair of 2 and 3. And similarly the cdr is a pair of 2 and 3. And let me remind you again, that a lot of people have some0:05:30idea that if I'd taken this arrow and somehow written it to point down, that would mean something else. That's irrelevant. It's only how these are connected and not whether this0:05:40arrow happens to go vertically or horizontally. And again just to remind you, there was0:05:49this notion of closure. See, closure was the thing that allowed us to start0:06:02building up complexity, that didn't trap us in pairs. Particularly what I mean is the things that we make,0:06:12having combined things using cons to get a pair, those things themselves can be combined using cons to make0:06:21more complicated things. Or as a mathematician might say, the set of data objects in List is closed under the operation of forming pairs.0:06:34That's the thing that allows us to build complexity. And that seems obvious, but remember, a lot of the things in the computer languages that people use are not closed. So for example, forming arrays in basic and Fortran is not a0:06:47closed operation, because you can make an array of numbers or character strings or something, but you can't make an array of arrays. And when you look at means of combination, you should be0:06:59should be asking yourself whether things are closed under that means of combination. Well in any case, because we can form pairs of pairs, we0:07:09can start using pairs to glue things together in all sorts of different ways. So for instance if I'd like to glue together the four things, 1, 2, 3 and 4, there are a lot of ways I can do it.0:07:20I could, for example, like we did with that line segment, I could make a pair that had a 1 and a 2 and0:07:32a 3 and a 4, right? Or if I liked, I could do something like this. I could make a pair, whose first thing is a pair, whose0:07:46car is 1, and his cdr is itself a pair that has the 2 and the 3, and then I could put the 4 up here.0:07:56So you see, there are a lot of different ways that I can start using pairs to glue things together, and so it'll be a good idea to establish some kind of conventions,0:08:07right, that allow us to deal with this thing in some conventional way, so we're not constantly making an ad hoc choice.0:08:16And List has a particular convention for representing a sequence of things as, essentially, a chain of pairs,0:08:26and that's called a List. And what a List is is essentially just a convention0:08:39for representing a sequence. I would represent the sequence 1, 2, 3 and 4 by a sequence of pairs.0:08:48I'd put 1 here and then the cdr of this would point to another pair whose car was the next thing in the sequence,0:09:01and the cdr would point to another pair whose car was the next thing in the sequence-- so there's 3-- and then another one. So for each item in the sequence, I'll get a pair.0:09:15And now there are no more, so I put a special marker that means there's nothing more in the List. OK, so that's a0:09:28conventional way to glue things together if you want to represent a sequence, right. And what it is is a bunch of pairs, the successive cars of0:09:42each pair are the items that you want to glue together, and the cdr pointer points to the next pair. Now if I actually wanted to construct that, what I would0:09:52type into List is this: I'd actually construct that as saying, well this thing is the cons of 1 onto the cons of 20:10:07onto the cons of 3 onto the cons of 4 onto, well, this thing nil. And what nil is is a name for the end of List marker.0:10:21It's a special name, which means this is the end of the List. OK, so that's how I would actually construct that.0:10:37Of course, it's a terrible drag to constantly have to write something like the cons of 1 onto the cons of 2 onto the cons of 3, whenever you want to make this thing. So List has an operation that's called List, and List0:10:54is just an abbreviation for this nest of conses. So I could say, I could construct that by saying that is the List of 1, 2, 3 and 4.0:11:08And all this is is another way, a piece of syntactic sugar, a more convenient way for writing that chain of conses-- cons of cons of cons of cons of cons of cons onto nil.0:11:18So for example, I could build this thing and say, I'll define 1-TO-4 to be the List of 1, 2, 3 and 4.0:11:48OK, well notice some of the consequences of using this convention. First of all if I have this List, this 1, 2, 3 and 4, the0:11:57car of the whole thing is the first element in the List, right. How do I get 2? Well, 2 would be the car of the cdr of this thing 1-TO-4,0:12:21it would be 2, right. I take this thing, I take the cdr of it, which is this much,0:12:30and the car of that is 2, and then similarly, the car of the cdr of the cdr of 1-TO-4, cdr, cdr, car--0:12:48would give me 3, and so on. Let's take a look at that on the computer screen for a second.0:12:57I could come up to List, and I could type define 1-TO-4 to be0:13:07the List of 1, 2, 3 and 4, right. And I'll tell that to List, and it says, fine, that's the0:13:19definition of 1-TO-4. And I could say, for instance, what's the car of the cdr of0:13:28the cdr of 1-TO-4, close paren, close paren.0:13:38Right, so the car of the cdr of the cdr would be 3. Right, or I could say, what's 1-TO-4 itself.0:13:51And you see what List typed out is 1, 2, 3, 4, enclosed in parentheses, and this notation, typing the elements of the List enclosed in parentheses is List's0:14:02conventional way for printing back this chain of pairs that represents a sequence. So for example, if I said, what's the cdr of 1-TO-4,0:14:19that's going to be the rest of the List. That's the thing pointed to by the first pair, which is, again, a sequence that starts off with 2.0:14:28Or for example, I go off and say, what's the cdr of the cdr of 1-TO-4; then that's 3,4.0:14:44Or if I say, what's the cdr of the cdr of the cdr of the cdr0:14:58of 1-TO-4, and I'm down there looking at the end of List0:15:07pointer itself, and List prints that as just open paren, close paren. You can think of that as a List with nothing in there. All right, see at the end what I did there was I looked at0:15:16the cdr of the cdr of the cdr of 1-TO-4, and I'm just left with the end of List pointer itself. And that gets printed as open close.0:15:34All right, well that's a conventional way you can see for working down a List by taking successive cdrs of things.0:15:43It's called cdring down a List. And of course it's pretty much of a drag to type all those cdrs by hand. You don't do that. You write procedures that do that.0:15:53And in fact one very, very common thing to do in List is to write procedures that, sort of, take a List of things and0:16:02do something to every element in List, and return you a List of the results. So what I mean for example, is I might write a procedure called Scale-List, and Scale-List I might say I want0:16:18to scale by 10 the entire List 1-TO-4, and that would return0:16:27for me the List 10, 20, 30, 40.0:16:36[UNINTELLIGIBLE PHRASE] Right, it returns List, and well you can see that there's0:16:46going to be some kind of recursive strategy for doing it. How would I actually write that procedure? The idea would be, well if you'd like to build up a List0:16:56where you've multiplied every element by 10, what you'd say is well you imagine that you'd taken the rest of the List--0:17:06right, the thing represented by the cdr of the List, and suppose I'd already built a List where each of these was multiplied by 10--0:17:16that would be Scale-List of the cdr of the List. And then all I have to do is multiply the car of the List by 10, and0:17:25then cons that onto the rest, and I'll get a List. Right and then similarly, to have scaled the cdr of the List, I'll scale the cdr of that and cons onto that 20:17:35multiplied by 10. And finally when I get all the way down to the end, and I only have this end of List pointer. All right, this thing whose name is nil-- well I just returned an end of List pointer.0:17:45So there's a recursive strategy for doing that. Here's the actual procedure that does that. Right, this is an example of the general strategy of cdr-ing down a List and so called cons-ing0:17:56up the result, right. So to Scale a List l by some scale factor s, what do I do?0:18:06Well there's a test, and List has the predicate called null. Null means is this thing the end of List pointer, or another way to think of that is are there any elements in0:18:16this List, right. But in any case if I'm looking at the end of List pointer, then I just return the end of List pointer. I just return nil, otherwise I cons together the result of0:18:32doing what I'm going to do to the first element in the List, namely taking the car of l and multiplying it by s, and I cons that onto recursively scaling the rest of the List.0:18:50OK, so again, the general idea is that you recursively do something to the rest of the List, to the cdr of the List, and then you cons that onto actually doing something to0:18:59the first element of the List. When you get down to the end here, you return the end of List pointer, and that's a general pattern for doing something to a List. Well of0:19:16course you should know by now that the very fact that there's a general pattern there means I shouldn't be writing this procedure at all. What I should do is write a procedure that's the general0:19:25pattern itself that says, do something to everything in the List and define this thing in terms of that. Right, make some higher order procedure, and here's the higher order procedure that does that.0:19:34It's called MAP, and what MAP does is it takes a List, takes a List l, and it takes a procedure p, and it returns0:19:45the List of the elements gotten by applying p to each successive element in the List. All right, so p to v1, p to v2, p of en.0:19:56Right, so I think of taking this List and transforming it by applying p to each element. And you see all this procedure is is exactly the general0:20:06strategy I said. Instead of multiply by 10, it's do the procedure. If the List is empty, return nil. Otherwise, apply p to the first element of the List.0:20:17Right, apply p to car of l, and cons that onto the result of applying p to everything in the cdr of the List, so that's0:20:26a general procedure called MAP. And I could define Scale-List in terms of MAP.0:20:39Let me show you that first. But I could say Scale-List is another way to define it is just MAP along the List by the procedure, which takes an item0:20:53and multiplies it by s. Right, so this is really the way I should think about scaling the List, build that actual recursion into the0:21:04general strategy, not to every particular procedure I write. And of course, one of the values of doing this is that you start to see commonality. Right, again you're capturing general patterns of usage.0:21:16For instance, if I said MAP, the square procedure, down this List 1-TO-4, then I'd end up with 1, 4, 9 and 16.0:21:32Right, or if I said MAP down this List, lambda of x plus0:21:42x10, if I MAP that down 1-TO-4, then I'd get the List0:21:51where everything had 10 added to it: right, so I'd get 11, 12, 13, 14.0:22:00And you can see that's going to be a very, very common idea: doing something to every element in the List. One thing you might think about is writing MAP in an0:22:11iterative style. The one I wrote happens to evolve a recursive process, but we could just as easily have made one that evolves an iterative process. But see the interesting thing about it is that once you0:22:21start thinking in terms of MAP-- see, once you say scale is just MAP, you stop thinking about whether it's iterative or recursive, and you just say, well there's this aggregate, there's this List,0:22:32and what I do is transform every item in the List, and I stop thinking about the particular control structure in order. That's a very, very important idea, and it, I guess it0:22:45really comes out of APL. It's, sort of, the really important idea in APL that you stop thinking about control structures, and you start thinking about operations on aggregates, and then about0:22:55halfway through this course, we'll see when we talk about something called stream processing, how that view of the world really comes into its glory. This is just us a, sort of, cute idea.0:23:05But we'll see much more applications of that later on. Well let me mention that there's something that's very similar to MAP that's also a useful idea, and that's--0:23:17see, MAP says I take a List, I apply something to each item, and I return a List of the successive values.0:23:26There's another thing I might do, which is very, very similar, which is take a List and some action you want to do and then do it to each item in the List in sequence.0:23:36Don't make a List of the values, just do this particular action, and that's something that's very much like MAP.0:23:45It's called for-each, and for-each takes a procedure and a List, and what it's going to do is do something to every item in the List. So basically what it does: it says if the0:23:56List is not empty, right, if the List is not null, then what I do is, I apply my procedure to the first item in0:24:05the List, and then I do this thing to the rest of the List. I apply for-each to the cdr of the List.0:24:15All right, so I do it to the first of the List, do it to the rest of the List, and of course, when I call it recursively, that's going to do it to the rest of the rest of the List and so on.0:24:24And finally, when I get done, I have to just do something to say I'm done, so we'll return the message "done." So that's very, very similar to MAP. It's mostly different in what it returns.0:24:35And so for example, if I had some procedure that printed things on the screen, if I wanted to print everything in the List, I could say for-each, print this List. Or0:24:47if I had a List of figures, and I wanted to draw them on the display, I could say for-each, display on the screen this figure.0:24:57Let's take questions. AUDIENCE: Does it create a new copy with something done to it, unless you explicitly tell it to do that?0:25:06Is that correct? PROFESSOR: Right. Yeah, that's right. For-each does not create a List. It just sort of does something.0:25:15So if you have a bunch of things you want to do and you're not worried about values like printing something, or drawing something on the screen, or ringing the bell on the terminal, or for something,0:25:24you can say for-each, you know, do this for-each of those things in the List, whereas MAP actually builds you this new collection of values that you might want to use. It's just a subtle difference between them.0:25:34AUDIENCE: Could you write MAP using for-each, so that you did some sort of cons or something to build the List back up? PROFESSOR: Well, sort of. I mean, I probably could.0:25:44I can't think of how to do it right offhand, but yeah, I could arrange something. AUDIENCE: The vital difference between MAP and for-each is one is recursive and the other is not in the sense you0:25:57defined early yesterday, I believe. PROFESSOR: Yeah, about MAP and for-each and recursion. Yeah, that's a good point.0:26:09For the MAP procedure I wrote, that happens to be a recursive process. And the reason for that is that when you've done this thing to the rest of the List, you're waiting for that value0:26:19so that you can stick it on to the beginning of the List, whereas for-each doesn't really have any values to wait for. So that turns out to be an iterative process. That's not fundamental. I could have defined MAP so that it's evolved by an0:26:30iterative process. I just didn't happen to. AUDIENCE: If you were to cons for each with a List that had embedded Lists, I imagine it would work, right?0:26:43It would give you the internal elements of each of those internal Lists? PROFESSOR: OK, the question is if I [UNINTELLIGIBLE] for each or MAP, for that matter, with a List that had0:26:54Lists in it-- although we haven't really looked at that yet-- would that work. The answer is yes in the sense I mean work and no in the0:27:04sense that you mean work, because all that-- see if I give you a List, where hanging off here is, you0:27:16know, is something that's not a number, maybe another List or you know, another cons or something, for-each just says do something to each item in this List. It goes down0:27:25successively looking at the cdrs. AUDIENCE: OK. PROFESSOR: And as far as it's concerned, the first item in this List is whatever is hanging off here. AUDIENCE: Mhm. PROFESSOR: That might or might not be the right thing. AUDIENCE: So it wouldn't go down into the--0:27:35PROFESSOR: Absolutely not. I could certainly write something else. There's another, what you're looking for is a common pattern of usage called tree recursion, where you take a List, and you actually go all the way down to the what's0:27:46called the leaves of the tree. And you could write such a thing, but that's not for-each and it's not MAP. Remember, these things are really being very simple minded.0:27:55OK, no more questions? All right, let's break. [MUSIC PLAYING]0:28:42PROFESSOR: What I'd like to do now is spend the rest of this time talking about one example, and this example, I think, pretty much summarizes everything that we've done up0:28:53until now: all right, and that's List structure and issues of abstraction, and representation and capturing0:29:02commonality with higher order procedures, and also is going to introduce something we haven't really talked about a lot yet-- what I said is the major third theme in this0:29:13course: meta-linguistic abstraction, which is the idea that one of the ways of tackling complexity in engineering design is to build a suitable powerful language.0:29:27You might recall what I said was pretty much the very most important thing that we're going to tell you in this course is that when you think about a language, you think0:29:39about it in terms of what are the primitives; what are the means of combination--0:29:49right, what are the things that allow you to build bigger things; and then what are the means of abstraction.0:30:01How do you take those bigger things that you've built and put black boxes around them and use them as elements in making something even more complicated?0:30:12Now the particular language I'm going to talk about is an example that was made up by a friend of ours0:30:21called Peter Henderson. Peter Henderson is at the University of Stirling in Scotland.0:30:32And what this language is about is making figures that sort of look like this.0:30:42This is this is a woodcut by Escher called "Square Limit." You, sort of, see it has this complicated, kind of,0:30:52recursive, sort of, recursive kind of figure, where there's this fish pattern in the middle and things sort of0:31:02bleed out smaller and smaller in self similar ways. Anyway, Peter Henderson's language was for describing0:31:11figures that look like that and designing new ones that look like that and drawing them on a display screen.0:31:20There's another theme that we'll see illustrated by this example, and that's the issue of what Gerry and I have0:31:31already mentioned a lot: that there's no real difference, in some sense, between procedures and data. And anyway I hope by the end of this morning, if you're not0:31:41already, you will be completely confused about what the difference between procedures and data are, if you're not confused about that already.0:31:51Well in any case, let's start describing Peter's language. I should start by telling you what the primitives are. This language is very simple because there's only one primitive.0:32:03A primitive is not quite what you think it is. There's only one primitive called a picture, and a picture is not quite what you think it is.0:32:12Here's an example. This is a picture of George. The idea is that a picture in this language is going to be0:32:23something that draws a figure scaled to fit a rectangle that you specify.0:32:33So here you see in [? Saint ?] [? Lawrence's ?] outline of a rectangle, that's not really part of the picture, but the picture--0:32:43you'll give it a rectangle, and it will draw this figure scaled to fit the rectangle. So for example, there's George, and here, this is also George.0:32:52It's the same picture, right, just scaled to fit a different rectangle. Here's George as a fat kid.0:33:02That's the same George. It's all the same figure. All of these three things are the same picture in this language. I'm just giving it different rectangles to scale itself in.0:33:16OK, those are the primitives. That is the primitive. Now let's start talking about the means of combination and the operations.0:33:25There is, for example, an operation called Rotate. And what Rotate does is, if I have a picture, say a picture0:33:35that draws an "A" in some rectangle that I give it, the Rotate of that-- say the Rotate by 90 degrees would, if I give it a0:33:47rectangle, draw the same image, but again, scaled to fit that rectangle.0:33:56So that's Rotate by 90 degrees. There's another operation called Flip that can flip something, either horizontally or vertically. All right, so those are, sort of, operations, or you can0:34:06think of those as means of combination of one element. I can put things together. There's a means of combination called Beside, and what Beside0:34:17does: it'll take two pictures, let's say A and B--0:34:29and by picture I mean something that's going to draw an image in a specified rectangle-- and what Beside will do--0:34:38I have to say, Beside of A and B, the side of two pictures and some number, s. And s will be a number between zero and one.0:34:50And Beside will draw a picture that looks like this. It will take the rectangle you give it and scale its base by s. Say s is 0.5.0:35:00And then over here it will draw-- it'll put the first picture, and over here it'll put the0:35:12second picture. Or for instance if I gave it a different value of s, if I said Beside with a 0.25, it would do the same thing,0:35:27except the A would be much skinnier. So it would draw something like that.0:35:38So there's a means of combination Beside, and similarly there's an Above, which does the same thing except it puts them vertically instead of horizontally.0:35:47Well let's look at that. All right, there's George and his kid brother, which is,0:35:58right, constructed by taking George and putting him Beside0:36:10the Above-- taking the empty picture, and there's a thing called the empty picture, which does the obvious thing-- putting the empty picture above a copy of George, and0:36:19then putting that whole thing Beside George.0:36:28Here's something called P which is, again, George Beside0:36:38Flipping George, I think, horizontally in this case, and then Rotating the whole result 180 degrees and putting them Beside one another with the basic rectangle divided at0:36:500.5, right, and I can call that P. And then I can take P,0:36:59and put it above the Flipped copy of itself, and I can call that Q.0:37:09Notice how rapidly that we've built up complexity, just in, you know, 15 seconds, you've gotten from George to that0:37:18thing Q. Why is that? How are how we able to do that so fast? The answer is the closure property.0:37:28See, it's the fact that when I take a picture and put it Beside another picture, that's then, again, a picture that I can go and Rotate and Flip or put Above something else.0:37:39Right, and when I take that element P, which is the Beside or the Flip or the Rotate of something, that's, again, a picture. Right, the world of pictures is closed under those means of0:37:49combination. So whenever I have something, I can turn right around and use that as an element in something else. So maybe better than List and segments, that just gives you0:37:59an image for how fast you can build up complexity, because operations are closed. OK, well before we go on with building more things, let's0:38:12talk about how this language is actually implemented. The basic element that sits under the table here is a0:38:23thing called a rectangle, and what a rectangle is going to be, it's a thing that specified by an origin that's0:38:36going to be some vector that says where the rectangle starts. And then there's going to be some other vector that I'm going to call the horizontal part of the rectangle, and0:38:49another picture called the vertical part of the rectangle.0:39:00And those three pieces are the elements: where the lower vertex is, how you get to the next vertex over here, and how you get to the vertex over there.0:39:09The three vectors specify a rectangle. Now to actually build rectangles, what I'll assume0:39:18is that we have a constructor called "make rectangle," or "make-rect," and selectors for horiz and vert and origin that0:39:37get out the pieces of that rectangle. And well, you know a lot of ways you can do this now. You can do it by using pairs in some way or other standard0:39:47List or not. But in any case, the implementation of these things, that's George's problem. It's just a data representation problem. So let's assume we have these rectangles to work with.0:39:58OK. Now the idea of this, remember what's got to happen. Somehow we have to worry about taking the figure and scaling0:40:10it to fit some rectangle that you give it, that's the basic thing you have to arrange, that these pictures can do.0:40:22How do we think about that? Well, one way to think about that is that any time I give you a rectangle, that defines, in some sense, a0:40:40transformation from the standard square into that rectangle. Let me say what I mean. By the standard square, I'll mean something, which is a0:40:49square whose coordinates are 0,0, and 1,0, and 0,1 and 1,1.0:41:01And there's some sort of the obvious scaling transformation, which maps this to that and this to that,0:41:10and sort of, stretches everything uniformly. So we take a line segment like this and end up mapping it to0:41:22a line segment like that, so some point xy goes to some0:41:31other point up there. And although it's not important, with a little vector algebra, you could write that formula. The thing that xy goes to, the point that xy goes to is0:41:43gotten by taking the origin of the rectangle and then adding that as a vector to-- well, take x, the x coordinate, which is something0:41:54between zero and one, multiply that by the horizontal vector of the rectangle; and take the y coordinate, which is also0:42:09something between zero and one and multiply that by the vertical vector of the rectangle. That's just a little linear algebra.0:42:19Anyway, that's the formula, which is the right obvious transformation that takes things into the unit square, into the interior of that rectangle.0:42:31OK well, let's actually look at that as a procedure. So what we want is the thing which tells us that particular transformation that a rectangle defines.0:42:44So here's the procedure. I'll call it coordinate-map. Coordinate-map is the thing that takes as its argument a rectangle and returns for you a procedure on points.0:43:00Right, so for each rectangle you get a way of transforming a point xy into that rectangle. And how do you get it? Well I just-- writing in List what I wrote there on the blackboard--0:43:10I add to the origin of the rectangle the result of adding--0:43:20I take the horizontal part of the rectangle; I scale that by the x coordinate of the point.0:43:29I take the vertical vector of the rectangle. I scale that by the y coordinate of the point, and then add all those three things up.0:43:40That's the procedure. That is the procedure that I'm going to apply to a point. And this whole thing is generated for each rectangle.0:43:53So any rectangle defines a coordinate MAP, which is a procedure on points. OK.0:44:06All right, so for example, George here, my original George, might have been something that I specified by segments in the unit square, and then for each rectangle I0:44:20give this thing, I'm going to draw those segments inside that rectangle. How actually do I do that?0:44:30Well I take each segment in my original reference George that was specified, and to each of the end points of those0:44:40segments, I applied the coordinate MAP of the particular rectangle I want to draw it in. So for example, this lower rectangle, this George as a fat kid rectangle, has its coordinate MAP.0:44:51And if I want to draw this image, what I do is for each segment here, say for this segment, I transformed that0:45:01point by the coordinate MAP, transform that point by the coordinate MAP. That will give me this point and that point and draw the segment between them.0:45:10Right, that's the idea. Right, and if I give it a different rectangle like this one, that's a different coordinate MAP, so I get a different image of those line segments.0:45:19Well how do we actually get a picture to start with? I can build a picture to start with out of a List of line segments initially. Here's a procedure that builds what I'll call a primitive0:45:31picture, meaning one I, sort of, got that didn't come out of Beside or Rotate or something. It starts with a List of line segments, and now0:45:43it does what I said. What's a picture have to be? First of all it's a procedure that's defined on rectangles.0:45:52What does it do? It says for each-- this is going to be a List of line segments-- for each segment, for each s, which is a segment in this0:46:02List of segments, well it draws a line. What line does it draw? It gets the start point of that segment, transforms that0:46:16by the coordinate MAP of the rectangle. That's the first new point it wants to do. Then it takes the endpoint of the segment, transforms that by the coordinate MAP of the rectangle, and then draws a0:46:27line between. Let's assume drawline is some primitive that's built into the system that actually draws a line on the display. All right, so it transforms the endpoints by the coordinate MAP of the rectangle, draws a line0:46:37between them, does that for each s in this List of segments.0:46:46And now remember again, a picture is a procedure that takes a rectangle as argument. So when you hand it a rectangle, this is what it does: draws those lines.0:46:57All right, so there's-- how would I actually use this thing? Let's make it a little bit more concrete. Right, I would say for instance, define R to be0:47:21make-rectangle of some stuff, and I'd have to specify some vectors here using make-vector.0:47:30And then I could say, define say, G to be make-picture, and0:47:45then some stuff. And what I'd have to specify here is a List of line segments, right, using make segment.0:47:55Make-segment might be made out of vectors, and vectors might be made out of points. And then if I actually wanted to see the image of G inside a rectangle, well a picture is a procedure that takes a0:48:10rectangle as argument. So if I then called G with an input of R, that would cause whatever image G is worrying about to be drawn inside the0:48:22rectangle R. Right, so that's how you'd use that. [MUSIC PLAYING]0:49:08PROFESSOR: Well why is it that I say this example is nice? You probably don't think it's nice. You probably think it's more weird than nice. Right, representing these pictures as procedures, which0:49:18do complicated things with rectangles. So why is it nice? The reason it's nice is that once you've implemented the0:49:29primitives in this way, the means of combination just fall out by implementing procedures. Let me show you what I mean. Suppose we want to implement Beside.0:49:41So I'd like to-- suppose I've got a picture. Let's call it P1. P1 is going to be-- and now remember what a picture really is.0:49:50It's a thing that if you can hand it some rectangle, it will cause an image to be drawn in whatever rectangle0:50:00you hand it. And suppose P2 two is some other picture, and you hand that a rectangle.0:50:09And whatever rectangle you hand it, it draws some picture. And now if I'd like to implement Beside of P1 and P20:50:25with a scale factor A, well what does that have to be? That's got to be picture. It's got to be a thing that you hand it a rectangle, and it draws something in that rectangle.0:50:34So if hand Beside this rectangle-- let's hand it a rectangle. Well what's it going to do? it's going to take this rectangle and split it into0:50:45two at a ratio of A and one minus A. And it will say, oh sure, now I've got two rectangles.0:51:02And now it goes off to P1 and says P1, well draw yourself in this rectangle, and goes off to P2, and says, P2, fine, draw yourself in this rectangle.0:51:13The only computation it has to do is figure out what these rectangles are. Remember a rectangle is specified by an origin and a horizontal vector and a vertical vector, so it's got0:51:24to figure out what these things are. So for this first rectangle, the origin turns out to be the origin of the original rectangle, and the vertical0:51:34vector is the same as the vertical vector of the original rectangle. The horizontal vector is the horizontal vector of the0:51:43original rectangle scaled by A. And that's the first rectangle. The second rectangle, the origin is the original origin0:51:55plus that horizontal vector scaled by A. The horizontal vector of the second rectangle is the rest of the horizontal0:52:05vector of the first one, which is 1 minus A times the original H, and the vertical vector is still v. But0:52:15basically it goes and constructs these two rectangles, and the important point is having constructed the rectangles, it says OK, p1, you draw yourself in there, and p2, you draw yourself in there, and that's0:52:25all Beside has to do. All right, let's look at that piece of code.0:52:34Beside of a picture and another picture with some0:52:45scaling ratio is first of all, since it's a picture, a procedure that's going to take a rectangle as argument.0:52:55What's it going to do? It says, p1 draw yourself in some rectangle and p2 draw yourself in some other rectangle. And now what are those rectangles?0:53:04Well here's the computation. It makes a rectangle, and this is the algebra I just did on the board: the origin, something; the horizontal vector, something; and the vertical vector, something.0:53:13And for p2, the rectangle it wants has some other origin and horizontal vector and vertical vector. But the important point is that all it's saying is, p1,0:53:23go do your thing in one rectangle, and p2, go do your thing in another rectangle. That's all the Beside has to do. OK, similarly Rotate--0:53:37see if I have this picture A, and I want to look at say rotating A by 90 degrees, what that should mean is, well take0:53:51this rectangle, which is origin and horizontal vector and vertical vector, and now pretend that it's really the0:54:01rectangle that looks like this, which has an origin and a horizontal vector up here, and a vertical vector there, and now draw yourself with respect to that rectangle.0:54:13Let me show you that as a procedure. All right, so we'll Rotate 90 of the picture, because again, a procedure for rectangle, which says, OK picture, draw0:54:24yourself in some rectangle; and then this algebra is the transformation on the rectangle. It's the one which makes it look like the rectangle is0:54:33sideways, the origin is someplace else and the vertical vector is someplace else, and the horizontal vector is someplace else, and vertical vector is someplace else. OK?0:54:43OK. OK, again notice, the crucial thing that's going on here is you're using the representation of pictures as0:54:57procedures to automatically get the closure property, because what happens is, Beside just has this thing p1. Beside doesn't care if that's a primitive picture or it's0:55:08line segments or if p1 is, itself, the result of doing Aboves or Besides or Rotates. All Beside has to know about, say, p1 is that if you hand p10:55:17a rectangle, it will cause something to be drawn. And above that level, Beside just doesn't-- it's none of its business how p1 accomplishes that drawing.0:55:27All right, so you're using the procedural representation to ensure this closure. OK. So implementing pictures as procedures makes these means0:55:40of combination, you know, both pretty simple and also, I think, elegant. But that's not the real punchline.0:55:49The real punchline comes when you look at the means of abstraction in this language. Because what have we done? We've implemented the means of combination themselves as0:56:02procedures. And what that means is that when we go to abstract in this language, everything that List supplies us for manipulating0:56:14procedures is automatically available to do things in this picture language. The technical term I want to say is not only is this0:56:25language implemented in List, obviously it is, but the language is nicely embedded in List. What I mean is by0:56:39embedding the language in this way, all the power of List is automatically available as an extension to whatever you want to do.0:56:49And what do I mean by that? Example: say, suppose I want to make a thing that takes four pictures A, B, C and D, and makes a configuration that0:57:06looks like this. Well you might call that, you know, four pictures or something, four-pict configuration.0:57:17How do I do that? Well I can obviously do that. I just write a procedure that takes B above D and A above C0:57:26and puts those things beside each other. So I automatically have List's ability to do procedure composition. And I didn't have to make that specifically in the picture language.0:57:35It's automatic from the fact that the means of combination are themselves procedures. Or suppose I wanted to do something a little bit more complicated.0:57:44I wanted to put in a parameter so that for each of these, I could independently specify a rotation by 90 degrees. That's just putting a parameter in the procedure.0:57:53It's automatically there. Right, it automatically comes from the embedding. Or even more, suppose I wanted to, you know, use recursion.0:58:04Let's look at a recursive means of combination on pictures. I could say define-- let's see if you can figure out what this one is-- suppose0:58:14I say define what it means to right-push a picture, right-push a picture and some integer N and some scale0:58:28factor A. I'll define this to say if N equals 0, then the0:58:40answer is the picture. Otherwise I'm going to put--0:58:49oops, name change: P. Otherwise, I'm going to take P0:58:59and put it beside the results of recursively right-pushing P0:59:09with N minus 1 and A and use a scale factor of A. OK, so if0:59:25N0 , it's P. Otherwise I put P with a scale factor of A-- I'm sorry I didn't align this right-- recursively beside the result of right-pushing P, N minus 10:59:37times with a scale factor of A. There's a recursive means of combination. What's that look like? Well, here's what it looks like.0:59:46There's George right-pushed against himself twice with a scale factor of 0.75.0:59:59OK. Where'd that come from? How did I get all this fancy recursion? And the answer is just automatic, absolutely automatic. Since these are procedures, the embedding says, well sure,1:00:08I can define recursive procedures. I didn't have to arrange that. And of course, we can do more complicated things of the same sort. I could make something that does an up-push.1:00:18Right, that sort of goes like this, by recursively putting something above. Or I could make something that, sort of, was this scheme. I might start out with a picture and then, sort of,1:00:33recursively both push it aside and above, and that might put something there. And then up here I put the same recursive thing, and I1:00:42might end up with something like this. Right, so there's a procedure that's a little bit more complicated than right-push but not much.1:00:53I just do an Above and a Beside, rather than just a Beside. Now if I take that and apply that with the idea of putting1:01:05four pictures together, which I can surely do; and I go and I apply that to Q, which we defined before, right, what I1:01:16end up with this is this thing, which is, sort of, the square limit of Q, done twice.1:01:27Right, and then we can compare that with Escher's "Square Limit." And you see, it's sort of the same idea. Escher's is, of course, much, much prettier.1:01:37If we go back and look at George, right, if we go look at George here-- see, I started with a fairly arbitrary design, this picture1:01:47of George and did things with it. Right, whereas if we go look at the Escher picture, right, the Escher picture is not an arbitrary design.1:01:56It's this very, very clever thing, so that when you take this fish body and Rotate it and shrink it down, it bleeds into the next one really nicely.1:02:07And of course with George, I didn't really do anything like that. So if we look at George, right, there's a little bit of1:02:16match up, but not very nice, and it's pretty arbitrary. One very nice project, by the way, would be to write a procedure that could take some basic figure like this George1:02:27thing and start moving the ends of the lines around, so you got a really nice one when you went and did that "Square Limit" process. That'd be a really nice thing to think about.1:02:38Well so, we can combine things. We can recursive procedures. We can do all kinds of things, and that's all automatic. Right, the important point, the difference between merely1:02:47implementing something in a language and embedding something in the language, so that you don't lose the original power of the language, and what List is great at, see List is a lousy language for doing any1:02:56particular problem. What it's good for is figuring out the right language that you want and embedding that in List. That's the real power of this approach to design.1:03:05Of course, we can go further. See, you saw the other thing that we can do in List is capture general methods of doing things as higher order1:03:16procedures. And you probably just from me drawing it got the idea that right-push and the analogous thing where you push something1:03:25up and up and up and up and this corner push thing are all generalizations of a common kind of idea.1:03:34So just to illustrate and give you practice in looking at a fairly convoluted use of higher order procedures, let me show you the general idea of pushing some means of1:03:45combination to recursively repeat it. So here's a good one to puzzle out. We'll define it what it means to push using a means of1:03:59combination. Comb is going to be something like the Beside or Above. Well what's that going to be. That's going to be a procedure, remember what1:04:10Beside actually was, right. It took a picture, took two pictures and a scale factor. Using that I produced something that took a level1:04:21number and a picture and a scale factor, that I called right-push. So this is going to be something that takes a picture, a level number and a scale factor, and1:04:32it's going to say-- I'm going to do some repeated operation. I'm going to repeatedly apply the procedure which takes a1:04:46picture and applies the means of combination to the picture and the original picture and the one I took in here and the1:04:58scale factor, and I do the thing which repeats this procedure N times, and I apply that whole thing to my1:05:15original picture. Repeated here, in case you haven't seen it, is another higher order procedure that takes a procedure and a number1:05:29and returns for you another procedure that applies this procedure N times. And I think some of you have already written repeated as an1:05:38exercise, but if you haven't, it's a very good exercise in thinking about higher order procedures. But in any case, the result of this repeated is what I apply to picture.1:05:49And having done that, that's going to capture-- that is the thing, the way I got from the idea of Beside to the idea of right-push So having done that, I could say1:06:00define right-push to be push of Beside.1:06:17Or if I say, define up-push to be push of Beside, I'd get the analogous thing or define corner-push to be push of some appropriate thing that did both the Beside and Above, or I could push anything.1:06:28Anyway this is, if you're having trouble with lambdas, this is an excellent exercise in figuring out what this means. OK, well there's a lot to learn from this example.1:06:42The main point I've been dwelling on is the notion of nicely embedding a language inside another language. Right, so that all the power of this language like List of1:06:54the surrounding language is still accessible to you and appears as a natural extension of the language that you built. That's one thing that this example shows very well.1:07:06OK. Another thing is, if you go back and think about that, what's procedures and what's data. You know, by the time we get up to here, my God,1:07:15what's going on. I mean, this is some procedure, and it takes a picture and an argument, and what's a picture. Well, a picture itself, as you remember, was a procedure, and that took a rectangle. And a rectangle is some abstraction.1:07:26And I hope now that by now you're completely lost as to the question of what in the system is procedure and what's data. You see, there isn't any difference.1:07:35There really isn't. And you might think of a picture sometimes as a procedure and sometimes as data, but that's just, sort of, you know, making you feel comfortable.1:07:44It's really both in some sense or neither in some sense. OK, there's a more general point about the structure of1:07:56the system as creating a language, viewing the engineering design process as one of creating language or1:08:08rather one of creating a sort of sequence of layers of language. You see, there's this methodology, or maybe I should1:08:18say mythology, that's, sort of, charitably called software, quote, engineering. All right, and what does it say, it's says well, you go1:08:27and you figure out your task, and you figure out exactly what you want to do. And once you figure out exactly what you want to do, you find out that it breaks out into three sub-tasks, and you go and you start working on-- and you work on this1:08:36sub-task, and you figure out exactly what that is. And you find out that that breaks down into three sub-tasks, and you specify them completely, and you go and you work on those two, and you work on this sub-one, and1:08:45you specify that exactly. And then finally when you're done, you come back way up here, and you work on your second sub-task, and specify that out and work it out. And then you end up with--1:08:55you end up at the end with this beautiful edifice. Right, you end up with a marvelous tree, where you've broken your task into sub-tasks and broken each of1:09:05these into sub-tasks and broken those into sub-tasks, right. And each of these nodes is exactly and precisely defined1:09:15to do the wonderful, beautiful task to make it fit into the whole edifice, right. That's this mythology. See only a computer scientist could possibly believe that you build a complex system like that, right.1:09:28Contrast that with this Henderson example. It didn't work like that. What happened was that there was a sequence1:09:37of layers of language. What happened? There was a layer of a thing that allowed us to build1:09:47primitive pictures. There's primitive pictures and that was a language.1:09:56I didn't say much about it. We talked about how to construct George, but that was a language where you talked about vectors and line segments and points and where they sat in the unit square.1:10:06And then on top of that, right, on top of that-- so this is the language of primitive pictures.1:10:17Right, talking about line segments in particular pictures in the unit square. On top of that was a whole language. There was a language of geometric combinators, a1:10:33language of geometric positions, which talks about things like Above and Beside and right-push and Rotate.1:10:48And those things, sort of, happened with reference to the things that are talked about in this language.1:10:58And then if we like, we saw that above that there was sort of a language of schemes of combination.1:11:21For example, push, which talked about repeatedly doing something over with a scale factor. And the things that were being discussed in that language1:11:31were, sort of, the things that happened down here. So what you have is, at each level, the objects that are1:11:41being talked about are the things that were erected at the previous level. What's the difference between this thing and this thing?1:11:53The answer is that over here in the tree, each node, and in fact, each decomposition down here, is being designed to do1:12:03a specific task, whereas in the other scheme, what you have is a full range of linguistic1:12:13power at each level. See what's happening there, at any level, it's not being set up to do a particular task.1:12:23It's being set up to talk about a whole range of things. The consequence of that for design is that something that's designed in that method is likely to be more robust,1:12:36where by robust, I mean that if you go and make some change in your description, it's more likely to be captured by a1:12:46corresponding change, in the way that the language is implemented at the next level up, right, because you've made1:12:55these levels full. So you're not talking about a particular thing like Beside. You've given yourself a whole vocabulary to express things of that sort, so if you go and change your specifications a1:13:06little bit, it's more likely that your methodology will able to adapt to capture that change, whereas a design like this is not going to be robust, because if I go and1:13:15change something that's in here, that might affect the entire way that I decomposed everything down, further down the tree. Right, so very big difference in outlook in decomposition,1:13:26levels of language rather than, sort of, a strict hierarchy. Not only that, but when you have levels of language you've given yourself a different vocabularies for talking about1:13:37the design at different levels. So if we go back and look at George one last time, if I wanted to change this picture George, see suddenly I have a1:13:46whole different ways of describing the change. Like for example, I may want to go to the basic primitive design and move the endpoint of some vector.1:13:57That's a change that I would discuss at the lowest level. I would say the endpoint is somewhere else. Or I might come up and say, well the next thing I wanted to do, this little replicated element, I might want to do by1:14:10something else. I might want to put a scale factor in that Beside. That's a change that I would discuss at the next level of design, the level of combinators.1:14:19Or I might want to say, I might want to change the basic way that I took this pattern and made some recursive decomposition, maybe not bleeding out toward the1:14:29corners or something else. That would be a change that I would discuss at the highest level. And because I've structured the system to be this way, I have all these vocabularies for talking about change in1:14:39different ways and a lot of flexibility to decide which one's appropriate. OK, well that's sort of a big point about the difference in1:14:48software methodology that comes out from List, and it all comes, again, out of the notion that really, the design process is not so much implementing programs as1:14:58implementing languages. And that's really the powerful of List. OK, thank you. Let's take a break.

`0:00:00`Lecture 3B | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING]0:00:19PROFESSOR: Well, Hal just told us how you build robust systems. The key idea was-- I'm sure that many of you don't really assimilate that0:00:30yet-- but the key idea is that in order to make a system that's robust, it has to be insensitive to small changes, that is, a small change in the problem should lead to only a0:00:39small change in the solution. There ought to be a continuity. The space of solutions ought to be continuous in this space of problems. The way he was explaining how to do that was instead of0:00:50solving a particular problem at every level of decomposition of the problem at the subproblems, where you solve the class of problems, which are a neighborhood of the particular problem that you're trying to solve.0:01:01The way you do that is by producing a language at that level of detail in which the solutions to that class of problems is representable in that language.0:01:11Therefore when you makes more changes to the problem you're trying to solve, you generally have to make only small local changes to the solution you've constructed, because at the0:01:20level of detail you're working, there's a language where you can express the various solutions to alternate problems of the same type.0:01:30Well that's the beginning of a very important idea, the most important perhaps idea that makes computer science more powerful than most of the other kinds of engineering0:01:40disciplines we know about. What we've seen so far is sort of how to use embedding of languages.0:01:49And, of course, the power of embedding languages partly comes from procedures like this one that I showed you yesterday. What you see here is the derivative program that we0:02:01described yesterday. It's a procedure that takes a procedure as an argument and returns a procedure as a value. And using such things is very nice.0:02:12You can make things like push combinators and all that sort of wonderful thing that you saw last time. However, now I'm going to really muddy the waters.0:02:21See this confuses the issue of what's the procedure and what is data, but not very badly. What we really want to do is confuse it very badly.0:02:31And the best way to do that is to get involved with the manipulation of the algebraic expressions that the procedures themselves are expressed in. So at this point, I want to talk about instead of things0:02:43like on this slide, the derivative procedure being a thing that manipulates a procedure-- this is a numerical method you see here. And what you're seeing is a representation of the0:02:56numerical approximation to the derivative. That's what's here. In fact what I'd like to talk about is instead things that look like this.0:03:06And what we have here are rules from a calculus book. These are rules for finding the derivatives of the0:03:15expressions that one might write in some algebraic language. It says things like a derivative of a constant is 0.0:03:24The derivative of the valuable with respect to which you are taking the derivative is 1. The derivative of a constant times the function is the constant times the derivative of the function,0:03:34and things like that. These are exact expressions. These are not numerical approximations.0:03:43Can we make programs? And, in fact, it's very easy to make programs that manipulate these expressions.0:03:56Well let's see. Let's look at these rules in some detail. You all have seen these rules in your elementary calculus class at one time or another.0:04:06And you know from calculus that it's easy to produce derivatives of arbitrary expressions. You also know from your elementary calculus that it's hard to produce integrals.0:04:17Yet integrals and derivatives are opposites of each other. They're inverse operations. And they have the same rules. What is special about these rules that makes it possible0:04:29for one to produce derivatives easily and integrals why it's so hard? Let's think about that very simply. Look at these rules.0:04:39Every one of these rules, when used in the direction for taking derivatives, which is in the direction of this arrow, the left side is matched against your0:04:48expression, and the right side is the thing which is the derivative of that expression. The arrow is going that way.0:04:58In each of these rules, the expressions on the right-hand side of the rule that are contained within derivatives are subexpressions, are proper subexpressions, of the0:05:08expression on the left-hand side. So here we see the derivative of the sum, with is the expression on the left-hand side is the sum of the0:05:17derivatives of the pieces. So the rule of moving to the right are reduction rules. The problem becomes easier.0:05:28I turn a big complicated problem it's lots of smaller problems and then combine the results, a perfect place for recursion to work. If I'm going in the other direction like this, if I'm0:05:42trying to produce integrals, well there are several problems you see here. First of all, if I try to integrate an expression like a sum, more than one rule matches. Here's one that matches.0:05:52Here's one that matches. I don't know which one to take. And they may be different. I may get to explore different things. Also, the expressions become larger in that direction.0:06:04And when the expressions become larger, then there's no guarantee that any particular path I choose will terminate, because we will only terminate by accidental cancellation.0:06:14So that's why integrals are complicated searches and hard to do. Right now I don't want to do anything as hard as that. Let's work on derivatives for a while.0:06:24Well, these roles are ones you know for the most part hopefully. So let's see if we can write a program which is these rules. And that should be very easy.0:06:34Just write the program. See, because while I showed you is that it's a reduction rule, it's something appropriate for a recursion.0:06:43And, of course, what we have for each of these rules is we have a case in some case analysis. So I'm just going to write this program down.0:06:53Now, of course, I'm going to be saying something you have to believe. Right? What you have to believe is I can represent these algebraic expressions, that I can grab their parts, that I can put0:07:03them together. We've invented list structures so that you can do that. But you don't want to worry about that now. Right now I'm going to write the program that encapsulates these rules independent of the representation of the0:07:14algebraic expressions. You have a derivative of an expression with0:07:27respect to a variable. This is a different thing than the derivative of the function. That's what we saw last time, that numerical approximation.0:07:39It's something you can't open up a function. It's just the answers. The derivative of an expression is the way it's written. And therefore it's a syntactic phenomenon.0:07:48And so a lot of what we're going to be doing today is worrying about syntax, syntax of expressions and things like that. Well, there's a case analysis.0:07:57Anytime we do anything complicated thereby a recursion, we presumably need a case analysis. It's the essential way to begin. And that's usually a conditional0:08:06of some large kind. Well, what are their possibilities? the first rule that you saw is this something a constant?0:08:16And what I'm asking is, is the expression a constant with respect to the variable given? If so, the result is 0, because the derivative0:08:28represents the rate of change of something. If, however, the expression that I'm taking the derivative0:08:38of is the variable I'm varying, then this is the same variable, the expression var, then the rate of change of the0:08:52expression with respect to the variable is 1. It's the same 1. Well now there are a couple of other possibilities.0:09:01It could, for example, be a sum. Well, I don't know how I'm going to express sums yet. Actually I do. But I haven't told you yet.0:09:10But is it a sum? I'm imagining that there's some way of telling. I'm doing a dispatch on the type of the expression here,0:09:20absolutely essential in building languages. Languages are made out of different expressions. And soon we're going to see that in our more powerful methods of building languages on languages.0:09:32Is an expression a sum? If it's a sum, well, we know the rule for derivative of the sum is the sum of the derivatives of the parts.0:09:42One of them is called the addend and the other is the augend. But I don't have enough space on the blackboard to such long names. So I'll call them A1 and A2. I want to make a sum.0:09:53Do you remember which is the sum for end or the menu end? Or was it the dividend and the divisor or something like that? Make sum of the derivative of the A1, I'll call it.0:10:08It's the addend of the expression with respect to the variable, and the derivative of the A2 of the expression,0:10:23because the two arguments, the addition with respect to the variable.0:10:32And another rule that we know is product rule, which is, if the expression is a product.0:10:43By the way, it's a good idea when you're defining things, when you're defining predicates, to give them a name that ends in a question mark. This question mark doesn't mean anything.0:10:53It's for us as an agreement. It's a conventional interface between humans so you can read my programs more easily. So I want you to, when you write programs, if you define0:11:02a predicate procedure, that's something that rings true of false, it should have a name which ends in question mark. The list doesn't care. I care.0:11:11I want to make a sum. Because the derivative of a product is the sum of the first times the derivative of the second plus the second times the derivative of the first. Make a sum of two0:11:26things, a product of, well, I'm going to say the M1 of the0:11:37expression, and the derivative of the M2 of the expression0:11:47with respect to the variable, and the product of the0:12:01derivative of M1, the multiplier of the expression,0:12:10with respect to the variable. It's the product of that and the multiplicand, M2, of the expression.0:12:21Make that product. Make the sum. Close that case. And, of course, I could add as many cases as I like here for a complete set of rules you might find in a calculus book.0:12:34So this is what it takes to encapsulate those rules. And you see, you have to realize there's a lot of0:12:43wishful thinking here. I haven't told you anything about how I'm going to make these representations. Now, once I've decided that this is my set of rules, I0:12:52think it's time to play with the representation. Let's attack that/ Well, first of all, I'm going to play a pun.0:13:01It's an important pun. It's a key to a sort of powerful idea. If I want to represent sums, and products, and differences,0:13:12and quotients, and things like that, why not use the same language as I'm writing my program in? I write my program in algebraic expressions that0:13:23look like the sum of the product on a and the product of x and x, and things like that.0:13:34And the product of b and x and c, whatever, make that a sum of the product. Right now I don't want to have procedures with unknown0:13:43numbers of arguments, a product of b and x and c. This is list structure.0:13:54And the reason why this is nice, is because any one of these objects has a property. I know where the car is. The car is the operator.0:14:04And the operands are the successive cdrs the successive cars of the cdrs of the list that this is. It makes it very convenient.0:14:14I have to parse it. It's been done for me. I'm using the embedding and Lisp to advantage. So, for example, let's start using list structure to write0:14:29down the representation that I'm implicitly assuming here. Well I have to define various things that are implied in this representation.0:14:38Like I have to find out how to do a constant, how you do same variable. Let's do those first. That's pretty easy enough. Now I'm going to be introducing lots of primitives0:14:47here, because these are the primitives that come with list structure. OK, you define a constant.0:15:02And what I mean by a constant, an expression that's constant with respect to a veritable, is that the expression is something simple.0:15:11I can't take it into pieces, and yet it isn't that variable. I can't break it up, and yet it isn't that variable. That does not mean that there may be other expressions that0:15:22are more complicated that are constants. It's just that I'm going to look at the primitive constants in this way. So what this is, is it says that's it's the and.0:15:34I can combine predicate expressions which return true or false with and. Something atomic, The expression is atomic, meaning0:15:45it cannot be broken into parts. It doesn't have a car and a cdr. It's not a list. It adds a special test built into the system.0:15:54And it's not identically equal to that variable.0:16:06I'm representing my variable by things that are symbols which cannot be broken into pieces, things like x, and y,0:16:16things like this. Whereas, of course, something like this can be broken up into pieces. And the same variable of an expression with respect to a0:16:40variable is, in fact, an atomic expression. I want to have an atomic0:16:50expression, which is identical.0:17:08I don't want to look inside this stuff anymore. These are primitive maybe. But it doesn't matter.0:17:18I'm using things that are given to me with a language. I'm not terribly interest in them Now how do we deal with sums? Ah, something very interesting will happen.0:17:29A sum is something which is not atomic and begins with the plus symbol. That's what it means. So here, I will define.0:17:45An question is a sum if and it's not atomic and it's head,0:18:08it's beginning, its car of the expression is the symbol plus.0:18:19Now you're about to see something you haven't seen before, this quotation. Why do I have that quotation there?0:18:29Say your name, AUDIENCE: Susanna. PROFESSOR: Louder. AUDIENCE: Susanna PROFESSOR: Say your name. AUDIENCE: Your name. PROFESSOR: Louder. AUDIENCE: Your name. PROFESSOR: OK.0:18:39What I'm showing you here is that the words of English are ambiguous. I was saying, say your name.0:18:52I was also possibly saying say, your name. But that cannot be distinguished in speech.0:19:04However, we do have a notation in writing, which is quotation for distinguishing these two possible meanings.0:19:14In particular, over here, in Lisp we have a notation for distinguishing these meetings. If I were to just write a plus here, a plus symbol, I would0:19:24be asking, is the first element of the expression, is the operator position of the expression, the addition operator?0:19:34I don't know. I would have to have written the addition operator there, which I can't write. However, this way I'm asking, is this the symbolic object0:19:45plus, which normally stands for the addition operator? That's what I want. That's the question I want to ask. Now before I go any further, I want to point out the0:19:55quotation is a very complex concept, and adding it to a language causes a great deal of troubles. Consider the next slide.0:20:06Here's a deduction which we should all agree with. We have, Alyssa is smart and Alyssa is George's mother.0:20:17This is an equality, is. From those two, we can deduce that George's mother is smart.0:20:27Because we can always substitute equals for equals in expressions. Or can we?0:20:36Here's a case where we have "Chicago" has seven letters. The quotation means that I'm discussing the word Chicago,0:20:45not what the word represents. Here I have that Chicago is the biggest city in Illinois.0:20:54As a consequence of this, I would like to deduce that the biggest city in Illinois has seven letters. But that's manifestly false.0:21:05Wow, it works. OK, so once we have things like that, our language gets much more complicated.0:21:14Because it's no longer true that things we tend to like to do with languages, like substituting equals for equals and getting right answers, are going to work without being very careful.0:21:24We can't substitute into what's called referentially opaque contexts, of which a quotation is the prototypical type of referentially opaque context.0:21:33If you know what that means, you can consult a philosopher. Presumably there is one in the room. In any case, let's continue now, now that we at least have0:21:42an operational understanding of a 2000-year-old issue that has to do with name, and mention, and all sorts of things like that.0:21:52I have to define what I mean, how to make a sum of two things, an a1 and a2.0:22:02And I'm going to do this very simply. It's a list of the symbol plus, and a1, and a2.0:22:13And I can determine the first element. Define a1 to be cadr. I've just0:22:34introduced another primitive. This is the car of the cdr of something. You might want to know why car and cdr are names of these0:22:43primitives, and why they've survived, even though they're much better ideas like left and right. We could have called them things like that. Well, first of all, the names come from the fact that in the0:22:54great past, when Lisp was invented, I suppose in '58 or something, it was on a 704 or something like that, which had a machine. It was a machine that had an address register and a0:23:04decrement register. And these were the contents of the address register and the decrement register. So it's an historical accident. Now why have these names survived? It's because Lisp programmers like to talk to each other0:23:14over the phone. And if you want to have a long sequence of cars and cdrs you might say, cdaddedr, which can be understood. But left of right or right of left is not so clear if you0:23:26get good at it. So that's why we have these words. All of them up to four deep are defined typically in a Lisp system.0:23:38A2 to be-- and, of course, you can see that if I looked at one of these expressions like the sum of 3 and 5, what that is is a0:23:54list containing the symbol plus, and a number 3,0:24:06and a number 5. Then the car is the symbol plus.0:24:16The car of the cdr. Well I take the cdr and then I take the car. And that's how I get to the 3. That's the first argument. And the car of the cdr of the cdr gets me to this one, the 5.0:24:28And similarly, of course, I can define what's going on with products. Let's do that very quickly.0:24:48Is the expression a product? Yes if and if it's true, that's it's not atomic and0:25:01it's EQ quote, the asterisk symbol, which is the operator0:25:13for multiplication. Make product of an M1 and an M2 to be list, quote, the0:25:35asterisk operation and M1 and M2. and I define M1 to be cadr and M2 to be caddr. You get to be0:26:00a good Lisp programmer because you start talking that way. I cdr down lists and console them up and so on. Now, now that we have essentially a complete program0:26:09for finding derivatives, you can add more rules if you like. What kind of behavior do we get out of it? I'll have to clear that x. Well, supposing I define foo here to be the sum of the0:26:28product of ax square and bx plus c. That's the same thing we see here as the algebraic expression written in the more conventional notation over there.0:26:37Well, the derivative of foo with respect to x, which we can see over here, is this horrible, horrendous mess.0:26:46I would like it to be 2ax plus b. But it's not. It's equivalent to it. What is it?0:26:56I have here, what do I have? I have the derivative of the product of x and x. Over here is, of course, the sum of x times0:27:091 and 1 times x. Now, well, it's the first times the derivative of the second plus the second times the derivative of the first. It's right. That's 2x of course.0:27:20a times 2x is 2ax plus 0X square doesn't count plus B over here plus a bunch of 0's.0:27:29Well the answer is right. But I give people take off points on an exam for that, sadly enough. Let's worry about that in the next segment. Are there any questions?0:27:42Yes? AUDIENCE: If you had left the quote when you put the plus, then would that be referring to the procedure plus and0:27:51could you do a comparison between that procedure and some other procedure if you wanted to? PROFESSOR: Yes. Good question. If I had left this quotation off at this point, if I had0:28:05left that quotation off at that point, then I would be referring here to the procedure which is the thing that plus is defined to be.0:28:15And indeed, I could compare some procedures with each other for identity.0:28:25Now what that means is not clear right now. I don't like to think about it. Because I don't know exactly what it would need to compare procedures. There are reasons why that may make no sense at all.0:28:35However, the symbols, we understand. And so that's why I put that quote in. I want to talk about the symbol that's apparent on the page.0:28:46Any other questions? OK. Thank you. Let's take a break. [MUSIC PLAYING]0:29:30PROFESSOR: Well, let's see. We've just developed a fairly plausible program for computing the derivatives of algebraic expressions. It's an incomplete program, if you would0:29:40like to add more rules. And perhaps you might extend it to deal with uses of addition with any number of arguments and multiplication with any of the number of arguments.0:29:49And that's all rather easy. However, there was a little fly in that ointment. We go back to this slide.0:30:02We see that the expressions that we get are rather bad. This is a rather bad expression.0:30:11How do we get such an expression? Why do we have that expression? Let's look at this expression in some detail. Let's find out where all the pieces come from.0:30:21As we see here, we have a sum-- just what I showed you at the end of the last time-- of X times 1 plus 1 time X. That is a0:30:30derivative of this product. The produce of a times that, where a does not depend upon x, and therefore is constant with respect to x, is this0:30:40sum, which goes from here all the way through here and through here. Because it is the first thing times the derivative of the second plus the derivative of the first times the second as0:30:54the program we wrote on the blackboard indicated we should do. And, of course, the product of bx over here manifests itself0:31:06as B times 1 plus 0 times X because we see that B does not0:31:15depend upon X. And so the derivative of B is this 0, and the derivative of X with respect itself is the 1. And, of course, the derivative of the sums over here turn0:31:26into these two sums of the derivatives of the parts. So what we're seeing here is exactly the thing I was trying to tell you about with Fibonacci numbers a while ago,0:31:37that the form of the process is expanded from the local rules that you see in the procedure, that the procedure0:31:48represents a set of local rules for the expansion of this process. And here, the process left behind some stuff, which is0:31:59the answer. And it was constructed by the walk it takes of the tree structure, which is the expression.0:32:08So every part in the answer we see here derives from some part of the problem. Now, we can look at, for example, the derivative of0:32:17foo, which is ax square plus bx plus c, with respect to other things, like here, for example, we can see that the derivative of foo with respect to a.0:32:27And it's very similar. It's, in fact, the identical algebraic expression, except for the fact that theses 0's and 1's are in different places. Because the only degree of freedom we have in this tree0:32:38walk is what's constant with respect to the variable we're taking the derivative with respect to and was the same variable.0:32:48In other words, if we go back to this blackboard and we look, we have no choice what to do when we take the derivative of the sum or a product.0:32:58The only interesting place here is, is the expression the variable, or is the expression a constant with respect to0:33:07that variable for very, very small expressions? In which case we get various 1's and 0's, which if we go back to this slide, we can see that the 0's that appear here,0:33:17for example, this 1 over here in derivative of foo with respect to A, which gets us an X square, because that 1 gets the multiply of X and X into the answer, that 1 is 0.0:33:32Over here, we're not taking the derivative of foo with respect to c. But the shapes of these expressions are the same. See all those shapes.0:33:42They're the same. Well is there anything wrong with our rules?0:33:53No. They're the right rules. We've been through this one before. One of the things you're going to begin to discover is that0:34:02there aren't too many good ideas. When we were looking at rational numbers yesterday,0:34:12the problem was that we got 6/8 rather then 3/4. The answer was unsimplified. The problem, of course, is very similar.0:34:21There are things I'd like to be identical by simplification that don't become identical. And yet the rules for doing addition a multiplication of0:34:30rational numbers were correct. So the way we might solve this problem is do the thing we did last time, which always works. If something worked last time it ought to work again.0:34:40It's changed representation. Perhaps in the representation we could put in a simplification step that produces a simplified representation.0:34:50This may not always work, of course. I'm not trying to say that it always works. But it's one of the pieces of artillery we have in our war0:34:59against complexity. You see, because we solved our problem very carefully. What we've done, is we've divided the world in several parts. There are derivatives rules and general rules for algebra0:35:12of some sort at this level of detail. and i have an abstraction barrier.0:35:21And i have the representation of the algebraic expressions,0:35:32list structure. And in this barrier, I have the interface procedures.0:35:43I have constant, and things like same-var.0:35:54I have things like sum, make-sum. I have A1, A2.0:36:06I have products and things like that, all the other things I might need for various kinds of algebraic expressions. Making this barrier allows me to arbitrarily change the0:36:18representation without changing the rules that are written in terms of that representation. So if I can make the problem go away by changing0:36:28representation, the composition of the problem into these two parts has helped me a great deal. So let's take a very simple case of this.0:36:38What was one of the problems? Let's go back to this transparency again. And we see here, oh yes, there's horrible things like0:36:48here is the sum of an expression and 0. Well that's no reason to think of it as anything other than the expression itself.0:36:57Why should the summation operation have made up this edition? It can be smarter than that. Or here, for example, is a multiplication of0:37:09something by 1. It's another thing like that. Or here is a product of something with 0, which is certainly 0. So we won't have to make this construction.0:37:21So why don't we just do that? We need to change the way the representation works, almost here.0:37:37Make-sum to be. Well, now it's not something so simple. I'm not going to make a list containing the symbol plus and0:37:48things unless I need to. Well, what are the possibilities?0:37:57I have some sort of cases here. If I have numbers, if anyone is a number--0:38:09and here's another primitive I've just introduced, it's possible to tell whether something's number-- and if number A2, meaning they're not symbolic0:38:23expressions, then why not do the addition now? The result is just a plus of A1 and A2.0:38:32I'm not asking if these represent numbers. Of course all of these symbols represent numbers. I'm talking about whether the one I've got is the number 3 right now.0:38:43And, for example, supposing A1 is a number, and it's equal to0:38:590, well then the answer is just A2. There is no reason to make anything up.0:39:10And if A2 is a number, and equal A20, then0:39:27the result is A1. And only if I can't figure out something better to do with this situation, well, I can start a list. Otherwise I want0:39:41the representation to be the list containing the quoted symbol plus, and A1, and A2.0:39:58And, of course, a very similar thing can be done for products. And I think I'll avoid boring you with them. I was going to write it on the blackboard.0:40:07I don't think it's necessary. You know what to do. It's very simple. But now, let's just see the kind of results we get out of0:40:17changing our program in this way. Well, here's the derivatives after having just changed the constructors for expressions.0:40:28The same foo, aX square plus bX plus c, and what I get is nothing more than the derivative of that is 2aX plus0:40:40B. Well, it's not completely simplified. I would like to collect common terms and sums. Well, that's more work. And, of course, programs to do this sort of thing are huge0:40:51and complicated. Algebraic simplification, it's a very complicated mess. There's a very famous program you may have heard of called Maxima developed at MIT in the past, which is 5,000 pages of0:41:02Lisp code, mostly the algebraic simplification operations. There we see the derivative of foo.0:41:12In fact, X is at something I wouldn't take off more than 1 point for on an elementary calculus class. And the derivative of foo with respect to a, well it's gone down to X times X, which isn't so bad.0:41:24And the derivative of foo with respect to b is just X itself. And the derivative of foo with respect to c comes out 1. So I'm pretty pleased with this.0:41:34What you've seen is, of course, a little bit contrived, carefully organized example to show you how we can manipulate algebraic expressions, how we do that0:41:43abstractly in terms of abstract syntax rather than concrete syntax and how we can use the abstraction to control0:41:53what goes on in building these expressions. But the real story isn't just such a simple thing as that. The real story is, in fact, that I'm manipulating these0:42:03expressions. And the expressions are the same expressions-- going back to the slide-- as the ones that are Lisp expressions.0:42:12There's a pun here. I've chosen my representation to be the same as the representation in my language of similar things.0:42:22By doing so, I've invoked a necessity. I created the necessity to have things like quotation because of the fact that my language is capable of writing0:42:35expressions that talk about expressions of the language. I need to have something that says, this is an expression I'm talking about rather than this expression is talking0:42:45about something, and I want to talk about that. So quotation stops and says, I'm talking about this0:42:54expression itself. Now, given that power, if I can manipulate expressions of0:43:03the language, I can begin to build even much more powerful layers upon layers of languages. Because I can write languages that not only are embedded in0:43:12Lisp or whatever language you start with, but languages that are completely different, that are just, if we say, interpreted in Lisp or something like that.0:43:23We'll get to understand those words more in the future. But right now I just want to leave you with the fact that we've hit a line which gives us tremendous power.0:43:36And this point we've bought a sledgehammer. We have to be careful to what flies when we apply it. Thank you. [MUSIC PLAYING]

`0:00:00`Lecture 4A | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:24PROFESSOR: Well, yesterday we learned a bit about symbolic manipulation, and we wrote a rather stylized program to0:00:35implement a pile of calculus rule from the calculus book. Here on the transparencies, we see a bunch of calculus rules0:00:47from such a book. And, of course, what we did is sort of translate these rules into the language of the computer.0:00:56But, of course, that's a sort of funny strategy. Why should we have to translate these rules into the language of the computer? And what do I really mean by that?0:01:07These are--the program we wrote yesterday was very stylized. It was a conditional, a dispatch on the type of the expression as observed by the rules.0:01:19What we see here are rules that say if the object being the derivative is being taken of, if that expression is a constant, then do one thing.0:01:29If it's a variable, do another thing. If it's a product of a constant times a variable, do something and so on. There's sort of a dispatch there on a type.0:01:41Well, since it has such a stylized behavior and structure, is there some other way of writing this program that's more clear?0:01:50Well, what's a rule, first of all? What are these rules? Let's think about that. Rules have parts. If you look at these rules in detail, what you see, for0:02:04example, is the rule has a left-hand side and a right-hand side. Each of these rules has a left-hand side and the0:02:13right-hand side. The left-hand side is somehow compared with the expression you're trying to take the derivative of. The right-hand side is the replacement for that0:02:24expression. So all rules on this page are something like this.0:02:35I have patterns, and somehow, I have to produce, given a0:02:45pattern, a skeleton. This is a rule.0:02:55A pattern is something that matches, and a skeleton is something you substitute into in order to get a new expression.0:03:06So what that means is that the pattern is matched against the expression, which is the source expression.0:03:23And the result of the application of the rule is to produce a new expression, which I'll call a target, by0:03:38instantiation of a skeleton. That's called instantiation.0:03:50So that is the process by which these rules are described. What I'd like to do today is build a language and a means0:04:02of interpreting that language, a means of executing that language, where that language allows us to directly express these rules. And what we're going to do is instead of bringing the rules0:04:14to the level of the computer by writing a program that is those rules in the computer's language-- at the moment, in a Lisp-- we're going to bring the computer to the level of us by0:04:25writing a way by which the computer can understand rules of this sort. This is slightly emphasizing the idea that we had last time0:04:35that we're trying to make a solution to a class of problems rather than a particular one. The problem is if I want to write rules for a different0:04:45piece of mathematics, say, to simple algebraic simplification or something like that, or manipulation of0:04:54trigonometric functions, I would have to write a different program in using yesterday's method. Whereas I would like to encapsulate all of the things0:05:03that are common to both of those programs, meaning the idea of matching, instantiation, the control structure, which turns out to be very complicated for such a0:05:12thing, I'd like to encapsulate that separately from the rules themselves. So let's look at, first of all, a representation.0:05:22I'd like to use the overhead here. I'd like-- there it is. I'd like to look at a representation of the rules of calculus for derivatives in a sort of simple language that0:05:36I'm writing right here. Now, I'm going to avoid--I'm going to avoid worrying about syntax. We can easily pretty this, and I'm not interested in making--0:05:48this is indeed ugly. This doesn't look like the beautiful text set dx by dt or something that I'd like to write, but that's not essential.0:05:58That's sort of an accidental phenomenon. Here, we're just worrying about the fact that the structure of the rules is that there is a left-hand side0:06:07here, represents the thing I want to match against the derivative expression. This is the representation I'm going to say for the derivative of a constant, which we will call c with0:06:18respect to the variable we will call v. And what we will get on the right-hand side is 0. So this represents a rule.0:06:29The next rule will be the derivative of a variable, which we will call v with respect to the same variable v, and we get a 1.0:06:38However, if we have the derivative of a variable called u with respect to a different variables v, we will get 0.0:06:47I just want you look at these rules a little bit and see how they fit together. For example, over here, we're going to have the derivative0:06:56of the sum of an expression called x1 and an expression called x2. These things that begin with question marks are called pattern variables in the language that we're inventing,0:07:08and you see we're just making it up, so pattern variables for matching. And so in this-- here we have the derivative of the sum of the expression0:07:19which we will call x1. And the expression we will call x2 with respect to the variable we call v will be-- here is the right-hand side: the sum of the derivative of that expression x1 with0:07:29respect to v-- the right-hand side is the skeleton-- and the derivative of x2 with respect to v. Colons here will0:07:38stand for substitution objects. They're--we'll call them skeleton evaluations.0:07:48So let me put up here on the blackboard for a second some syntax so we'll know what's going on for this rule language. First of all, we're going to have to worry about the0:07:58pattern matching. We're going to have things like a symbol like foo matches0:08:11exactly itself.0:08:23The expression f of a and b will be used to match any list0:08:35whose first element is f, whose second element is a, and0:08:51whose third element is b. Also, another thing we might have in a pattern is that--0:09:03a question mark with some variable like x. And what that means, it says matches anything, which we0:09:17will call x. Question mark c x will match only constants.0:09:30So this is something which matches a constant colon x.0:09:44And question mark v x will match a variable,0:09:55which we call x. This is sort of the language we're making up now.0:10:04If I match two things against each other, then they are compared element by element. But elements in the pattern may contain these syntactic0:10:13variables, pattern variables, which will be used to match arbitrary objects.0:10:22And we'll get that object as the value in the name x here, for example.0:10:31Now, when we make skeletons for instantiation. Well, then we have things like this.0:10:42foo, a symbol, instantiates to itself.0:10:55Something which is a list like f of a and b, instantiates to--0:11:06well, f instantiates to a 3-list, a list of three elements, okay, which are the results of instantiating each0:11:27of f, a, and b.0:11:36And x well--we instantiate to the value of x as in the0:11:53matched pattern.0:12:02So going back to the overhead here, we see--we see that all of those kinds of objects, we see here a pattern variable0:12:14which matches a constant, a pattern variable which matches a variable, a pattern variable which will match anything. And if we have two instances of the same name, like this is0:12:25the derivative of the expression which is a variable only whose name will be v with respect to some arbitrary0:12:34expression which we will call v, since this v appears twice, we're going to want that to mean they have to be the same. The only consistent match is that those are the same.0:12:45So here, we're making up a language. And in fact, that's a very nice thing to be doing. It's so much fun to make up a language. And you do this all the time.0:12:54And the really most powerful design things you ever do are sort of making up a language to solve problems like this. Now, here we go back here and look at some of these rules.0:13:05Well, there's a whole set of them. I mean, there's one for addition and one for multiplication, just like we had before. The derivative of the product of x1 and x2 with respect to v0:13:16is the sum of the product of x1 and the derivative x2 with respect to v and the product of the derivative of x1 and x2.0:13:27And here we have exponentiation. And, of course, we run off the end down here. We get as many as we like. But the whole thing over here, I'm giving this--this list of0:13:36rules the name "derivative rules." What would we do with such a thing once we have it?0:13:45Well, one of the nicest ideas, first of all, is I'm going to write for you, and we're going to play with it all day. What I'm going to write for you is a program called0:13:56simplifier, the general-purpose simplifier. And we're going to say something like define dsimp to0:14:09be a simplifier of the derivative rules.0:14:23And what simplifier is going to do is, given a set of rules, it will produce for me a procedure which will simplify expressions containing the things that are0:14:33referred to by these rules. So here will be a procedure constructed for your purposes0:14:42to simplify things with derivatives in them such that, after that, if we're typing at some list system, and we get a prompt, and we say dsimp, for example, of the derivative of0:14:58the sum of x and y with respect to x-- note the quote here because I'm talking about the0:15:08expression which is the derivative-- then I will get back as a result plus 1 0.0:15:19Because the derivative of x plus y is the derivative of x plus derivative y. The derivative of x with respect to x is 1. The derivative of y with respect to x is 0.0:15:29It's not what we're going to get. I haven't put any simplification at that level-- algebraic simplification-- yet. Of course, once we have such a thing, then we can--then we0:15:39can look at other rules. So, for example, we can, if we go to the slide, OK?0:15:49Here, for example, are other rules that we might have, algebraic manipulation rules, ones that would be used for simplifying algebraic expressions.0:15:58For example, just looking at some of these, the left-hand side says any operator applied to a constant e1 and a0:16:08constant e2 is the result of evaluating that operator on the constants e1 and e2. Or an operator, applied to e1, any expression e1 and a0:16:20constant e2, is going to move the constant forward. So that'll turn into the operator with e2 followed by e1. Why I did that, I don't know.0:16:30It wouldn't work if I had division, for example. So there's a bug in the rules, if you like. So the sum of 0 and e is e.0:16:42The product of 1 and any expression e is e. The product of 0 and any expression e is 0. Just looking at some more of these rules, we could have0:16:51arbitrarily complicated ones. We could have things like the product of the constant e1 and any constant e2 with e3 is the result of multiplying the0:17:04result of--multiplying now the constants e1 and e2 together and putting e3 there.0:17:13So it says combine the constants that I had, which was if I had a product of e1 and e2 and e3 just multiply--I mean and e1 and e2 are both constants, multiply them.0:17:23And you can make up the rules as you like. There are lots of them here. There are things as complicated, for example, as-- oh, I suppose down here some distributive law, you see.0:17:33The product of any object c and the sum of d and e gives the result as the same as the sum of the product of c and d0:17:42and the product of c and e. Now, what exactly these rules are doesn't very much interest me. We're going to be writing the language that will allow us to0:17:51interpret these rules so that we can, in fact, make up whatever rules we like, another whole language of programming.0:18:03Well, let's see. I haven't told you how we're going to do this. And, of course, for a while, we're going to work on that. But there's a real question of what is--what am I going to do0:18:13at all at a large scale? How do these rules work? How is the simplifier program going to manipulate these rules with your expression to produce a reasonable answer?0:18:26Well, first, I'd like to think about these rules as being some sort of deck of them. So here I have a whole bunch of rules, right?0:18:42Each rule-- here's a rule-- has a pattern and a skeleton. I'm trying to make up a control structure for this.0:18:53Now, what I have is a matcher, and I have something which is0:19:02an instantiater. And I'm going to pass from the matcher to the instantiater0:19:13some set of meaning for the pattern variables, a dictionary, I'll call it. A dictionary, which will say x was matched against the0:19:26following subexpression and y was matched against another following subexpression. And from the instantiater, I will be making expressions,0:19:35and they will go into the matcher. They will be expressions.0:19:44And the patterns of the rules will be fed into the matcher, and the skeletons from the same rule will be fed into the0:19:53instantiater. Now, this is a little complicated because when you have something like an algebraic expression, where someth--the rules are intended to be able to allow you to0:20:02substitute equal for equal. These are equal transformation rules. So all subexpressions of the expression should be looked at.0:20:11You give it an expression, this thing, and the rules should be cycled around. First of all, for every subexpression of the expression you feed in, all of the rules must be0:20:21tried and looked at. And if any rule matches, then this process occurs. The dictionary--the dictionary is to have some values in it.0:20:30The instantiater makes a new expression, which is basically replaces that part of the expression that was matched in your original expression.0:20:40And then, then, of course, we're going to recheck that, going to go around these rules again, seeing if that could be simplified further.0:20:49And then, then we're going to do that for every subexpression until the thing no longer changes. You can think of this as sort of an organic process. You've got some sort of stew, right?0:21:00You've got bacteria or something, or enzymes in some, in some gooey mess. And there's these--and these enzymes change things.0:21:10They attach to your expression, change it, and then they go away. And they have to match. The key-in-lock phenomenon. They match, they change it, they go away.0:21:19You can imagine it as a parallel process of some sort. So you stick an expression into this mess, and after a while, you take it out, and it's been simplified.0:21:29And it just keeps changing until it no longer can be changed. But these enzymes can attach to any part of the, of the expression.0:21:39OK, at this point, I'd like to stop and ask for questions. Yes. AUDIENCE: This implies that the matching program and the0:21:48instantiation program are separate programs; is that right? Or is that-- they are. PROFESSOR: They're separate little pieces. They fit together in a larger structure.0:21:57AUDIENCE: So I'm going through and matching and passing the information about what I matched to an instantiater, which makes the changes. And then I pass that back to the matcher?0:22:06PROFESSOR: It won't make a change. It will make a new expression, which has, which has substituted the values of the pattern variable that were matched on the left-hand side for the variables that are0:22:17mentioned, the skeleton variables or evaluation variables or whatever I called them, on the right-hand side. AUDIENCE: And then that's passed back into the matcher?0:22:27PROFESSOR: Then this is going to go around again. This is going to go through this mess until it no longer changes. AUDIENCE: And it seems that there would be a danger of getting into a recursive loop.0:22:37PROFESSOR: Yes. Yes, if you do not write your rules nicely, you are-- indeed, in any programming language you invent, if it's sufficiently powerful to do anything, you can write0:22:46programs that will go into infinite loops. And indeed, writing a program for doing algebraic manipulation for long will produce infinite loops.0:23:00Go ahead. AUDIENCE: Some language designers feel that this feature is so important that it should become part of the basic language, for example, scheme in this case.0:23:12What are your thoughts on-- PROFESSOR: Which language feature? AUDIENCE: The pairs matching. It's all application of such rules should be--0:23:21PROFESSOR: Oh, you mean like Prolog? AUDIENCE: Like Prolog, but it becomes a more general-- PROFESSOR: It's possible. OK, I think my feeling about that is that I would like to0:23:33teach you how to do it so you don't depend upon some language designer. AUDIENCE: OK. PROFESSOR: You make it yourself. You can roll your own.0:23:44Thank you.0:24:14Well, let's see. Now we have to tell you how it works. It conveniently breaks up into various pieces.0:24:24I'd like to look now at the matcher. The matcher has the following basic structure. It's a box that takes as its input an expression and a0:24:44pattern, and it turns out a dictionary.0:25:01A dictionary, remember, is a mapping of pattern variables to the values that were found by matching, and it puts out another dictionary, which is the result of augmenting this0:25:20dictionary by what was found in matching this expression against this pattern. So that's the matcher.0:25:33Now, this is a rather complicated program, and we can look at it on the overhead over here and see, ha, ha,0:25:42it's very complicated. I just want you to look at the shape of it. It's too complicated to look at except in pieces.0:25:51However, it's a fairly large, complicated program with a lot of sort of indented structure.0:26:00At the largest scale-- you don't try to read those characters, but at the largest scale, you see that there is a case analysis, which is all0:26:09these cases lined up. What we're now going to do is look at this in a bit more detail, attempting to understand how it works.0:26:19Let's go now to the first slide, showing some of the structure of the matcher at a large scale.0:26:28And we see that the matcher, the matcher takes as its input a pattern, an expression, and a dictionary.0:26:38And there is a case analysis here, which is made out of several cases, some of which have been left out over here, and the general case, which I'd like you to see.0:26:50Let's consider this general case. It's a very important pattern. The problem is that we have to examine two trees0:27:00simultaneously. One of the trees is the tree of the expression, and the other is the tree of the pattern. We have to compare them with each other so that the0:27:12subexpressions of the expression are matched against subexpressions of the pattern. Looking at that in a bit more detail, suppose I had a0:27:21pattern, a pattern, which was the sum of the product of a thing which we will call x and a thing which we will call y,0:27:38and the sum of that, and the same thing we call y. So we're looking for a sum of a product whose second--whose0:27:49second argument is the same as the second argument of the sum. That's a thing you might be looking for.0:27:59Well, that, as a pattern, looks like this. There is a tree, which consists of a sum, and a0:28:09product with a pattern variable question mark x and question mark y, the other pattern variable, and question0:28:21mark y, just looking at the same, just writing down the list structure in a different way. Now, suppose we were matching that against an expression0:28:31which matches it, the sum of, say, the product of 3 and x and, say, x.0:28:42That's another tree. It's the sum of the product of 3 and x and of x.0:28:59So what I want to do is traverse these two trees simultaneously. And what I'd like to do is walk them like this.0:29:08I'm going to say are these the same? This is a complicated object. Let's look at the left branches.0:29:17Well, that could be the car. How does that look? Oh yes, the plus looks just fine. But the next thing here is a complicated thing. Let's look at that. Oh yes, that's pretty fine, too.0:29:26They're both asterisks. Now, whoops! My pattern variable, it matches against the 3. Remember, x equals 3 now.0:29:36That's in my dictionary, and the dictionary's going to follow along with me: x equals three. Ah yes, x equals 3 and y equals x, different x.0:29:46The pattern x is the expression x, the pattern y. Oh yes, the pattern variable y, I've already0:29:56got a value for it. It's x. Is this an x? Oh yeah, sure it is. That's fine. Yep, done. I now have a dictionary, which I've accumulated0:30:07by making this walk. Well, now let's look at this general case here and see how that works. Here we have it.0:30:17I take in a pattern variable--a pattern, an expression, and a dictionary. And now I'm going to do a complicated thing here, which0:30:26is the general case. The expression is made out of two parts: a left and a right half, in general.0:30:35Anything that's complicated is made out of two pieces in a Lisp system. Well, now what do we have here? I'm going to match the car's of the two expressions against0:30:45each other with respect to the dictionary I already have, producing a dictionary as its value, which I will then use0:30:55for matching the cdr's against each other. So that's how the dictionary travels, threads the entire structure. And then the result of that is the dictionary for the match0:31:06of the car and the cdr, and that's what's going to be returned as a value. Now, at any point, a match might fail.0:31:16It may be the case, for example, if we go back and look at an expression that doesn't quite match, like supposing this was a 4.0:31:29Well, now these two don't match any more, because the x that had to be-- sorry, the y that had to be x here and this0:31:38y has to be 4. But x and 4 were not the same object syntactically. So this wouldn't match, and that would be rejected0:31:47sometimes, so matches may fail. Now, of course, because this matcher takes the dictionary from the previous match as input, it must be able to0:31:57propagate the failures. And so that's what the first clause of this conditional does. It's also true that if it turned out that the pattern0:32:07was not atomic-- see, if the pattern was atomic, I'd go into this stuff, which we haven't looked at yet. But if the pattern is not atomic and the0:32:16expression is atomic-- it's not made out of pieces-- then that must be a failure, and so we go over here. If the pattern is not atomic and the pattern is not a0:32:26pattern variable-- I have to remind myself of that-- then we go over here. So that way, failures may occur.0:32:35OK, so now let's look at the insides of this thing. Well, the first place to look is what happens if I have an atomic pattern? That's very simple. A pattern that's not made out of any pieces: foo.0:32:46That's a nice atomic pattern. Well, here's what we see. If the pattern is atomic, then if the expression is atomic,0:32:56then if they are the same thing, then the dictionary I get is the same one as I had before. Nothing's changed. It's just that I matched plus against plus, asterisk against0:33:09asterisk, x against x. That's all fine. However, if the pattern is not the one which is the expression, if I have two separate atomic objects, then0:33:19it was matching plus against asterisk, which case I fail. Or if it turns out that the pattern is atomic but the0:33:29expression is complicated, it's not atomic, then I get a failure. That's very simple.0:33:38Now, what about the various kinds of pattern variables? We had three kinds. I give them the names.0:33:47They're arbitrary constants, arbitrary variables, and arbitrary expressions. A question mark x is an arbitrary expression.0:34:01A question mark cx is an arbitrary constant, and a question mark vx is an arbitrary variable. Well, what do we do here?0:34:10Looking at this, we see that if I have an arbitrary constant, if the pattern is an arbitrary constant, then it had better be the case that the expression0:34:19had better be a constant. If the expression is not a constant, then that match fails. If it is a constant, however, then I wish to extend the dictionary. I wish to extend the dictionary with that pattern0:34:32being remembered to be that expression using the old dictionary as a starting point.0:34:41So really, for arbitrary variables, I have to check first if the expression is a variable by matching against. If so, it's worth extending the dictionary so that the0:34:50pattern is remembered to be matched against that expression, given the original dictionary, and this makes a new dictionary. Now, it has to check.0:35:00There's a sorts of failure inside extend dictionary, which is that-- if one of these pattern variables already has a value0:35:09and I'm trying to match the thing against something else which is not equivalent to the one that I've already matched it against once, then a failure will come flying out of here, too.0:35:20And I will see that some time. And finally, an arbitrary expression does not have to check anything syntactic about the expression that's being0:35:29matched, so all it does is it's an extension of the dictionary. So you've just seen a complete, very simple matcher.0:35:39Now, one of the things that's rather remarkable about this is people pay an awful lot of money these days for someone to make a, quote, AI expert system that has nothing more0:35:49in it than a matcher and maybe an instantiater like this. But it's very easy to do, and now, of course, you can start up a little start-up company and make a couple of megabucks0:35:59in the next week taking some people for a ride. 20 years ago, this was remarkable, this kind of program.0:36:09But now, this is sort of easy. You can teach it to freshmen. Well, now there's an instantiater as well.0:36:19The problem is they're all going off and making more money than I do. But that's always been true of universities. As expression, the purpose of the instantiater is to make0:36:33expressions given a dictionary and a skeleton.0:36:44And that's not very hard at all. We'll see that very simply in the next, the next slide here.0:36:53To instantiate a skeleton, given a particular dictionary-- oh, this is easy. We're going to do a recursive tree walk over the skeleton.0:37:04And for everything which is a skeleton variable-- I don't know, call it a skeleton evaluation. That's the name and the abstract syntax that I give it in this program: a skeleton evaluation, a thing beginning0:37:13with a colon in the rules. For anything of that case, I'm going to look up the answer in the dictionary, and we'll worry about that in a second.0:37:24Let's look at this as a whole. Here, I have-- I'm going to instantiate a skeleton, given a dictionary. Well, I'm going to define some internal loop right there, and0:37:38it's going to do something very simple. Even if a skeleton--even if a skeleton is simple and atomic, in which case it's nothing more than giving the skeleton back as an answer, or in the general case, it's0:37:51complicated, in which case I'm going to make up the expression which is the result of instantiating-- calling this loop recursively--0:38:01instantiating the car of the skeleton and the cdr. So here is a recursive tree walk. However, if it turns out to be a skeleton evaluation, a colon0:38:12expression in the skeleton, then what I'm going to do is find the expression that's in the colon--0:38:21the CADR in this case. It's a piece of abstract syntax here, so I can change my representation of rules. I'm going to evaluate that relative to this dictionary,0:38:31whatever evaluation means. We'll find out a lot about that sometime. And the result of that is my answer. so. I start up this loop-- here's my initialization--0:38:42by calling it with the whole skeleton, and this will just do a recursive decomposition into pieces. Now, one more little bit of detail is what0:38:55happens inside evaluate? I can't tell you that in great detail. I'll tell you a little bit of it. Later, we're going to see--look into this in much more detail.0:39:04To evaluate some form, some expression with respect to a dictionary, if the expression is an atomic object, well, I'm0:39:15going to go look it up. Nothing very exciting there. Otherwise, I'm going to do something complicated here, which is I'm going to apply a procedure which is the result0:39:26of looking up the operator part in something that we're going to find out about someday. I want you realize you're seeing magic now. This magic will become clear very soon, but not today.0:39:40Then I'm looking at--looking up all the pieces, all the arguments to that in the dictionary. So I don't want you to look at this in detail.0:39:51I want you to say that there's more going on here, and we're going to see more about this. But it's-- the magic is going to stop.0:40:02This part has to do with Lisp, and it's the end of that. OK, so now we know about matching and instantiation.0:40:15Are there any questions for this segment?0:40:27AUDIENCE: I have a question. PROFESSOR: Yes. AUDIENCE: Is it possible to bring up a previous slide? It's about this define match pattern.0:40:36PROFESSOR: Yes. You'd like to see the overall slide define match pattern. Can somebody put up the-- no, the overhead. That's the biggest scale one.0:40:45What part would you like to see? AUDIENCE: Well, the top would be fine. Any of the parts where you're passing failed.0:40:54PROFESSOR: Yes. AUDIENCE: The idea is to pass failed back to the dictionary; is that right? PROFESSOR: The dictionary is the answer to a match, right?0:41:05And it is either some mapping or there's no match. It doesn't match.0:41:14AUDIENCE: Right. PROFESSOR: So what you're seeing over here is, in fact, because the fact that a match may have another match pass in the dictionary, as you see in the general case down here.0:41:24Here's the general case where a match passes another match to the dictionary. When I match the cdr's, I match them in the dictionary that is resulting from matching the car's.0:41:36OK, that's what I have here. So because of that, if the match of the car's fails, then it may be necessary that the match of the cdr's propagates that failure, and that's what the first line is.0:41:48AUDIENCE: OK, well, I'm still unclear what matches-- what comes out of one instance of the match? PROFESSOR: One of two possibilities. Either the symbol failed, which means there is no match.0:41:59AUDIENCE: Right. PROFESSOR: Or some mapping, which is an abstract thing right now, and you should know about the structure of it, which relates the pattern variables to their values as0:42:13picked up in the match. AUDIENCE: OK, so it is-- PROFESSOR: That's constructed by extend dictionary. AUDIENCE: So the recursive nature brings about the fact0:42:22that if ever a failed gets passed out of any calling of match, then the first condition will pick it up-- PROFESSOR: And just propagate it along without any further0:42:32ado, right. AUDIENCE: Oh, right. OK. PROFESSOR: That's just the fastest way to get that failure out of there.0:42:43Yes. AUDIENCE: If I don't fail, that means that I've matched a pattern, and I run the procedure extend dict and then pass in the pattern in the expression.0:42:55But the substitution will not be made at that point; is that right? I'm just-- PROFESSOR: No, no. There's no substitution being there because there's no skeleton to be substituted in. AUDIENCE: Right. So what-- PROFESSOR: All you've got there is we're making up the0:43:04dictionary for later substitution. AUDIENCE: And what would the dictionary look like? Is it ordered pairs?0:43:13PROFESSOR: That's--that's not told to you. We're being abstract. AUDIENCE: OK. PROFESSOR: Why do you want to know? What it is, it's a function. It's a function. AUDIENCE: Well, the reason I want to know is--0:43:22PROFESSOR: A function abstractly is a set of ordered pairs. It could be implemented as a set of list pairs. It could be implemented as some fancy table mechanism.0:43:32It could be implemented as a function. And somehow, I'm building up a function. But I'm not telling you. That's up to George, who's going to build that later.0:43:49I know you really badly want to write concrete things. I'm not going to let you do that. AUDIENCE: Well, let me at least ask, what is the important information there that's being passed to extend dict?0:43:59I want to pass the pattern I found-- PROFESSOR: Yes. The pattern that's matched against the expression. You want to have the pattern, which happens to be in those cases pattern variables, right?0:44:09All of those three cases for extend dict are pattern variables. AUDIENCE: Right. PROFESSOR: So you have a pattern variable that is to be given a value in a dictionary.0:44:18AUDIENCE: Mm-hmm. PROFESSOR: The value is the expression that it matched against. The dictionary is the set of things I've already0:44:27figured out that I have memorized or learned. And I am going to make a new dictionary, which is extended from the original one by having that pattern variable0:44:36have a value with the new dictionary. AUDIENCE: I guess what I don't understand is why can't the substitution be made right as soon as you find-- PROFESSOR: How do I know what I'm going to substitute? I don't know anything about this skeleton.0:44:47This pattern, this matcher is an independent unit. AUDIENCE: Oh, I see. OK. PROFESSOR: Right? AUDIENCE: Yeah. PROFESSOR: I take the matcher. I apply the matcher. If it matches, then it was worth doing instantiation.0:44:57AUDIENCE: OK, good. Yeah. PROFESSOR: OK? AUDIENCE: Can you just do that answer again using that example on the board? You know, what you just passed back to the matcher.0:45:06PROFESSOR: Oh yes. OK, yes. You're looking at this example. At this point when I'm traversing this structure, I get to here: x.0:45:16I have some dictionary, presumably an empty dictionary at this point if this is the whole expression. So I have an empty dictionary, and I've matched x against 3.0:45:26So now, after this point, the dictionary contains x is 3, OK? Now, I continue walking along here.0:45:35I see y. Now, this is a particular x, a pattern x. I see y, a pattern y. The dictionary says, oh yes, the pattern y is the symbol x0:45:48because I've got a match there. So the dictionary now contains at this point two entries. The pattern x is 3, and the pattern y is the expression x.0:46:02Now, I get that, I can walk along further. I say, oh, pattern y also wants to be 4. But that isn't possible, producing a failure.0:46:14Thank you. Let's take a break.0:47:02OK, you're seeing your first very big and hairy program. Now, of course, one of the goals of this subsegment is to get you to be able to read something like this and not be0:47:12afraid of it. This one's only about four pages of code. By the end of the subject, I hope a 50-page program will not look particularly frightening.0:47:22But I don't expect-- and I don't want you to think that I expect you to be getting it as it's coming out. You're supposed to feel the flavor of this, OK?0:47:31And then you're supposed to think about it because it is a big program. There's a lot of stuff inside this program.0:47:40Now, I've told you about the language we're implementing, the pattern match substitution language. I showed you some rules. And I've told you about matching and instantiation,0:47:51which are the two halves of how a rule works. Now we have to understand the control structure by which the rules are applied to the expressions so as to do0:48:03algebraic simplification. Now, that's also a big complicated mess.0:48:12The problem is that there is a variety of interlocking, interwoven loops, if you will, involved in this. For one thing, I have to apply--0:48:22I have to examine every subexpression of my expression that I'm trying to simplify. That we know how to do. It's a car cdr recursion of some sort, or something like0:48:34that, and some sort of tree walk. And that's going to be happening. Now, for every such place, every node that I get to in0:48:43doing my traversal of the expression I'm trying to simplify, I want to apply all of the rules.0:48:53Every rule is going to look at every node. I'm going to rotate the rules around. Now, either a rule will or will not match.0:49:07If the rule does not match, then it's not very interesting. If the rule does match, then I'm going to replace that node0:49:16in the expression by an alternate expression. I'm actually going to make a new expression, which contains-- everything contains that new value, the result of0:49:26substituting into the skeleton, instantiating the skeleton for that rule at this level. But no one knows whether that thing that I instantiated0:49:35there is in simplified form. So we're going to have to simplify that, somehow to call the simplifier on the thing that I just constructed.0:49:45And then when that's done, then I sort of can build that into the expression I want as my answer. Now, there is a basic idea here, which I will call a0:49:55garbage- in, garbage-out simplifier. It's a kind of recursive simplifier. And what happens is the way you simplify something is that0:50:06simple objects like variables are simple. Compound objects, well, I don't know. What I'm going to do is I'm going to build up from simple0:50:16objects, trying to make simple things by assuming that the pieces they're made out of are simple. That's what's happening here.0:50:27Well, now, if we look at the first slide-- no, overhead, overhead. If we look at the overhead, we see a very complicated program like we saw before for the matcher, so complicated that0:50:38you can't read it like that. I just want you to get the feel of the shape of it, and the shape of it is that this program has various0:50:48subprograms in it. One of them--this part is the part for traversing the0:50:57expression, and this part is the part for trying rules. Now, of course, we can look at that in some more detail.0:51:06Let's look at--let's look at the first transparency, right? The simplifier is made out of several parts.0:51:17Now, remember at the very beginning, the simplifier is the thing which takes a rules--a set of rules and produces a program which will simplify it relative to them.0:51:29So here we have our simplifier. It takes a rule set. And in the context where that rule set is defined, there are0:51:39various other definitions that are done here. And then the result of this simplifier procedure is, in fact, one of the procedures that was defined.0:51:50Simplify x. What I'm returning as the value of calling the simplifier on a set of rules is a procedure, the simplify x0:52:01procedure, which is defined in that context, which is a simplification procedure appropriate for using those set of rules.0:52:14That's what I have there. Now, the first two of these procedures, this one and this one, are together going to be the recursive traversal of an0:52:25expression. This one is the general simplification for any expression, and this is the thing which simplifies a list of parts of an expression.0:52:35Nothing more. For each of those, we're going to do something complicated, which involves trying the rules. Now, we should look at the various parts.0:52:45Well let's look first at the recursive traversal of an expression. And this is done in a sort of simple way.0:52:54This is a little nest of recursive procedures. And what we have here are two procedures-- one for simplifying an expression, and one for0:53:06simplifying parts of an expression. And the way this works is very simple. If the expression I'm trying to simplify is a compound0:53:16expression, I'm going to simplify all the parts of it. And that's calling--that procedure, simplify parts, is going to make up a new expression with all the parts0:53:25simplified, which I'm then going to try the rules on over here. If it turns out that the expression is not compound, if it's simple, like just a symbol or something like pi,0:53:37then in any case, I'm going to try the rules on it because it might be that I want in my set of rules to expand pi to 3.14159265358979, dot, dot, dot.0:53:48But I may not. But there is no reason not to do it. Now, if I want to simplify the parts, well, that's easy too.0:53:59Either the expression is an empty one, there's no more parts, in which case I have the empty expression. Otherwise, I'm going to make a new expression by cons, which0:54:11is the result of simplifying the first part of the expression, the car, and simplifying the rest of the expression, which is the cdr.0:54:21Now, the reason why I'm showing you this sort of stuff this way is because I want you get the feeling for the various patterns that are very important when writing programs. And this could be written a different way.0:54:33There's another way to write simplified expressions so there would be only one of them. There would only be one little procedure here. Let me just write that on the blackboard to give you a feeling for that.0:54:49This in another idiom, if you will.0:54:58To simplify an expression called x, what am I going to do? I'm going to try the rules on the following situation.0:55:11If-- on the following expression-- compound, just like we had before.0:55:21If the expression is compound, well, what am I going to do? I'm going to simplify all the parts. But I already have a cdr recursion, a common pattern of0:55:30usage, which has been captured as a high-order procedure. It's called map. So I'll just write that here. Map simplify the expression, all the parts of the0:55:47expression. This says apply the simplification operation, which is this one, every part of the expression, and then that cuts those up into a list. It's every element of0:56:02the list which the expression is assumed to be made out of, and otherwise, I have the expression. So I don't need the helper procedure, simplify parts,0:56:12because that's really this. So sometimes, you just write it this way. It doesn't matter very much. Well, now let's take a look at--0:56:24let's just look at how you try rules. If you look at this slide, we see this is a complicated mess also.0:56:33I'm trying rules on an expression. It turns out the expression I'm trying it on is some subexpression now of the expression I started with. Because the thing I just arranged allowed us to try0:56:43every subexpression. So now here we're taking in a subexpression of the expression we started with. That's what this is.0:56:52And what we're going to define here is a procedure called scan, which is going to try every rule. And we're going to start it up on the whole set of rules.0:57:01This is going to go cdr-ing down the rules, if you will, looking for a rule to apply. And when it finds one, it'll do the job.0:57:14Well, let's take a look at how try rules works. It's very simple: the scan rules. Scan rules, the way of scanning. Well, is it so simple?0:57:23It's a big program, of course. We take a bunch of rules, which is a sublist of the list of rules. We've tried some of them already, and they've not been0:57:33appropriate, so we get to some here. We get to move to the next one. If there are no more rules, well then, there's nothing I can do with this expression, and it's simplified.0:57:42However, if it turns out that there are still rules to be done, then let's match the pattern of the first rule0:57:52against the expression using the empty dictionary to start with and use that as the dictionary. If that happens to be a failure, try0:58:02the rest of the rules. That's all it says here. It says discard that rule.0:58:11Otherwise, well, I'm going to get the skeleton of the first rule, instantiate that relative to the dictionary, and simplify the result, and that's the expression I want.0:58:24So although that was a complicated program, every complicated program is made out of a lot of simple pieces. Now, the pattern of recursions here is very complicated.0:58:34And one of the most important things is not to think about that. If you try to think about the actual pattern by which this does something, you're going to get very confused.0:58:45I would. This is not a matter of you can do this with practice. These patterns are hard. But you don't have to think about it.0:58:55The key to this-- it's very good programming and very good design-- is to know what not to think about. The fact is, going back to this slide, I don't have to0:59:07think about it because I have specifications in my mind for what simplify x does. I don't have to know how it does it.0:59:16And it may, in fact, call scan somehow through try rules, which it does. And somehow, I've got another recursion going on here. But since I know that simplify x is assumed by wishful0:59:28thinking to produce the simplified result, then I don't have to think about it anymore. I've used it. I've used it in a reasonable way. I will get a reasonable answer.0:59:39And you have to learn how to program that way-- with abandon. Well, there's very little left of this thing.0:59:50All there is left is a few details associated with what a dictionary is. And those of you who've been itching to know what a dictionary is, well, I will flip it up and not tell you1:00:01anything about it. Dictionaries are easy. It's represented in terms of something else called an A list, which is a particular pattern of usage for making1:00:14tables in lists. They're easy. They're made out of pairs, as was asked a bit ago. And there are special procedures for dealing with1:00:23such things called assq, and you can find them in manuals. I'm not terribly excited about it. The only interesting thing here in extend dictionary is I have to extend the dictionary with a pattern, a datum, and a1:00:36dictionary. This pattern is, in fact, at this point a pattern variable. And what do I want to do? I want to pull out the name of that pattern variable, the1:00:48pattern variable name, and I'm going to look up in the dictionary and see if it already has a value. If not, I'm going to add a new one in.1:00:57If it does have one, if it has a value, then it had better be equal to the one that was already stored away. And if that's the case, the dictionary is what I expected it to be.1:01:06Otherwise, I fail. So that's easy, too. If you open up any program, you're going to find inside of1:01:15it lots of little pieces, all of which are easy. So at this point, I suppose, I've just told you some million-dollar valuable information.1:01:27And I suppose at this point we're pretty much done with this program. I'd like to ask about questions. AUDIENCE: Yes, can you give me the words that describe the specification for a simplified expression?1:01:38PROFESSOR: Sure. A simplified expression takes an expression and produces a simplified expression. That's it, OK?1:01:48How it does it is very easy. In compound expressions, all the pieces are simplified, and then the rules are tried on the result. And for simple expressions, you just try all the rules.1:01:59AUDIENCE: So an expression is simplified by virtue of the rules? PROFESSOR: That's, of course, true. AUDIENCE: Right. PROFESSOR: And the way this works is that simplifi expression, as you see here, what it does is it breaks the1:02:10expression down into the smallest pieces, simplifies building up from the bottom using the rules to be the simplifier, to do the manipulations, and constructs1:02:21a new expression as the result. Eventually, one of things you see is that the rules themselves, the try rules, call a simplified expression1:02:30on the results when it changes something, the results of a match. I'm sorry, the results of instantiation of a skeleton1:02:39for a rule that has matched. So the spec of a simplified expression is that any expression you put into it comes out simplified according to those rules.1:02:49Thank you. Let's take a break.

`0:00:00`Lecture 4B | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:20PROFESSOR: So far in this course we've been talking a lot about data abstraction. And remember the idea is that we build systems that have these horizontal barriers in them, these abstraction0:00:31barriers that separate use, the way you might use some data object, from the way you might represent it.0:00:48Or another way to think of that is up here you have the boss who's going to be using some sort of data object.0:00:57And down here is George who's implemented it. Now this notion of separating use from representation so you can think about these two problems separately is a very,0:01:10very powerful programming methodology, data abstraction. On the other hand, it's not really sufficient for really0:01:21complex systems. And the problem with this is George. Or actually, the problem is that there0:01:32are a lot of Georges. Let's be concrete. Let's suppose there is George, and there's also Martha.0:01:41OK, now George and Martha are both working on this system, both designing representations, and absolutely are incompatible.0:01:51They wouldn't cooperate on a representation under any circumstances. And the problem is you would like to have some system where0:02:00both George and Martha are designing representations, and yet, if you're above this abstraction barrier you don't0:02:09want to have to worry about that, whether something is done by George or by Martha. And you don't want George and Martha to interfere with each other. Somehow in designing a system, you not only want these0:02:20horizontal barriers, but you also want some kind of vertical barrier to keep George and Martha separate.0:02:32Let me be a little bit more concrete. Imagine that you're thinking about personnel records for a0:02:42large company with a lot of loosely linked divisions that don't cooperate very well either. And imagine even that this company is formed by merging a0:02:57whole bunch of companies that already have their personnel record system set up. And imagine that once these divisions are all linked in0:03:06some kind of very sophisticated satellite network, and all these databases are put together. And what you'd like to do is, from any place in the company,0:03:17to be able to say things like, oh, what's the name in a personnel record?0:03:26Or, what's the job description in a personnel record? And not have to worry about the fact that each division obviously is going to have completely separate0:03:36conventions for how you might implement these records. From this point you don't want to know about that. Well how could you possibly do that?0:03:48One way, of course, is to send down an edict from somewhere that everybody has to change their format to some fixed compatible thing.0:03:58That's what people often try, and of course it never works. Another thing that you might want to do is somehow arrange0:04:07it so you can have these vertical barriers. So that when you ask for the name of a personnel record, somehow, whatever format it happens to be, name will0:04:17figure out how to do the right thing. We want name to be, so-called, a generic operator.0:04:26Generic operator means what it sort of precisely does depends on the kind of data that it's looking at. More than that, you'd like to design the system so that the0:04:37next time a new division comes into the company they don't have to make any big changes in what they're already doing to link into this system, and the rest of the company0:04:50doesn't have to make any big changes to admit their stuff to the system. So that's the problem you should be thinking about. Like it's sort of just your work.0:05:00You want to be able to include new things by making minimal changes. OK, well that's the problem that we'll be talking about today.0:05:09And you should have this sort of distributed personnel record system in your mind. But actually the one I'll be talking about is a problem that's a little bit more self-contained than that.0:05:18that'll bring up the issues, I think, more clearly. That's the problem of doing a system that does arithmetic on complex numbers.0:05:27So let's take a look here. Just as a little review, there are things called complex numbers. Complex number you can think of as a point in0:05:36the plane, or z. And you can represent a point either by its real-part and0:05:46its imaginary-part. So if this is z and its real-part is this much, and its imaginary-part is that much, and you write z equals x plus iy.0:05:59Or another way to represent a complex number is by saying, what's the distance from the origin, and what's the angle?0:06:10So that represents a complex number as its radius times an angle.0:06:19This one's called-- the original one's called rectangular form, rectangular representation, real- and imaginary-part, or polar representation.0:06:28Magnitude and angle-- and if you know the real- and imaginary-part, you can figure out the magnitude and angle. If you know x and y, you can get r by this formula.0:06:37Square root of sum of the squares, and you can get the angle as an arctangent. Or conversely, if you knew r and A you could figure out x and y. x is r times the cosine of A, and y is r times the sine of0:06:49A. All right, so there's these two. They're complex numbers. You can think of them either in polar form or rectangular form. What we would like to do is make a system that does0:06:59arithmetic on complex numbers. In other words, what we'd like-- just like the rational number example-- is to have some operations plus c, which is going to take0:07:11two complex numbers and add them, subtract them, and multiply them, and divide them.0:07:20OK, well there's little bit of mathematics behind it. What are the actual formulas for manipulating such things?0:07:29And it's sort of not important where they come from, but just as an implementer let's see-- if you want to add two complex numbers it's pretty easy to0:07:40get its real-part and its imaginary-part. The real-part of the sum of two complex numbers, the real-part of the z1 plus z2 is the real-part of z1 plus the0:07:53real-part of z2. And the imaginary-part of z1 plus z2 is the imaginary part0:08:02of z1 plus the imaginary part of z2. So it's pretty easy to add complex numbers. You just add the corresponding parts and make a new complex0:08:12number with those parts. If you want to multiply them, it's kind of nice to do it in polar form. Because if you have two complex numbers, the magnitude0:08:21of their product is here, the product of the magnitudes. And the angle of the product is the sum of the angles.0:08:35So that's sort of mathematics that allows you to do arithmetic on complex numbers. Let's actually think about the implementation. Well we do it just like rational numbers.0:08:49We come down, we assume we have some constructors and selectors. What would we like? Well let's assume that we make a data object cloud, which is0:08:58a complex number that has some stuff in it, and that we can get out from a complex number the real-part, or the imaginary-part, or the magnitude, or the angle.0:09:12We want some ways of making complex numbers-- not only selectors, but constructors. So we'll assume we have a thing called make-rectangular. What make-rectangular is going to do is take a real-part and0:09:24an imaginary-part and construct a complex number with those parts. Similarly, we can have make-polar which will take a0:09:35magnitude and an angle, and construct a complex number which has that magnitude and angle.0:09:44So here's a system. We'll have two constructors and four selectors. And now, just like before, in terms of that abstract data0:09:55we'll go ahead and implement our complex number operations. And here you can see translated into Lisp code just the arithmetic formulas I put down before.0:10:08If I want to add two complex numbers I will make a complex number out of its real- and imaginary-parts. The real part of the complex number I'm going to make is0:10:19the sum of the real-parts. The imaginary part of the complex number I'm going to make is the sum of the imaginary-parts.0:10:30I put those together, make a complex number. That's how I implement complex number addition. Subtraction is essentially the same.0:10:39All I do is subtract the parts rather than add them. To multiply two complex numbers, I use the other formula.0:10:49I'll make a complex number out of a magnitude and angle. The magnitude is going to be the product of the magnitudes0:10:58of the two complex numbers I'm multiplying. And the angle is going to be the sum of the angles of the two complex numbers I'm multiplying.0:11:09So there's multiplication. And then division, division is almost the same. Here I divide the magnitudes and subtract the angles.0:11:28Now I've implemented the operations. And what do we do? We call on George. We've done the use, let's worry about the0:11:38representation. We'll call on George and say to George, go ahead and build us a complex number representation. Well that's fine.0:11:47George can say, we'll implement a complex number simply as a pair that has the real-part and the0:11:56imaginary-part. So if I want to make a complex number with a certain real-part and an imaginary-part, I'll just use cons to form a pair, and that will-- that's George's0:12:06representation of a complex number. So if I want to get out the real-part of something, I just extract the car, the first part. If I want to get the imaginary-part, I extract the0:12:16cdr. How do I deal with the magnitude and angle? Well if I want to extract the magnitude of one of these0:12:25things, I get the square root of the sum of the square of the car plus the square of the cdr. If I want to get the0:12:34angle, I compute the arctangent of the cdr in the car. This is a list procedure for computing arctangent.0:12:44And if somebody hands me a magnitude and an angle and says, make me a complex number, well I compute the real-part and the imaginary-part, or our cosine0:12:54of a and our sine of a, and stick them together into a pair. OK so we're done. In fact, what I just did, conceptually, is absolutely no0:13:07different from the rational number representation that we looked at last time. It's the same sort of idea. You implement the operators, you pick a representation.0:13:18Nothing different. Now let's worry about Martha. See, Martha has a different idea. She doesn't want to represent a complex number as a pair of0:13:29a real-part and an imaginary-part. What she would like to do is represent a complex number as a pair of a magnitude and an angle.0:13:39So if instead of calling up George we ask Martha to design our representation, we get something like this. We get make-polar. Sure, if I give you a magnitude and an angle we're0:13:50just going to form a pair that has magnitude and angle. If you want to extract the magnitude, that's easy. You just pull out the car or the pair.0:13:59If you want to extract the angle, sure, that's easy. You just pull out the cdr. If you want to look for real-parts and imaginary-parts, well then you have to do some work.0:14:08If you want the real-part, you have to get r cosine a. In other words, r, the car of the pair, times the cosine of0:14:19the cdr of the pair. So this is r times the cosine of a, and that's the real-part.0:14:28If you want to get the imaginary-part, it's r times the sine of a. And if I hand you a real-part and an imaginary-part and say,0:14:37make me a complex number with that real-part and imaginary-part, well I figure out what the magnitude and angle should be. The magnitude's the square root of the sum of the squares0:14:48and the angle's the arctangent. I put those together to make a pair. So there's Martha's idea. Well which is better?0:14:59Well if you're doing a lot of additions, probably George's is better, because you're doing a lot of real-parts and imaginary-parts. If mostly you're going to be doing multiplications and divisions, then maybe Martha's idea is better.0:15:11Or maybe, and this is the real point, you can't decide. Or maybe you just have to let them both hang around, for0:15:21personality reasons. Maybe you just really can't ever decide what you would like. And again, what we would really like is a system that0:15:31looks like this. That somehow there's George over here, who has built rectangular complex numbers.0:15:41And Martha, who has polar complex numbers. And somehow we have operations that can add, and subtract,0:15:54and multiply, and divide, and it shouldn't matter that there are two incompatible representations of complex numbers floating around this system.0:16:04In other words, not only like an abstraction barrier here that has things in it like a real-part, and an0:16:15imaginary-part, and magnitude, and angle. So not only is there an abstraction barrier that hides0:16:26the actual representation from us, but also there's some kind of vertical barrier here that allows both of these representations to exist without0:16:36interfering with each other. The idea is that the things in here-- real-part, imaginary-part, magnitude, and angle-- will be generic operators.0:16:47If you ask for the real-part, it will worry about what representation it's looking at. OK, well how can we do that?0:16:56There's actually a really obvious idea, if you're used to thinking about complex numbers. If you're used to thinking about compound data.0:17:06See, suppose you could just tell by looking at a complex number whether it was constructed by George or Martha.0:17:15In other words, so it's not that what's floating around here are ordinary, just complex numbers, right? They're fancy, designer complex numbers.0:17:24So you look at a complex numbers as it's not just a complex number, it's got a label on it that says, this one is by Martha. Or this is a complex number by George.0:17:34Right? They're signed. See, and then whenever we looked at a complex number we could just read the label, and then we'd know how you expect0:17:45to operate on that. In other words, what we want is not just ordinary data objects. We want to introduce the notion of what's called typed data.0:17:59Typed data means, again, there's some sort of cloud. And what it's got in it is an ordinary data object like0:18:08we've been thinking about. Pulled out the contents, sort of the actual data.0:18:19But also a thing called a type, but it's signed by either George or Martha. So we're going to go from regular data to type data.0:18:31How do we build that? Well that's easy. We know how to build clouds. We build them out of pairs. So here's a little representation that supports0:18:41typed data. There's a thing called take a type and attach it to a piece of contents, and we just use cons.0:18:51And if we have a piece of typed data, we can look at the type, which is the car. We can look at the contents, which is the cdr. Now along0:19:00with that, the way we use our type data will test, when we're given a piece of data, what type it is. So we have some type predicates with us.0:19:10For example, to see whether a complex number is one of George's, whether it's rectangular, we just check to see if the type of that is the symbol rectangular, right?0:19:23The symbol rectangular. And to check whether a complex number is one of Martha's, we check to see whether the type is the symbol polar.0:19:36So that's a way to test what kind of number we're looking at. Now let's think about how we can use that to build the system. So let's suppose that George and Martha were off working0:19:46separately, and each of them had designed their complex number representation packages. What do they have to do to become part of the system, to0:19:58exist compatibly? Well it's really pretty easy. Remember, George had this package. Here's George's original package, or half of it.0:20:08And underlined in red are the changes he has to make. So before, when George made a complex number out of an x and y, he just put them together to make a pair.0:20:20And the only difference is that now he signs them. He attaches the type, which is the symbol rectangular to that pair.0:20:30Everything else George does is the same, except that-- see, George and Martha both have procedures named real-part and imaginary-part. So to allow them both to exist in the same Lisp environment,0:20:44George had changed the names of his procedures. So we'll say, this is George's real-part procedure. It's the real-part rectangular procedure, the imaginary-part rectangular procedure.0:20:55And then here's the rest of George's package. He'd had magnitude and angle, just renames them magnitude rectangular and angle rectangular.0:21:05And Martha has to do basically the same thing. Martha previously, when she made a complex number out of a0:21:15magnitude and angle, she just cons them. Now she attaches the type polar, and she changes the0:21:25name so her real-part procedure won't conflict in name with George's. It's a real-part-polar, imaginary-part-polar,0:21:34magnitude polar, and angle polar.0:21:45Now we have the system. Right there's George and Martha. And now we've got to get some kind of manager to look at these types.0:21:55How are these things actually going to work now that George and Martha have supplied us with typed data? Well what we have are a bunch of generic selectors.0:22:05Generic selectors for complex numbers real-part, imaginary-part, magnitude, and angle.0:22:14Let's look at them more closely. What does a real-part do? If I ask for the real part of a complex number,0:22:24well I look at it. I look at its type. I say, is it rectangular? If so, I apply George's real part procedure to the contents0:22:36of that complex number. This is a number that has a type on it. I strip off the type using contents and0:22:46apply George's procedure. Or is this a polar complex number? If I want the real part, I apply Martha's real part0:22:56procedure to the contents of that number. So that's how real part works. And then similarly there's imaginary-part, which is almost the same.0:23:06It looks at the number and if it's rectangular, uses George's imaginary-part procedure. If it's polar, uses Martha's. And then there's a magnitude and an angle.0:23:19So there's a system. Has three parts. There's sort of George, and Martha, and the manager. And that's how you get generic operators implemented.0:23:28Let's look at just a simple example, just to pin it down. But exactly how this is going to work, suppose you're going0:23:40to be looking at the complex number who's real-part is one, and who's imaginary-part is two. So that would be one plus 2i.0:23:50What would happen is up here, up here above where the operations have to happen, that number would be represented as a pair of 1 and 2 together with typed data.0:24:10That would be the contents. And the whole data would be that thing with the symbol rectangular added onto that. And that's the way that complex number would exist in0:24:20the system. When you went to take the real-part, the manager would look at this and say, oh it's one of George's.0:24:30He'll strip off the type and hand down to George the pair 1, 2. And that's the kind of data that George developed his0:24:41system to use. So it gets stripped down. Later on, if you ask George to construct a complex number,0:24:51George would construct some complex number as a pair, and before he passes it back up through the manager would attach the type rectangular.0:25:03So you see what happens. There's no confusion in this system. It doesn't matter in the least that the pair 1, 2 means0:25:13something completely different in Martha's world. In Martha's world this pair means the complex number whose magnitude is 1 and whose angle is 2. And there's no confusion, because by the time any pair0:25:23like this gets handed back through the manager to the main system it's going to have the type polar attached. Whereas this one would have the type rectangular attached.0:25:36OK, let's take a break. [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:26:20We just looked at a strategy for implementing generic operators. That strategy has a name: it's called dispatch type.0:26:34And the idea is that you break your system into a bunch of pieces. There's George and Martha, who are making representations,0:26:43and then there's the manager. Looks at the types on the data and then dispatches them to the right person. Well what criticisms can we make of that as a system0:26:55organization? Well first of all there was this little, annoying problem that George and Martha had to change the names of their procedures.0:27:04George originally had a real-part procedure, and he had to go name it real-part rectangular so it wouldn't interfere with Martha's real-part procedure, which is now named real-part-polar, so it wouldn't interfere with the0:27:14manager's real-part procedure, who's now named real-part. That's kind of an annoying problem. But I'm not going to talk about that one now. We'll see later on when we think about the structure of0:27:24Lisp names and environments that there really are ways to package all those so-called name spaces separately so they don't interfere with each other. Not going to think about that problem now.0:27:35The problem that I actually want to focus on is what happens when you bring somebody new into the system.0:27:44What has to happen? Well George and Martha don't care. George is sitting there in his rectangular world, has his procedures and his types.0:27:54Martha sits in her polar world. She doesn't care. But let's look at the manager. What's the manager have to do?0:28:03The manager comes through and had these operations. There was a test for rectangular and a test for polar. If Harry comes in with some new kind of complex number,0:28:17and Harry has a new type, Harry type complex number, the manager has to go in and change all those procedures. So the inflexibility in the system, the place where work0:28:28has to happen to accommodate change, is in the manager. That's pretty annoying. It's even more annoying when you realize the manager's not0:28:40doing anything. The manager is just being a paper pusher. Let's look again at these programs. What are they doing?0:28:51What does real-part do? Real-part says, oh, is it the kind of complex number that George can handle? If so, send it off to George. Is it the kind of complex number that Martha can handle?0:29:01If so, send it off to Martha. So it's really annoying that the bottleneck in this system, the thing that's preventing flexibility and change, is0:29:13completely in the bureaucracy. It's not in anybody who's doing any of the work. Not an uncommon situation, unfortunately.0:29:23See, what's really going on-- abstractly in the system, there's a table. So what's really happening is somewhere there's a table.0:29:32There're types. There's polar and rectangular.0:29:41And Harry's may be over here. And there are operators. There's an operator like real-part.0:29:55Or imaginary-part. Or a magnitude and angle.0:30:05And sitting in this table are the right procedures.0:30:19So sitting here for the type polar and real-part is Martha's procedure real-part-polar.0:30:30And over here in the table is George's procedure real-part-rectangular. And over here would be, say, Martha's procedure0:30:40magnitude-polar, and George's procedure magnitude-rectangular, right, and so on.0:30:49The rest of this table's filled in. And that's really what's going on. So in some sense, all the manager is doing is acting as0:31:03this table. Well how do we fix our system?0:31:12How do you fix bureaucracies a lot of the time? What you do is you get rid of the manager. We just take the manager and replace him by a computer. We're going to automate him out of existence.0:31:23Namely, instead of having the manager who basically consults this table, we'll have our system use the table directly. What do I mean by that?0:31:32Let's assume, again using data abstraction, that we have some kind of data structure that's a table. And we have ways of sticking things in and ways of getting0:31:43things out. And to be explicit, let me assume that there's an operation called "put." And put is going to take, in this0:31:52case two things I'll call "keys." Key1 and key2. And a value.0:32:06And that stores the value in the table under key1 and key2. And then we'll assume there's a thing called "get," such0:32:15that if later on I say, get me what's in the table stored under key1 and key2, it'll retrieve whatever value was0:32:25stored there. And let's not worry about how tables are implemented. That's yet another data abstraction, George's problem. And maybe we'll see later--0:32:34talk about how you might actually build tables in Lisp. Well given this organization, what did George and Martha0:32:44have to do? Well when they build their system, they each have the responsibility to set up their appropriate column in the table.0:32:55So what George does, for example, when he defines his procedures, all he has to do is go off and put into the0:33:04table under the type-rectangular. And the name of the operation is real-part, his procedure0:33:14real-part-rectangular. So notice what's going into this table. The two keys here are symbols, rectangular and real-part. That's the quote.0:33:24And what's going into the table is the actual procedure that he wrote, real-part rectangular. And then puts an imaginary part into the table, filed0:33:35under the keys rectangular- and imaginary-part, and magnitude under the keys rectangular magnitude, angle0:33:44under rectangular-angle. So that's what George has to do to be part of this system.0:33:54Martha similarly sets up the column and the table under polar. Polar and real-part. Is the procedure real-part-polar?0:34:04And imaginary-part, and magnitude, and angle. So this is what Martha has to do to be part of the system. Everyone who makes a representation has the0:34:13responsibility for setting up a column in the table. And what does Harry do when Harry comes in with his brilliant idea for implementing complex numbers? Well he makes whatever procedure he wants and builds0:34:25a new column in this table. OK, well what happened to the manager? The manager has been automated out of existence and is0:34:34replaced by a procedure called operate. And this is the key procedure in the whole system. Let's say define operate.0:34:51Operate is going to take an operation that you want to do, the name of an operation, and an object that you would like0:35:01to apply that operation to. So for example, the real-part of some particular complex number, what does it do? Well the first thing it does, it looks in the table.0:35:12Goes into the table and tries to find a procedure that's stored in the table.0:35:23So it gets from the table, using as keys the type of the object and the operator, but looks on the table and sees0:35:40what's stored under the type of the object and the operator, sees if anything's stored. Let's assume that get is implemented. So if nothing is stored there, it'll return the empty list.0:35:52So it says, if there's actually something stored there, if the procedure here is not no, then it'll take the0:36:04procedure that it found in the table and apply it to the contents of the object.0:36:18And otherwise if there was nothing stored there, it'll-- well we can decide. In this case let's have it put out an error message saying, undefined operator.0:36:28No operator for this type. Or some appropriate error message.0:36:39OK? And that replaces the manager. How do we really use it? Well what we say is we'll go off and define our generic0:36:48selectors using operate. We'll say that the real-part of an object is found by0:36:57operating on the object with the name of the operation being real-part.0:37:08And then similarly, imaginary-part is operate using the name imaginary-part and magnitude and angle. That's our implementation.0:37:17That plus the tape plus the operate procedure. And the table effectively replaces what the manager used to do. Let's just go through that slowly to show you0:37:27what's going on. Suppose I have one of Martha's complex numbers. It's got magnitude 1 and angle 2.0:37:39And it's one of Martha's. So it's labeled here, polar. Let's call that z.0:37:48Suppose that's z. And suppose with this implementation someone comes up and asks for the real-part of z.0:38:04Well real-part now is defined in terms of operate. So that's equivalent to saying operate with the name of the0:38:18operator being real-part, the symbol real-part on z.0:38:27And now operate comes. It's going to look in the table, and it's going to try and find something stored under--0:38:38the operation is going to apply by looking in the table under the type of the object. And the type of z is polar.0:38:48So it's going to look and say, can I get using polar? And the operation name, which was real-part.0:39:05It's going to look in there and apply that to the contents of z.0:39:14And that? If everything was set up correctly, this thing is the procedure that Martha put there. This is real-part-polar.0:39:30And this is z without its type. The thing that Martha originally designed those procedures to work on, which is 1, 2.0:39:43And so operate sort of does uniformly what the manager used to do sort of all over the system. It finds the right thing, looks in the table, strips off0:39:52the type, and passes it down into the person who handles it. This is another, and, you can see, more flexible for most0:40:04purposes, way of implementing generic operators. And it's called data-directed programming.0:40:20And the idea of that is in some sense the data objects themselves, those little complex numbers that are floating around the system, are carrying with them the0:40:30information about how you should operate on them. Let's break for questions.0:40:41Yes. AUDIENCE: What do you have stored in that data object? You have the data itself, you have its type, and you have the operations for that type? Or where are the operations that you found?0:40:53PROFESSOR: OK, let me-- yeah, that's a good question. Because it raises other possibilities of how you might do it. And of course there are a lot of possibilities.0:41:04In this particular implementation, what's sitting in this data object, for example, is the data itself-- which in this case is a pair of 1 and 2--0:41:14and also a symbol. This is the symbol, the word P-O-L-A-R, and that's what's sitting in this data object.0:41:24Where are the operations themselves? The operations are sitting in the table. So in this table, the rows and columns of the table are0:41:35labeled by symbols. So when I store something in this table, the key might be the symbol polar and the symbol magnitude.0:41:48And I think by writing it this way I've been very confusing. Because what's really sitting here isn't-- when I wrote magnitude polar, what I mean is the procedure0:41:58magnitude polar. And probably what I really should have written-- except it's too small for me to write in this little space-- is something like lambda of z, the thing that0:42:11Martha wrote to implement. And then you can see from that, there's another way that I alluded to of solving this name conflict problem, which0:42:20is that George and Martha never have to name their procedures at all. They can just stick the anonymous things generated by lambda directly into the table. There's also another thing that your question raises, is0:42:32the possibility that maybe what I would like somehow is to store in this data object not the symbol P-O-L-A-R but maybe actually all the operations themselves.0:42:43And that's another way to organize the system, called message passing. So there are a lot of ways you can do it.0:42:54AUDIENCE: Therefore if Martha and George had used the same procedure names, it would be OK because it wouldn't look [UNINTELLIGIBLE]. PROFESSOR: That's right.0:43:03That's right. See, they wouldn't even have to name their procedures at all. What George could have written instead of saying put in the0:43:12table under rectangular- and real-part, the procedure real-part rectangular, George could have written put under rectangular real-part, lambda of z, such and such,0:43:23and such and such. And the system would work completely the same. AUDIENCE: My question is, Martha could have put key1 key2 real-part, and George could have put key1 key20:43:37real-part, and as long as they defined them differently they wouldn't have had any conflicts, right? PROFESSOR: Yes, that would all be OK except for the fact that if you imagine George and Martha typing at the same0:43:47console with the same meanings for all their names, and it would get confused by real-part, but there are ways to arrange that, too. And in principle you're absolutely right. If their names didn't conflict--0:43:56it's the objects that go in the table, not the names.0:44:08OK, let's take a break. [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:45:12All right, well we just looked at data-directed programming as a way of implementing a system that does arithmetic on0:45:21complex numbers. So I had these operations in it called plus C and minus C,0:45:32and multiply, and divide, and maybe some others. And that sat on top of-- and this is the key point-- sat on0:45:46top of two different representations. A rectangular package here, and a polar package.0:45:58And maybe some more. And we saw that the whole idea is that maybe some more are now very easy to add. But that doesn't really show the power of this methodology.0:46:08Shows you what's going on. The power of the methodology only becomes apparent when you start embedding this in some more complex system.0:46:17What I'm going to do now is embed this in some more complex system. Let's assume that what we really have is a general kind of arithmetic system. So called generic arithmetic system.0:46:27And at the top level here, somebody can say add two things, or subtract two things, or multiply two0:46:38things, or divide two things. And underneath that there's an abstraction barrier.0:46:47And underneath this barrier, is, say, a complex arithmetic package. And you can say, add two complex numbers. Or you might also have-- remember we did a rational0:46:57number package-- you might have that sitting there. And there might be a rational thing. And the rational number package, well, has the things0:47:07we implemented. Plus rat, and times rat, and so on. Or you might have ordinary Lisp numbers.0:47:17You might say add three and four. So we might have ordinary numbers, in which case we have0:47:29the Lisp supplied plus, and minus, and times, and slash. OK, so we might imagine this complex number system sitting0:47:39in a more complicated generic operator structure at the next level up. Well how can we make that?0:47:49We already have the idea, we're just going to do it again. We've implemented a rational number package. Let's look at how it has to be changed.0:48:01In fact, at this level it doesn't have to be changed at all. This is exactly the code that we wrote last time. To add two rational numbers, remember0:48:10there was this formula. You make a rational number whose numerator-- the numerator of the first times the denominator of the second, plus the denominator of the first times the0:48:20numerator of the second. And who's denominator is the product of the denominators. And minus rat, and star rat, and slash rat.0:48:30And this is exactly the rational number package that we made before. We're ignoring the GCD problem, but let's not worry about that.0:48:40As implementers of this rational number package, how do we install it in the generic arithmetic system? Well that's easy. There's only one thing we have to do differently.0:48:51Whereas previously we said that to make a rational number you built a pair of the numerator and denominator,0:49:00here we'll not only build the pair, but we'll sign it. We'll attach the type rational. That's the only thing we have to do different, make it a typed data object.0:49:12And now we'll stick our operations in the table. We'll put under the symbol rational and the operation add our procedure, plus rat.0:49:21And, again, note this is a symbol. Right? Quote, unquote, but the actual thing we're putting in the table is the procedure.0:49:30And for how to subtract, well you subtract rationals with minus rat. And multiply, and divide.0:49:41And that is exactly and precisely what we have to do to fit inside this generic arithmetic system. Well how does the whole thing work?0:49:51See, what we want to do is have some generic operators.0:50:00Have add and sub and [UNINTELLIGIBLE] be generic operators. So we're going to define add and say, to add x and y, that0:50:18will be operate-- we were going to call it operate-2.0:50:27This is our operator procedure, but set up for two arguments using add on x and y.0:50:37And so this is the analog to operate. Let's look at the code for second. It's almost like operate.0:50:46To operate with some operator on an argument 1 and an argument 2, well the first thing we're going to do is0:50:56check and see if the two arguments have the same type. So we'll say, is the type of the first argument the same as0:51:06the type of the second argument? And if they're not, we'll go off and complain, and say,0:51:15that's an error. We don't know how to do that. If they do have the same type, we'll do exactly what we did before. We'll go look and filed under the type of the argument--0:51:26arg 1 and arg 2 have the same type, so it doesn't matter. So we'll look in the table, find the procedure. If there is a procedure there, then we'll apply it to the0:51:38contents of the argument 1 and the contents of arg 2. And otherwise we'll say, error. Undefined operator. And so there's operate-2.0:51:51And that's all we have to do. We just built the complex number package before. How do we embed that complex number package in0:52:00this generic system? Almost the same. We make a procedure called make-complex that takes0:52:11whatever George and Martha hand to us and add the type-complex. And then we say, to add complex numbers, plus complex,0:52:25we use our internal procedure, plus c, and attach a type, make that a complex number.0:52:37So our original package had names plus c and minus c that we're using to communicate with George and Martha. And then to communicate with the outside world, we have a0:52:47thing called plus-complex and minus-complex. And so on.0:52:56And the only difference is that these return values that are tight. So they can be looked at up here. And these are internal operations.0:53:09Let's go look at that slide again. There's one more thing we do. After defining plus-complex, we put under the type complex0:53:19and the symbol add, that procedure plus complex. And then similarly for subtracting complex numbers, and multiplying them, and dividing them.0:53:31OK, how do we install ordinary numbers? Exactly the same way. Come off and say, well we'll make a thing called0:53:40make-number. Make-number takes a number and attaches a type, which is the symbol number.0:53:50We build a procedure called plus-number, which is simply, add the two things using the ordinary addition, because in0:53:59this case we're talking about ordinary numbers, and attach a type to it and make that a number. And then we put into the table under the symbol number and0:54:08the operation add, this procedure plus-number, and then the same thing for subtracting, and multiplying, and dividing.0:54:22Let's look at an example, just to make it clear. Suppose, for instance, I'm going0:54:32to perform the operation. So I sit up here and I'm going to perform the operation, which looks like multiplying two complex numbers. So I would multiply, say, 3 plus 4i and 2 plus 6i.0:54:49And that's something that I might want to take hand that to mul. I'll write mul as my generic operator here. How's that going to work?0:54:58Well 3 plus 4i, say, sits in the system at this level as something that looks like this. Let's say it was one of George's.0:55:08So it would have a 3 and a 4.0:55:18And attached to that would be George's type, which would say rectangular, it came from George.0:55:29And attached to that-- and this itself would be the data view from the next level up, which it is-- so that itself would be a type-data object which would0:55:41say complex. So that's what this object would look like up here at the very highest level, where the really super-generic0:55:52operations are looking at it. Now what happens, mul eventually's going to come along and say, oh, what's it's type? It's type is complex.0:56:04Go through to operate-2 and say, oh, what I want to do is apply what's in the table, which is going to be the procedure star complex, on this thing with the type0:56:17stripped off. So it's going to strip off the type, take that much, and send that down into the complex world.0:56:26The complex world looks at its operations and says, oh, I have to apply star c. Star c might say, oh, at some point I want to look at the magnitude of this object that it's in, that it's got.0:56:39And they'll say, oh, it's rectangular, it's one of George's. So it'll then strip off the next version of type, and hand that down to George to take the magnitude of.0:56:52So you see what's going on is that there are these chains of types. And the length of the chain is sort of the number of levels0:57:01that you're going to be going up in this table. And what a type tells you, every time you have a vertical barrier in this table, where there's some ambiguity about0:57:12where you should go down to the next level, the type is telling you where to go. And then everybody at the bottom, as they construct data and filter it up, they stick their type back on.0:57:25So that's the general structure of the system. OK.0:57:34Now that we've got this, let's go and make this thing even more complex. Let's talk about adding to the system not only these kinds of0:57:46numbers, but it's also meaningful to start talking about adding polynomials. Might do arithmetic on polynomials. Like we could have x to the fifteenth plus 2x to the0:57:57seventh plus 5. That might be some polynomial.0:58:06And if we have two such gadgets we can add them or multiply them. Let's not worry about dividing them. Just add them, multiply them, then we'll subtract them.0:58:15What do we have to do? Well let's think about how we might represent a polynomial. It's going to be some typed data object.0:58:24So let's say a polynomial to this system might look like a thing that starts with the type polynomial. And then maybe it says the next thing is what0:58:33variable its in. So I might say I'm a polynomial in the variable x. And then it'll have some information about what the terms are.0:58:42And there're just tons of ways to do this, but one way is to say we're going to have a thing called a term-list. And0:58:51a term-list-- well, in our case we'll use something that looks like this. We'll make it a bunch of pairs which have an order in a coefficient. So this polynomial would be represented by this term-list.0:59:09And what that means is that this polynomial starts off with a term of order 15 and coefficient 1.0:59:23And the next thing in it is a term of order 7 and coefficient 2, a term of order 0, which is constant in coefficient 5. And there are lots and lots of ways, and lots and lots of0:59:35trade-offs when you really think about making algebraic manipulation packages about exactly how you should represent these things. But this is a fairly standard one.0:59:44It's useful in a lot of contexts. OK, well how do we implement our polynomial arithmetic?0:59:54Let's start out. What we'll do to make a polynomial-- we'll first have a way to make polynomials.1:00:05We're going to make a polynomial out of variable like x and term-list. And all that does is we'll package them together someway.1:00:14We'll put the variable together with the term list using cons, and then attached to that the type polynomial.1:00:26OK, how do we add two polynomials? To add a polynomial, p1 and p2, and then just for simplicity let's say we will only add1:00:36things in the same variable. So if they have the same variable, and same variable here is going to be some selector we write, whose details we don't care about.1:00:45If the two polynomials have the same variable, then we'll do something. If they don't have the same variable, we'll give an error, polynomials not in the same variable.1:00:55And if they do have the same variable, what we'll do is we'll make a polynomial whose variable is whatever that variable is, and whose term-list is something we'll1:01:05call sum-terms. Plus terms will add the two term lists. So we'll add the two term lists to the polynomial. That'll give us a term-list. We'll add on, we'll say it's a1:01:16polynomial in the variable with that term-list. That's plus poly. And then we're going to put in our table under the type1:01:26polynomial, add them using plus poly. And of course we really haven't done much. What we've really done is pushed all the work onto this thing, plus-terms, which is supposed to add term-lists.1:01:38Let's look at that. Here's an overview of how we might add two term-lists.1:01:48So L1 and L2 were going to be two term-lists. And a term-list is a bunch of pairs, coefficient in order. And it's a big case analysis.1:01:59And the first thing we'll check for and see if there are any terms. We're going to recursively work down these term-lists, so eventually we'll get to a place where1:02:09either L1 or L2 might be empty. And if either one is empty, our answer will be the other one. So if L1 is empty we'll return L2, and if L2 is empty1:02:20we'll return L1. Otherwise there are sort of three interesting cases. What we're going to do is grab the first term in each of1:02:30those lists, called t1 and t2. And we're going to look at three cases, depending on1:02:43whether the order of t1 is greater than the order of t2, or less than t2, or the same.1:02:53Those are the three cases we're going to look at. Let's look at this case. If the order of t1 is greater than the order of t2, then1:03:03what that means is that our answer is going to start with this term of the order of t1. Because it won't combine with any lower order terms. So what1:03:14we do is add the lower order terms. We recursively add together all the terms in the rest of the term-list in L1 and L2.1:03:26That's going to be the lower order terms of the answer. And then we're going to adjoin to that the highest order term. And I'm using here a whole bunch of procedures I haven't1:03:35defined, like a adjoin-term, and rest-terms, and selectors that get order. But you can imagine what those are.1:03:44So if the first term-list has a higher order than the second, we recursively add all the lower terms and then stick on that last term.1:03:55The other case, the same way. If the first term has a smaller order, well then we1:04:05add the first term-list and the rest of the terms in the second one, and adjoin on this highest order term.1:04:14So so far nothing's much happened, we've just sort of pushed this thing off into adding lower order terms. The last case where you actually get to a coefficients that you have to add, this will be the case where1:04:24the orders are equal. What we do is, well again recursively add the lower order terms. But now we have to really combine something.1:04:33What we do is we make a term whose order is the order of the term we're looking at. By now t1 and t2 have the same order.1:04:44That's its order. And its coefficient is gotten by adding the coefficient of t1 and the coefficient of t2.1:04:56This is a big recursive working down of terms, but really there's only one interesting symbol in this procedure, only one interesting idea.1:05:05The interesting idea is this add. And the reason that's interesting is because1:05:15something completely wonderful just happened. We reduced adding polynomials, not to sort of plus, but to1:05:25the generic add. In other words, by implementing it that way, not only do we have our system where we can have rational1:05:37numbers, or complex numbers, or ordinary numbers, we've just added on polynomials.1:05:48But the coefficients of the polynomials can be anything that the system can add. So these could be polynomials whose coefficients are1:05:57rational numbers or complex numbers, which in turn could be either rectangular, or polar, or ordinary numbers.1:06:19So what I mean precisely is our system right now automatically can handle things like adding together1:06:30polynomials that have this one: 2/3 of x squared plus 5/17 x plus 11/4.1:06:40Or automatically handle polynomials that look like 3 plus 2i times x to the fifth plus 4 plus 7i, or something.1:06:54You can automatically handle those things. Why is that? That's merely because, or profoundly because we reduced1:07:03adding polynomials to adding their coefficients. And adding coefficients was done by the generic add operator, which said, I don't care what your types are as1:07:12long as I know how to add you. So automatically for free we get the ability to handle that. What's even better than that, because remember one of the1:07:24things we did is we put into the table that the way you add polynomials is using plus poly.1:07:34That means that polynomials themselves are things that can be added. So for instance let me write one here.1:07:45Here's a polynomial. So this gadget here I'm writing up, this is a1:07:55polynomial in y whose coefficients are polynomials in x.1:08:08So you see, simply by saying, polynomials are themselves things that can be added, we can go off and say, well not only can we deal with rationals, or complex, or1:08:19ordinary numbers, but we can deal with polynomials whose coefficients are rationals, or complex, or ordinary numbers, or polynomials whose coefficients are rationals, or1:08:31complex, rectangular, polar, or ordinary numbers, or polynomials whose coefficients are rationals, complex, or1:08:42ordinary numbers. And so on, and so on, and so on. So this is sort of an infinite or maybe a recursive tower of types that we've built up.1:08:53And it's all exactly from that one little symbol. A-D-D. Writing "add" instead of "plus" in the polynomial thing.1:09:02Slightly different way to think about it is that polynomials are a constructor for types. Namely you give it a type, like integer, and it returns1:09:12for you polynomials in x whose coefficients are integers. And the important thing about that is that the operations on polynomials reduce to the operations on the1:09:22coefficients. And there are a lot of things like that. So for example, let's go back and rational numbers. We thought about rational numbers as an integer over an1:09:32integer, but there's the general notion of a rational object. Like we might think about 3x plus 7 over x squared plus 1.1:09:43That's general rational object whose numerator and denominator are polynomials. And to add two of them we use the same formula, numerator1:09:52times denominator plus denominator times numerator over product of denominators. How could we install that in our system? Well here's our original rational1:10:01number arithmetic package. And all we have to do in order to make the entire system continue working with general rational objects, is replace1:10:12these particular pluses and stars by the generic operator. So if we simply change that procedure to this one, here we've changed plus and star to add a mul, those are1:10:23absolutely the only change, then suddenly our entire system can start talking about objects that look like this.1:10:34So for example, here is a rational object whose numerator is a polynomial in x whose coefficients are1:10:44rational numbers. Or here is a rational object whose numerator is polynomials1:10:53in x whose coefficients are rational objects constructed out of complex numbers.1:11:03And then there are a lot of other things like that. See, whenever you have a thing where the operations reduce to operations on the pieces, another example would be two by two matrices.1:11:12I have the idea, there might be a matrix here of general things that I don't care about. But if I add two of them, the answer over here is gotten by1:11:25adding this one and that one, however they like to add. So I can implement that the same way. And if I do that, then again suddenly my system can start handling things like this.1:11:35So here's a matrix whose elements happen to be-- we'll say this element here is a rational object whose numerator and denominators are polynomials.1:11:47And all that comes for free. What's really going on here? What's really going on is getting rid of this manager1:11:58who's sitting there poking his nose into who everybody's business is. We built a system that has decentralized control.1:12:14So when you come into and no one's poking around saying, gee, are you in the official list of people who can be added? Rather you say, well go off and add yourself how your1:12:24parts like to be added. And the result of that is you can get this very, very, very complex hierarchy where a lot of things just get done and1:12:33rooted to the right place automatically. Let's stop for questions. AUDIENCE: You say you get this for free.1:12:43One thing that strikes me is that now you've lost kind of the cleanness of the break between what's on top and what's underneath. In other words, now you're defining some of the1:12:52lower-level procedures in terms of things above their own line. Isn't that dangerous? Or, if nothing more, a little less structured?1:13:05PROFESSOR: No, I-- the question is whether that's less structured. Depends on what you mean by structure. All this is doing is recursion. See, it's saying that the way you add these1:13:15guys is to use that. And that's not less structured, it's just a recursive structure. So I don't think it's particularly any less clean.1:13:24AUDIENCE: Now when you want to change the multiplier or the add operator, suddenly you've got tremendous consequences underneath that you're not even sure the extent of.1:13:34PROFESSOR: That's right, but it depends what you mean. See, this goes both ways. What would be a good example?1:13:44I ignored greatest common divisor, for instance. I ignored that problem just to keep the example simple. But if I suddenly decided that plus rat here should do a GCD1:13:59computation and install that, then that immediately becomes available to all of these, to that guy, and that guy, and1:14:08that guy, and all the way down. So it depends what you mean by the coherence of your system. It's certainly true that you might want to have a special1:14:17different one that didn't filter down through the coefficients, but the nice thing about this particular example is that mostly you do. AUDIENCE: Isn't that the problem, I think, that you're1:14:27getting to tied in with the fact that the structuring, the recursiveness of that structuring there is actually1:14:36in execution as opposed to just definition of the actual types themselves? PROFESSOR: I think I understand the question.1:14:46The point is that these types evolve and get more and more complex as the thing's actually running. Is that what-- AUDIENCE: Yes. As it's running. PROFESSOR: --what you're saying? Yes, the point is-- AUDIENCE: As opposed to the basic definitions. PROFESSOR: Right. The type structure is sort of recursive.1:14:57It's not that you can make this finite list of the actual things they might look like before the system runs. It's something that evolves.1:15:06So if you want to specify that system, you have to do in some other way than by this finite list. You have to do it by a recursive structure. AUDIENCE: Because the basic structure of the types is1:15:16pretty clean and simple. PROFESSOR: Right. Yes? AUDIENCE: I have a question. I understand once you have your data structure set up,1:15:25how it pulls off complex and passes that down, and then pulls off rect, passes that down. But if you're just a user and you don't know anything about rect or polar or whatever, how do you initially set up that1:15:35data structure so that everything goes to the right spot? If I just have the equation over there on the left and I just want to add, multiply complex numbers-- PROFESSOR: Well that's the wonderful thing. If you're just a user you say "mul."1:15:47AUDIENCE: And it figures out that I mean complex numbers? Or how do I tell it that I want-- PROFESSOR: Well you're going to have in your hands complex numbers. See what you would have at some level, as a real user, is1:15:56a constructor for complex numbers. AUDIENCE: So then I have to make complex numbers? PROFESSOR: So you have to make them. What you would probably have as a user is some little thing in the reader loop, which would give you some plausible1:16:07way to type in a complex number, in whatever format you like. Or it might be that you're never typing them in. Someone's just handing you a complex number.1:16:16AUDIENCE: OK, so if I had a complex number that had a polynomial in it, I'd have to make my polynomial and then make my complex number. PROFESSOR: Right if you wanted it constructed from scratch. At some point you construct them from scratch.1:16:25But what you don't have to know of that is when you have the object you can just say "mul." And it'll multiply. Yeah? AUDIENCE: I think the question that was being posed here is,1:16:36say if I want to change my presentation of complexes, or some operation of complex, how much real code I will have to1:16:46gets around with, or change to change it in one specific operation? PROFESSOR: [UNINTELLIGIBLE] what you have to change. And the point is that you only have to change what you're changing.1:16:56See if Martha decides that she would rather-- let's see something silly-- like change the order in the pair. Like angle and magnitude in the other order, she just1:17:09makes that change locally. And the whole thing will propagate through the system in the right way. Or if suddenly you said, gee, I have another representation1:17:18for rationals. And I'm going to stick it here, by filing those operations in the table. Then suddenly all of these polynomials whose coefficients1:17:27are coefficients of coefficients, or whatever, also can automatically have available that representation. That's the power of this particular one. AUDIENCE: I'm not sure if I can even pose an intelligent1:17:37sounding question. But somehow this whole thing went really nicely to this beautiful finish where all the things seemed to fall into place.1:17:47Sort of seemed a little contrived. That's all for the sake, I'm sure, of teaching. I doubt that the guys who first did this-- and I could be wrong--1:17:56figured it all out so that when they just all put it all together, you could all of the sudden, blam, do any kind of arithmetic on any kind of object. It seems like maybe they had to play with it for a while1:18:07and had to bash it and rework it. And it seems like that's the kind of problem we're really faced with we start trying to design a really complex1:18:16system, is having lots of different kinds of parts and not even knowing what kinds of operations we're going to want to do on those parts. How to organize the operations in this nice way so that no1:18:27matter what you do, when you start putting them together everything starts falling out for free. PROFESSOR: OK, well that's certainly a very intelligent question.1:18:37One part is this is a very good methodology that people have discovered a lot coming from symbolic algebra. Because there are a lot of complications.1:18:47To allow you to implement these things before you decide what you want all the operations to be, and all of that. So in some sense it's an answer that people have discovered by wading through this stuff.1:18:58In another sense, it is a very contrived example. AUDIENCE: It seems like to be able to do this you do have to wade through it for a certain amount of time before you can1:19:08become good at it. PROFESSOR: Let me show you how terribly contrived this is. So you can write all these wonderful things. But the system that I wrote here, and if we had another1:19:17half an hour to give this lecture I would have given this part of it, which says, notice that it breaks down if I tell it to do something as foolish as add 3 plus 7/2.1:19:30Because what will happen is you'll get to operate-2, and operate-2 will say, oh this is type number, and that's type rational. I don't know how to add them.1:19:41So you'd like the system at least to be able to say something like, gee, before you do that change that to 3/1.1:19:50Turn it into a rational number, hand that to the rational package. That's the thing I didn't talk about in this lecture. It's a little bit in the book, which talks about the problem1:20:00of what's called coercion. Where you wanted-- see, having so carefully set up all of these types as distinct objects, a lot of times you want to also put in1:20:11knowledge about how to view an ordinary number as a kind of rational. Or view an ordinary number as a kind of complex.1:20:21That's where the complexity in the system really starts happening, where you talk about, see where do I put that knowledge? Is it rational to know that ordinary numbers might be1:20:30pieces of [UNINTELLIGIBLE] of them? Or they're terrible, terrible examples, like if I might want to add a complex number to a rational number.1:20:50Bad example. 5/7. Then somebody's got to know that I have to convert these to another type, which is complex numbers whose parts1:20:59might be rationals. And who worries about that? Does complex worry about that? Does rational worry about that? Does plus worry about that? That's where the real complexity comes in.1:21:08And that's where it's pretty well sorted out. And a lot of, in fact, all of this message passing stuff was motivated by problems like this.1:21:18And when you really push it, people are-- somehow the algebraic manipulation problem seems to be so complex that the people who are always at the edge of it are exactly in1:21:27the state you said. They're wading through this thing, mucking around, seeing what they use, trying to distill stuff. AUDIENCE: I just want to come back to this issue of1:21:36complexity once more. It certainly seems to be true that you have a great deal of flexibility in altering the lower level kinds of things.1:21:49But it is true that you are, in a sense, freezing higher level operations. Or at least if you change them you don't know where all of1:21:58the changes are going to show up, or how they are. PROFESSOR: OK, that's an extremely good question. What I have to do is, if I decide there's a new general1:22:10operation called equality test, then all of these people have to decide whether or not they would like to have an1:22:19equality test by looking in the table. There're ways to decentralize it even more. That's what I sort of hinted at last time, where I said you1:22:31could not only have this type as a symbol, but you actually might store in each object the operations that it knows of that.1:22:40So you might have things like greatest common divisor, which is a thing here which is defined only for integers, and not in general for rational numbers.1:22:51So it might be a very, very fragmented system. And then depending on where you want your flexibility, there's a whole spectrum of places that you can build that in. But you're pointing at the place where this starts being1:23:02weak, that there has to be some agreement on top here about these general operations. Or at least people have to think about them. Or you might decide, you might have a table that's very sparse, that only has a few things in it.1:23:14But there are lot of ways to play that game. OK, thank you.1:23:23[MUSIC: "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]

`0:00:00`Lecture 5A | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING]0:00:16PROFESSOR: Well, so far we've invented enough programming to do some very complicated things. And you surely learned a lot about0:00:28programming at this point. You've learned almost all the most important tricks that usually don't get taught to people until they have had a lot of experience. For example, data directed programming is a major trick,0:00:40and yesterday you also saw an interpreted language. We did this all in a computer language, at this point, where0:00:50there was no assignment statement. And presumably, for those of you who've seen your Basic or Pascal or whatever, that's usually considered the most0:01:00important thing. Well today, we're going to do some thing horrible. We're going to add an assignment statement. And since we can do all these wonderful things without it,0:01:09why should we add it? An important thing to understand is that today we're going to, first of all, have a rule, which is going to always be obeyed, which is the only reason we ever add a feature0:01:19to our language is because there is a good reason. And the good reason is going to boil down to the ability, you now get an ability to break a problem into pieces0:01:30that are different sets of pieces then you could have broken it down without that, give you another means of decomposition. However, let's just start.0:01:39Let me quick begin by reviewing the kind of language that we have now.0:01:48We've been writing what's called functional programs. And functional programs are a kind of encoding of mathematical truths.0:01:58For example, when we look at the factorial procedure that you see on the slide here, it's basically two clauses.0:02:07If n is one, the result is one, otherwise n times factorial n minus one. That's factorial of n. Well, that is factorial of n. And written down in some other obscure notation that you0:02:17might have learned in calculus classes, mathematical logic, what you see there is if n equals one, for the result of0:02:28n factorial is one, otherwise, greater than one, n factorial is n times n minus one factorial. True statements, that's the kind of language we've been using.0:02:37And whenever we have true statements of that sort, there is a kind of, a way of understanding how they work0:02:47which is that such processes can be involved by substitution. And so we see on the second slide here, that the way we0:02:56understand the execution implied by those statements in arranged in that order, is that you do successive0:03:05substitutions of arguments for formal parameters in the body of a procedure. This is basically a sequence of equalities.0:03:14Factorial four is four times factorial three. That is four times three times factorial of two and so on. We're always preserving truth.0:03:26Even though we're talking about true statements, there might be more than one organization of these true statements to describe the computation of a particular function, the computation of the value of0:03:37a particular function. So, for example, looking at the next one here. Here is a way of looking at the sum of n and m.0:03:49And we did this one by a recursive process. It's the increment of the sum of the decrement of n and m.0:04:00And, of course, there is some piece of mathematical logic here that describes that. It's the increment of the sum of the decrement of n and m,0:04:11just like that. So there's nothing particularly magic about that. And, of course, if we can also look at an iterative process for the same, a program that evolves an iterative process,0:04:22for the same function. These are two things that compute the same answer. And we have equivalent mathematical truths that are0:04:34arranged there. And just the way you arrange those truths determine the particular process. In the way choose and arrange them determines the process that's evolved.0:04:44So we have the flexibility of talking about both the function to be computed, and the method by which it's computed. So it's not clear we need more.0:04:53However, today I'm going to this awful thing. I'm going to introduce this assignment operation. Now, what is this?0:05:02Well, first of all, there is going to be another kind of kind of statement, if you will, in a programming language called Set!0:05:13Things that do things like assignment, I'm going to put exclamation points after. We'll talk about what that means in a second. The exclamation point, again like question mark, is an0:05:23arbitrary thing we attach to the symbol which is the name, has no significance to the system. The only significance is to me and you to alert you that this is an assignment of some sort.0:05:35But we're going to set a variable to a value. And what that's going to mean is that there is a time at0:05:47which something happens. Here's a time. If I have time going this way, it's a time access. Time progresses by walking down the page.0:05:58Then an assignment is the first thing we have that produces the difference between a before and an after. All the other programs that we've written, that have no0:06:09assignments in them, the order in which they were evaluated didn't matter. But assignment is special, it produces a moment in time. So there is a moment before the set occurs and after, such0:06:27that after this moment in time, the variable has the0:06:39value, value.0:06:49Independent of what value it had before, set! changes the value of the variable. Until this moment, we had nothing that changed.0:07:03So, for example, one of the things we can think of is that the procedures we write for something like factorial are in fact pretty much identical to the function factorial.0:07:13Factorial of four, if I write fact4, independent of what context it's in, and independent of how many times I write it, I always get the same answer.0:07:23It's always 24. It's a unique map from the argument to the answer. And all the programs we've written so far are like that.0:07:33However, once I have assignment, that isn't true. So, for example, if I were to define count to be one.0:07:50And then I'm going to define also a procedure, a simple procedure called demo, which takes argument x and does the0:08:02following operations. It first sets x to x plus one. My gosh, this looks just like FORTRAN, right--0:08:13in a funny syntax. And then add to x count, Oh, I just made a mistake.0:08:24I want to say, set! count to one plus count. It's this thing defined here.0:08:34And then plus x count. Then I can try this procedure. Let's run it.0:08:43So, suppose I get a prompt and I say, demo three.0:08:52Well, what happens here? The first thing that happens is count is currently one. Currently, there is a time. We're talking about time. x gets three.0:09:02At this moment, I say, oh yes, count is incremented, so count is two. two plus three is five. So the answer I get out is five.0:09:14Then I say, demo of say, three again.0:09:23What do I get? Well, now count is two, it's not one anymore, because I have incremented it. But now I go through this process, three goes into x,0:09:35count becomes one plus count, so that's three now. The sum of those two is six, so the answer is six. And what we see is the same expression leads to two0:09:45different answers, depending upon time. So demo is not a function, does not compute a0:09:55mathematical function. In fact, you could also see why now, of course, this is the first place where the substitution model0:10:05isn't going to work. This kills the substitution model dead. You know, with quotation there were some little problems that0:10:14a philosopher might notice with the substitutions, because you have to worry about what deductions you can make when you substitute into quotes, if you're allowed to0:10:23do that at all. But here the substitution model is dead, can't do anything at all. Because, supposing I wanted to use a substitution model to0:10:34consider substituting for count? Well, my gosh, if I substitute for here and here, they're different ones.0:10:44It's not the same count any more. I get the wrong answer. The substitution model is a static phenomenon that describes things that are true and not things that change.0:10:55Here, we have truths that change. OK, Well, before I give you any understanding of this,0:11:06this is very bad. Now, we've lost our model of computation. Pretty soon, I'm going to have to build you a new model of computation.0:11:15But ours plays with this, just now, in an informal sense. Of course, what you already see is that when I have something like assignment, the model that we're going to need0:11:24is different from the model that we had before in that the variables, those symbols like count, or x are no longer going to refer to the values they have, but rather to some0:11:35sort of place where the value restored. We're going to have to think that way for a while. And it's going to be a very bad thing and cause a lot of trouble.0:11:44And so, as I said, the very fact that we're inventing this bad thing, means that there had better be a good reason for it, otherwise, just a waste of time and a lot of effort.0:11:53Let's just look at some of it just to play. Supposing we write down the functional version, functional meaning in the old style, of factorial by0:12:02an iterative process. Factorial of n, we're going to iterate of m and i, which says0:12:26if i is greater than n, then the result is m, otherwise,0:12:40the result of iterating the product of i and m. So m is going to be the product that I'm accumulating.0:12:51m is the product. And the count I'm going to increase by one.0:13:04Plus, ITER, ELSE, COND, define. I'm going to start this up.0:13:17And these days, you should have no trouble reading something like this. What I have here is a product there being accumulated and a counter.0:13:26I start them up both at one. I'm going to buzz the counter up, i goes to i plus one every time around. But that's only our putting a time on the process, each of0:13:38this is just a set of truths, true rules. And m is going to get a new values of i and m, i times m0:13:47each time around, and eventually i is going to be bigger than n, in which case, the answer's going to be m. Now, I'm speaking to you, use time in this. That's just because I know how the computer works.0:13:58But I didn't have to. This could be a purely mathematical description at this point, because substitution will work for this. But let's set right down a similar sort of program, using0:14:08the same algorithm, but with assignments. So this is called the functional version.0:14:23I want to write down an imperative version.0:14:34Factorial of n. I'm going to create my two variables. Let i initialize itself to one, and m be initialized to0:14:48one, similar. We'll create a loop which has COND greater than i, and if i0:15:05is greater than n, we're done. And the result is m, the product I'm accumulating. Otherwise, I'm going to write down three things to do.0:15:19I'm going to set! m to the product of i and m, set! i to the sum of i and0:15:34one, and go around the loop again. Looks very familiar to you FORTRAN programmers.0:15:44ELSE, COND, define, funny syntax though. Start the loop up, and that's the program.0:15:59Now, this program, how do we think about it? Well, let's just say what we're seeing here. There are two local variables, i and m, that have been initialized to one.0:16:10Every time around the loop, I test to see if i is greater than n, which is the input argument, and if so, the result is the product being accumulated in m.0:16:19However, if it's not the end of the loop, if I'm not done, then what I'm going to do is change the product to be the result of multiplying i times the current product.0:16:29Which is sort of what we were doing here. Except here I wasn't changing. I was making another copy, because the substitution model0:16:38says, you copy the body of the procedure with the arguments substituted for the formal parameters. Here I'm not worried about copying, here I've changed the0:16:49value of m. I also then change the value of i to i plus one, and go buzzing around.0:16:58Seems like essentially the same program, but there are some ways of making errors here that didn't exist until today. For example, if I were to do the horrible thing of not0:17:10being careful in writing my program and interchange those two assignments, the program wouldn't compute the same function.0:17:20I get a timing error because there's a dependency that m depends upon having the last value of i. If I try to i first, then I've got the wrong value of i when0:17:32I multiply by m. It's a bug that wasn't available until this moment, until we introduced something that had time in it.0:17:43So, as I said, first we need a new model of computation, and second, we have to be damn good reason for doing this kind of ugly thing.0:17:52Are there any questions? Speak loudly, David. AUDIENCE: I'm confused about, we've introduced set now, but0:18:04we had let before and define before. I'm confused about the difference between the three. Wouldn't define work in the same situation as set if you0:18:14introduced it a bit? PROFESSOR: No, define is intended for setting something once the first time, for making it. You've never seen me write on a blackboard two defines in a0:18:26row whose intention was to change the old value of some variable to a new one. AUDIENCE: Is that by convention or-- PROFESSOR: No, it's intention.0:18:38The answer is that, for example, internal to a procedure, two defines in a row are illegal, two defines0:18:47in a row of the same variable. x can't be defined twice. Whether or not a system catches that error is a different question, but I legislate to you that define0:18:58happens once on anything. Now, indeed, in interactive debugging, we intend that you interacting with your computer will redefine things, and so0:19:08there's a special exception made for interactive debugging. But define is intended to mean to set up something which will0:19:18be forever that value after that point. It's as if all the defines were done at the beginning. In fact, the only legal place to put a define in Scheme,0:19:29internal to a procedure, is just at the beginning of a lambda expression, the beginning of the body of a procedure.0:19:41Now, let of course does nothing like either of that. I mean, if you look at what's happening with a let, this0:19:50happens again exactly once. It sets up a context where i and m are values one and one. That context exists throughout this scope, this0:20:01region of the program. However, you don't think of that let as setting i again.0:20:11It doesn't change it. i never changes because of the let. i gets created because of let. In fact, the let is a very simple idea.0:20:22Let does nothing more, Let a variable one to have value one; I'll write this down a little bit more neatly; Let's0:20:37write, var one have value, the value of expression e1, and variable two, have this value of the expression e2, in an0:20:48expression e3, is the same thing as a procedure of var0:21:00one and var two, the formal parameters, and e3 being the body, where var one is bound to the value of e1, and var0:21:15two gets the value of e2. So this is, in fact, a perfectly understandable thing from a substitution point of view.0:21:24This is really the same expression written in two different ways. In fact, the way the actual system works is this gets0:21:34translated into this before anything happens. AUDIENCE: OK, I'm still unclear as then what makes the difference between a let and a define. They could-- PROFESSOR: A define is a syntactic sugar, whereby,0:21:45essentially a bunch of variables get created by lets and then set up once.0:21:57OK, time for the first break, I think. Thank you. [MUSIC PLAYING]0:23:04Well let's see. I now have to rebuild the model of computation, so you understand how some such mechanical mechanism could0:23:13work that can do what we've just talked about. I just recently destroyed your substitution model.0:23:22Unfortunately, this model is significantly more complicated than the substitution model. It's called the environment model. And I'm going to have to introduce some terminology,0:23:32which is very good terminology for you to know anyway. It's about names. And we're going to give names to the kinds of names things have and the way those names are used.0:23:42So this is a meta-description, if you will. Anyway, there is a pile of an unfortunate terminology here, but we're going to need this to understand what's called0:23:52the environment model. We're about to do a little bit of boring, dog-work here. Let's look at the first transparency.0:24:02And we see a description of a word called bound. And we're going to say that a variable, v, is bound in an0:24:11expression, e, if the meaning of e is unchanged by the uniform replacement of a variable w, not occurring in0:24:22e, for every occurrence of v in e. Now that's a long sentence, so, I think, I'm going to have to say a little bit about that before we even fool0:24:31around at all here. Bound variables we're talking about here.0:24:44And you've seen lots of them. You may not know that you've seen lots of them. Well, I suppose in your logic you saw a logical variables like, for every x there exists a y such that p is true of x0:24:58and y from your calculus class. This variable, x, and this variable, y, are bound, because the meaning of this expression does not depend0:25:10upon the particular letters I used to describe x and y. If I were to change the w for x, then said for every w there0:25:21exists a y such that p is true of w and y, it would be the same sentence. That's what it means.0:25:30Or another case of this that you've seen is integral say, from 0 to one of dx over one plus x square.0:25:46Well that's something you see all the time. And this x is a bound variable. If I change that to a t, the expression is0:25:55still the same thing. This is a 1/4 of the arctan of one or something like that.0:26:04Yes, that's the arctan of one. So bound variables are actually fairly common, for those of you who have played a bit with mathematics.0:26:13Well, let's go into the programming world. Instead of the quantifier being something like, for0:26:22every, or there exists, or integral, a quantifier is a symbol that binds a variable. And we are going to use the quantifier lambda as being the essential thing that binds variables.0:26:33And so we have some nice examples here like that procedure of one argument y which does0:26:43the following thing. It calls the procedure of one argument x, which multiplies x by y, and applies that to three.0:26:58That procedure has the property there of two bound variables in it, x and y. This quantifier, lambda here, binds this y, and this0:27:08quantifier, lambda, binds that x. Because, if I were to take an arbitrary symbol does not occur in this expression like w and replace all y's with w's0:27:20in this expression, the expression is still the same, the same procedure. And this is an important idea. The reason why we had such things like that is a kind of0:27:30modularity. If two people are writing programs, and they work together, it shouldn't matter what names they use internal to their own little machines that they're building.0:27:42And so, what I'm really telling you there, is that, for example, this is equivalent to that procedure of one argument y which uses that procedure of one argument0:27:54d which multiplies z by y. Because nobody cares what I used in here.0:28:06It's a nice example. On the other hand, I have some variables that are not bound.0:28:15For example, that procedure of one argument x which multiplies x by y.0:28:27In this case, y is not bound. Supposing y had the value three, and z had the value0:28:36four, then this procedure would be the thing that multiplies its argument by three. If I were to replace every instance of y with z, I would0:28:47have a different procedure which multiplies every argument that's given by four. And, in fact, we have a name for such a variable.0:28:57Here, we say that a variable, v, is free in the expression, e, if the meaning of the expression, e, is changed by0:29:06the uniform replacement of a variable, w, not occurring in e for every occurrence of v and e. So that's why this variable over here,0:29:20y, is a free variable.0:29:29And so free variables in this expression-- And other examples of that is that procedure of one argument0:29:38y, which is just what we had before, which uses that procedure of one argument x that multiplies x by y--0:29:51use that on three. This procedure has a free variable0:30:00in it which is asterisk. See, because, if that has a normal meaning of multiplication, then if I were to replace uniformly all0:30:11asterisks with pluses, then the meaning of this expression would change. That's what you mean by a free variable.0:30:22So, so far you've learned some logician words which describe the way names are used. Now, we have to do a little bit more playing around here,0:30:32a little bit more. I want to tell you about the regions are over which variables are defined.0:30:42You see, we've been very informal about this up till now, and, of course, many of you have probably understood very clearly or most of you, that the x that's being0:30:51declared here is defined only in here. This x is the defined only in here, and this y is defined0:31:03only in here. We have a name for such an idea. It's called a scope. And let me give you another piece of terminology.0:31:14It's a long story. If x is a bound variable in e, then there is a lambda expression where it is bound. So the only way you can get a bound variable ultimately is0:31:23by lambda expression. Then you may worry, does define quite an exception to this? And it turns out, we could always arrange things so you don't need any defines.0:31:33And we'll see that in a while. It's a very magical thing. So define really can go away. The really, only thing that makes names is lambda .0:31:42That's its job. And what's so amazing about a lot of things is you can compute with only lambda. But, in any case, a lambda expression has a place where0:31:53it declares a variable. We call it the formal parameter list or the bound variable list. We say that the lambda expression binds--0:32:03so it's a verb-- binds the variables declared in it's found variable list. In addition, those parts of the expression where the variable is defined, which was declared by some declaration,0:32:15is called the scope of that variable. So these are scopes. This is the scope of y.0:32:27And this is the scope of x-- that sort of thing.0:32:41OK, well, now we have enough terminology to begin to understand how to make a new model for computation, because0:32:52the key thing going on here is that we destroyed the substitution model, and we now have to have a model that represents the names as referring to places.0:33:03Because if we are going to change something, then we have a place where it's stored. You see, if a name only refers to a value, and if I tried to0:33:14change the name's meaning, well, that's not clear. There's nothing that is the place that that0:33:23name referred to. How am I really saying it? There is nothing shared among all of the instances of that name. And what we really mean, by a name, is that we0:33:32fan something out. We've given something a name, and you have it, and you have it, because I'm given you a reference to it, and I've given you a reference to it.0:33:41And we'll see a lot about that. So let me tell you about environments. I need the overhead projection machine, thank you.0:33:52And so here is a bunch of environment structures.0:34:01An environment is a way of doing substitutions virtually. It represents a place where something is stored which is the substitutions that you haven't done.0:34:14It's a place where everything accumulates, where the names of the variables are associated with the values they have such that when you say, what dose this name mean,0:34:26you look it up in an environment. So an environment is a function, or a table, or something like that. But it's a structured sort of table.0:34:35It's made out of things called frames. Frames are pieces of environment, and they are0:34:45chained together, in some nice ways, by what's called parent links or something like that. So here, we have an environment structure0:34:57consisting of three environments, basically, a, b, and c. d is also an environment, but it's the same one, they share.0:35:11And that's the essence of assignment. If I change a variable, a value of a valuable that lives here, like that one, it should be visible from all places0:35:21that you're looking at it from. Take this one, x. If I change the x to four, it's visible from other places.0:35:30But I'm not going to worry about that right now. We're going to talk a lot about that in a little while. What do we have here? Well, these are called frames. Here is a frame, here's a frame, and here's a frame.0:35:43a is an environment which consists of the table which is frame two, followed by the table labeled frame one.0:35:52And, in this environment, in say this environment, frame two, x and y are bound.0:36:04They have values. Sorry, in frame one-- In frame two, z is bound, and x is bound, and y is bound,0:36:15but the value of x that we see, looking from this point of view, is this x. It's x is seven, rather than this one which is three.0:36:24We say that this x shadows this x. From environment three--0:36:33from frame three, from environment b, which refers to frame three, we have variables n and y bound and also x.0:36:44This y shadow this one. So the value, looking from this point of view, of y is two.0:36:53The value for looking from this point of view and m is one. And the value, looking from this point of view, of x is three.0:37:02So there we have a very simple environment structure made out of frames. These correspond to the applications of procedures. And we'll see that in a second.0:37:14So now I have to make you some other nice little structure that we build. Next slide, we see an object, which I'm going to draw0:37:25procedures. This is a procedure. A procedure is made out of two parts. It's sort of like a cons.0:37:37However, it's the two parts. The first part refers to some code, something that can be0:37:46executed, a set of instructions, if you will. You can think of it that way. And the second part is the environment. The procedure is the whole thing.0:37:57And we're going to have to use this to capture the values of the free variables that occur in the procedure.0:38:06If a variable occurs in the procedure it's either bound in that procedure or free. If it's bound, then the value will somehow be easy to find.0:38:16It will be in some easy environment to get at. If it's free, we're going to have to have something that goes with the procedure that says where we'll go look for its value.0:38:27And the reasons why are not obvious yet, but will be soon. So here's a procedure object. It's a composite object consisting of a piece of code0:38:40and a environment structure. Now I will tell you the new rules, the complete new rules, for evaluation.0:38:50The first rule is-- there's only two of them. These correspond to the substitution model rules. And the first one has to do with how do you apply a0:39:00procedure to its arguments? And a procedural object is applied to a set of arguments by constructing a new frame.0:39:11That frame will contain the mapping of the former parameters to the actual parameters of the arguments that were supplied in the call.0:39:21As you know, when we make up a call to a procedure like lambda x times x y, and we call that with the argument three, then we're going to need some0:39:31mapping of x to three. It's the same thing as later substituting, if you will, the three for the x in the old model.0:39:41So I'm going to build a frame which contains x equals three as the information in that frame. Now, the body of the procedure will then have to be evaluated0:39:52which is this. I will be evaluated in an environment which is0:40:04constructed by adjoining the new frame that we just made to the environment which was part of the procedure that we applied.0:40:13So I'm going to make a little example of that here. Supposing I have some environment.0:40:25Here's a frame which represents it. And some procedure-- which I'm going to draw with circles here because it's easier than little triangles-- Sorry, those are rhombuses, rhomboidal little pieces of0:40:38fruit jelly or something. So here's a procedure which takes this environment. And the procedure has a piece of code, which is a lambda0:40:48expression, which binds x and y and then executes an expression, e.0:40:58And this is the procedure. We'll call it p. I wish to apply that procedure to three and four. So I want to do p of three and four.0:41:09What I'm going to do, of course, is make a new frame. I build a frame which contains x equals three,0:41:18and y equals four. I'm going to connect that frame to this frame over here.0:41:27And then this environment, with I will call b, is the environment in which I will evaluate the body of e.0:41:39Now, e may contain references to x and y and other things. x and y will have values right here.0:41:50Other things will have their values here. How do we get this frame? That we do by the construction of procedures which is the0:42:00other rule. And I think that's the next slide. Rule two, when a lambda expression is evaluated,0:42:10relative to a particular environment-- See, the way I get a procedure is by evaluating the lambda expression. Here's a lambda expression.0:42:20By evaluating it, I get a procedure which I can apply to three. Now this lambda expression is evaluated in an environment where y is defined.0:42:31And I want the body of this which contains a free version of y. y is free in here, it's bound over the whole thing, but it's0:42:41free over here. I want that y to be this one. I evaluate this body of this procedure in the environment0:42:53where y was created. That's this kind of thing, because that was done by application. Now, if I ever want to look up the value of y, I have to know0:43:03where it is. Therefore, this procedural was created, the creation of the procedure which is the result of evaluating that lambda expression had better capture a pointer or remember the0:43:14frame in which y was bound. So that's what this rule is telling us. So, for example, if I happen to be evaluating a lambda0:43:28expression, lambda expression in e, lambda of say, x and y,0:43:37let's call it g in e, evaluating that. Well, all that means is I now construct a procedure object.0:43:47e is some environment. e is something which has a pointer to it. I construct a procedure object that points up to that0:43:56environment, where the code of that is a lambda expression or whatever that translates into.0:44:06And this is the procedure. So this produces for me-- this object here, this environment0:44:17pointer, captures the place where this lambda expression was evaluated, where the definition was used, where the definition was used to make a0:44:26procedure, to make the procedure. So it picks up the environment from the place where that0:44:35procedure was defined, stores it in the procedure itself, and then when the procedure is used, the environment where it was defined is extended with the new frame.0:44:48So this gives us a locus for putting where a variable has a value. And, for example, if there are lots of guys pointing in at that environment, then they share that place.0:45:01And we'll see more of that shortly. Well, now you have a new model for understanding the execution of programs. I suppose I'll take questions0:45:12now, and then we'll go on and use that for something. AUDIENCE: Is it right to say then, the environment is that0:45:21linked chain of frames-- PROFESSOR: That's right. AUDIENCE: starting with-- working all the way back? PROFESSOR: Yes, the environment is a sequence of frames linked together.0:45:32And the way I like to think about it, it's the pointer to the first one, because once you've got that you've got them all.0:45:44Anybody else? AUDIENCE: Is it possible to evaluate a procedure or to define a procedure in two different environments such that it will behave differently, and have pointers to both-- PROFESSOR: Oh, yes.0:45:53The same procedure is not going to have two different environments. The same code, the same lambda expression can be evaluated in two environments producing two different procedures.0:46:06Each procedure-- AUDIENCE: Their definition has the same name. Their operation-- PROFESSOR: The definition is written the same, with the same characters. I can evaluate that set of characters, whatever, that0:46:16list structure that defines, that is the textual representation. I can evaluate that in two different environments producing two different procedures.0:46:25Each of those procedures has its own local sets of variables, and we'll see that right now.0:46:36Anybody else? OK, thank you. Let's take a break.0:46:48[MUSIC PLAYING]0:47:22Well, now I've done this terrible thing to you. I've introduced a very complicated thing, assignment,0:47:34which destroys most of the interesting mathematical properties of our programs. Why should I have done this?0:47:43What possible good could this do? Clearly not a nice thing, so I better have a good excuse.0:47:52Well, let's do a little bit of playing, first of all, with some very interesting programs that have assignment. Understand something special about them that makes them0:48:02somewhat valuable. Start with a very simple program which I'm going to call make-counter. I'm going to define make-counter to be a procedure0:48:26of one argument n which returns as its value a procedure of no arguments-- a procedure that produces a procedure--0:48:36which sets n to the increment of n and returns0:48:48that value of n. Now we're going to investigate the behavior of this.0:48:57It's a sort of interesting thing. In order to investigate the behavior, I have to make an environment model, because we can't understand this any other way.0:49:08So let's just do that. We start out with some sort of-- let's say there is a global environment that the machine is born with. Global we'll call it.0:49:19And it's going to have in it a bunch of initial things. We all know what it's got. It's got things in it like say, plus, and times, and0:49:32quotient, and difference, and CAR, and et cetera, lots of things.0:49:42I don't know what they are, some various squiggles that are the things the machine is born with.0:49:51And by doing the definition here, what I plan to do-- Well, what am I doing? I'm doing this relative to the global environment. So here's my environment pointer.0:50:03In order to do that I have to evaluate this lambda expression. That means I make a procedure object. So I'm going to make a procedure object here.0:50:17And the procedure object has, as the place it's defined, the global environment. The procedure object contains some code that represents a0:50:29procedure of one argument n which returns a procedure of no arguments which does something.0:50:38And the define is a way of changing this environment, so that I now add to it a make-counter, a special rule0:50:53for the special thing defined. But what that is, is it gives me that pointer to that procedure.0:51:03So now the global environment contains make-counter as well. Now, we're going to do some operations. I'm going to use this to make some counters.0:51:14We'll see what a counter is. So let's define c1 to be a counter beginning at 0.0:51:35Well, we know how to do this now, according to the model. I have to evaluate the expression make-counter in the global environment, make-counter of 0.0:51:47Well, I look up make-counter and see that it's a procedure. I'm going to have to apply that procedure.0:51:56The way I apply the procedure is by constructing a frame. So I construct a frame which has a value for n in it which0:52:12is 0, and the parent environment is the one which is the environment of definition of make-counter.0:52:23So I've made an environment by applying make-counter to 0. Now, I have to evaluate the body of make-counter, which is0:52:34this lambda expression, in that environment. Well evaluating this body, this body is a lambda0:52:43expression. Evaluate a lambda expression means make a procedure object. So I'm going to make a procedure object.0:52:56And that procedure object has the environment it was defined in being that, where n was defined to be 0.0:53:07And it has some code, which is the procedure of no arguments which does something, that sets something, and returns n.0:53:17And this thing is going to be the object, which in the global environment, will have the name c1.0:53:26So we construct a name here, c1, and say that equals that.0:53:35Now, but also make another counter, c2 to be make-counter0:53:50say, starting with 10. Then I do essentially the same thing. I apply the make-counter procedure, which I got from0:53:59here, to make another frame with n being 10. That frame has the global environment as its parent.0:54:10I then construct a procedure which has that as it's frame of definition.0:54:20The code of it is the procedure of no arguments which does something. And it does a set, and so on. And n comes out.0:54:31And c2 is this. Well, you're already beginning to see something fairly interesting.0:54:40There are two n's here. They are not one n. Each time I called make-counter, I made another0:54:49instance of n. These are distinct and separate from each other. Now, let's do some execution, use those counters.0:55:00I'm going to use those counters. Well, what happens if I say, c1 at this point?0:55:15Well, I go over here, and I say, oh yes, c1 is a procedure. I'm going to call this procedure on no arguments, but it has no parameters.0:55:25That's right. What's its body? Well, I have to look over here, because I didn't write it down. It said, set n to one plus n and return n, increment n.0:55:39Well, the n it sees is this one. So I increment that n. That becomes one, and I return the value one.0:55:53Supposing I then called c2. Well, what do I do? I say c2 is this procedure which does the same thing, but0:56:03here's the n. It becomes 11. And so I have an 11 which is the value.0:56:15I then can say, let's try c1 again. c1 is this, that's two, so the answer is two.0:56:29And c2 gives me a 12 by the same method, by walking down here looking at that and saying, here's the n, I'm0:56:38incrementing. So what I have are computational objects. There are two counters, each with its own0:56:49independent local state. Let's talk about this a little. This is a strange thing.0:57:01What's an object? It's not at all obvious what an object is. We like to think about objects, because it's0:57:11economical to think that way. It's an intellectual economy. I am an object.0:57:21You are an object. We are not the same object. I can divide the world into two parts, me and you, and0:57:32there's other things as well, such that most of the things I might want to discuss about my workings do not involve you,0:57:41and most of the things I want to discuss about your workings don't involve me. I have a blood pressure, a temperature, a respiration0:57:50rate, a certain amount of sugar in my blood, and numerous, thousands, of state variables-- millions actually,0:57:59or I don't know how many-- huge numbers of state variables in the physical sense which represent the state of me as a particle, and0:58:09you have gazillions of them as well. And most of mine are uncoupled to most of yours. So we can compute the properties of me without0:58:21worrying too much about the properties of you. If we had to work about both of us together, than the number of states that we have to consider is the product of the number of states you have and the number of states I have. But this way it's almost a sum.0:58:32Now, indeed there are forces that couple us. I'm talking to you and your state changes. I'm looking at you and my state changes.0:58:41Some of my state variables, a very few of them, therefore, are coupled to yours. If you were to suddenly yell very loud, my blood pressure would go up.0:58:54However, and it may not be always appropriate to think about the world as being made out of independent states and independent particles. Lots of the bugs that occur in things like quantum mechanics,0:59:05or the bugs in our minds that occur when we think about things like quantum mechanics, are due the fact that we are trying to think about things being broken up into independent pieces, when in fact there's more coupling0:59:15than we see on the surface, or that we want to believe in, because we want to compute efficiently and effectively. We've been trained to think that way.0:59:29Well, let's see. How would we know if we had objects at all? How can we tell if we have objects? Consider some possible optical illusions.0:59:41This could be done. These pieces of chalk are not appropriately identical, but supposing you couldn't tell the difference of them by looking at them.0:59:52Well, there's a possibility that this all a game I'm playing with mirrors. It's really the same piece of chalk, but you're seeing two of them.1:00:01How would you know if you're seeing one or two? Well, there's only one way I know. You grab one of them and change it and see if the other1:00:10one changed. And it didn't, so there's two of them.1:00:19And, on the other hand, there is some other screwy properties of things like that. Like, how do we know if something changed? We have to look at it before and after the change.1:00:28The change is an assignment, it's a moment in time. But that means we have to know it was the same one that we're looking at. So some very strange, and unusual, and obscure, and--1:00:39I don't understand the problems associated with assignment, and change, and objects. These could get very, very bad.1:00:51For example, here I am, I am a particular person, a particular object. Now, I can take out my knife, and cut my fingernail.1:01:02A piece of my fingernail has fallen off onto the table. I believe I am the same person I was a second ago, but I'm1:01:11not physically the same in the slightest. I have changed. Why am I the same? What is the identity of me?1:01:21I don't know. Except for the fact that I have some sort of identity. And so, I think by introducing assignment and objects, we1:01:34have opened ourselves up to all the horrible questions of philosophy that have been plaguing philosophers for some thousands of years about this sort of thing.1:01:43It's why mathematics is a lot cleaner. Let's look at the best things I know to say about actions and identity.1:01:52We say that an action, a, had an effect on an object, x, or equivalently, that x was changed by a, if some property, p, which was true of x before a, became1:02:02false of x after a. Let's test. It still means I have to have the x before and after. Or, the other way of saying this is, we say that two1:02:13objects x and y are the same for any action which has an effect on x has the same effect on y. However, objects are very useful, as I said, for1:02:22intellectual economy. One of the things that's incredibly useful about them, is that the world is, we like to think about, made out of1:02:32independent objects with independent local state. We like to think that way, although it isn't completely true. When we want to make very complicated programs that deal1:02:42with such a world, if we want those programs to be understandable by us and also to be changeable, so that if we change the world we change the program only a little bit,1:02:51then we want there to be connections, isomorphism, between the objects in the world and the objects in our mental model. The modularity of the world can give us the modularity in1:03:00our programming. So we invent things called object-oriented programming and things like that to provide us with that power.1:03:09But it's even easier. Let's play a little game. I want to play a little game, show you an even easier example of where modularity can be enhanced by using an1:03:19assignment statement, judiciously. One thing I want to enforce and impress on you, is don't use assignment statements the way you use it in FORTRAN or1:03:28Basic or something or Pascal, to do the things you don't have to do with it. It's not the right way to think for most things.1:03:37Sometimes it's essential, or maybe it's essential. We'll see more about that too. OK, let me show you a fun game here.1:03:47There was mathematician by the name of Cesaro-- or Cesaro, Cesaro I suppose it is-- who figured out a clever way of computing pi.1:03:58It turns out that if I take to random numbers, two integers at random, and compute the greatest common divisor, their1:04:11greatest common divisor is either one or it's not one. If it's one, then they have no common divisors. If their greatest common divisor is one--1:04:21the probability that two random numbers, two numbers chosen at random, has as greatest common divisor one is related to pi. In fact--1:04:31yes, it's very strange-- of course there are other ways of computing pi, like dropping pins on flags, and things like that, and sort of the same kind of thing.1:04:40So the probability of that the GCD of number one and number two, two random numbers chosen, is 6 over pi squared.1:04:55I'm not going to try to prove that. It's actually not too hard and sort of fun. How would we estimate such probability? Well, the way we do that, the way we estimate probabilities,1:05:07is by doing lots of experiments, and then computing the ratios of the ones that come out one way to the total number of experiments we do.1:05:16It's called Monte Carlo, and it's useful in other contexts for doing things like integrals where you have lots and lots of variables-- the space which is limiting the dimensions you are doing you integral in.1:05:26But going back to here, Let's look at this slide, We can use Cesaro's method for estimating pi with n trials by taking the1:05:40square root of six over a Monte Carlo, a Monte Carlo experiment with n trials, using Cesaro's experiment,1:05:51where Cesaro's experiment is the test of whether the GCD of two random numbers-- And you can see that I've already got some assignments1:06:01in here, just by what I wrote. The fact that this word rand, in parentheses, therefore, that procedure call, yields a different value than this one,1:06:11at least that's what I'm assuming by writing this this way, indicates that this is not a function, that there's internal state in it which is changing.1:06:25If the GCD of those two random numbers is equal to one, that's the experiment. So here I have an experimental method for estimating the1:06:34value of pi. Where, I can easily divide this problem into two parts. One is the specific Monte Carlo experiment of Cesaro,1:06:43which you just saw, and the other is the general technique of doing Monte Carlo experiments. And that's what this is. If I want to do Monte Carlo experiments with n trials, a1:06:55certain number of trials, and a particular experiment, the way I do that is I make a little iterative procedure which has variable the number of trials remaining and the1:07:05number trials that have been passed, that I've gotten true. And if the number remaining is 0, then the answer is the number past divided by this whole number of trials, was1:07:16the estimate of the probability. And if it's not, if I have more trials to do, then let's do one. We do an experiment. We call the procedure which is experiment on no arguments.1:07:27We do the experiment and then, if that turned out to be true, we go around the loop decrementing the number of experiments we have to do by one and incrementing the1:07:36number that were passed. And if the experiment was false, we just go around the loop decrementing the number of experiments remaining and keeping the number passed the same.1:07:48We start this up iterating over the total number of trials with 0 experiments past. A very elegant little program.1:07:57And I don't have to just do this with Cesaro's experiment, it could be lots of Monte Carlo experiments I might do. Of course, this depends upon the existence of some sort of random number generator.1:08:07And random number generators generally look something like this. There is a random number generator--1:08:17is in fact a procedure which is going to do something just like the counter. It's going to update an x to the result of applying some1:08:30function to x, where this function is some screwy kind of function that you might find out in Knuth's books on the details of programming.1:08:41He does these wonderful books that are full of the details of programming, because I can't remember how to make a random number generator, but I can look it up there, and I1:08:50can find out. And then, eventually, I return the value of x which is the state variable internal to the random number generator. That state variable is initialized1:09:00somehow, and has a value. And this procedure is defined in the context where that variable is bound.1:09:10So this is a hidden piece of local state that you see here. And this procedure is defined in that context.1:09:21Now, that's a very simple thing to do. And it's very nice. Supposing, I didn't want to use assignments. Supposing, I wanted to write this program without1:09:30assignments. What problems would I have? Well, let's see. I'd like to use the overhead machine here, thank you.1:09:44First of all, let's look at the whole thing. It's a big story. Unfortunately, which tells you there is something wrong. It's at least that big, and it's monolithic.1:09:57You don't have to understand or look at the text there right now to see that it's monolithic. It isn't a thing which is Cesaro's experiment. It's not pulled out from the Monte Carlo process.1:10:10It's not separated. Let's look why. Remember, the constraint here is that every procedure return1:10:19the same value for the same arguments. Every procedure represents a function. That's a different kind of constraint.1:10:28Because when I have assignments, I can change some internal state variable. So let's see how that causes things to go wrong. Well, start at the beginning.1:10:38The estimate of pi looks sort of the same. What I'm doing is I take the square root of six over the1:10:47random GCD test applied to n, whereas that's what this is. But here, we are beginning to see something funny. The random GCD test of a certain number of trials is1:10:58just like we had before, an iteration on the number of trials remaining, the number of trials that have been passed, and another variable x.1:11:10What's that x? That x is the state of the random number generator. And it is now going to be used here.1:11:21The same random update function that I have over here is the one I would have used in a random number generator if I were building it the other way, the one I get out of Knuth's books.1:11:31x is going to get transformed into x1, I need two random numbers. And x1 is going to get transformed into x2, I have two random numbers. I then have to do exactly what I did before.1:11:42I take the GCD of x1 x2. If that's one, then I go around the loop with x2 being the next value of x.1:11:54You see what's happened here is that the state of the random number generator is no longer confined to the insides of the random number generator. It has leaked out.1:12:03It has leaked out into my procedure that does the Monte Carlo experiment. But what's worse than that, is it's also, because it was1:12:13contained inside my experiment itself, Cesaro, it leaked out of that too. Because Cesaro called twice, has to have a different value each time, if I going to have a legitimate experimental1:12:24test. So Cesaro can't be a function either, unless I pass it the seed of the random number generator that is going1:12:34to go wandering around. So unfortunately, the seed of random number generator has leaked out into Cesaro, from the random number generator, that's leaked into the Monte Carlo experiment.1:12:45And, unfortunately, my Monte Carlo experiment here is no longer general. The Monte Carlo experiment here knows how many random numbers I need to do the experiment.1:12:58That's sort of horrible. I lost an ability to decompose a problem into pieces, because I wasn't willing to accept the little loop of information,1:13:10the feedback process, that happens inside the random number generator before that was made by having an assignment to a state variable that was confined to the1:13:20random number generator. So the fact that the random number generator is an object, with an internal state variable, it's affected by1:13:29nothing, but it'll give you something, and it will apply it's force to you, that was what we're missing now.1:13:38OK, well I think we've seen enough reason for doing this, and it all sort of looks very wonderful. Wouldn't it be nice if assignment was a good thing1:13:51and maybe it's worth it, but I'm not sure. As Mr. Gilbert and Sullivan said, things are seldom what they seem, skim milk masquerades as cream.1:14:01Are there any questions?1:14:17Are there any philosophers here? Anybody want to argue about objects? You're just floored, right?1:14:29And you haven't done your homework yet. You haven't come up with a good question. Oh, well.1:14:40Sure, thank you. Let's take the long break now.

`0:00:00`Lecture 5B | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:21PROFESSOR: Well, now that we've given you some power to make independent local state and to model objects, I thought we'd do a bit of programming of a very0:00:31complicated kind, just to illustrate what you can do with this sort of thing.0:00:40I suppose, as I said, we were motivated by physical systems and the ways we like to think about physical systems, which is that there are these things that the world is made out of.0:00:52And each of these things has particular independent local state, and therefore it is a thing. That's what makes it a thing.0:01:01And then we're going to say that in the model in the world--we have a world and a model in our minds and in the computer of that world.0:01:10And what I want to make is a correspondence between the objects in the world and the objects in the computer, the relationships between the objects in the world and the relationships between those same obj...--the model objects0:01:21in the computer, and the functions that relate things in the world to the functions that relate things in the computer.0:01:30This buys us modularity. If we really believe the world is like that, that it's made out of these little pieces, and of course we could arrange0:01:40our world to be like that, we could only model those things that are like that, then we can inherit the modularity in the world into our programming.0:01:50That's why we would invent some of this object-oriented programming. Well, let's take the best kind of objects I know. They're completely--they're completely wonderful:0:02:03electrical systems. Electrical systems really are the physicist's best, best objects.0:02:14You see over here I have some piece of machinery. Right here's a piece of machinery. And it's got an electrical wire connecting one part of0:02:24the machinery with another part of the machinery. And one of the wonderful properties of the electrical world is that I can say this is an object, and this is an0:02:34object, and they're-- the connection between them is clear. In principle, there is no connection that I didn't describe with these wires.0:02:44Let's say if I have light bulbs, a light bulb and a power supply that's plugged into the outlet. Then the connection is perfectly clear.0:02:53There's no other connections that we know of. If I were to tie a knot in the wire that connects the light bulb to the power supply, the light remains lit up.0:03:04It doesn't care. That the way the physics is arranged is such that the connection can be made abstract, at least for low0:03:13frequencies and things like that. So in fact, we have captured all of the connections there really are.0:03:22Well, as you can go one step further and talk about the most abstract types of electrical systems we have, digital to dual circuits. And here there are certain kinds of objects.0:03:34For example, in digital circuits we have things like inverters. We have things like and-gates.0:03:43We have things like or-gates. We connect them together by sort-of wires which represent0:03:53abstract signals. We don't really care as physical variables whether these are voltages or currents or some combination or anything like that, or water, water pressure.0:04:05These abstract variables represent certain signals. And we build systems by wiring these things together with wires.0:04:14So today what I'm going to show you, right now, we're going to build up an invented language in Lisp, embedded in the same sense that Henderson's picture language0:04:24was embedded, which is not the same sense as the language of pattern match and substitution was done yesterday. The pattern match/substitution language was interpreted by a0:04:35Lisp program. But the embedding of Henderson's program is that we just build up more and more procedures that encapsulate the structure we want.0:04:45So for example here, I'm going to have some various primitive kinds of objects, as you see, that one and that one. I'm going to use wires to combine them.0:04:55The way I represent attaching-- I can make wires. So let's say A is a wire. And B is a wire. And C is a wire. And D is a wire.0:05:04And E is wire. And S is a wire. Well, an or-gate that has both inputs, the inputs being A and B, and the output being Y or D, you notate like this.0:05:17An and-gate, which has inputs A and B and output C, we notate like that. By making such a sequence of declarations, like this, I can0:05:29wire together an arbitrary circuit. So I've just told you a set of primitives and means of combination for building digital circuits, when I need0:05:40more in a real language than abstraction. And so for example, here I have--here I have a half adder.0:05:52It's something you all know if you've done any digital design. It's used for adding numbers together on A and B and putting out a sum and a carry.0:06:03And in fact, the wiring diagram is exactly what I told you. A half adder with things that come out of the box-- you see the box, the boundary, the abstraction is always a box.0:06:14And there are things that come out of it, A, B, S, and C. Those are the declared variables--declared variables0:06:24of a lambda expression, which is the one that defines half adder. And internal to that, I make up some more wires, D and E,0:06:36which I'm going to use for the interconnect-- here E is this one and D is this wire, the interconnect that doesn't come through the walls of the box--0:06:45and wire things together as you just saw. And the nice thing about this that I've just shown you is this language is hierarchical in the right way. If a language isn't hierarchical in the right way,0:06:55if it turns out that a compound object doesn't look like a primitive, there's something wrong with the language-- at least the way I feel about that.0:07:06So here we have--here, instead of starting with mathematical functions, or things that compute mathematical functions, which is what we've been doing up until now, instead of starting with things that look like0:07:15mathematical functions, or compute such things, we are starting with things that are electrical objects and we build up more electrical objects. And the glue we're using is basically the0:07:26Lisp structure: lambdas. Lambda is the ultimate glue, if you will. And of course, half adder itself can be used in a more0:07:39complicated abstraction called a full adder, which in fact involves two half adders, as you see here, hooked together with some extra wires, that you see here, S, C1, and C2,0:07:50and an or-gate, to manufacture a full adder, which takes a input number, another input number, a carry in, and0:08:01produces output, a sum and a carry out. And out of full adders, you can make real adder chains and big adders.0:08:12So we have here a language so far that has primitives, means of combination, and means of abstraction to real language.0:08:22Now, how are we going to implement this? Well, let's do it easily. Let's look at the primitives. The only problem is we have to implement the primitives.0:08:31Nothing else has to be implemented, because we're picking up the means of combination and abstraction from Lisp, inheriting them in the embedding.0:08:43OK, so let's look at a particular primitive. An inverter is a nice one. Now, inverter has two wires coming in, an in and an out.0:08:57And somehow, it's going to have to know what to do when a signal comes in. So somehow it's going to have to tell its input wire--0:09:07and now we're going to talk about objects and we're going to see this in a little more detail soon-- but it's going to have to tell its input wire that when you0:09:16change, tell me. So this object, the object which is the inverter has to tell the object which is the input wire,0:09:25hi, my name is George. And my, my job is to do something with results when you change. So when you change, you get a change, tell me about it.0:09:34Because I've got to do something with that. Well, that's done down here by adding an action on the input wire called invert-in, where invert-in is defined over here0:09:47to be a procedure of no arguments, which gets the logical not of the signal on the input wire.0:09:56And after some delay, which is the inverter delay, all these electrical objects have delays, we'll do the following thing-- set the signal on the output wire to the new value.0:10:10A very simple program. Now, you have to imagine that the output wire has to be sensitive and know that when its signal changes, it may0:10:19have to tell other guys, hey, wake up. My value has changed. So when you hook together inverter with an and-gate or0:10:29something like that, there has to be a lot of communication going on in order to make sure that the signal propagates right. And down here is nothing very exciting.0:10:38This is just the definition of logical not for some particular representations of the logical values-- 1, 0 in this case. And we can look at things more complicated like and-gates.0:10:49And-gates take two inputs, A1 and A2, we'll call them, and produce an output. But the structure of the and-gate is identical to the0:10:59one we just saw. There's one called an and-action procedure that's defined, which is the thing that gets called when an input0:11:08is changed. And what it does, of course, is nothing more than compute the logical and of the signals on the inputs. And after some delay, called the and-gate delay, calls this0:11:20procedure, which sets a signal on the output to a new value. Now, how I implement these things is all wishful thinking. As you see here, I have an assignment operation.0:11:32It's not set. It's a derived assignment operation in the same way we had functions that were derived from CAR and CDR. So0:11:41I, by convention, label that with an exclamation point. And over here, you see there's an action, which is to inform0:11:50the wire, called A1 locally in this and-gate, to call the and-action procedure when it gets changed, and the wire A20:12:00to call the and-action procedure when it gets changed. All very simple.0:12:09Well, let's talk a little bit about this communication that must occur between these various parts.0:12:18Suppose, for example, I have a very simple circuit which contains an and with wires A and B. And that connects0:12:34through a wire called C to an inverter which has a wire output called D. What are the comput...--here's0:12:46the physical world. It's an abstraction of the physical world. Now I can buy these out of little pieces that you get at Radio Shack for a few cents. And there are boxes that act like this, which have little0:12:57numbers on them like LS04 or something. Now supposing I were to try to say what's the0:13:06computational model. What is the thing that corresponds to that, that part of reality in the mind of us and in the computer?0:13:15Well, I have to assign for every object in the world an object in the computer, and for every relationship in the world between them a relationship in the computer.0:13:25That's my goal. So let's do that. Well, I have some sort of thing called the signal, A.0:13:35This is A. It's a signal. It's a cloudy thing like that. And I have another one down here which I'm going to call B. It's another signal.0:13:49Now this signal--these two signals are somehow going to have to hook together into a box, let's call it this, which is the and-gate, action procedure.0:14:00That's the and-gate's action procedure. And it's going to produce--well, it's going to0:14:09interact with a signal object, which we call C--a wire0:14:18object, excuse me, we call C. And then the-- this is going to put out again, or connect to, another action procedure which is one associated with the inverter0:14:28in the world, not. And I'm going to have another--another wire, which0:14:39we'll call D. So here's my layout of stuff. Now we have to say what's inside them and what they have to know to compute.0:14:51Well, every--every one of these wires has to know what the value of the signal that's on that wire is. So there's going to be some variable inside here, we'll call it signal.0:15:02And he owns a value. So there must be some environment associated with this. And for each one of these, there must be an environment that binds signal.0:15:15And there must be a signal here, therefore. And presumably, signal's a value that's either 1 or 0, and signal.0:15:28Now, we also have to have some list of people to inform if the signal here changes. We're going to have to inform this.0:15:39So I've got that list. We'll call it the Action Procedures, AP. And it's presumably a list. But the first thing on the list, in this case, is this guy.0:15:50And the action procedures of this one happens to have some list of stuff. There might be other people who are sharing A, who are looking at it.0:15:59So there might be other guys on this list, like somebody over there that we don't know about. It's the other guy attached to A. And the action procedure here also has to point to that, the0:16:11list of action procedures. And of course, that means this one, its action procedures has to point up to here. This is the things-- the people it has to inform.0:16:21And this guy has some too. But I don't know what they are because I didn't draw it in my diagram. It's the things connected to D.0:16:30Now, it's also the case that when the and-action procedure is awakened, saying one of the people who know that you've0:16:41told--one of the people you've told to wake you up if their signal changes, you have to go look and ask them what's their signal so you can do the and, and produce a0:16:51signal for this one. So there has to be, for example, information here saying A1, my A1 is this guy, and my A2 is this guy.0:17:08And not only that, when I do my and, I'm going to have to tell this guy something. So I need an output--0:17:19being this guy. And similarly, this guy's going to have a thing called0:17:29the input that he interrogates to find out what the value of the signal on the input is, when the signal wakes up and0:17:39says, I've changed, and sends a message this way saying, I've changed. This guy says, OK, what's your value now? When he gets that value, then he's going to have to say, OK,0:17:50output changes this guy, changes this guy.0:18:00And so on. And so I have to have at least that much connected-ness. Now, let's go back and look, for example, at the and-gate.0:18:10Here we are back on this slide. And we can see some of these parts. For any particular and-gate, there is an A1, there is an A2, and the output.0:18:21And those are, those are an environment that was created at the--those produce a frame at the time and-gate was0:18:30called, a frame where A1, A2, and output are--have as their values, they're bound to the wires which, they are--which0:18:41were passed in. In that environment, I constructed a procedure--0:18:50this one right there. And-action procedure was constructed in that environment. That was the result of evaluating a lambda0:19:00expression. So it hangs onto the frame where these were defined. Local--part of its local state is that.0:19:11The and-action procedure, therefore, has access to A1, A2, and output as we see here. A1, A2, and output.0:19:22Now, we haven't looked inside of a wire yet. That's all that remains. Let's look at a wire.0:19:33Like the overhead, very good. Well, the wire, again, is a, is a somewhat complicated mess.0:19:43Ooh, wrong one. It's a big complicated mess, like that. But let's look at it in detail and see what's going on.0:19:54Well, the wire is one of these. And it has to have two things that are part of it, that it's state.0:20:05One of them is the signal we see here. In other words, when we call make-wire to make a wire, then the first thing we do is we create some variables which0:20:15are the signal and the action procedures for this wire. And in that context, we define various functions--or0:20:26procedures, excuse me, procedures. One of them is called set-my-signal to a new value. And what that does is takes a new value in.0:20:37If that's equal to my current value of my signal, I'm done. Otherwise, I set the signal to the new value and call each of the action procedures that I've been, that I've0:20:47been--what's the right word?-- introduced to. I get introduced when the and-gate was applied to me.0:21:04I add action procedure at the bottom. Also, I have to define a way of accepting an action procedure-- which is what you see here--- which increments my action procedures using set to the0:21:18result of CONSing up a new process--a procedure, which is passed to me, on to my actions procedures list. And for technical reasons, I have to call that procedure one.0:21:27So I'm not going to tell you anything about that, that has to do with event-driven simulations and getting them started, which takes a little bit of thinking.0:21:36And finally, I'm going to define a thing called the dispatcher, which is a way of passing a message to a wire,0:21:45which is going to be used to extract from it various information, like what is the current signal value? What is the method of setting your signal?0:21:57I want to get that out of it. How do I--how do I add another action procedure? And I'm going to return that dispatch, that0:22:08procedure as a value. So the wire that I've constructed is a message accepting object which accepts a message like, like what's your method of adding action procedures?0:22:19In fact, it'll give me a procedure, which is the add action procedure, which I can then apply to an action procedure to create another action procedure in the wire.0:22:31So that's a permission. So it's given me permission to change your action procedures. And in fact, you can see that over here.0:22:41Next slide. Ah. This is nothing very interesting. The call each of the action procedures is just a CDRing0:22:52down a list. And I'm not going to even talk about that anymore. We're too advanced for that. However, if I want to get a signal from a wire, I ask the wire--0:23:02which is, what is the wire? The wire is the dispatch returned by creating the wire. It's a procedure. I call that dispatch on the message get-signal.0:23:12And what I should expect to get is a method of getting a signal. Or actually, I get the signal. If I want to set a signal, I want to change a signal, then0:23:25what I'm going to do is take a wire as an argument and a new value for the signal, I'm going to ask the wire for permission to set its signal and use that permission, which0:23:35is a procedure, on the new value. And if we go back to the overhead here, thank you, if0:23:44we go back to the overhead here, we see that the method-- if I ask for the method of setting the signal, that's over here, it's set-my-signal, a procedure that's defined0:23:54inside the wire, which if we look over here is the thing that says set my internal value called the signal, my internal variable, which is the signal, to the new value,0:24:08which is passed to me as an argument, and then call each of the action procedures waking them up. Very simple.0:24:19Going back to that slide, we also have the one last thing-- which I suppose now you can easily work out for yourself-- is the way you add an action.0:24:30You take a wire--a wire and an action procedure. And I ask the wire for permission to add an action.0:24:40Getting that permission, I use that permission to give it an action procedure. So that's a real object. There's a few more details about this.0:24:52For example, how am I going to control this thing? How do I do these delays?0:25:01Let's look at that for a second. The next one here. Let's see. We know when we looked at the and-gate or the not-gate that0:25:15when a signal changed on the input, there was a delay. And then it was going to call the procedure, which was going to change the output.0:25:26Well, how are we going to do this? We're going to make up some mechanism, a fairly complicated mechanism at that, which we're going to have to be very careful about. But after a delay, we're going to do an action.0:25:37A delay is a number, and an action is a procedure. What that's going to be is they're going to have a special structure called an agenda, which is a thing that0:25:47organizes time and actions. And we're going to see that in a while. I don't want to get into that right now. But the agenda has a moment at which--at which something happens.0:25:59We're setting up for later at some moment, which is the sum of the time, which is the delay time plus the current time, which the agenda thinks is now.0:26:08We're going to set up to do this action, and add that to the agenda. And the way this machine will now run is very simple.0:26:18We have a thing called propagate, which is the way things run. If the agenda is empty, we're done--if there's nothing more to be done.0:26:27Otherwise, we're going to take the first item off the agenda, and that's a procedure of no arguments. So that we're going to see extra parentheses here.0:26:36We call that on no arguments. That takes the action. Then we remove that first item from the agenda, and we go0:26:45around the propagation loop. So that's the overall structure of this thing. Now, there's a, a few other things we can look at.0:26:57And then we're going to look into the agenda a little while from now. Now the overhead again. Well, in order to set this thing going, I just want to show you some behavior out of this simulator.0:27:07By the way, you may think this simulator is very simple, and probably too simple to be useful. The fact of the matter is that this simulator has been used to manufacture a fairly large computer.0:27:18So this is a real live example. Actually, not exactly this simulator, because I'll tell you the difference. The difference is that there were many more different kinds0:27:28of primitives. There's not just the word inverter or and-gate. There were things like edge-triggered, flip-flops,0:27:37and latches, transparent latches, and adders, and things like that. And the difficulty with that is that there's pages and0:27:48pages of the definitions of all these primitives with numbers like LS04. And then there's many more parameters for them. It's not just one delay.0:27:58There's things like set up times and hold times and all that. But with the exception of that part of the complexity, the structure of the simulator that we use for building a0:28:07real computer, that works is exactly what you're seeing here. Well in any case, what we have here is a few simple things.0:28:19Like, there's inverter delays being set up and making a new agenda. And then we can make some inputs. There's input-1, input-2, a sum and a0:28:28carry, which are wires. I'm going to put a special kind of object called a probe onto, onto some of the wires, onto sum and onto carry.0:28:37A probe is a, can object that has the property that when you change a wire it's attached to, it types out a message.0:28:46It's an easy thing to do. And then once we have that, of course, the way you put the probe on, the first thing it does, it says, the current value of the sum at time 0 is 0 because I just noticed it.0:28:59And the value of the carry at time 0, this is the time, is 0. And then we go off and we build some structure.0:29:09Like, we can build a structure here that says you have a half-adder on input-1, input-2, sum, and carry.0:29:18And we're going to set the signal on input-1 to 1. We do some propagation. At time 8, which you could see going through this thing if you wanted to, the new value of sum became 1.0:29:29And the thing says I'm done. That wasn't very interesting. But we can send it some more signals. Like, we set-signal on input-2 to be one. And at that time if we propagate, then it carried at0:29:3911, the carry becomes 1, and at 16, the sum's new value becomes 0. And you might want to work out that, if you like, about the0:29:48digital circuitry. It's true, and it works. And it's not very interesting. But that's the kind of behavior we get out of this thing.0:30:01So what I've shown you right now is a large-scale picture, how you, at a bigger, big scale, you implement an0:30:10event-driven simulation of some sort. And how you might organize it to have nice hierarchical structure allowing you to build abstract boxes that you0:30:20can instantiate. But I haven't told you any of the details about how this agenda and things like that work. That we'll do next. And that's going to involve change and mutation of data0:30:32and things like that. Are there any questions now, before I go on?0:30:47Thank you. Let's take a break.0:31:28Well, we've been making a simulation. And the simulation is an event-driven simulation where0:31:39the objects in the world are the objects in the computer. And the changes of state that are happening in the world in time are organized to be time in the computer, so that if0:31:53something happens after something else in the world, then we have it happen after, after the corresponding events happen in the same order in the computer.0:32:04That's where we have assignments, when we make that alignment. Right now I want to show you a way of organizing time, which is an agenda or priority queue, it's sometimes called.0:32:16We'll do some--we'll do a little bit of just understanding what are the things we need to be able to do to make agendas.0:32:28And so we're going to have--and so right now over here, I'm going to write down a bunch of primitive operations for manipulating agendas. I'm not going to show you the code for them because they're0:32:38all very simple, and you've got listings of all that anyway. So what do we have? We have things like make-agenda which produces a0:32:52new agenda. We can ask--we get the current-time of an agenda,0:33:10which gives me a number, a time. We can get--we can ask whether an agenda is empty,0:33:20empty-agenda.0:33:30And that produces either a true or a false.0:33:42We can add an object to an agenda.0:33:52Actually, what we add to an agenda is an operation--an action to be done. And that takes a time, the action itself, and the agenda0:34:03I want to add it to. That inserts it in the appropriate place in the agenda. I can get the first item off an agenda, the first thing I0:34:14have to do, which is going to give me an action.0:34:26And I can remove the first item from an agenda. That's what I have to be able to do with agendas. That is a big complicated mess.0:34:42From an agenda. Well, let's see how we can organize this thing as a data structure a bit.0:34:52Well, an agenda is going to be some kind of list. And it's going to be a list that I'm going to have to be able to modify.0:35:01So we have to talk about modifying of lists, because I'm going to add things to it, and delete things from it, and things like that.0:35:11It's organized by time. It's probably good to keep it in sorted order. But sometimes there are lots of things that happen at the0:35:22same time--approximate same time. What I have to do is say, group things by the time at which they're supposed to happen. So I'm going to make an agenda as a list of segments.0:35:32And so I'm going to draw you a data structure for an agenda, a perfectly reasonable one. Here's an agenda.0:35:41It's a thing that begins with a name. I'm going to do it right now out of list structure.0:35:52It's got a header. There's a reason for the header. We're going to see the reason soon. And it will have a segment.0:36:03It will have--it will be a list of segments. Supposing this agenda has two segments, they're the car's--0:36:13successive car's of this list. Each segment is going to have a time--0:36:24say for example, 10-- that says that the things that happen in this segment are at time 10.0:36:33And what I'm going to have in here is another data structure which I'm not going to describe, which is a queue of things to do at time 10.0:36:42It's a queue. And we'll talk about that in a second. But abstractly, the queue is just a list of things to do at a particular time. And I can add things to a queue.0:36:53This is a queue. There's a time, there's a segment.0:37:02Now, I may have another segment in this agenda. Supposing this is stuff that happens at time 30.0:37:13It has, of course, another queue of things that are queued up to be done at time 30.0:37:23Well, there are various things I have to be able to do to an agenda. Supposing I want to add to an agenda another thing to be done at time 10.0:37:33Well, that's not very hard. I'm going to walk down here, looking for the segment of time 10. It is possible that there is no segment of time 10.0:37:42We'll cover that case in a second. But if I find a segment of time 10, then if I want to add another thing to be done at time 10, I just0:37:51increase that queue-- "just increase" isn't such an obvious idea. But I increase the things to be done at that time.0:38:01Now, supposing I want to add something to be done at time 20. There is no segment for time 20. I'm going to have to create a new segment.0:38:11I want my time 20 segment to exist between time 10 and time 30. Well, that takes a little work.0:38:20I'm going to have to do a CONS. I'm going to have to make a new element of the agenda list--list of segments.0:38:33I'm going to have to change. Here's change. I'm going to have to change the CDR of the CDR of the0:38:42agenda to point that a new CONS of the new segment and the CDR of the CDR of the CDR of the agenda, the CD-D-D-DR.0:38:56And this is going to have a new segment now of time 20 with its own queue, which now has one element in it.0:39:10If I wanted to add something at the end, I'm going to have to replace the CDR of this, of this list with something.0:39:20We're going to have to change that piece of data structure. So I'm going to need new primitives for doing this. But I'm just showing you why I need them.0:39:29And finally, if I wanted to add a thing to be done at time 5, I'm going to have to change this one, because I'm going to0:39:41have to add it in over here, which is why I planned ahead and had a header cell, which has a place. If I'm going to change things, I have to have0:39:50places for the change. I have to have a place to make the change. If I remove things from the agenda, that's not so hard.0:40:02Removing them from the beginning is pretty easy, which is the only case I have. I can go looking for the first, the first segment.0:40:11I see if it has a non-empty queue. If it has a non-empty queue, well, I'm going to delete one element from the queue, like that.0:40:20If the queue ever becomes empty, then I have to delete the whole segment. And then this, this changes to point to here. So it's quite a complicated data structure manipulation0:40:30going on, the details of which are not really very exciting. Now, let's talk about queues. They're similar.0:40:41Because each of these agendas has a queue in it. What's a queue? A queue is going to have the following primitive0:40:51operations. To make a queue, this gives me a new queue.0:41:07I'm going to have to be able to insert into a queue a new item.0:41:24I'm going to have to be able to delete from a queue the first item in the queue.0:41:39And I want to be able to get the first thing in the queue0:41:51from some queue. I also have to be able to test whether a queue is empty.0:42:07And when you invent things like this, I want you to be very careful to use the kinds of conventions I use for naming things. Notice that I'm careful to say these change something and0:42:18that tests it. And presumably, I did the same thing over here. OK, and there should be an empty test over here.0:42:29OK, well, how would I make a queue? A queue wants to be something I can add to at the end of, and pick up the thing at the beginning of. I should be able to delete from the beginning0:42:39and add to the end. Well, I'm going to show you a very simple structure for that. We can make this out of CONSes as well. Here's a queue.0:42:49It has--it has a queue header, which contains two parts-- a front pointer and a rear pointer.0:43:02And here I have a queue with two items in it. The first item, I don't know, it's perhaps a 1.0:43:12And the second item, I don't know, let's give it a 2.0:43:21The reason why I want two pointers in here, a front pointer and a rear pointer, is so I can add to the end without having to chase down from the beginning.0:43:31So for example, if I wanted to add one more item to this queue, if I want to add on another item to be worried0:43:40about later, all I have to do is make a CONS, which contains that item, say a 3. That's for inserting 3 into the queue.0:43:51Then I have to change this pointer here to here.0:44:00And I have to change this one to point to the new rear.0:44:09If I wish to take the first element of the queue, the first item, I just go chasing down the front pointer until I find the first one and pick it up.0:44:18If I wish to delete the first item from the queue, delete-queue, all I do is move the front pointer along this way.0:44:27The new front of the queue is now this. So queues are very simple too. So what you see now is that I need a certain number of new0:44:39primitive operations. And I'm going to give them some names. And then we're going to look into how they work, and how they're used. We have set the CAR of some pair, or a thing produced by0:44:56CONSing, to a new value. And set the CDR of a pair to a new value.0:45:12And then we're going to look into how they work. I needed setting CAR over here to delete the first element of the queue. This is the CAR, and I had to set it.0:45:23I had to be able to set the CDR to be able to move the rear pointer, or to be able to increment the queue here. All of the operations I did were made out of those that I0:45:33just showed you on the, on the last blackboard. Good. Let's pause the time, and take a little break then.0:46:38When we originally introduced pairs made out of CONS, made by CONS, we only said a few axioms about them, which were0:46:48of the form-- what were they-- for all X and Y, the CAR of the CONS of X and Y is X and0:47:06the CDR of the CONS of X and Y is Y. Now, these say nothing0:47:15about whether a CONS has an identity like a person. In fact, all they say is something sort of abstract,0:47:25that a CONS is the parts it's made out of. And of course, two things are made out of the same parts, they're the same, at least from the point of view of0:47:34these axioms. But by introducing assignment-- in fact, mutable data is a kind of assignment, we have a0:47:43set CAR and a set CDR-- by introducing those, these axioms no longer tell the whole story. And they're still true if written exactly like this.0:47:53But they don't tell the whole story. Because if I'm going to set a particular CAR in a particular CONS, the questions are, well, is that setting all CARs and0:48:05all CONSes of the same two things or not? If I--if we use CONSes to make up things like rational numbers, or things like 3 over 4, supposing I had two0:48:19three-fourths. Are they the same one-- or are they different? Well, in the case of numbers, it doesn't matter. Because there's no meaning to changing the0:48:29denominator of a number. What you could do is make a number which has a different denominator. But the concept of changing a number which has to have a0:48:38different denominator is sort of a very weird, and sort of not supported by what you think of as mathematics. However, when these CONSes represent things in the physical world, then changing something like the CAR is like0:48:50removing a piece of the fingernail. And so CONSes have an identity. Let me show you what I mean about identity, first of all.0:49:01Let's do some little example here. Supposing I define A to the CONS of 1 and 2.0:49:18Well, what that means, first of all, is that somewhere in some environment I've made a symbol A to have a value which0:49:27is a pair consisting of pointers to a 1 and a pointer to a 2, just like that.0:49:38Now, supposing I also say define B to be the CONS--0:49:53it doesn't matter, but I like it better, it's prettier-- of A and A.0:50:03Well, first of all, I'm using the name A twice. At this moment, I'm going to think of CONSes as having identity. This is the same one.0:50:13And so what that means is I make another pair, which I'm going to call B. And it contains two pointers to A. At0:50:29this point, I have three names for this object. A is its name. The CAR of B is its name. And the CDR of B is its name.0:50:39It has several aliases, they're called. Now, supposing I do something like set-the-CAR, the CAR of0:51:01the CAR of B to 3.0:51:12What that means is I find the CAR of B, that's this. I set the CAR of that to be 3, changing this.0:51:24I've changed A. If I were to ask what's the CAR of A--of A now?0:51:35I would get out 3, even though here we see that A was the CONS of 1 and 2.0:51:45I caused A to change by changing B. There is sharing here. That's sometimes what we want.0:51:54Surely in the queues and things like that, that's exactly what we defined our--organized our data structures to facilitate-- sharing.0:52:04But inadvertent sharing, unanticipated interactions between objects, is the source of most of the bugs that occur in complicated programs. So by introducing this possibility0:52:17of things having identity and sharing and having multiple names for the same thing, we get a lot of power. But we're going to pay for it with lots of0:52:27complexity and bugs. So also, for example, if I just looked at this just to drive that home, the CADR of B, which has nothing to do0:52:43with even the CAR of B, apparently. The CADR of B, what's that? Take that CDR of B and now take the CAR of that.0:52:53Oh, that's 3 also. So I can have non-local interactions by sharing. And I have to be very careful of that.0:53:06Well, so far, of course, it seems I've introduced several different assignment operators-- set, set CAR, set CDR. Well, maybe I should just get rid of0:53:19set CAR and set CDR. Maybe they're not worthwhile. Well, the answer is that once you let the camel's nose into the tent, the rest of him follows.0:53:30All I have to have is set, and I can make all of the--all of the bad things that can happen. Let's play with that a little bit.0:53:40A couple of days ago, when we introduced compound data, you saw Hal show you a definition of CONS in terms0:53:49of a message acceptor. I'm going to show you even a more horrible thing, a definition of CONS in terms of nothing but air, hot air.0:54:04What is the definition of CONS, of the old functional kind, in terms of purely lambdic expressions,0:54:13procedures? Because I'm going to then modify this definition to get assignment to be only one kind of assignment, to get rid of0:54:25the set CAR and set CDR in terms of set. So what if I define CONS of X and Y to be a procedure of one0:54:41argument called a message M, which calls that message on X and Y?0:54:51This [? idea ?] was invented by Alonzo Church, who was the greatest programmer of the 20th century, although he never saw a computer. It was done in the 1930s. He was a logician, I suppose at Princeton at the time.0:55:08Define CAR of X to be the result of applying X to that procedure of two arguments, A and D, which selects A. I will0:55:24define CDR of X to be that procedure, to be the result of0:55:36applying X to that procedure of A and D, which selects D.0:55:46Now, you may not recognize this as CAR, CDR, and CONS. But I'm going to demonstrate to you that it satisfies the original axioms, just once.0:55:55And then we're going to do some playing of games. Consider the problem CAR of CONS of, say, 35 and 47.0:56:09Well, what is that? It is the result of taking car of the result of substituting 35 and 47 for X and Y in the body of this.0:56:19Well, that's easy enough. That's CAR of the result of substituting into lambda of M, M of 35 and 47.0:56:35Well, what this is, is the result of substituting this object for X in the body of that. So that's just lambda of M--0:56:48that's substituted, because this object is being substituted for X, which is the beginning of a list, lambda of M--0:56:57M of 35 and 47, applied to that procedure of A and D,0:57:07which gives me A. Well, that's the result of substituting this for M here. So that's the same thing as lambda of A, D, A,0:57:22applied to 35 and 47. Oh, well that's 35. That's substituting 35 for A and for 47 for D in A. So I0:57:36don't need any data at all, not even numbers. This is Alonso Church's hack.0:57:52Well, now we're going to do something nasty to him. Being a logician, he wouldn't like this. But as programmers, let's look at the overhead.0:58:03And here we go. I'm going to change the definition of CONS. It's almost the same as Alonzo Church's, but not quite.0:58:14What do we have here? The CONS of two arguments, X and Y, is going to be that procedure of one argument M, which supplies M to X and Y as0:58:25before, but also to two permissions, the permission to set X to N and the permission to set Y to N, given that I0:58:35have an N. So besides the things that I had here in Church's0:58:44definition, what I have is that the thing that CONS returns will apply its argument to not just the0:58:55values of the X and Y that the CONS is made of, but also permissions to set X and Y to new values.0:59:06Now, of course, just as before, CAR is exactly the same. The CAR of X is nothing more than applying X, as in Church's definition, to a procedure, in this case, of0:59:18four arguments, which selects out the first one. And just as we did before, that will be the value of X0:59:28that was contained in the procedure which is the result of evaluating this lambda expression in the environment where X and Y are defined over here.0:59:41That's the value of CONS. Now, however, the exciting part. CDR, of course, is the same. The exciting part, set CAR and set CDR. Well, they're nothing0:59:54very complicated anymore. Set CAR of a CONS X to a new value Y is nothing more than applying that CONS, which is the procedure of four--the1:00:06procedure of one argument which applies its argument to four things, to a procedure which is of four arguments--1:00:15the value of X, the value of Y, permission to set X, the permission to set Y-- and using it--using that permission to set X to the new value.1:00:31And similarly, set-cdr is the same thing. So what you've just seen is that I didn't introduce any new primitives at all.1:00:40Whether or not I want to implement it this way is a matter of engineering. And the answer is of course I don't implement it this way for reasons that have to do with engineering.1:00:51However in principle, logically, once I introduced one assignment operator, I've assigned--I've introduced them all.1:01:05Are there any questions? Yes, David. AUDIENCE: I can follow you up until you get--I can follow1:01:14all of that. But when we bring in the permissions, defining CONS in terms of the lambda N, I don't follow where N gets passed.1:01:24PROFESSOR: Oh, I'm sorry. I'll show you. Let's follow it. Of course, we could do it on the blackboard. It's not so hard. But it's also easy here. Supposing I wish to set-cdr of X to Y. See that right there.1:01:38set-cdr of X to Y. X is presumably a CONS, a thing resulting from evaluating CONS. Therefore X comes from a place over here, that that X is of1:01:54the result of evaluating this lambda expression. Right? That when I evaluated that lambda expression, I evaluated1:02:04it in an environment where the arguments to CONS were defined. That means that as free variables in this lambda1:02:14expression, there is the--there are in the frame, which is the parent frame of this lambda expression, the1:02:23procedure resulting from this lambda expression, X and Y have places. And it's possible to set them. I set them to an N, which is the argument of the1:02:35permission. The permission is a procedure which is passed to M, which is the argument that the CONS object gets passed.1:02:47Now, let's go back here in the set-cdr The CONS object, which is the first argument of set-cdr1:02:56gets passed an argument. That--there's a procedure of four things, indeed, because that's the same thing as this M over here, which is applied1:03:05to four objects. The object over here, SD, is, in fact, this permission.1:03:15When I use SD, I apply it to Y, right there. So that comes from this.1:03:25AUDIENCE: So what do you-- PROFESSOR: So to finish that, the N that was here is the Y which is here.1:03:34How's that? AUDIENCE: Right, OK. Now, when you do a set-cdr, X is the value the CDR is going to become. PROFESSOR: The X over here.1:03:44I'm sorry, that's not true. The X is--set-cdr has two arguments-- The CONS I'm changing and the value I'm changing it to.1:03:56So you have them backwards, that's all. Are there any other questions?1:04:07Well, thank you. It's time for lunch.

`0:00:00`Lecture 6A | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:18PROFESSOR: Well, last time Gerry really let the cat out of the bag. He introduced the idea of assignment. Assignment and state.0:00:37And as we started to see, the implications of introducing assignment and state into the language are absolutely frightening. First of all, the substitution model of0:00:47evaluation breaks down. And we have to use this much more complicated environment model and this very mechanistic thing with diagrams, even to say what statements in the programming0:00:56language mean. And that's not a mere technical point. See, it's not that we had this particular substitution model and, well, it doesn't quite work, so we have to do0:01:05something else. It's that nothing like the substitution model can work. Because suddenly, a variable is not just something that0:01:15stands for a value. A variable now has to somehow specify a place that holds a value. And the value that's in that place can change.0:01:30Or for instance, an expression like f of x might have a side0:01:39effect in it. So if we say f of x and it has some value, and then later we say f of x again, we might get a different value0:01:48depending on the order. So suddenly, we have to think not only about values but about time.0:01:57And then things like pairs are no longer just their CARs and their CDRs. A pair now is not quite its CAR and its CDR. It's rather0:02:06its identity. So a pair has identity. It's an object.0:02:21And two pairs that have the same CAR and CDR might be the same or different, because suddenly we have to worry about sharing.0:02:34So all of these things enter as soon as we introduce assignment. See, this is a really far cry from where we started with0:02:43substitution. It's a technically harder way of looking at things because we have to think more mechanistically about our0:02:52programming language. We can't just think about it as mathematics. It's philosophically harder, because suddenly there are all these funny issues about what does it mean that something0:03:02changes or that two things are the same. And also, it's programming harder, because as Gerry showed last time, there are all these bugs having to do with bad sequencing and aliasing that just don't exist0:03:14in a language where we don't worry about objects. Well, how'd we get into this mess?0:03:23Remember what we did, the reason we got into this is because we were looking to build modular systems. We0:03:35wanted to build systems that fall apart into chunks that seem natural. So for instance, we want to take a random number generator0:03:46and package up the state of that random number generator inside of it so that we can separate the idea of picking random numbers from the general Monte Carlo strategy0:03:56of estimating something and separate that from the particular way that you work with random numbers in that formula developed by Cesaro for pi.0:04:06And similarly, when we go off and construct some models of things, if we go off and model a system that we see in the0:04:15real world, we'd like our program to break into natural pieces, pieces that mirror the parts of the system that we see in the real world.0:04:24So for example, if we look at a digital circuit, we say, gee, there's a circuit and it has a piece and0:04:33it has another piece. And these different pieces sort of have identity.0:04:43They have state. And the state sits on these wires. And we think of this piece as an object that's different from that as an object.0:04:52And when we watch the system change, we think about a signal coming in here and changing a state that might be here and going here and interacting with a state that might be stored there, and so on and so on.0:05:06So what we'd like is we'd like to build in the computer systems that fall into pieces that mirror our view of0:05:17reality, of the way that the actual systems we're modeling seem to fall into pieces. Well, maybe the reason that building systems like this0:05:28seems to introduce such technical complications has nothing to do with computers. See, maybe the real reason that we pay such a price to0:05:37write programs that mirror our view of reality is that we have the wrong view of reality. See, maybe time is just an illusion, and0:05:47nothing ever changes. See, for example, if I take this chalk, and we say, gee, this is an object and it has a state. At each moment it has a position and a velocity.0:05:59And if we do something, that state can change. But if you studied any relativity, for instance, you know that you don't think of the path of that chalk as0:06:09something that goes on instant by instant. It's more insightful to think of that whole chalk's existence as a path in space-time. that's all splayed out.0:06:18There aren't individual positions and velocities. There's just its unchanging existence in space-time. Similarly, if we look at this electrical system, if we0:06:28imagine this electrical system is implementing some sort of signal processing system, the signal processing engineer who put that thing together doesn't think of it as, well,0:06:39at each instance there's a voltage coming in. And that translates into something. And that affects the state over here, which changes the state over here. Nobody putting together a signal processing system0:06:49thinks about it like that. Instead, you say there's this signal that's splayed out over time.0:06:58And if this is acting as a filter, this whole thing transforms this whole thing for some sort of other output.0:07:09You don't think of it as what's happening instant by instant as the state of these things. And somehow you think of this box as a whole thing, not as little pieces sending messages of state to each other at0:07:20particular instants. Well, today we're going to look at another way to0:07:30decompose systems that's more like the signal processing engineer's view of the world than it is like thinking about objects that communicate sending messages.0:07:41That's called stream processing.0:07:54And we're going to start by showing how we can make our programs more uniform and see a lot more commonality if we0:08:08throw out of these programs what you might say is an inordinate concern with worrying about time.0:08:17Let me start by comparing two procedures. The first one does this. We imagine that there's a tree.0:08:30Say there's a tree of integers. It's a binary tree.0:08:39So it looks like this. And there's integers in each of the nodes. And what we would like to compute is for each odd number0:08:51sitting here, we'd like to find the square and then sum up all those squares. Well, that should be a familiar kind of thing. There's a recursive strategy for doing it.0:09:02We look at each leaf, and either it's going to contribute the square of the number if it's odd or 0 if it's even. And then recursively, we can say at each tree, the sum of0:09:13all of them is the sum coming from the right branch and the left branch, and recursively down through the nodes. And that's a familiar way of thinking about programming. Let's actually look at that on the slide.0:09:23We say to sum the odd squares in a tree, well, there's a test. Either it's a leaf node, and we're going to check to see if it's an integer, and then either it's odd, in which0:09:34we take the square, or else it's 0. And then the sum of the whole thing is the sum coming from the left branch and the right branch.0:09:46OK, well, let me contrast that with a second problem. Suppose I give you an integer n, and then some function to0:09:55compute of the first of each integer in 1 through n. And then I want to collect together in a list all those function values that satisfy some property.0:10:05That's a general kind of thing. Let's say to be specific, let's imagine that for each integer, k, we're going to compute the k Fibonacci number.0:10:14And then we'll see which of those are odd and assemble those into a list. So here's a procedure that does that.0:10:23Find the odd Fibonacci numbers among the first n. And here is a standard loop the way we've been writing it. This is a recursion. It's a loop on k, and says if k is bigger than n, it's the0:10:33empty list. Otherwise we compute the k-th Fibonacci number, call that f. If it's odd, we CONS it on to the list starting0:10:45with the next one. And otherwise, we just take the next one. And this is the standard way we've been writing iterative loops. And we start off calling that loop with 1.0:10:57OK, so there are two procedures. Those procedures look very different. They have very different structures. Yet from a certain point of view, those procedures are0:11:07really doing very much the same thing. So if I was talking like a signal processing engineer, what I might say is that the first procedure enumerates the0:11:25leaves of a tree. And then we can think of a signal coming out of that, which is all the leaves.0:11:35We'll filter them to see which ones are odd, put them through some kind of filter.0:11:45We'll then put them through a kind of transducer. And for each one of those things, we'll take the square.0:11:54And then we'll accumulate all of those. We'll accumulate them by sticking them together with addition starting from 0.0:12:07That's the first program. The second program, I can describe in a very, very similar way. I'll say, we'll enumerate the numbers on this interval, for0:12:17the interval 1 through n. We'll, for each one, compute the Fibonacci number, put them0:12:28through a transducer. We'll then take the result of that, and we'll filter it for oddness. And then we'll take those and put them into an accumulator.0:12:39This time we'll build up a list, so we'll accumulate with CONS starting from the empty list. So this way of looking at the program makes the two seem0:12:50very, very similar. The problem is that that commonality is completely obscured when we look at the procedures we wrote. Let's go back and look at some odd squares again, and say0:13:02things like, where's the enumerator? Where's the enumerator in this program? Well, it's not in one place.0:13:11It's a little bit in this leaf-node test, which is going to stop. It's a little bit in the recursive structure of the thing itself.0:13:23Where's the accumulator? The accumulator isn't in one place either. It's partly in this 0 and partly in this plus.0:13:32It's not there as a thing that we can look at. Similarly, if we look at odd Fibs, that's also, in some sense, an enumerator and an accumulator, but0:13:42it looks very different. Because partly, the enumerator is here in this greater than sign in the test. And partly it's in this whole recursive0:13:52structure in the loop, and the way that we call it. And then similarly, that's also mixed up in there with the accumulator, which is partly over there and partly0:14:01over there. So these very, very natural pieces, these very natural boxes here don't appear in our programs. Because they're kind0:14:13of mixed up. The programs don't chop things up in the right way. Going back to this fundamental principle of computer science0:14:22that in order to control something, you need the name of it, we don't really have control over thinking about things this way because we don't have our hands in them explicitly.0:14:31We don't have a good language for talking about them. Well, let's invent an appropriate language in which0:14:42we can build these pieces. The key to the language is these guys, is what is these things I called signals? What are these things that are flying on the0:14:52arrows between the boxes? Well, those things are going to be data structures called0:15:02streams. That's going to be the key to inventing this language. What's a stream? Well, a stream is, like anything else, a data abstraction.0:15:12So I should tell you what its selectors and constructors are. For a stream, we're going to have one constructor that's called CONS-stream.0:15:25CONS-stream is going to put two things together to form a thing called a stream. And then to extract things from the stream, we're going0:15:34to have a selector called the head of the stream. So if I have a stream, I can take its head or I can take its tail.0:15:44And remember, I have to tell you George's contract here to tell you what the axioms are that relate these.0:15:53And it's going to be for any x and y, if I form the0:16:04CONS-stream and take the head, the head of CONS-stream of x and y is going to be x and the tail of CONS-stream of x and y0:16:26is going to be y. So those are the constructor, two selectors for streams, and an axiom. There's something fishy here.0:16:36So you might notice that these are exactly the axioms for CONS, CAR, and CDR. If instead of writing CONS-stream I wrote0:16:46CONS and I said head was the CAR and tail was the CDR, those are exactly the axioms for pairs. And in fact, there's another thing here.0:16:55We're going to have a thing called the-empty-stream, which is like the-empty-list.0:17:08So why am I introducing this terminology? Why don't I just keep talking about pairs and lists? Well, we'll see. For now, if you like, why don't you just pretend that0:17:18streams really are just a terminology for lists. And we'll see in a little while why we want to keep this extra abstraction layer and not just call them lists.0:17:32OK, now that we have streams, we can start constructing the pieces of the language to operate on streams. And there are a whole bunch of very useful things that we could0:17:41start making. For instance, we'll make our map box to take a stream, s,0:17:54and a procedure, and to generate a new stream which has as its elements the procedure applied to all the0:18:03successive elements of s. In fact, we've seen this before. This is the procedure map that we did with lists. And you see it's exactly map, except we're testing for0:18:14empty-stream. Oh, I forgot to mention that. Empty-stream is like the null test. So if it's empty, we generate the empty stream. Otherwise, we form a new stream whose first element is0:18:24the procedure applied to the head of the stream, and whose rest is gotten by mapping along with the procedure down the tail of the stream.0:18:33So that looks exactly like the map procedure we looked at before. Here's another useful thing. Filter, this is our filter box. We're going to have a predicate and a stream.0:18:43We're going to make a new stream that consists of all the elements of the original one that satisfy the predicate. That's case analysis. When there's nothing in the stream, we0:18:53return the empty stream. We test the predicate on the head of the stream. And if it's true, we add the head of the stream onto the0:19:03result of filtering the tail of the stream. And otherwise, if that predicate was false, we just filter the tail of the stream.0:19:13Right, so there's filter. Let me run through a couple more rather quickly. They're all in the book and you can look at them. Let me just flash through.0:19:22Here's accumulate. Accumulate takes a way of combining things and an initial value in a stream and sticks them all together.0:19:31If the stream's empty, it's just the initial value. Otherwise, we combine the head of the stream with the result of accumulating the tail of the stream starting from the initial value.0:19:40So that's what I'd use to add up everything in the stream. I'd accumulate with plus. How would I enumerate the leaves of a tree? Well, if the tree is just a leaf itself, I make something0:19:54which only has that node in it. Otherwise, I append together the stuff of enumerating the left branch and the right branch.0:20:04And then append here is like the ordinary append on lists.0:20:13You can look at that. That's analogous to the ordinary procedure for appending two lists. How would I enumerate an interval? This will take two integers, low and high, and generate a0:20:24stream of the integers going from low to high. And we can make a whole bunch of pieces. So that's a little language of talking about streams. Once we0:20:34have streams, we can build things for manipulating them. Again, we're making a language. And now we can start expressing things in this language.0:20:43Here's our original procedure for summing the odd squares in a tree. And you'll notice it looks exactly now like the block0:20:52diagram, like the signal processing block diagram. So to sum the odd squares in a tree, we enumerate the leaves of the tree.0:21:01We filter that for oddness. We map that for squareness. And we accumulate the result of that using addition,0:21:12starting from 0. So we can see the pieces that we wanted. Similarly, the Fibonacci one, how do we get the odd Fibs?0:21:22Well, we enumerate the interval from 1 to n, we map along that, computing the Fibonacci of each one. We filter the result of those for oddness.0:21:34And we accumulate all of that stuff using CONS starting from the empty-list.0:21:43OK, what's the advantage of this? Well, for one thing, we now have pieces that we can start mixing and matching. So for instance, if I wanted to change this, if I wanted to0:21:58compute the squares of the integers and then filter them, all I need to do is pick up a standard piece like this in that square and put it in. Or if we wanted to do this whole Fibonacci computation on0:22:10the leaves of a tree rather than a sequence, all I need to do is replace this enumerator with that one. See, the advantage of this stream processing is that0:22:20we're establishing-- this is one of the big themes of the course-- we're establishing conventional interfaces that0:22:35allow us to glue things together. Things like map and filter are a standard set of components that we can start using for pasting together programs in all sorts of ways.0:22:45It allows us to see the commonality of programs. I just ought to mention, I've only showed you two procedures. But let me emphasize that this way of putting things together0:22:57with maps, filters, and accumulators is very, very general. It's the generate and test paradigm for programs. And as0:23:08an example of that, Richard Waters, who was at MIT when he was a graduate student, as part of his thesis research went and analyzed a large chunk of the IBM scientific0:23:17subroutine library, and discovered that about 60% of the programs in it could be expressed exactly in terms0:23:26using no more than what we've put here-- map, filter, and accumulate. All right, let's take a break.0:23:36Questions? AUDIENCE: It seems like the essence of this whole thing is just that you have a very uniform, simple data structure0:23:45to work with, the stream. PROFESSOR: Right. The essence is that you, again, it's this sense of conventional interfaces. So you can start putting a lot of things together.0:23:55And the stream is as you say, the uniform data structure that supports that. This is very much like APL, by the way. APL is very much the same idea, except in APL, instead0:24:06of this stream, you have arrays and vectors. And a lot of the power of APL is exactly the same reason of the power of this.0:24:19OK, thank you. Let's take a break.0:24:57All right. We've been looking at ways of organizing computations using streams. What I want to do now is just show you two somewhat0:25:07more complicated examples of that. Let's start by thinking about the following kind of utility procedure that will come in useful.0:25:16Suppose I've got a stream. And the elements of this stream are themselves streams. So the first thing might be 1, 2, 3.0:25:32So I've got a stream. And each element of the stream is itself a stream. And what I'd like to do is build a stream that collects0:25:45together all of the elements, pulls all of the elements out of these sub-streams and strings them all together in one thing. So just to show you the use of this language, how easy it is,0:25:56call that flatten. And I can define to flatten this stream of streams. Well,0:26:13what is that? That's just an accumulation. I want to accumulate using append, by0:26:25successively appending. So I accumulate using append streams, starting with0:26:36the-empty-stream down that stream of streams.0:26:54OK, so there's an example of how you can start using these higher order things to do some interesting operations. In fact, there's another useful thing0:27:04that I want to do. I want to define a procedure called flat-map, flat map of0:27:18some function and a stream. And what this is going to do is f will be a stream of elements. f is going to be a function that for each element in the0:27:28stream produces another stream. And what I want to do is take all of the elements and all of those streams and combine them together. So that's just going to be the flatten of map f down s.0:27:51Each time I apply f to an element of s, I get a stream. If I map it all the way down, I get a stream of streams, and I'll flatten that. Well, I want to use that to show you a new way to do a0:28:04familiar kind of problem. The problem's going to be like a lot of problems you've seen, although maybe not this particular one.0:28:14I'm going to give you an integer, n. And the problem is going to be find all pairs and integers i0:28:31and j, between 0 and i, with j less than i, up to n, such0:28:42that i plus j is prime.0:28:55So for example, if n equals 6, let's make a little table here, i and j and i plus j.0:29:09So for, say, i equals 2 and j equals 1, I'd get 3. And for i equals 3, I could have j equals 2, and that0:29:18would be 5. And 4 and 1 would be 5 and so on, up until i goes to 6.0:29:28And what I'd like to return is to produce a stream of all the triples like this, let's say i, j, and i plus j.0:29:37So for each n, I want to generate this stream. OK, well, that's easy. Let's build it up.0:29:47We start like this. We're going to say for each i, we're going to generate a stream.0:29:56For each i in the interval 1 through n, we're going to generate a stream. What's that stream going to be? We're going to start by generating all the pairs. So for each i, we're going to generate, for each j in the0:30:11interval 1 to i minus 1, we'll generate the pair, or the list with two elements i and j.0:30:23So we map along the interval, generating the pairs. And for each i, that generates a stream of pairs.0:30:33And we flatmap it. Now we have all the pairs i and j, such that i is less than j. So that builds that. Now we're got to test them.0:30:42Well, we take that thing we just built, the flatmap, and we filter it to see whether the i-- see, we had an i and a j.0:30:51i was the first thing in the list, j was the second thing in the list. So we have a predicate which says in that list of two elements is the sum of the0:31:00CAR and the CDR prime. And we filter that collection of pairs we just built. So those are the pairs we want.0:31:09Now we go ahead and we take the result of that filter and we map along it, generating the list i and j and i plus j.0:31:19And that's our procedure prime-sum-pairs. And then just to flash it up, here's the whole procedure. A map, a filter, a flatmap.0:31:34There's the whole thing, even though this isn't particularly readable. It's just expanding that flatmap. So there's an example which illustrates the general point0:31:45that nested loops in this procedure start looking like compositions of flatmaps of flatmaps of flatmaps of maps and things.0:31:54So not only can we enumerate individual things, but by using flatmaps, we can do what would correspond to nested loops in most other languages.0:32:03Of course, it's pretty awful to keep writing these flatmaps of flatmaps of flatmaps. Prime-sum-pairs you saw looked fairly complicated, even0:32:13though the individual pieces were easy. So what you can do, if you like, is introduced some syntactic sugar that's called collect. And collect is just an abbreviation for that nest of0:32:23flatmaps and filters arranged in that particular way. Here's prime-sum-pairs again, written using collect. It says to find all those pairs, I'm going to collect0:32:32together a result, which is the list i, j, and i plus j, that's going to be generated as i runs through the interval0:32:44from 1 to n and as j runs through the interval from 1 to i minus 1, such that i plus j is prime.0:32:58So I'm not going to say what collect does in general. You can look at that by looking at it in the book. But pretty much, you can see that the pieces of this are the pieces of that original procedure I wrote.0:33:08And this collect is just some syntactic sugar for automatically generating that nest of flatmaps and flatmaps. OK, well, let me do one more example that shows you the0:33:21same kind of thing. Here's a very famous problem that's used to illustrate a lot of so-called backtracking computer algorithms. This is the eight queens problem.0:33:30This is a chess board. And the eight queens problem says, find a way to put down eight queens on a chess board so that no two are attacking each other. And here's a particular solution to the0:33:39eight queens problem. So I have to make sure to put down queens so that no two are in the same row or the same column or sit0:33:48along the same diagonal. Now, there's sort of a standard way of doing that.0:33:59Well, first we need to do is below the surface, at George's level. We have to find some way to represent a board, and represent positions.0:34:08And we'll not worry about that. But let's assume that there's a predicate called safe. And what safe is going to do is going to say given that I0:34:19have a bunch of queens down on the chess board, is it OK to put a queen in this particular spot? So safe is going to take a row and a column.0:34:32That's going to be a place where I'm going to try and put down the next queen, and the rest of positions.0:34:45And what safe will say is given that I already have queens down in these positions, is it safe to put another queen down in that row and that column?0:34:58And let's not worry about that. That's George's problem. and it's not hard to write. You just have to check whether this thing contains any things on that row or that column or in that diagonal.0:35:10Now, how would you organize the program given that? And there's sort of a traditional way to organize it called backtracking.0:35:20And it says, well, let's think about all the ways of putting the first queen down in the first column.0:35:31There are eight ways. Well, let's say try the first column. Try column 1, row 1. These branches are going to represent the possibilities at0:35:41each level. So I'll try and put a queen down in the first column. And now given that it's in the first column, I'll try and put the next queen down in the first column.0:35:53I'll try and put the first queen, the one in the first column, down in the first row. I'm sorry. And then given that, we'll put the next queen down in the first row. And that's no good.0:36:02So I'll back up to here. And I'll say, oh, can I put the first queen down in the second row? Well, that's no good. Oh, can I put it down in the third row? Well, that's good.0:36:12Well, now can I put the next queen down in the first column? Well, I can't visualize this chess board anymore, but I think that's right. And I try the next one. And at each place, I go as far down this tree as I can.0:36:24And I back up. If I get down to here and find no possibilities below there, I back all the way up to here, and now start again generating this sub-tree.0:36:33And I sort of walk around. And finally, if I ever manage to get all the way down, I've found a solution. So that's a typical sort of paradigm that's used a lot in0:36:45AI programming. It's called backtracking search.0:36:57And it's really unnecessary. You saw me get confused when I was visualizing this thing.0:37:06And you see the complication. This is a complicated thing to say. Why is it complicated? Its because somehow this program is too inordinately0:37:16concerned with time. It's too much-- I try this one, and I try this one, and I go back to the last possibility. And that's a complicated thing. If I stop worrying about time so much, then there's a much0:37:28simpler way to describe this. It says, let's imagine that I have in my hands the tree down0:37:40to k minus 1 levels. See, suppose I had in my hands all possible ways to put down0:37:50queens in the first k columns. Suppose I just had that. Let's not worry about how we get it. Well, then, how do I extend that?0:37:59How do I find all possible ways to put down queens in the next column? It's really easy. For each of these positions I have, I think about putting0:38:12down a queen in each row to make the next thing. And then for each one I put down, I filter those by the ones that are safe.0:38:22So instead of thinking about this tree as generated step by step, suppose I had it all there. And to extend it from level k minus 1 to level k, I just0:38:32need to extend each thing in all possible ways and only keep the ones that are safe. And that will give me the tree to level k. And that's a recursive strategy for solving the eight0:38:41queens problem. All right, well, let's look at it.0:38:50To solve the eight queens problem on a board of some specified size, we write a sub-procedure called0:39:00fill-columns. Fill-columns is going to put down queens up through column k. And here's the pattern of the recursion. I'm going to call fill-columns with the size eventually.0:39:12So fill-columns says how to put down queens safely in the first k columns of this chess board with a size number of rows in it. If k is equal to 0, well, then I don't have to0:39:22put anything down. So my solution is just an empty chess board. Otherwise, I'm going to do some stuff. And I'm going to use collect. And here's the collect.0:39:34I find all ways to put down queens in the first k minus 1 columns. And this was just what I set for.0:39:43Imagine I have this tree down to k minus 1 levels. And then I find all ways of trying a row, that's just each0:39:53of the possible rows. They're size rows, so that's enumerate interval. And now what I do is I collect together the new row I'm going0:40:03to try and column k with the rest of the queens. I adjoin a position. This is George's problem. An adjoined position is like safe.0:40:13It's a thing that takes a row and a column and the rest of the positions and makes a new position collection. So I adjoin a position of a new row and a new column to0:40:26the rest of the queens, where the rest of the queens runs through all possible ways of solving the problem in k minus 1 columns. And the new row runs through all possible rows such that it0:40:39was safe to put one there. And that's the whole program. There's the whole procedure.0:40:49Not only that, that doesn't just solve the eight queens problem, it gives you all solutions to the eight queens problem. When you're done, you have a stream.0:40:58And the elements of that stream are all possible ways of solving that problem. Why is that simpler? Well, we threw away the whole idea that this is some process0:41:10that happens in time with state. And we just said it's a whole collection of stuff. And that's why it's simpler. We've changed our view.0:41:20Remember, that's where we started today. We've changed our view of what it is we're trying to model. we stop modeling things that evolve in time and have steps0:41:30and have state. And instead, we're trying to model this global thing like the whole flight of the chalk, rather than its state at each instant.0:41:40Any questions? AUDIENCE: It looks to me like backtracking would be searching for the first solution it can find, whereas0:41:49this recursive search would be looking for all solutions. And it seems that if you have a large enough area to search,0:41:58that the second is going to become impossible. PROFESSOR: OK, the answer to that question is the whole0:42:07rest of this lecture. It's exactly the right question. And without trying to anticipate the lecture too much, you should start being suspicious at this point, and0:42:19exactly those kinds of suspicions. It's wonderful, but isn't it so terribly inefficient? That's where we're going.0:42:28So I won't answer now, but I'll answer later. OK, let's take a break.0:43:29Well, by now you should be starting to get suspicious. See, I've showed your this simple, elegant way of putting0:43:41programs together, very unlike these other traditional programs that sum the odd squares or compute the odd0:43:50Fibonacci numbers. Very unlike these programs that mix up the enumerator and the filter and the accumulator.0:44:00And by mixing it up, we don't have all of these wonderful conceptual advantages of these streams pieces, these wonderful mix and match components for putting0:44:09together lots and lots of programs. On the other hand, most of the programs you've seen look like these ugly ones.0:44:18Why's that? Can it possibly be that computer scientists are so obtuse that they don't notice that if you'd merely did this0:44:28thing, then you can get this great programming elegance? There's got to be a catch. And it's actually pretty easy to see what the catch is.0:44:39Let's think about the following problem. Suppose I tell you to find the second prime between 10,000 and 1 million, or if your computer's larger, say between0:44:5110,000 and 100 billion, or something. And you say, oh, that's easy. I can do that with a stream. All I do is I enumerate the interval0:45:01from 10,000 to 1 million. So I get all those integers from 10,000 to 1 million. I filter them for prime-ness, so test all of them and see if0:45:10they're prime. And I take the second element. That's the head of the tail. Well, that's clearly pretty ridiculous.0:45:21We'd not even have room in the machine to store the integers in the first place, much less to test them. And then I only want the second one. See, the power of this traditional programming style0:45:36is exactly its weakness, that we're mixing up the enumerating and the testing and the accumulating.0:45:45So we don't do it all. So the very thing that makes it conceptually ugly is the very thing that makes it efficient.0:45:55It's this mixing up. So it seems that all I've done this morning so far is just confuse you. I showed you this wonderful way that programming might work, except that it doesn't.0:46:05Well, here's where the wonderful thing happens. It turns out in this game that we really can have our cake and eat it too.0:46:14And what I mean by that is that we really can write stream programs exactly like the ones I wrote and arrange0:46:24things so that when the machine actually runs, it's as efficient as running this traditional programming style that mixes up the generation and the test.0:46:36Well, that sounds pretty magic. The key to this is that streams are not lists.0:46:48We'll see this carefully in a second, but for now, let's take a look at that slide again. The image you should have here of this signal processing system is that what's going to happen is there's this box0:47:00that has the integers sitting in it. And there's this filter that's connected to it and it's tugging on them.0:47:10And then there's someone who's tugging on this stuff saying what comes out of the filter. And the image you should have is that someone says, well,0:47:19what's the first prime, and tugs on this filter. And the filter tugs on the integers.0:47:28And you look only at that much, and then say, oh, I really wanted the second one. What's the second prime? And that no computation gets done except when you tug on0:47:37these things. Let me try that again. This is a little device. This is a little stream machine invented by Eric0:47:46Grimson who's been teaching this course at MIT. And the image is here's a stream of stuff, like a whole bunch of the integers. And here's some processing elements.0:47:58And if, say, it's filter of filter of map, or something. And if I really tried to implement that with streams as0:48:08lists, what I'd say is, well, I've got this list of things, and now I do the first filter. So do all this processing. And I take this and I process and I process and I process0:48:18and I process. And now I'm got this new stream. Now I take that result in my hand someplace. And I put that through the second one. And I process the whole thing.0:48:28And there's this new stream. And then I take the result and I put it all the way through this one the same way. That's what would happen to these stream programs if0:48:41streams were just lists. But in fact, streams aren't lists, they're streams. And the image you should have is something a little bit more like this.0:48:50I've got these gadgets connected up by this data that's flowing out of them.0:48:59And here's my original source of the streams. It might be starting to generate the integers. And now, what happens if I want a result? I tug on the end here.0:49:10And this element says, gee, I need some more data. So this one comes here and tugs on that one. And it says, gee, I need some more data. And this one tugs on this thing, which might be a0:49:19filter, and says, gee, I need some more data. And only as much of this thing at the end here gets generated as I tugged. And only as much of this stuff goes through the processing0:49:28units as I'm pulling on the end I need. That's the image you should have of the difference between implementing what we're actually going to do and if streams were lists.0:49:40Well, how do we make this thing? I hope you have the image. The trick is how to make it. We want to arrange for a stream to be a data structure0:49:52that computes itself incrementally, an on-demand data structure. And the basic idea is, again, one of the very basic ideas0:50:02that we're seeing throughout the whole course. And that is that there's not a firm distinction between programs and data. So what a stream is going to be is simultaneously this data0:50:12structure that you think of, like the stream of the leaves of this tree. But at the same time, it's going to be a very clever procedure that has the method of computing in it.0:50:23Well, let me try this. It's going to turn out that we don't need any more mechanism. We already have everything we need simply from the fact that we know how to handle procedures0:50:32as first-class objects. Well, let's go back to the key. The key is, remember, we had these operations. CONS-stream and head and tail.0:50:48When I started, I said you can think about this as CONS and think about this as CAR and think about that as CDR, but it's not. Now, let's look at what they really are.0:50:57Well, CONS-stream of x and y is going to be an abbreviation0:51:09for the following thing.0:51:19CONS form a pair, ordinary CONS, of x to a thing called delay of y.0:51:31And before I explain that, let me go and write the rest. The head of a stream is going to be just the CAR.0:51:42And the tail of a stream is going to be a thing called force the CDR of the stream.0:51:56Now let me explain this. Delay is going to be a special magic thing. What delay does is take an expression and produce a0:52:06promise to compute that expression when you ask for it. It doesn't do any computation here. It just gives you a rain check. It produces a promise.0:52:17And CONS-stream says I'm going to put together in a pair x and a promise to compute y.0:52:28Now, if I want the head, that's just the CAR that I put in the pair. And the key is that the tail is going to be-- force calls in that promise.0:52:39Tail says, well, take that promise and now call in that promise. And then we compute that thing. That's how this is going to work.0:52:48That's what CONS-stream, head, and tail really are. Now, let's see how this works. And we'll go through this fairly carefully.0:52:58We're going to see how this works in this example of computing the second prime between 10,000 and a million.0:53:08OK, so we start off and we have this expression. The second prime-- the head of the tail of the result of0:53:20filtering for primality the integers between 10,000 and 1 million. Now, what is that? What that is, that interval between 10,000 and 1 million,0:53:35well, if you trace through enumerate interval, there builds a CONS-stream. And the CONS-stream is the CONS of 10,000 to a promise to0:53:45compute the integers between 10,001 and 1 million.0:53:54So that's what this expression is. Here I'm using the substitution model. And we can use the substitution model because we don't have side effects and state.0:54:04So I have CONS of 10,000 to a promise to compute the rest of the integers. So only one integer, so far, got enumerated.0:54:14Well, I'm going to filter that thing for primality. Again, you go back and look at the filter code. What the filter will first do is test the head.0:54:25So in this case, the filter will test 10,000 and say, oh, 10,000's not prime. Therefore, what I have to do recursively0:54:36is filter the tail. And what's the tail of it, well, that's the tail of this pair with a promise in it.0:54:46Tail now comes in and says, well, I'm going to force that. I'm going to force that promise, which means now I'm going to compute the integers between 10,001 and 1 million.0:55:00OK, so this filter now is looking at that. That enumerate itself, well, now we're back in the original0:55:10enumerate situation. The enumerate is the CONS of the first thing, 10,001, onto a promise to compute the rest.0:55:19So now the primality filter is going to go look at 10,001. It's going to decide if it likes that or not. It turns out 10,001 isn't prime. So it'll force it again and again and again.0:55:32And finally, I think the first prime it hits is 10,009. And at that point, it'll stop. And that will be the first prime, and then eventually,0:55:42it'll need the second prime. So at that point, it will go again. So you see what happens is that no more gets generated0:55:51than you actually need. That enumerator is not going to generate any more integers0:56:00than the filter asks it for as it's pulling in things to check for primality. And the filter is not going to generate any more stuff than you ask it for, which is the head of the tail.0:56:11You see, what's happened is we've put that mixing of generation and test into what actually happens in the0:56:20computer, even though that's not apparently what's happening from looking at our programs. OK, well, that seemed easy.0:56:30All of this mechanism got put into this magic delay. So you're saying, gee, that must be where the magic is. But see there's no magic there either.0:56:39You know what delay is. Delay on some expression is just an abbreviation for--0:56:53well, what's a promise to compute an expression? Lambda of nil, procedure of no arguments, which is that expression.0:57:03That's what a procedure is. It says I'm going to compute an expression. What's force? How do I take up a promise? Well, force of some procedure, a promise, is just run it.0:57:18Done. So there's no magic there at all. Well, what have we done? We said the old style, traditional style of0:57:29programming is more efficient. And the stream thing is more perspicuous. And we managed to make the stream procedures run like the0:57:40other procedures by using delay. And the thing that delay did for us was to de-couple the apparent order of events in our programs from the actual0:57:52order of events that happened in the machine. That's really what delay is doing. That's exactly the whole point. We've given up the idea that our procedures, as they run,0:58:04or as we look at them, mirror some clear notion of time. And by giving that up, we give delay the freedom to arrange the order of events in the computation the way it likes.0:58:16That's the whole idea. We de-couple the apparent order of events in our programs from the actual order of events in the computer. OK, well there's one more detail.0:58:25It's just a technical detail, but it's actually an important one. As you run through these recursive programs unwinding, you'll see a lot of things that look like tail of the0:58:35tail of the tail. That's the kind of thing that would happen as I go CONSing down a stream all the way. And if each time I'm doing that, each time to compute a0:58:47tail, I evaluate a procedure which then has to go re-compute its tail, and re-compute its tail and recompute its tail each time, you can see that's very0:58:56inefficient compared to just having a list where the elements are all there, and I don't have to re-compute each tail every time I get the next tail.0:59:05So there's one little hack to slightly change what delay is,0:59:15and make it a thing which is-- I'll write it this way. The actual implementation, delay is an abbreviation for0:59:27this thing, memo-proc of a procedure. Memo-proc is a special thing that transforms a procedure. What it does is it takes a procedure of no arguments and0:59:39it transforms it into a procedure that'll only have to do its computation once. And what I mean by that is, you give it a procedure.0:59:48The result of memo-proc will be a new procedure, which the first time you call it, will run the original procedure, remember what result it got, and then from ever on after,1:00:00when you call it, it just won't have to do the computation. It will have cached that result someplace. And here's an implementation of memo-proc.1:00:11Once you have the idea, it's easy to implement. Memo-proc is this little thing that has two little flags in there. It says, have I already been run?1:00:20And initially it says, no, I haven't already been run. And what was the result I got the last time I was run?1:00:29So memo-proc takes a procedure called proc, and it returns a new procedure of no arguments. Proc is supposed to be a procedure of no arguments.1:00:38And it says, oh, if I'm not already run, then I'm going to do a sequence of things. I'm going to compute proc, I'm going to save that.1:00:48I'm going to stash that in the variable result. I'm going to make a note to myself that I've already been run, and then I'll return the result. So that's if you compute it if it's not already run.1:00:59If you call it and it's already been run, it just returns the result. So that's a little clever hack called memoization.1:01:08And in this case, it short circuits having to re-compute the tail of the tail of the tail of the tail of the tail. So there isn't even that kind of inefficiency.1:01:17And in fact, the streams will run with pretty much the same efficiency as the other programs precisely. And remember, again, the whole idea of this is that we've1:01:28used the fact that there's no really good dividing line between procedures and data. We've written data structures that, in fact, are sort of like procedures.1:01:38And what that's allowed us to do is take an example of a common control structure, in this place iteration.1:01:49And we've built a data structure which, since itself is a procedure, kind of has this iteration control structure in it. And that's really what streams are.1:01:58OK, questions? AUDIENCE: Your description of tail-tail-tail, if I understand it correctly, force is actually execution of a1:02:10procedure, if it's done without this memo-proc thing. And you implied that memo-proc gets around that problem. Doesn't it only get around it if tail-tail-tail is always1:02:20executing exactly the same-- PROFESSOR: Oh, that's-- sure. AUDIENCE: I guess I missed that point. PROFESSOR: Oh, sure. I mean the point is--1:02:31yeah. I mean I have to do a computation to get the answer. But the point is, once I've found the tail of the stream, to get the tail of the tail, I shouldn't have had to re-compute the first tail.1:02:42See, and if I didn't use memo-proc, that re-computation would have been done. AUDIENCE: I understand now. AUDIENCE: In one of your examples, you mentioned that1:02:52we were able to use the substitution model because there are no side effects. What if we had a single processing unit--1:03:01if we had a side effect, if we had a state? Could we still practically build the stream model? PROFESSOR: Maybe. That's a hard question.1:03:10I'm going to talk a little bit later about the places where substitution and side effects don't really mix very well. But in general, I think the answer is unless you're very1:03:21careful, any amount of side effect is going to mess up everything.1:03:35AUDIENCE: Sorry, I didn't quite understand the memo-proc operation. When do you execute the lambda? In other words, when memo-proc is executed, just this lambda1:03:46expression is being generated. But it's not clear to me when it's executed. PROFESSOR: Right. What memo-proc does-- remember, the thing that's going into memo-proc, the thing proc, is a procedure of1:03:57no arguments. And someday, you're going to call it. Memo-proc translates that procedure into another procedure of no arguments, which someday you're going to call.1:04:06That's that lambda. So here, where I initially built as my tail of the1:04:17stream, say, this procedure of no arguments, which someday I'll call. Instead, I'm going to have the tail of the stream be1:04:27memo-proc of it, which someday I'll call. So that lambda of nil, that gets called when you call the memo-proc, when you call the result of that memo-proc,1:04:40which would be ordinarily when you would have called the original thing that you set it. AUDIENCE: OK, the reason I ask is I had a feeling that when1:04:49you call memo-proc, you just return this lambda. PROFESSOR: That's right. When you call memo-proc, you return the lambda.1:04:58You never evaluate the expression at all, until the first time that you would have evaluated it.1:05:07AUDIENCE: Do I understand it right that you actually have to build the list up, but the elements of the list don't get evaluated? The expressions don't get evaluated? But at each stage, you actually are building a list.1:05:18PROFESSOR: That's-- I really should have said this. That's a really good point. No, it's not quite right. Because what happens is this. Let me draw this as pairs. Suppose I'm going to make a big stream, like enumerate1:05:29interval, 1 through 1 billion. What that is, is a pair with a 1 and a promise.1:05:46That's exactly what it is. Nothing got built up. When I go and force this, and say, what happens?1:05:56Well, this thing is now also recursively a CONS. So that this promise now is the next thing, which is a 21:06:07and a promise to do more. And so on and so on and so on. So nothing gets built up until you walk down the stream.1:06:18Because what's sitting here is not the list, but a promise to generate the list. And by promise, technically I mean procedure.1:06:28So it doesn't get built up. Yeah, I should have said that before this point. OK. Thank you. Let's take a break.

`0:00:00`Lecture 6B | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:20PROFESSOR: OK, well, we've been looking at streams, this signal processing way of putting systems together. And remember, the key idea is that we decouple the apparent0:00:35order of events in our programs from the actual order of events in the computer. And that means that we can start dealing with very long streams and only having to generate0:00:46the elements on demand. That sort of on-demand computation is built into the stream's data structure. So if we have a very long stream, we only0:00:55compute what we need. The things only get computed when we actually ask for them. Well, what are examples? Are they actually asking for them?0:01:04For instance, we might ask for the n-th element of a stream.0:01:16Here's a procedure that computes the n-th element of a stream. An integer n, the n-th element of some stream s, and we just recursively walk down the stream.0:01:25And the end of 0, we compute the head. Otherwise, it's the n-th the minus 1 element of the tail of the stream.0:01:34Those two are just like for Lisp, but the difference is those elements aren't going to get computed until we walk down, taking successive n-ths. So that's one way that the stream0:01:43elements might get forced. And another way, here's a little procedure that prints a stream. We say print a stream, so to print a stream s.0:01:54Well, what do we do? We print the head of the stream, and that will cause the head to be computed. And then we recursively print stream the tail of the stream.0:02:04And if we're already done, maybe we have to return something about the message done. OK, and then so if you make a stream, you could say here's the stream, this very long stream.0:02:14And then you say print the stream, and the elements of the stream will get computed successively as that print calls them. They won't get all computed initially.0:02:24So in this way, we can deal with some very long streams. Well, how long can a stream be?0:02:33Well, it can be infinitely long. Let's look at an example here on the computer. I could walk up to this computer, and I could say--0:02:43how about we'll define the stream of integers starting0:02:52with some number N, the stream of positive integers starting with some number n. And that's cons-stream of n onto the0:03:12integers from one more.0:03:24So there are the integers. Then I could say let's get all the integers.0:03:34define the stream of integers to be the integers0:03:43starting with 1. And now if I say something like what's the what's the0:03:5420th integer. So it's 21 because we start counting at 0.0:04:07Or I can do more complicated things. Let me to define a little predicate here. How about define no-seven.0:04:19It's going to test an integer, and it's going to say it's not.0:04:28I take the remainder of x by 7, I don't get 0.0:04:41And then I could say define the integers with no sevens to0:04:50be, take all the integers and filter them to have no sevens.0:05:11So now I've got the stream of all the integers that are not divisible by seven. So if I say what's the 100th integer and the list not0:05:25divisible by seven, I get 117. Or if I'd like to say well, gee, what are all of them?0:05:35So I could say print stream all these integers with no seven, it goes off printing.0:05:45You may have to wait a very long time to see them all. Well, you can start asking, gee, is it really true that0:05:56this data structure with the integers is really all the integers? And let me draw a picture of that program I just wrote.0:06:08Here's the definition of the integers again that I just typed in, Right it's a cons of the first integer under the integer starting with the rest. Now, we can make a0:06:18picture of that and see what it looks like. Conceptually, what I have is a box that's the integer starting with n.0:06:27It takes in some number n, and it's going to return a stream of-- this infinite stream of all integers starting with n.0:06:37And what do I do? Well, this is an integers from box. What's it got in it? Well, it takes in this n, and it increments it.0:06:58And then it puts the result into recursively another integer's from box. It takes the result of that and the original n and puts0:07:10those together with a cons and forms a stream. So that's a picture of that program I wrote. Let's see. These kind of diagrams we first saw drawn by Peter0:07:21Henderson, the same guy who did the Escher language. We call them Henderson diagrams. And the convention here is that you put these things together. And the solid lines are things coming out are streams, and0:07:33dotted lines are initial values going in. So this one has the shape of-- it takes in some integer, some initial value, and outputs a stream.0:07:46Again, you can ask. Is that data structure integers really all the integers? Or is it is something that's cleverly arranged so that0:07:55whenever you look for an integer you find it there? That's sort of a philosophical question, right? If something is there whenever you look, is it really there or not?0:08:04It's sort of the same sense in which the money in your savings account is in the bank. Well, let me do another example.0:08:19Gee, we started the course with an algorithm from Alexandria, which was Heron of Alexandria's algorithm for computing the square root.0:08:28Let's take a look at another Alexandrian algorithm. This one is Eratosthenes method for computing all of0:08:37the primes. It is called the Sieve of Eratosthenes. And what you do is you start out, and you list all the0:08:51integers, say, starting with 2. And then you take the first integer, and you say, oh, that's prime. And then you go look at the rest, and you cross out all the things divisible by 2.0:09:01So I cross out this and this and this. This takes a long time because I have to do it for all of the integers.0:09:11So I go through the entire list of integers, crossing the ones divisible by 2.0:09:22And now when I finish with all of the integers, I go back and look and say what am I left with? Well, the first thing that starts there is 3. So 3 is a prime. And now I go back through what I'm left with, and I cross out0:09:33all the things divisible by 3. So let's see, 9 and 15 and 21 and 27 and 33 and so on.0:09:44I won't finish. Then I see what I'm left with. And the next one I have is 5. Now I can through the rest, and I find the first one0:09:53that's divisible by 5. I cross out from the remainder all the ones that are divisible by 5. And I do that, and then I go through and find 7. Go through all the rest, cross out things divisible 7, and I0:10:04keep doing that forever. And when I'm done, what I'm left with is a list of all the primes. So that's the Sieve of Eratosthenes.0:10:15Let's look at it as a computer program. It's a procedure called sieve.0:10:27Now, I just write what I did. I'll say to sieve some stream s.0:10:38I'm going to build a stream whose first element is the head of this. Remember, I always found the first thing I was left with, and the rest of it is the result of taking the tail of0:10:48this, filtering it to throw away all the things that are divisible by the head of this, and now sieving the result.0:10:59That's just what I did. And now to get the infinite stream of times, we just sieve all the integers starting from 2.0:11:14Let's try that. We can actually do it. I typed in the definition of sieve before, I hope, so I can0:11:23say something like define the primes to be the result of0:11:35sieving the integers starting with 2.0:11:46So now I've got this list of primes. That's all of the primes, right? So, if for example, what's the 20th prime in that list?0:12:0173. See, and that little pause, it was only at the point when I started asking for the 20th prime is that it started computing.0:12:10Or I can say here let's look at all of the primes.0:12:22And there it goes computing all of the primes. Of course, it will take a while again if I want to look at all of them, so let's stop it.0:12:32Let me draw you a picture of that. Well, I've got a picture of that. What's that program really look like? Again, some practice with these diagrams, I have a sieve box.0:12:42How does sieve work? It takes in a stream. It splits off the head from the tail. And the first thing that's going to come out of the sieve0:12:53is the head of the original stream. Then it also takes the head and uses that.0:13:02It takes the stream. It filters the tail and uses the head to filter for nondivisibility. It takes the result of nondivisibility and puts it0:13:11through another sieve box and puts the result together. So you can think of this sieve a filter, but notice that it's an infinitely recursive filter. Because inside the sieve box is another sieve box, and0:13:23inside that is another sieve box and another sieve box. So you see we start getting some very powerful things. We're starting to mix this signal processing view of the0:13:32world with things like recursion that come from computation. And there are all sorts of interesting things you can do that are like this. All right, any questions?0:13:48OK, let's take a break.0:14:28Well, we've been looking at a couple of examples of stream programming. All the stream procedures that we've looked at so far have0:14:39the same kind of character. We've been writing these recursive procedures that kind of generate these stream elements one at a time and put them together in cons-streams. So we've been thinking a lot0:14:50about generators. There's another way to think about stream processing, and that's to focus not on programs that sort of process these elements as you walk down the stream, but on things0:15:00that kind of process the streams all at once. To show you what I mean, let me start by defining two0:15:09procedures that will come in handy. The first one's called add streams. Add streams takes two streams: s1 and s2.0:15:22and. It's going to produce a stream whose elements are the are the corresponding sums. We just sort of add them element-wise.0:15:32If either stream is empty, we just return the other one. Otherwise, we're going to make a new stream whose head is the0:15:42sum of the two heads and whose tail is the result of recursively adding the tails. So that will produce the element-wise sum of two0:15:52streams. And then another useful thing to have around is scale stream. Scale stream takes some constant number in a stream s0:16:04and is going to produce the stream of elements of s multiplied by this constant. And that's easy, that's just a map of the function of an0:16:14element that multiplies it by the constant, and we map that down the stream. So given those two, let me show you what I mean by0:16:23programs that operate on streams all at once. Let's look at this. Suppose I write this. I say define--0:16:36I'll call it ones-- to be cons-stream of 1 onto ones.0:16:54What's that? That's going to be an infinite stream of ones because the first thing is 1.0:17:03And the tail of it is a thing whose first thing is 1 and whose tail is a thing whose first thing is 1 and so on and so on and so on. So that's an infinite stream of ones.0:17:15And now using that, let me give you another definition of the integers. We can define the integers to be--0:17:28well, the first integer we'll take to be 1, this cons-stream of 1 onto the element-wise sum onto add streams of the0:17:42integers to ones.0:17:54The integers are a thing whose first element is 1, and the rest of them you get by taking those integers and0:18:04incrementing each one by one. So the second element of the integers is the first element of the integers incremented by one.0:18:13And the rest of that is the next one, and the third element of that is the same as the first element of the tail of the integers incremented by one, which is the same as the0:18:25first element of the original integers incremented by one and incremented by one again and so on.0:18:35That looks pretty suspicious. See, notice that it works because of delay. See, this looks like-- let's take a look at ones. This looks like it couldn't even be processed because it's0:18:46suddenly saying in order to know what ones is, I say it's cons-stream of something onto ones. The reason that works is because of that very sneaky hidden delay in there.0:18:55Because what this really is, remember, cons-stream is just an abbreviation. This really is cons of 1 onto delay of ones.0:19:12So how does that work? You say I'm going to define ones. First I see what ones is supposed to be defined as. Well, ones is supposed to be defined as a cons whose first0:19:27part is 1 and whose second part is, well, it's a promise to compute something that I don't worry about yet. So it doesn't bother me that at the point I do this definition, ones isn't defined.0:19:37Having run the definition now, ones is defined. So that when I go and look at the tail of it, it's defined. It's very sneaky.0:19:46And an integer is the same way. I can refer to integers here because hidden way down-- because of this cons-stream. It's the cons-stream of 1 onto something that I0:19:56don't worry that yet. So I don't look at it, and I don't notice that integers isn't defined at the point where I try and run the definition.0:20:06OK, let me draw a picture of that integers thing because it still maybe seems a little bit shaky. What do I do?0:20:15I've got the stream of ones, and that sort of comes in and goes into an adder that's going to be0:20:25this add streams thing. And that goes in-- that's going to put out the integers.0:20:40And the other thing that goes into the adder here is the integer, so there's a little feedback loop. And all I need to start it off is someplace I've got a stick0:20:51that initial 1. In a real signal processing thing, this might be a delay0:21:00element with that was initialized to 1. But there's a picture of that ones program. And in fact, that looks a lot like--0:21:09if you've seen real signal block diagram things, that looks a lot like accumulators, finite state accumulators. And in fact, we can modify this a little bit to change0:21:21this into something that integrates a stream or a finite state accumulator, however you like to think about it. So instead of the ones coming in and getting out the0:21:30integers, what we'll do is say there's a stream s coming in, and we're going to get out the integral of this, successive0:21:43values of that, and it looks almost the same. The only thing we're going to do is when s comes in here, before we just add it in we're going to multiply it0:21:53by some number dt. And now what we have here, this is exactly the same thing. We have a box, which is an integrator.0:22:09And it takes in a stream s, and instead of 1 here, we can put the additional value for the integral.0:22:19And that one looks very much like a signal processing block diagram program. In fact, here's the procedure that looks exactly like that.0:22:31Find the integral of a stream. So an integral's going to take a stream and produce a new stream, and it takes in an initial value and some time constant.0:22:42And what do we do? Well, we internally define this thing int, and we make this internal name so we can feed it back, loop it around itself. And int is defined to be something that starts out at0:22:52the initial value, and the rest of it is gotten by adding together.0:23:01We take our input stream, scale it by dt, and add that to int. And now we'll return from all that the value of integral is this thing int.0:23:10And we use this internal definition syntax so we could write a little internal definition that refers to itself.0:23:21Well, there are all sorts of things we can do. Let's try this one. how about the Fibonacci numbers. You can say define fibs.0:23:36Well, what are the Fibonacci numbers? They're something that starts out with 0, and0:23:48the next one is 1. And the rest of the Fibonacci numbers are gotten by adding0:24:06the Fibonacci numbers to their own tail.0:24:17There's a definition of the Fibonacci numbers. How does that work? Well, we start off, and someone says compute for us the Fibonacci numbers, and we're going to tell you it0:24:30starts out with 0 and 1. And everything after the 0 and 1 is gotten by summing two0:24:40streams. One is the fibs themselves, and the other one is the tail of the fibs. So if I know that these start out with 0 and 1, I know that0:24:52the fibs now start out with 0 and 1, and the tail of the fibs start out with 1. So as soon as I know that, I know that the next one here is 0 plus 1 is 1, and that tells me that the next one here is 10:25:04and the next one here is 1. And as soon as I know that, I know that the next one is 2. So the next one here is 2 and the next one here is 2. And this is 3.0:25:14This one goes to 3, and this is 5. So it's a perfectly sensible definition. It's a one-line definition. And again, I could walk over to the computer and type that0:25:25in, exactly that, and then say print stream the Fibonacci numbers, and they all come flying out. See, this is a lot like learning0:25:34about recursion again. Instead of thinking that recursive procedures, we have recursively defined data objects.0:25:45But that shouldn't surprise you at all, because by now, you should be coming to really believe that there's no difference really between procedures and data. In fact, in some sense, the underlying streams are0:25:55procedures sitting there, although we don't think of them that way. So the fact that we have recursive procedures, well, then it should be natural that we have recursive data, too.0:26:07OK, well, this is all pretty neat. Unfortunately, there are problems that streams aren't going to solve. Let me show you one of them.0:26:17See, in the same way, let's imagine that we're building an analog computer to solve some differential equation like,0:26:26say, we want to solve the equation y prime dy dt is y squared, and I'm going to give you some initial value.0:26:36I'll tell you y of 0 equals 1. Let's say dt is equal to something.0:26:46Now, in the old days, people built analog computers to solve these kinds of things. And the way you do that is really simple. You get yourself an integrator, like that one, an0:27:01integrator box. And we put in the initial value y of 0 is 1. And now if we feed something in and get something out,0:27:10we'll say, gee, what we're getting out is the answer. And what we're going to feed in is the derivative, and the derivative is supposed to be the square of the answer.0:27:21So if we take these values and map using square, and if I0:27:31feed this around, that's how I build a block diagram for an analog computer that solves this differential equation.0:27:42Now, what we'd like to do is write a stream program that looks exactly like that. And what do I mean exactly like that? Well, I'd say define y to be the integral of dy starting at0:28:081 with 0.001 as a time step. And I'd like to say that says this. And then I'd like to say, well, dy is gotten by mapping0:28:19the square along y. So define dy to be map square along y.0:28:33So there's a stream description of this analog computer, and unfortunately, it doesn't work. And you can see why it doesn't work because when I come in0:28:43and say define y to be the integral of dy, it says, oh, the integral of y-- huh? Oh, that's undefined.0:28:53So I can't write this definition before I've written this one. On the other hand, if I try and write this one first, it says, oh, I define y to be the map of square along y?0:29:03Oh, that's not defined yet. So I can't write this one first, and I can't write that one first. So I can't quite play this game.0:29:17Well, is there a way out? See, we can do that with ones. See, over here, we did this thing ones, and we were able0:29:27to define ones in terms of ones because of this delay that was built inside because cons-stream had a delay. Now, why's it sensible?0:29:36Why's it sensible for cons-stream to be built with this delay? The reason is that cons-stream can do a useful thing without looking at its tail.0:29:45See, if I say this is cons-stream of 1 onto something without knowing anything about something, I know that the stream starts off with 1.0:29:54That's why it was sensible to build something like cons-stream. So we put a delay in there, and that allows us to have this sort of self-referential definition.0:30:06Well, integral is a little bit the same way. See, notice for an integral, I can-- let's go back and look at integral for a second.0:30:17See, notice integral, it makes sense to say what's the first thing in the integral without knowing the stream that you're0:30:27integrating. Because the first thing in the integral is always going to be the initial value that you're handed. So integral could be a procedure like cons-stream.0:30:37You could define it, and then even before it knows what it's supposed to be integrating, it knows enough to say what its initial value is.0:30:46So we can make a smarter integral, which is aha, you're going to give me a stream to integrate and an initial value, but I really don't have to look at that stream that I'm supposed to integrate until you ask me to work down0:30:56the stream. In other words, integral can be like cons-stream, and you can expect that there's going to be a delay around its integrand. And we can write that.0:31:05Here's a procedure that does that. Another version of integral, and this is almost like the previous one, except the stream it's going to get in is going to expect to be a delayed object.0:31:17And how does this integral work? Well, the little thing it's going to define inside of itself says on the cons-stream, the initial value is the initial value, but only inside of that cons-stream,0:31:29and remember, there's going to be a hidden delay inside here. Only inside of that cons-stream will I start0:31:38looking at what the actual delayed object is. So my answer is the first thing's the initial value. If anybody now asks me for my tail, at that point, I'm going0:31:50to force that delayed object-- and I'll call that s-- and I do the add streams. So this is an integral which is sort of like cons-stream.0:31:59It's not going to actually try and see what you handed it as the thing to integrate until you look past the first element.0:32:10And if we do that and we can make this work, all we have to do here is say define y to the integral of delay of y, of0:32:24delay of dy. So y is going to be the integral of delay of dy0:32:33starting at 1, and now this will work. Because I type in the definition of y, and that says, oh, I'm supposed to use the integral of something I don't care about right now because it's a delay.0:32:44And these things, now you define dy. Now, y is defined. So when I define dy, it can see that definition for y. Everything is now started up. Both streams have their first element.0:32:54And then when I start mapping down, looking at successive elements, both y and dy are defined. So there's a little game you can play that goes a little bit beyond just using the delay that's hidden inside0:33:06streams. Questions? OK, let's take a break.0:34:07Well, just before the break, I'm not sure if you noticed it, but something nasty started to happen. We've been going along with the streams and divorcing time0:34:21in the programs from time in the computers, and all that divorcing got hidden inside the streams. And then at the very end, we saw that sometimes in order to really0:34:30take advantage of this method, you have to pull out other delays. You have to write some explicit delays that are not hidden inside that cons-stream.0:34:39And I did a very simple example with differential equations, but if you have some very complicated system with all kinds of self-loops, it becomes very, very difficult to see where you need those delays.0:34:49And if you leave them out by mistake, it becomes very, very difficult to see why the thing maybe isn't working. So that's kind of mess, that by getting this power and0:35:00allowing us to use delay, we end up with some very complicated programming sometimes, because it can't all be hidden inside the streams. Well, is there a way out of that?0:35:11Yeah, there is a way out of that. We could change the language so that all procedures acted like cons-stream, so that every procedure automatically0:35:22has an implicit delay around its arguments. And what would that mean? That would mean when you call a procedure, the arguments wouldn't get evaluated.0:35:32Instead, they'd only be evaluated when you need them, so they might be passed off to some other procedure, which wouldn't evaluate them either. So all these procedures would be passing promises around.0:35:42And then finally maybe when you finally got down to having to look at the value of something that was handed to a primitive operator would you actually start calling in all those promises.0:35:52If we did that, since everything would have a uniform delay, then you wouldn't have to write any explicit delays, because it would be automatically built into the way the language works.0:36:02Or another way to say that, technically what I'm describing is what's called-- if we did that, our language would be so-called0:36:12normal-order evaluation language versus what we've0:36:22actually been working with, which is called applicative order--0:36:31versus applicative-order evaluation. And remember the substitution model for applicative order. It says when you go and evaluate a combination, you0:36:40find the values of all the pieces. You evaluate the arguments and then you substitute them in the body of the procedure. Normal order says no, don't do that.0:36:49What you do is effectively substitute in the body of the procedure, but instead of evaluating the arguments, you just put a promise to compute them there.0:36:58Or another way to say that is you take the expressions for the arguments, if you like, and substitute them in the body of the procedure and go on, and never really simplify anything until you get down to a primitive operator.0:37:09So that would be a normal-order language. Well, why don't we do that? Because if we did, we'd get all the advantages of delayed evaluation with none of the mess.0:37:18In fact, if we did that and cons was just a delayed procedure, that would make cons the same as cons-stream. We wouldn't need streams of all because lists would0:37:27automatically be streams. That's how lists would behave, and data structures would behave that way. Everything would behave that way, right? You'd never really do any computation until you actually0:37:38needed the answer. You wouldn't have to worry about all these explicit annoying delays. Well, why don't we do that?0:37:47First of all, I should say people do do that. There's some very beautiful languages. One of the very nicest is a language called Miranda, which0:37:56is developed by David Turner at the University of Kent. And that's how this language works. It's a normal-order language and its data structures, which0:38:06look like lists, are actually streams. And you write ordinary procedures in Miranda, and they do these prime things and eight queens things, just without anything special. It's all built in there.0:38:17But there's a price. Remember how we got here. We're decoupling time in the programs0:38:26from time in the machines. And if we put delay, that sort of decouples it everywhere, not just in streams. Remember what we're trying to do. We're trying to think about programming as a way to0:38:36specify processes. And if we give up too much time, our language becomes more elegant, but it becomes a little bit less expressive.0:38:47There are certain distinctions that we can't draw. One of them, for instance, is iteration. Remember this old procedure, iterative factorial, that we0:38:58looked at quite a long time ago. Iterative factorial had a thing, and it said there was an internal procedure, and there was a state which was a product and a counter, and we iterate that0:39:09going around the loop. And we said that was an iterative procedure because it didn't build up state. And the reason it didn't build up state is because this iter0:39:19that's called is just passing these things around to itself. Or in the substitution model, you could see in the substitution model that Jerry did, that in an iterative0:39:29procedure, that state doesn't have to grow. And in fact, we said it doesn't, so this is an iteration. But now think about this exact same text if we had a normal-order language.0:39:41What would happen is this would no longer be an iterative procedure? And if you really think about the details of the substitution model, which I'm not going to do here, this0:39:51expression would grow. Why would it grow? It's because when iter calls itself, it calls itself with this product. If it's a normal-order language, that multiplication0:40:00is not going to get done. That's going to say I'm to call myself with a promise to compute this product. And now iter goes around again.0:40:09And I'm going to call myself with a promise to compute this product where now one of the one factors is a promise.0:40:18And I call myself again. And if you write out the substitution model for that iterative process, you'll see exactly the same growth in state, all those promises that are getting remembered that0:40:29have to get called in at the very end. So one of the disadvantages is that you can't really express iteration. Maybe that's a little theoretical reason why not,0:40:39but in fact, people who are trying to write real operating systems in these languages are running into exactly these types of problems. Like it's perfectly possible to0:40:51implement a text editor in languages like these. But after you work a while, you suddenly have 3 megabytes of stuff, which is--0:41:01I guess they call them the dragging tail problem of people who are looking at these, of promises that sort of haven't been called in because you couldn't quite express an iteration.0:41:10And one of the research questions in these kinds of languages are figuring out the right compiler technology to get rid of the so-called dragging tails.0:41:20It's not simple. But there's another kind of more striking issue about why you just don't go ahead and make your0:41:30language normal order. And the reason is that normal-order evaluation and side effects just don't mix.0:41:42They just don't go together very well. Somehow, you can't-- it's sort of you can't simultaneously go around0:41:51trying to model objects with local state and change and at the same time do these normal-order tricks of de-coupling time.0:42:00Let me just show you a really simple example, very, very simple. Suppose we had a normal-order language. And I'm going to start out in this language.0:42:09This is now normal order. I'm going to define x to be 0. It's just some variable I'll initialize. And now I'm going to define this little funny function,0:42:18which is an identity function. And what it does, it keeps track of the last time you called it using x.0:42:31So the identity of n just returns n, but it sets x to be n. And now I'll define a little increment function, which is a0:42:40very little, simple scenario. Now, imagine I'm interacting with this in the normal-order language, and I type the following. I say define y to be increment the identity function of 3, so0:42:52y is going to be 4. Now, I say what's x? Well, x should have been the value that was remembered last0:43:02when I called the identity function. So you'd expect to say, well, x is 3 at this point, but it's not. Because when I defined y here, what I really defined y to be0:43:13increment of a promise to do this thing. So I didn't look at y, so that identity function didn't get run. So if I type in this definition and look at x, I'm0:43:24going to get 0. Now, if I go look at y and say what's y, say y is 4, looking0:43:33at y, that very active looking at y caused the identity function to be run. And now x will get remembered as 3. So here x will be 0.0:43:42Here, x will be 3. That's a tiny, little, simple scenario, but you can see what kind of a mess that's going to make for debugging interactive0:43:52programs when you have normal-order evaluation. It's very confusing. But it's very confusing for a very deep reason, which is0:44:03that the whole idea of putting in delays is that you throw away time. That's why we can have these infinite processes. Since we've thrown away time, we don't have to wait for them0:44:13to run, right? We decouple the order of events in the computer from what we write in our programs. But when we talk about state0:44:23and set and change, that's exactly what we do want control of. So it's almost as if there's this fundamental contradiction0:44:32in what you want. And that brings us back to these sort of philosophical mutterings about what is it that you're trying to model and how do you look at the world.0:44:42Or sometimes this is called the debate over functional programming.0:44:53A so-called purely functional language is one that just doesn't have any side effects. Since you have no side effects, there's no assignment0:45:02operator, so there are no terrible consequences of it. You can use a substitution-like thing. Programs really are like mathematics and not like0:45:11models in the real world, not like objects in the real world. There are a lot of wonderful things about functional languages. Since there's no time, you never have any synchronization problems. And if you want to put something into a parallel0:45:23algorithm, you can run the pieces of that parallel processing any way you want. There's just never any synchronization to worry that, and it's a very congenial environment for doing this.0:45:33The price is you give up assignment. So an advocate of a functional language would say, gee, that's just a tiny price to pay.0:45:44You probably shouldn't use assignment most of the time anyway. And if you just give up assignment, you can be in this much, much nicer world than this place with objects.0:45:54Well, what's the rejoinder to that? Remember how we got into this mess. We started trying to model things that had local state.0:46:04So remember Jerry's random number generator. There was this random number generator that had some little state in it to compute the next random number and the next random number and the next random number.0:46:14And we wanted to hide that state away from the Cesaro compute part process, and that's why we needed set. We wanted to package that stated modularly.0:46:24Well, a functional programming person would say, well, you're just all wet. I mean, you can write a perfectly good modular program. It's just you're thinking about modularity wrong.0:46:33You're hung up in this next random number and the next random number and the next random number. Why don't you just say let's write a program. Let's write an enumerator which just generates an0:46:42infinite stream of random numbers. We can sort of have that stream all at once, and that's0:46:52going to be our source of random numbers. And then if you like, you can put that through some sort of processor, which is-- I don't know-- a Cesaro test, and that can do what it wants.0:47:06And what would come out of there would be a stream of0:47:16successive approximations to pi.0:47:28So as we looked further down this stream, we'd tug on this Cesaro thing, and it would pull out more and more random numbers. And the further and further we look down the stream, the0:47:37better an approximation we'd get to pi. And it would do exactly the same as the other computation, except we're thinking about the modularity different. We're saying imagine we had all those infinite streams of0:47:46random numbers all at once. You can see the details of this procedure in the book. Similarly, there are other things that we tend to get0:47:56locked into on this one and that one and the next one and the next one, which don't have to be that way. Like you might think about like a banking system, which0:48:07is a very simple idea. Imagine we have a program that sort of represents a bank account.0:48:18The bank account might have in it-- if we looked at this in a sort of message-passing view of the world, we'd say a bank account is an object that has some0:48:29local state in there, which is the balance, say. And a user using this system comes and sends a transaction request. So the user sends a transaction request, like0:48:41deposit some money, and the bank account maybe-- let's say the bank account always responds with what the current balance is. The user says let's deposits some money, and the bank0:48:50account sends back a message which is the balance. And the user says deposit some more, and the bank account sends back a message.0:48:59And just like the random number generator, you'd say, gee, we would like to use set. We'd like to have balance be a piece of local state inside this bank account because we want to separate the state of0:49:08the user from the state of the bank account. Well, that's the message-processing view. There's a stream view with that thing, which does the0:49:20same thing without any set or side effects. And the idea is again we don't think about anything having0:49:29local state. We think about the bank account as something that's going to process a stream of transaction requests.0:49:38So think about this bank account not as something that goes message by message, but something that takes in a stream of transaction requests like maybe successive deposit announced.0:49:491, 2, 2, 4, those might be successive amounts to deposit. And then coming out of it is the successive0:49:58balances 1, 3, 5, 9. So we think of the bank account not as something that has state, but something that acts sort of on the infinite0:50:09stream of requests. But remember, we've thrown away time. So what we can do is if the user's here, we can have this infinite stream of requests being generated one at a time0:50:21coming from the user and this transaction stream coming back on a printer being printed one at a time.0:50:30And if we drew a little line here, right there to the user, the user couldn't tell that this system doesn't have state.0:50:39It looks just like the other one, but there's no state in there. And by the way, just to show you, here's an actual0:50:48implementation of this-- we'll call it make deposit account because you can only deposit. It takes an initial balance and then a stream of deposits0:50:57you might make. And what is it? Well, it's just cons-stream of the balance onto make a new account stream whose initial balance is the old balance0:51:08plus the first thing in the deposit stream and make deposit account works on the rest of which is the tail of the deposit stream.0:51:18So there's sort of a very typical message-passing, object-oriented thing that's done without side effects at all.0:51:28There are very many things you can do this way. Well, can you do everything without assignment? Can everybody go over to purely functional languages?0:51:40Well, we don't know, but there seem to be places where purely functional programming breaks down. Where it starts hurting is when you have things like0:51:50this, but you also mix it up with the other things that we had to worry that, which are objects and sharing and two independent agents being the same. So under a typical one, suppose you want to extend0:52:00this bank account. So here's a bank account.0:52:12Bank accounts take in a stream of transaction requests and put out streams of, say, balances or responses to that. But suppose you want to model the fact that this is a joint0:52:21bank account between two independent people. So suppose there are two people, say, Bill and Dave,0:52:31who have a joint bank account. How would you model this? Well, Bill puts out a stream of transaction requests, and0:52:40Dave puts out a stream of transaction requests, and somehow, they have to merge into this bank account. So what you might do is write a little stream processing thing called merge, which sort of takes these, merges them0:52:58together, produces a single stream for the bank account. Now they're both talking to the same bank account. That's all great, but how do you write merge? What's this procedure merge?0:53:09You want to do something that's reasonable. Your first guess might be to say, well, we'll take alternate requests from Bill and Dave. But what happens if0:53:20suddenly in the middle of this thing, Dave goes away on vacation for two years? Then Bill's sort of stuck. So what you want to do is-- well, it's hard to describe.0:53:29What you want to do is what people call fair merge.0:53:38The idea of fair merge is it sort of should do them alternately, but if there's nothing waiting here, it should take one twice. Notice I can't even say that without talking about time.0:53:51So one of the other active researcher areas in functional languages is inventing little things like fair merge and0:54:00maybe some others, which will take the places where I used to need side effects and objects and sort of hide them away in some very well-defined modules of the system so that0:54:11all the problems of assignment don't sort of leak out all over the system but are captured in some fairly well-understood things.0:54:20More generally, I think what you're seeing is that we're running across what I think is a very basic problem in computer science, which is how to define languages that0:54:29somehow can talk about delayed evaluation, but also be able to reflect this view that there are objects in the world.0:54:38How do we somehow get both? And I think that's a very hard problem. And it may be that it's a very hard problem that has almost nothing to do with computer science, that it really is a0:54:49problem having to do with two very incompatible ways of looking at the world. OK, questions?0:55:17AUDIENCE: You mentioned earlier that once you introduce assignment, the general rule for using the substitution model is you can't. Unless you're very careful, you can't.0:55:27PROFESSOR: Right. AUDIENCE: Is there a set of techniques or a set of guidelines for localizing the effects of assignment so that0:55:37the very careful becomes defined? PROFESSOR: I don't know. Let me think. Well, certainly, there was an assignment inside memo proc,0:55:50but that was sort of hidden away. It ended up not making any difference. Part of the reason for that is once this thing triggered that it had run and gotten an answer, that answer will never change.0:56:00So that was sort of a one-time assignment. So one very general thing you can do is if you only do what's called a one-time assignment and never change anything, then you can do better.0:56:11One of the problems in this merge thing, people have-- let me see if this is right. I think it's true that with fair merge, with just fair0:56:22merge, you can begin effectively simulating assignment in the rest of the language. It seems like anything you do to go outside--0:56:33I'm not quite sure that's true for fair merge, but it's true of a little bit more general things that people have been doing. So it might be that any little bit you put in, suddenly if0:56:42they allow you to build arbitrary stuff, it's almost as bad as having assignment altogether. But that's an area that people are thinking about now.0:56:51AUDIENCE: I guess I don't see the problem here with merge if I call Bill, if Bill is a procedure, then Bill is going0:57:00to increment the bank account or build the list that 's going to put in the next element. If I call Dave twice in a row, that will do that. I'm not sure where fair merge has to be involved.0:57:09PROFESSOR: The problem is imagine these really as people. See, here I have the user who's interacting with this bank account. Put in a request, get an answer. Put in a request, get an answer. AUDIENCE: Right.0:57:18PROFESSOR: But if the only way I can process request is to alternate them from two people-- AUDIENCE: Well, why would you alternate them? PROFESSOR: Why don't I? AUDIENCE: Yes. Why do you? PROFESSOR: Think of them as real people, right?0:57:27This guy might go away for a year. And you're sitting here at the bank account window, and you can't put in two requests because it's waiting for this guy. AUDIENCE: Why does it have to be waiting for one?0:57:37PROFESSOR: Because it's trying to compute a function. I have to define a function. Another way to say that is the answer to what comes out of this merge box is not a function of what goes in.0:57:51Because, see, what would the function be? Suppose he puts in 1, 1, 1, 1, and he puts in 2, 2, 2, 2.0:58:03What's the answer supposed to be? It's not good enough to say it's 1, 2, 1, 2, 1, 2. AUDIENCE: I understand. But when Bill puts in 1, 1 goes in. When Dave puts in 2 twice, 2 goes in twice.0:58:13When Bill puts in-- PROFESSOR: Right. AUDIENCE: Why can't it be hooked to the time of the input-- the actual procedural-- PROFESSOR: Because I don't have time.0:58:23See, all I can say is I'm going to define a function. I don't have time.0:58:32There's no concept if it's going to alternate, except if nobody's there, it's going to wait a while for him. It's just going to say I have the stream of requests, the0:58:41timeless infinite streams of all the requests that Dave would have made, right? And the timeless infinite stream of all the requests Bill would have made, and I want to operate on them.0:58:51See, that's how this bank account is working. And the problem is that these poor people who are sitting at the bank account windows have the0:59:02misfortune to exist in time. They don't see their infinite stream of all the requests they would have ever made. They're waiting now, and they want an answer.0:59:14So if you're sitting there-- if this is the screen operation on some time-sharing system and it's working functionally, you want an answer then when you talk the character.0:59:25You don't want it to have to wait for everybody in the whole system to have typed one character before it can get around to service you. So that's the problem. I mean, the fact that people live in time, apparently.0:59:36If they didn't, it wouldn't be a problem.0:59:49AUDIENCE: I'm afraid I miss the point of having no time in this banking transaction. Isn't time very important? For instance, the sequence of events.1:00:00If Dave take out $100, then the timing sequence should be important. How do you treat transactions as streams?1:00:11PROFESSOR: Well, that's the thing I'm saying. This is an example where you can't. You can't. The point is what comes out of here is simply not a function1:00:21of the stream going in here and the stream going in here. It's a function of the stream going in here and the stream going in here and some kind of information about time, which is precisely what a normal-order language won't1:00:31let you say. AUDIENCE: In order to brings this back into a more functional perspective, could we just explicitly time stamp1:00:40all the inputs from Bill and Dave and define fair merge to just be the sort on those time stamps?1:00:49PROFESSOR: Yeah, you can do that. You can do that sort of thing. Another thing you could say is imagine that really what this function is, is that it does a read every microsecond, and1:00:59then if there's none there, that's considered an empty one. That's about equivalent to what you said. And yes, you can do that, but that's a clg. So it's not quite only implementation1:01:09we're worried about. We're worried about expressive power in the language, and what we're running across is a real mismatch between what we can say easily and what we'd like to say.1:01:18AUDIENCE: It sounds like where we're getting hung up with that is the fact it expects one input from both Bill and Dave at the same time. PROFESSOR: It's not quite one, but it's anything you define.1:01:28So you can say Dave can go twice as often, but if anything you predefine, it's not the right thing. You can't decide at some particular function of their1:01:39input requests. Worse yet, I mean, worse yet, there are things that even merge can't do. One thing you might want to do that's even more general is1:01:49suddenly you add somebody else to this bank account system. You go and you add John to this bank account system. And now there's yet another stream that's going to come1:01:58into the picture at some time which we haven't prespecified. So that's something even fair merge can't do, and they're things called-- I forget--1:02:07natagers or something. That's a generalization of fair merge to allow that. There's a whole sort of research discipline saying how far can you push this functional perspective by1:02:16adding more and more mechanism? And how far does that go before the whole thing breaks down and you might as well been using set anyway.1:02:25AUDIENCE: You need to set him up on automatic deposit. [LAUGHTER]1:02:39PROFESSOR: OK, thank you.

`0:00:00`Lecture 7A | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:15PROFESSOR: Well today we're going to learn about something quite amazing. We're going to understand what we mean by a program a little bit more profoundly than we have up till now.0:00:26Up till now, we've been thinking of programs as describing machines. So for example, looking at this still store, we see here0:00:38is a program for factorial. And what it is, is a character string description, if you will, of the wiring diagram of a0:00:49potentially infinite machine. And we can look at that a little bit and just see the idea. That this is a sort of compact notation which says, if n is0:00:580, the result is one. Well here comes n coming into this machine, and if it's 0, then I control this switch in such a way that the switch allows the output to be one.0:01:09Otherwise, it's n times factorial of n minus one. Well, I'm computing factorial of n minus one and multiplying that by n, and, in the case that it's not 0, this switch0:01:19makes the output come from there. Of course, this is a machine with a potentially infinite number of parts, because factorial occurs within factorial, so we don't know how deep it has to be.0:01:31But that's basically what our notation for programs really means to us at this point. It's a character string description, if you will, of a0:01:41wiring diagram that could also be drawn some other way. And, in fact, many people have proposed to me, programming languages look graphical like this. I'm not sure I believe there are many advantages.0:01:51The major disadvantage, of course, is that it takes up more space on a page, and, therefore, it's harder to pack into a listing or to edit very well.0:02:01But in any case, there's something very remarkable that can happen in the competition world which is that you can have something called a universal machine.0:02:10If we look at the second slide, what we see is a special machine called eval.0:02:21There is a machine called eval, and I'm going to show it to you today. It's very simple. What is remarkable is that it will fit on the blackboard.0:02:33However, eval is a machine which takes as input a description of another machine. It could take the wiring diagram of a0:02:42factorial machine as input. Having done so, it becomes a simulator for the factorial0:02:52machine such that, if you put a six in, out comes a 720. That's a very remarkable sort of machine.0:03:02And the most amazing part of it is that it fits on a blackboard. By contrast, one could imagine in the analog electronics world a very different machine, a machine which also0:03:17was, in some sense, universal, where you gave a circuit diagram as one of the inputs, for example, of this little low-pass filter, one-pole low-pass filter.0:03:28And you can imagine that you could, for example, scan this out-- the scan lines are the signal that's describing what this0:03:37machine is to simulate-- then the analog of that which is made out of electrical circuits, should configure itself into a filter that has the frequency response specified0:03:47by the circuit diagram. That's a very hard machine to make, and, surely, there's no chance that I could put it on a blackboard. So we're going to see an amazing thing today.0:03:58We're going to see, on the blackboard, the universal machine. And we'll see that among other things, it's extremely simple. Now, we're getting very close to the real spirit in the0:04:10computer at this point. So I have to show a certain amount of reverence and respect, so I'm going to wear a suit jacket for the only time that you'll ever see me wear a suit jacket here.0:04:20And I think I'm also going to put on an appropriate hat for the occasion. Now, this is a lecturer which I have to warn you--0:04:34let's see, normally, people under 40 and who don't have several children are advised to be careful. If they're really worried, they should leave. Because0:04:44there's a certain amount of mysticism that will appear here which may be disturbing and cause trouble in your minds. Well in any case, let's see, I wish to write for you the0:04:57evaluator for Lisp. Now the evaluator isn't very complicated. It's very much like all the programs we've seen already.0:05:08That's the amazing part of it. It's going to be-- and I'm going to write it right here-- it's a program called eval.0:05:22And it's a procedure of two arguments in expression of an environment.0:05:31And like every interesting procedure, it's a case analysis.0:05:40But before I start on this, I want to tell you some things. The program we're going to write on the blackboard is ugly, dirty, disgusting, not the way I would write this is0:05:52a professional. It is written with concrete syntax, meaning you've got really to use lots of CARs and CDRs which is exactly what I told you not to do.0:06:02That's on purpose in this case, because I want it to be small, compact, fit on the blackboard so you can get the0:06:11whole thing. So I don't want to use long names like I normally use. I want to use CAR-CDR because it's short. Now, that's a trade-off.0:06:20I don't want you writing programs like this. This is purely for an effect. Now, you're going to have to work a little harder to read it, but I'm going to try to make it clear0:06:29as I'm writing it. I'm also-- this is a pretty much complete interpreter, but there's going to be room for putting in more things-- I'm going to leave out definition and assignment,0:06:39just because they are not essential, for a mathematical reason I'll show you later and also they take up more space.0:06:51But, in any case, what do we have to do? We have to do a dispatch which breaks the types of expressions up into particular classes.0:07:02So that's what we're going to have here. Well, what expressions are there? Let's look at the kinds of expressions. We can have things like the numeral three. What do I want that to do?0:07:12I can make choices, but I think right now, I want it to be a three. That's what I want. So that's easy enough. That means I want, if the thing is a number, the0:07:27expression, that I want the expression itself as the answer. Now the next possibility is things that we0:07:37represent as symbols. Examples of symbols are things like x, n, eval, number, x.0:07:47What do I mean them to be? Those are things that stand for other things. Those are the variables of our language. And so I want to be able to say, for example, that x, for0:07:58example, transforms to it's value which might be three. Or I might ask something like car.0:08:07I want to have as its value-- be something like some procedure, which I don't know0:08:17what is inside there, perhaps a machine language code or something like that. So, well, that's easy enough. I'm going to push that off on someone else.0:08:27If something is a symbol, if the expression is a symbol, then I want the answer to be the result, looking up the0:08:38expression in the environment. Now the environment is a dictionary which maps the0:08:52symbol names to their values. And that's all it is. How it's done? Well, we'll see that later. It's very easy.0:09:01It's easy to make data structures that are tables of various sorts. But it's only a table, and this is the access routine for some table.0:09:10Well, the next thing, another kind of expression-- you have things that are described constants that are not numbers, like 'foo.0:09:20Well, for my convenience, I want to syntactically transform that into a list structure which is, quote foo.0:09:35A quoted object, whatever it is, is going to be actually an abbreviation, which is not part of the evaluator but happens somewhere else, an abbreviation for an expression0:09:46that looks like this. This way, I can test for the type of the expression as being a quotation by examining the car of the expression.0:09:58So I'm not going to worry about that in the evaluator. It's happening somewhere earlier in the reader or something. If the expression of the expression is quote, then what0:10:18I want, I want quote foo to itself evaluate to foo. It's a constant.0:10:27This is just a way of saying that this evaluates to itself. What is that? That's the second of the list. It's the second element of the0:10:37list. The second element of the list is it's CADR. So I'm just going to write here, CADR.0:10:51What else do we have here? We have lambda expressions, for example, lambda of x plus x y.0:11:04Well, I going have to have some representation for the procedure which is the value of an expression, of a lambda expression. The procedure here is not the expression lambda x.0:11:13That's the description of it, the textual description. However, what what I going to expect to see here is something which contains an environment as one of its parts if I'm implementing a lexical language.0:11:27And so what I'd like to see is some type flags. I'm going to have to be able to distinguish procedures later, procedures which were produced by lambdas, from ones0:11:37that may be primitive. And so I'm going to have some flag, which I'll just arbitrarily call closure, just for historical reasons.0:11:47Now, to say what parts of this are important. I'm going to need to know the bound variable list and the body. Well, that's the CDR of this, so it's going to be x and plus0:12:00x y and some environment. Now this is not something that users should ever see, this is0:12:13purely a representation, internally, for a procedure object. It contains a bound variable list, a body, and an0:12:22environment, and some type tag saying, I am a procedure. I'm going to make one now. So if the CAR of the expression is quote lambda,0:12:43then what I'm going to put here is-- I'm going to make a list of closure, the CDR of the0:12:58procedure description was everything except the lambda,0:13:07and the current environment. This implements the rule for environments in the environment model. It has to do with construction of procedures from lambda0:13:17expressions. The environment that was around at the time the evaluator encountered the lambda expression is the environment where the procedure resulting interprets0:13:30it's free variables. So that's part of that. And so we have to capture that environment as part of the procedure object.0:13:39And we'll see how that gets used later. There are also conditional expressions of things like COND of say, p one, e one, p two, e two.0:13:54Where this is a predicate, a predicate is a thing that is either true or false, and the expression to be evaluated if the predicate is true.0:14:03A set of clauses, if you will, that's the name for such a thing. So I'm going put that somewhere else. We're going to worry about that in another piece of code.0:14:12So EQ-- if the CAR of the expression is COND, then I'm going to do0:14:24nothing more than evaluate the COND, the CDR of the expression.0:14:34That's all the clauses in the environment that I'm given. Well, there's one more case, arbitrary thing like the sum0:14:46of x and three, where this is an operator applied to operands, and there's nothing special about it.0:14:56It's not one of the special cases, the special forms. These are the special forms.0:15:09And if I were writing here a professional program, again, I would somehow make this data directed. So there wouldn't be a sequence of conditionals here, there'd be a dispatch on some bits if I were trying to do0:15:20this in a more professional way. So that, in fact, I can add to the thing without changing my program much. So, for example, they would run fast, but I'm not worried0:15:29about that. Here we're trying to look at this in its entirety. So it's else. Well, what do we do?0:15:38In this case, I have to somehow do an addition. Well, I could find out what the plus is. I have to find out what the x and the three are.0:15:50And then I have to apply the result of finding what the plus is to the result of finding out what the x and the three are. We'll have a name for that.0:15:59So I'm going to apply the result of evaluating the CAR0:16:11of the expression-- the car of the expression is the operator-- in the environment given.0:16:20So evaluating the operator gets me the procedure. Now I have to evaluate all the operands to get the arguments. I'll call that EVLIST, the CDR of the operands, of the0:16:34expression, with respect to the environment. EVLIST will come up later--0:16:43EVLIST, apply, COND pair, COND, lambda, define. So that what you are seeing here now is pretty much all0:16:53there is in the evaluator itself. It's the case dispatch on the type of the expression with the default being a general application or a combination.0:17:17Now there is lots of things we haven't defined yet. Let's just look at them and see what they are. We're going to have to do this later, evcond. We have to write apply.0:17:27We're going to have to write EVLIST. We're going to write LOOKUP. I think that's everything, isn't there? Everything else is something which is simple, or primitive, or something like that.0:17:38And, of course, we could many more special forms here, but that would be a bad idea in general in a language. You make a language very complicated by putting a lot of things in there.0:17:47The number of reserve words that should exist in a language should be no more than a person could remember on his fingers and toes. And I get very upset with languages which have hundreds0:17:56of reserve words. But that's where the reserve words go. Well, now let's get to the next part of0:18:06this, the kernel, apply. What else is this doing? Well, apply's job is to take a procedure and apply it to its0:18:17arguments after both have been evaluated to come up with a procedure and the arguments rather the operator symbols and the operand symbols, whatever they are-- symbolic expressions.0:18:33So we will define apply to be a procedure of two arguments, a procedure and arguments.0:18:47And what does it do? It does nothing very complicated. It's got two cases. Either the procedure is primitive--0:19:02And I don't know exactly how that is done. It's possible there's some type information just like we made closure for, here, being the description of the type of0:19:14a compound thing-- probably so. But it is not essential how that works, and, in fact, it turns out, as you probably know or have deduced, that you0:19:24don't need any primitives anyway. You can compute anything without them because some of the lambda that I've been playing with.0:19:33But it's nice to have them. So here we're going to do some magic which I'm not going to explain. Go to machine language, apply primop.0:19:42Here's how it adds. Execute an add instruction. However, the interesting part of a language is the glue by0:19:52which the predicates are glued together. So let's look at that. Well, the other possibility is that this is a compound made0:20:01up by executing a lambda expression, this is a compound procedure. Well, we'll check its type.0:20:10If it is closure, if it's one of those, then I have to do an0:20:23eval of the body. The way I do this, the way I deal with this at all, is the way I evaluate the application of a procedure to its arguments, is by evaluating the body of the procedure in0:20:34the environment resulting from extending the environment of the procedure with the bindings of the formal parameters of the procedure to the arguments that0:20:43were passed to it. That was a long sentence. Well that's easy enough.0:20:52Now here's going to be a lot of CAR-CDRing. I have to get the body of the procedure. Where's the body of the procedure in here?0:21:02Well here's the CAR, here's the CDR is the whole rest of this. So here's the CADR. And so I see, what I have here is the body is the second element of the second0:21:11element of the procedure. So it's the CADR of the CADR or the CADADR. It's the C-A-D-A-D-R, CADADR of the procedure.0:21:30To evaluate the body in the result of binding that's making up more environment, well I need the formal0:21:39parameters of the of the procedure, what is that? That's the CAR of the CDR. It's horrible isn't it?0:21:52--of the procedure. Bind that to the arguments that were passed in the environment, which is passed also as part of the procedure.0:22:04Well, that's the CAR of the CDR of the CDR of this, CADADR, of the procedure.0:22:20Bind, eval, pair, COND, lamda, define-- Now, of course, if I were being really a neat character,0:22:29and I was being very careful, I would actually put an extra case here for checking for certain errors like, did you try to apply one to an argument?0:22:39You get a undefined procedure type. So I may as well do that anyway. --else, some sort of error, like that.0:22:57Now, of course, again, in some sort of more real system, written for professional reasons, this would be written0:23:06with a case analysis done by some sort of dispatch. Over here, I would probably have other cases like, is this compiled code?0:23:16It's very important. I might have distinguished the kind of code that's produced by a directly evaluating a lambda in interpretation from code that was produced by somebody's compiler or0:23:25something like that. And we'll talk about that later. Or is this a piece Fortran program I have to go off and execute. It's a perfectly possible thing, at this point, to do that. In fact, in this concrete syntax evaluator I'm writing0:23:36here, there's an assumption built in that this is Lisp, because I'm using CARs and CDRs. CAR means the operator, and CDR means the operand.0:23:46In the text, there is an abstract syntax evaluator for which these could be-- these are given abstract names like operator, and operand, and all these other things are like that.0:23:56And, in that case, you could reprogram it to be ALGOL with no problem. Well, here we have added another couple of things that0:24:07we haven't defined. I don't think I'll worry about these at all, however, this one will be interesting later.0:24:17Let's just proceed through this and get it done. There's only two more blackboards so it can't be very long.0:24:27It's carefully tailored to exactly fit. Well, what do we have left? We have to define EVLIST, which is over here. And EVLIST is nothing more than a map down a bunch of0:24:40operands producing arguments. But I'm going to write it out. And one of the reasons I'm going to write this out is for a mystical reason, which is I want to make this evaluator so0:24:51simple that it can understand itself. I'm going to really worry about that a little bit.0:25:00So let's write it out completely. See, I don't want to worry about whether or not the thing can pass functional arguments. The value evaluator is not going to use them. The evaluator is not going to produce functional values.0:25:10So even if there were a different, alternative language that were very close to this, this evaluates a complex language like Scheme which does allow procedural0:25:19arguments, procedural values, and procedural data. But even if I were evaluating ALGOL, which doesn't allow0:25:28procedural values, I could use this evaluator. And this evaluator is not making any assumptions about that. And, in fact, if this value were to be restricted to not being able to that, it wouldn't matter, because it0:25:37doesn't use any of those clever things. So that's why I'm arranging this to be super simple. This is sort of the kernel of all possible language evaluators.0:25:47How about that? Evlist-- well, what is it? It's the procedure of two arguments, l and an0:25:56environment, where l is a list such that if the list of0:26:06arguments is the empty list, then the result is the empty list. Otherwise, I want to cons up the result of0:26:21evaluating the CAR of the list of operands in the0:26:31environment. So I want the first operand evaluated, and I'm going to make a list of the results by CONSing that onto the result0:26:40of this EVLISTing as a CDR recursion, the CDR of the list relative to the same environment.0:26:53Evlist, cons, else, COND, lambda, define-- And I have one more that I want to put on the blackboard.0:27:03It's the essence of this whole thing. And there's some sort of next layer down.0:27:14Conditionals-- conditionals are the only thing left that are sort of substantial. Then below that, we have to worry about things like lookup and bind, and we'll look at that in a second.0:27:25But of the substantial stuff at this level of detail, next important thing is how you deal with conditionals. Well, how do we have a conditional thing?0:27:37It's a procedure of a set of clauses and an environment.0:27:47And what does it do? It says, if I've no more clauses, well, I have to give0:28:03this a value. It could be that it was an error. Supposing it run off the end of a conditional, it's pretty arbitrary. It's up to me as programmer to choose what I want to happen.0:28:13It's convenient for me, right now, to write down that this has a value which is the empty list, doesn't matter. For error checking, some people might prefer something else.0:28:23But the interesting things are the following ones. If I've got an else clause-- You see, if I have a list of clauses, then each clause is a0:28:34list. And so the predicate part is the CAAR of the clauses.0:28:43It's the CAR, which is the first part of the first clause in the list of clauses. If it's an else, then it means I want my result of the0:28:55conditional to be the result of evaluating the matching expression. So I eval the CADR. So this is the first clause, the second0:29:10element of it, CADAR-- CADAR of a CAR-- of the clauses, with respect to the environment.0:29:26Now the next possibility is more interesting. If it's false, if the first predicate in the predicate list is not an else, and it's not false, if it's not the0:29:38word else, and if it's not a false thing-- Let's write down what it is if it's a false thing. If the result of evaluating the first0:29:49predicate, the clauses-- respect the environment, if that evaluation yields false,0:30:01then it means, I want to look at the next clause. So I want to discard the first one. So we just go around loop, evcond, the CDR of the clauses0:30:15relative to that environment. And otherwise, I had a true clause, in which case, what I0:30:27want is to evaluate the CADAR of the clauses relative to0:30:40that environment. Boy, it's almost done.0:30:51It's quite close to done. I think we're going to finish this part off. So just buzzing through this evaluator, but so far you're seeing almost everything.0:31:01Let's look at the next transparency here. Here is bind.0:31:11Bind is for making more table. And what we are going to do here is make a-- we're going to make a no-frame for an environment structure.0:31:22The environment structure is going to be represented as a list of frames. So given an existing environment structure, I'm going to make a new environment structure by0:31:32consing a new frame onto the existing environment structure, where the new frame consists of the result of pairing up the variables, which are the bound variables0:31:41of the procedure I'm applying, to the values which are the arguments that were passed that procedure. This is just making a list, adding a new element to our0:31:53list of frames, which is an environment structure, to make a new environment. Where pair-up is very simple. Pair-up is nothing more than if I have a list of variables0:32:04and a list of values, well, if I run out of variables and if I run out of values, everything's OK. Otherwise, I've given too many arguments. If I've not run out of variables, but I've run out of0:32:15values, that I have too few arguments. And in the general case, where I don't have any errors, and I'm not done, then I really am just adding a new pair of the0:32:26first variable with the first argument, the first value, onto a list resulting from pairing-up the rest of the0:32:37variables with the rest of the values. Lookup is of course equally simple.0:32:46If I have to look up a symbol in an environment, well, if the environment is empty, then I've got an unbound variable. Otherwise, what I'm going to do is use a special pair list0:32:59lookup procedure, which we'll have very shortly, of the symbol in the first frame of the environment. Since I know the environment is not empty, it must have a first frame.0:33:09So I lookup the symbol in the first frame. That becomes the value cell here. And then, if the value cell is empty, if there is no such0:33:19value cell, then I have to continue and look at the rest of the frames. It means there was nothing found there. So that's a property of ASSQ is it returns emptiness if it0:33:29doesn't find something. but if it did find something, then I'm going to use the CDR of the value cell here, which is the thing that was the pair0:33:38consisting of the variable and the value. So the CDR of it is the value part. Finally, ASSQ is something you've probably seen already.0:33:47ASSQ takes a symbol and a list of pairs, and if the list is empty, it's empty. If the symbol is the first thing in the list--0:33:57That's an error. That should be CAAR, C-A-A-R. Everybody note that.0:34:07Right there, OK? And in any case, if the symbol is the CAAR of the A list,0:34:17then I want the first, the first pair, in the A list. So, in other words, if this is the key matching the right entry,0:34:26otherwise, I want to look up that symbol in the rest. Sorry for producing a bug, bugs appear.0:34:35Well, in any case, you're pretty much seeing the whole thing now. It's a very beautiful thing, even though it's written in an0:34:45ugly style, being the kernel of every language. I suggest that we just-- let's look at it for a while.0:34:56[MUSIC PLAYING]0:35:49Are there any questions?0:36:01Alright, I suppose it's time to take a small break then. [MUSIC PLAYING]0:36:56OK, now we're just going to do a little bit of practice understanding what it is we've just shown you. What we're going to do is go through, in detail, an0:37:05evaluation by informally substituting through the interpreter. And since we have no assignments or definitions in0:37:14this interpreter, we have no possible side effects, and so the we can do substitution with impunity and not worry0:37:23about results. So the particular problem I'd like to look at is it an interesting one. It's the evaluation of quote, open, open, open, lambda of x,0:37:41lambda of y plus x y, lambda, lambda, applied to three,0:37:55applied to four, in some global environment which I'll call e0.0:38:04So what we have here is a procedure of one argument x, which produces as its value a procedure of one argument y, which adds x to y.0:38:14We are applying the procedure of one argument x to three. So x should become three. And the result of that should be procedure of one argument0:38:23y, which will then apply to 4. And there is a very simple case, they will then add those results.0:38:34And now in order to do that, I want to make a very simple environment model. And at this point, you should already have in your mind the environments that this produces.0:38:44But we're going to start out with a global environment, which I'll call e0, which is that.0:38:56And it's going to have in it things, definitions for plus, and times, and--0:39:07using Greek letters, isn't that interesting, for the objects-- and minus, and quotient, and CAR, and CDR, and CONS, and0:39:27EQ, and everything else you might imagine in a global environment. It's got something there for each of those things, something the machine is born with, that's e0.0:39:39Now what does it mean to do this evaluation? Well, we go through the set of special forms. First of all, this is not a number.0:39:48This is not a symbol. Gee, it's not a quoted expression. This is a quoted expression, but that's not what I0:40:00interested in. The question is, whether or not the thing which is quoted is quoted expression? I'm evaluating an expression. This just says it's this particular expression.0:40:11This is not a quoted expression. It's not a thing that begins with lambda. It's not a thing that begins with COND.0:40:22Therefore, it's an application of its of an operated operands. It's a combination. The combination thus has this as the operator and this is0:40:35the operands. Well, that means that what I'm going to do is transform this into apply of eval, of quote, open, open lambda of0:40:54x, lambda of y-- I'm evaluating the operator-- plus x y, in the environment, also e0, with the operands0:41:13that I'm going to apply this to, the arguments being the result of EVLIST, the list containing four, fin e0.0:41:29I'm using this funny notation here for e0 because this should be that environment. I haven't a name for it, because I have no environment0:41:38to name it in. So this is just a representation of what would be a quoted expression, if you will.0:41:47The data structure, which is the environment, goes there. Well, that's what we're seeing here. Well in order to do this, I have to do this, and0:41:57I have to do that. Well this one's easy, so why don't we do that one first. This turns into apply of eval-- just0:42:07copying something now. Most of the substitution rule is copying.0:42:18So I'm going to not say the words when I copy, because it's faster. And then the EVLIST is going to turn into a cons, of eval,0:42:34of four, in e0-- because it was not an empty list-- onto the result of EVLISTing, on the empty list, in e0.0:42:52And I'm going to start leaving out steps soon, because it's going to get boring. But this is basically the same thing as apply, of eval--0:43:07I'm going to keep doing this-- the lambda of x, the lambda of y, plus xy, 3, close, e0.0:43:20I'm a pretty good machine. Well, eval of four, that's meets the question, is it a number. So that's cons, cons of 4.0:43:35And EVLIST of the empty list is the empty list, so that's this. And that's very simple to understand, because that means0:43:46the list containing four itself. So this is nothing more than apply of eval, quote, open,0:43:56open, lambda of x, lambda of y, plus x y, three applied to,0:44:06e0, applied to the list four-- bang. So that's that step.0:44:18Now let's look at the next, more interesting thing. What do I do to evaluate that? Evaluating this means I have to evaluate--0:44:27Well, it's not. It's nothing but an application. It's not one of the special things. If the application of this operator, which we see here--0:44:37here's the operator-- applied to this operands, that combination.0:44:46But we know how to do that, because that's the last case of the conditional. So substituting in for this evaluation, it's apply of eval0:44:56of the operator in the EVLIST of the operands. Well, it's apply, of apply, of eval, of quote, open, lambda0:45:12of x, lambda of y, plus x y, lambda, lambda,0:45:23in environment e0. I'm going to short circuit the evaluation of the operands ,0:45:32because they're the same as they were before. I got a list containing three, apply that, and apply that to four.0:45:42Well let's see. Eval of a lambda expression produces a procedure object.0:45:52So this is apply, of apply, of the procedure object closure,0:46:04which contains the body of the procedure, x, which is lambda-- which binds x [UNINTELLIGIBLE] the internals of the body, it returns the procedure of one0:46:17argument y, which adds x to y. Environment e0 is now captured in it, because this was0:46:27evaluated with respect to e0. e0 is part now of the closure object. Apply that to open, three, close, apply, to open, 4,0:46:40close, apply. So going from this step to this step meant that I made up0:46:50a procedure object which captured in it e0 as part of the procedure object. Now, we're going to pass those to apply. We have to apply this procedure0:47:00to that set of arguments. Well, but that procedure is not primitive. It's, in fact, a thing which has got the tag closure, and,0:47:10therefore, what we have to do is do a bind. We have to bind. A new environment is made at this point, which has as its0:47:21parent environment the one over here, e0, that environment.0:47:30And we'll call this one, e1. Now what's bound in there? x is bound to three. So I have x equal three.0:47:41That's what's in there. And we'll call that e1. So what this transforms into is an eval of the body of0:47:51this, which is this, the body of that procedure, in the environment that you just saw.0:48:00So that's an apply, of eval, quote, open, lambda of y, plus0:48:11x y-- the body-- in e1.0:48:20And apply the result of that to four, open, close, 4-- list of arguments. Well, that's sensible enough because evaluating a lambda, I0:48:31know what to do. That means I apply, the procedure which is closure,0:48:43binds one argument y, adds x to y, with e1 captured in it.0:48:55And you should really see this. I somehow manufactured a closure. I should've put this here. There was one over here too.0:49:06Well, there's one here now. I've captured e1, and this is the procedure of one argument y, whatever this is.0:49:17That's what that is there, that closure. I'm going to apply that to four.0:49:30Well, that's easy enough. That means I have to make a new environment by copying0:49:39this pointer, which was the pointer of the procedure, which binds y equal 4 with that environment.0:49:49And here's my new environment, which I'll call e2. And, of course, this application then is evaluate0:49:58the body in e2. So this is eval, the body, which is plus x y, in the0:50:10environment e2. But this is an application, so this is the apply, of eval,0:50:22plus in e2, an EVLIST, quote, open, x y, in e2.0:50:44Well, but let's see. That is apply, the object which is a result of that and plus.0:50:54So here we are in e2, plus is not here, it's not here, oh, yes, but's here as some primitive operator. So it's the primitive operator for addition.0:51:08Apply that to the result of evaluating x and y in e2. But we can see that x is three and y is four.0:51:18So that's a three and four, here. And that magically produces for me a seven.0:51:30I wanted to go through this so you would see, essentially, one important ingredient, which is what's being passed around, and who owns what, and what his job is.0:51:40So what do we have here? We have eval, and we have apply, the two main players.0:51:49And there is a big loop the goes around like this. Which is eval produces a procedure and0:52:00arguments for apply. Now some things eval could do by itself.0:52:09Those are little self things here. They're not interesting. Also eval evaluates all of the arguments, one after another. That's not very interesting. Apply can apply some procedures like plus, not very0:52:21interesting. However, if apply can't apply a procedure like plus, it produces an expression and environment for eval.0:52:35The procedural arguments wrap up essentially the state of a computation and, certainly, the expression of environment. And so what we're actually going to do next is not the0:52:45complete state, because it doesn't say who wants the answers. But what we're going to do-- it's always got something like an expression of environment or procedure and arguments as0:52:56the main loop that we're going around. There are minor little sub loops like eval through EVLIST, or eval through evcond, or apply through a0:53:11primitive apply. But they're not the essential things. So that's what I wanted you to see.0:53:21Are there any questions? Yes. AUDIENCE: I'm trying to understand how x got down to0:53:32three instead of four. At the early part of the-- PROFESSOR: Here.0:53:41You want to know how x got down to three? AUDIENCE: Because x is the outer procedure, and x and y are the inner procedure.0:53:51PROFESSOR: Fine. Well, I was very careful and mechanical. First of all, I should write those procedures again for you, pretty printed.0:54:00First order of business, because you're probably not reading them well. So I have here that procedure of-- was it x over there--0:54:11which is-- value of that procedure of y, which adds x to y, lambda,0:54:20lambda, applied that to three, takes the result of that, and applied that to four. Is that not what I wrote? Now, you should immediately see that here is an0:54:34application-- let me get a white piece of chalk-- here is an application, a combination.0:54:44That combination has this as the operator and this as the operand. The three is going in for the x here.0:54:54The result of this is a procedure of one argument y, which gets applied to four. So you just weren't reading the expression right.0:55:04The way you see that over here is that here I have the actual procedure object, x.0:55:13It's getting applied to three, the list containing three. What I'm left over with is something which gets applied to four.0:55:24Are there any other questions? Time for our next small break then. Thank you.0:55:33[MUSIC PLAYING]0:56:08Let's see, at this point, you should be getting the feeling, what's this nonsense this Sussman character is feeding me?0:56:20There's an awful lot of strange nonsense here. After all, he purported to explain to me Lisp, and he wrote me a Lisp program on the blackboard.0:56:30The Lisp program was intended to be interpreted for Lisp, but you need a Lisp interpreter in order to understand that program. How could that program have told me anything there is to0:56:41be known about Lisp? How is that not completely vacuous? It's a very strange thing.0:56:50Does it tell me anything at all? Well, you see, the whole thing is sort of like these Escher's0:56:59hands that we see on this slide. Yes, eval and apply each sort of draw each other and0:57:11construct the real thing, which can sit out and draw itself. Escher was a very brilliant man, he just didn't know the names of these spirits.0:57:23Well, I'm going to do now, is I'm going to try to convince you that both this mean something, and, as a aside,0:57:33I'm going to show you why you don't need definitions. Just turns out that that sort of falls out, why definitions are not essential in a mathematical sense for doing0:57:42all the things we need to do for computing. Well, let's see here. Consider the following small program, what does it mean?0:57:54This is a program for computing exponentials.0:58:07The exponential of x to the nth power is if--0:58:16and is zero, then the result is one. Otherwise, I want the product of x and the result of0:58:29exponentiating x to the n minus one power.0:58:42I think I got it right. Now this is a recursive definition. It's a definition of the exponentiation procedure in0:58:53terms of itself. And, as it has been mentioned before, your high school geometry teacher probably gave you a hard time0:59:03about things like that. Was that justified? Why does this self referential definition make any sense?0:59:13Well, first of all, I'm going to convince you that your high school geometry teacher was I telling you nonsense. Consider the following set of definitions here.0:59:24x plus y equals three, and x minus y equal one.0:59:33Well, gee, this tells you x in terms of y, and this one tells you y in terms of x, presumably. And yet this happens to have a unique solution in x and y.0:59:55However, I could also write two x plus two y is six.1:00:06These two equations have an infinite number solutions.1:00:15And I could write you, for example, x minus y equal 2, and these two equations have no solutions.1:00:29Well, I have here three sets of simultaneous linear equations, this set, this set, and this set.1:00:39But they have different numbers of solutions. The number of solutions is not in the form of the equations. They all three sets have the same form.1:00:48The number of solutions is in the content. I can't tell by looking at the form of a definition whether it makes sense, only by its detailed content.1:00:59What are the coefficients, for example, in the case of linear equations? So I shouldn't expect to be able to tell looking at something like this, from some simple things like, oh yes,1:01:11EXPT is the solution of this recursion equation. Expt is the procedure which if substituted in here,1:01:22gives me EXPT back. I can't tell, looking at this form, whether or not there's a single, unique solution for EXPT, an infinite number of1:01:33solutions, or no solutions. It's got to be how it counts and things like that, the details. And it's harder in programming than linear algebra.1:01:42There aren't too many theorems about it in programming. Well, I want to rewrite these equations a little bit, these over here.1:01:53Because what we're investigating is equations like this. But I want to play a little with equations like this that we understand, just so we get some insight into1:02:02this kind of question. We could rewrite our equations here, say these two, the ones that are interesting, as x equals three minus y, and y1:02:17equals x minus one. What do we call this transformation? This is a linear transformation, t.1:02:29Then what we're getting here is an equation x y equals t of x y.1:02:42What am I looking for? I'm looking for a fixed point of t. The solution is a fixed point of t.1:03:01So the methods we should have for looking for solutions to equations, if I can do it by fixed points, might be applicable.1:03:10If I have a means of finding a solution to an equations by fixed points-- just, might not work-- but it might be applicable to investigating solutions of1:03:21equations like this. But what I want you to feel is that this is an equation.1:03:30It's an expression with several instances of various names which puts a constraint on the name, saying what that1:03:39name could have as its value, rather than some sort of mechanical process of substitution right now. This is an equation which I'm going to try to solve.1:03:51Well, let's play around and solve it. First of all, I want to write down the function which corresponds to t.1:04:00First I want to write down the function which corresponds to t whose fixed point is the answer to this question.1:04:11Well, let's consider the following procedure f. I claim it computes that function. f is that procedure of one argument g, which is that1:04:26procedure of two arguments x and n. Which have the property that if n is zero, then the result1:04:42is one, otherwise, the result is the product of x and g,1:04:56applied to x, and minus n1. g, times, else, COND, lambda, lambda--1:05:11Here f is a procedure, which if I had a solution to that equation, if I had a good exponentiation procedure, and1:05:23I applied f to that procedure, then the result would be a good exponentiation procedure.1:05:37Because, what does it do? Well, all it is is exposing g were a good exponentiation procedure, well then this would produce, as its value, a1:05:48procedure to arguments x and n, such that if n were 0, the result would be one, which is certainly true of exponentiation. Otherwise, it will be the result of multiplying x by the1:05:57exponentiation procedure given to me with x and n minus one as arguments. So if this computed the correct exponentiation for n minus one, then this would be the correct exponentiation for1:06:10exponent n, so this would have been the right exponentiation procedure. So what I really want to say here is E-X-P-T is a fixed1:06:26point of f.1:06:37Now our problem is there might be more than one fixed point. There might be no fixed points. I have to go hunting for the fixed points.1:06:48Got to solve this equation. Well there are various ways to hunt for fixed points. Of course, the one we played with at the beginning of this1:06:58term worked for cosine. Go into radians mode on your calculator and push cosine,1:07:09and just keep doing it, and you get to some number which is about 0.73 or 0.74. I can't remember which.1:07:22By iterating a function, whose fixed point I'm searching for, it is sometimes the case that that function will converge in1:07:32producing the fixed point. I think we luck out in this case, so let's look for it. Let's look at this slide.1:07:48Consider the following sequence of procedures. e0 over here is the procedure which does nothing at all.1:08:02It's the procedure which produces an error for any arguments you give it. It's basically useless.1:08:14Well, however, I can make an approximation. Let's consider it the worst possible approximation to exponentiation, because it does nothing.1:08:26Well, supposing I substituted e0 for g by calling f, as you see over here on e0.1:08:37So you see over here, have e0 there. Then gee, what's e1? e1 is a procedure which exponentiate things to the 0th1:08:47power, with no trouble. It gets the right answer, anything to the zero is one, and it makes an error on anything else.1:08:57Well, now what if I take e1 and I substitute if for g by1:09:06calling f on e1? Oh gosh, I have here a procedure of two arguments.1:09:15Now remember e1 was appropriate for taking exponentiations of 0, for raising to the 0 exponent.1:09:24So here, is n is 0, the result is one, so this guy is good for that too. However, I can use something for raising to the 0th power to multiply it by x to raise something to the first power.1:09:35So e2 is good for both power 0 and one. And e3 is constructed from e2 in the same way.1:09:47And e3, of course, by the same argument is good for powers 0, one, and two. And so I will assert for you, without proof, because the1:10:00proof is horribly difficult. And that's the sort of thing that people called denotational semanticists do. This great idea was invented by Scott and Strachey.1:10:14They're very famous mathematician types who invented the interpretation for these programs that we have that I'm talking to you about right now.1:10:24And they proved, by topology that there is such a fixed point in the cases that we want. But the assertion is E-X-P-T is limit as n goes1:10:41to infinity of em. and And that we've constructed this by the following way.1:10:50--is Well, it's f of, f of, f of, f of, f of-- f applied to anything at all.1:11:01It didn't matter what that was, because, in fact, this always produces an error. Applied to this--1:11:12That's by infinite nesting of f's. So now my problem is to make some infinite things.1:11:22We need some infinite things. How am I going to nest up an f an infinite number of times? I'd better construct this.1:11:32Well, I don't know. How would I make an infinite loop at all? Let's take a very simple infinite loop, the simplest infinite loop imaginable.1:11:43If I were to take that procedure of one argument x which applies x to x and apply that to the procedure of one1:11:57argument x which applies x to x, then this is an infinite loop.1:12:07The reason why this is an infinite loop is as follows. The way I understand this is I substitute the argument for the formal parameter in the body.1:12:18But if I do that, I take for each of these x's, I substitute one of these, making a copy of the original expression I just started with, the1:12:28simplest infinite loop. Now I want to tell you about a particular operator which is1:12:40constructed by a perturbation from this infinite loop. I'll call it y.1:12:52This is called Curry's Paradoxical Combinator of y after a fellow by the name of Curry, who was a logician of the 1930s also.1:13:04And if I have a procedure of one argument f, what's it going to have in it? It's going to have a kind of infinite loop in it, which is1:13:13that procedure of one argument x which applies f to x of x, applied to that procedure of one argument x, which applies1:13:25f to f of x. Now what's this do?1:13:34Suppose we apply y to F. Well, that's easy enough. That's this capital F over here.1:13:46Well, the easiest thing to say there is, I substitute F for here.1:13:55So that's going to give me, basically-- because then I'm going to substitute this for x in here.1:14:08Let me actually do it in steps, so you can see it completely. I'm going to be very careful. This is open, open, lambda of x , capital F, x, x, applied1:14:27to itself, F of x of x.1:14:37Substituting this for this in here, this is F applied to-- what is it--1:14:47substituting this in here, open, open, lambda of x, F, of x and x, applied to lambda of x, F of x of x, F, lambda,1:15:08pair, F. Oh, but what is this? This thing over here that I just computed, is1:15:17this thing over here. But I just wrapped another F around it. So by applying y to F, I make an infinite series of F's.1:15:27If I just let this run forever, I'll just keep making more and more F's outside. I ran an infinite loop which is useless, but it doesn't matter that the inside is useless.1:15:40So y of F is F applied to y of F. So y is a magical thing1:15:53which, when applied to some function, produces the object which is the fixed point of that function, if it exists,1:16:03and if this all works. Because, indeed, if I take y of F and put it into F, I get y of F out.1:16:16Now I want you to think this in terms of the eval-apply interpreter for a bit. I wrote down a whole bunch of recursion equations out there.1:16:28They're simultaneous in the same way these are simultaneous equations. Exponentiation was not a simultaneous equation. It was only one variable I was looking for a meaning for.1:16:38But what Lisp is is the fixed point of the process which says, if I knew what Lisp was and substituted it in for eval, and apply, and so on, on the right hand sides of all1:16:47those recursion equations, then if it was a real good Lisp, is a real one, then the left hand side would also be Lisp.1:16:58So I made sense of that definition. Now whether or not there's an answer isn't so obvious. I can't attack that.1:17:07Now these arguments that I'm giving you now are quite dangerous. Let's look over here. These are limit arguments. We're talking about limits, and it's really calculus, or1:17:17topology, or something like that, a kind of analysis. Now here's an argument that you all believe. And I want to make sure you realize that I could be1:17:27bullshitting you. What is this? u is the sum of 1/2, 1/4, and 1/8, and so on, the sum of a1:17:40geometric series. And, of course, I could play a game here. u minus one is 1/2, plus 1/4, plus 1/8, and so on.1:17:53What I could do here-- oops. There is a parentheses error here. But I can put here two times u minus one is one plus 1/2,1:18:02plus 1/4, plus 1/8. Can I fix that?1:18:14Yes, well. But that gives me back two times u minus one is u,1:18:27therefore, we conclude that u is two. And this actually is true. There's no problem like that. But supposing I did something different.1:18:38Supposing I start up with something which manifestly has no sum. v is one, plus two, plus four, plus 8, plus dot, dot, dot.1:18:47Well, v minus one is surely two, plus four, plus eight, plus dot, dot, dot. v minus one over two, gee, that looks like v again.1:18:57From that I should be able to conclude that-- that's also wrong, apparently. v equals minus one.1:19:12That should be a minus one. And that's certainly a false conclusion.1:19:22So when you play with limits, arguments that may work in one case they may not work in some other case. You have to be very careful.1:19:32The arguments have to be well formed. And I don't know, in general, what the story is about arguments like this.1:19:43We can read a pile of topology and find out. But, surely, at least you understand now, why it might be some meaning to the things we've been writing on the1:19:52blackboard. And you understand what that might mean. So, I suppose, it's almost about time for you to merit1:20:02being made a member of the grand recursive order of lambda calculus hackers. This is the badge. Because you now understand, for example, what it says at1:20:14the very top, y F equals F y F. Thank you. Are there any questions?1:20:24Yes, Lev. AUDIENCE: With this, it seems that then there's no need to define, as you imply, to just remember a value, to apply it later.1:20:34Defines were kind of a side-effect it seemed in the language. [INTERPOSING] are order dependent. Does this eliminate the side-effect from the [INTERPOSING]1:20:43PROFESSOR: The answer is, this is not the way these things were implemented. Define, indeed is implemented as an operation that actually1:20:53modifies an environment structure, changes the frame that the define is executed in.1:21:03And there are many reasons for that, but a lot of this has to do with making an interactive system. What this is saying is that if you've made a system, and you1:21:14know you're not going to do any debugging or anything like that, and you know everything there is all at once, and you want to say, what is the meaning of a final set of equations?1:21:24This gives you a meaning for it. But in order to make an interactive system, where you can change the meaning of one thing without changing everything else, incrementally, you can't do1:21:33that by implementing it this way. Yes. AUDIENCE: Another question on your danger slide.1:21:44It seemed that the two examples that you gave had to do with convergence and non-convergence? And that may or may not have something to do with function1:21:53theory in a way which would lead you to think of it in terms of linear systems, or non-linear systems. How does this convergence relate to being able to see a priori1:22:03what properties of that might be violated? PROFESSOR: I don't know. The answer is, I don't know under what circumstances. I don't know how to translate that into less than an1:22:13hour of talk more. What are the conditions under which, for which we know that these things converge?1:22:22And v, all that was telling you that arguments that are based on convergence are flaky if you don't know the convergence beforehand.1:22:32You can make wrong arguments. You can make deductions, as if you know the answer, and not be stopped somewhere by some obvious contradiction. AUDIENCE: So can we say then that if F is a convergent1:22:43mathematical expression, then the recursion property can be-- PROFESSOR: Well, I think there's a technical kind of F,1:22:52there is a technical description of those F's that have the property that when you iteratively apply them like this, you converge.1:23:03Things that are monotonic, and continuous, and I forgot what else. There is a whole bunch of little conditions like that1:23:12which have this property. Now the real problem is deducing from looking at the F, its definition here, whether not it has those properties, and that's very hard.1:23:22The properties are easy. You can write them down. You can look in a book by Joe Stoy. It's a great book-- Stoy.1:23:31It's called, The Scott-Strachey Method of Denotational Semantics, and it's by Joe Stoy, MIT Press.1:23:47And he works out all this in great detail, enough to horrify you. But it really is readable.1:24:09OK, well, thank you. Time for the bigger break, I suppose.

`0:00:00`Lecture 7B | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING]0:00:16PROFESSOR: Well, let's see. What we did so far was a lot of fun, was it useful for anything?0:00:26I suppose the answer is going to be yes. If these metacircular interpreters are a valuable thing to play with.0:00:38Well, there have been times I spend 50% of my time, over a year, trying various design alternatives by experimenting with them with metacircular interpreters--0:00:49metacircular interpreters like the sort you just saw. Metacircular is because they are defined in terms of themselves in such a way that the language they interpret0:00:58contains itself. Such interpreters are a convenient medium for exploring language issues. If you want to try adding a new feature, it's sort of a0:01:11snap, it's easy, you just do it and see what happens. You play with that language for a while you say, gee, I'm didn't like that, you throw it away.0:01:21Or you might want to see what the difference is if you'd make a slight difference in the binding strategy, or some0:01:30more complicated things that might occur. In fact, these metacircular interpreters are an excellent medium for people exchanging ideas about language design,0:01:44because they're pretty easy to understand, and they're short, and compact, and simple. If I have some idea that I want somebody to criticize0:01:54like say, Dan Friedman at Indiana, I'd write a little metacircular interpreter and send him some network mail0:02:04with this interpreter in it. He could whip it up on his machine and play with it and say, that's no good. And then send it back to me and say, well, why don't you0:02:13try this one, it's a little better. So I want to show you some of that technology. See, because, really, it's the essential, simple technology0:02:24for getting started in designing your own languages for particular purposes. Let's start by adding a very simple feature to a Lisp.0:02:40Now, one thing I want to tell you about is features, before I start.0:02:49There are many languages that have made a mess of themselves by adding huge numbers of features. Computer scientists have a joke about bugs that transform0:03:00it to features all the time. But I like to think of it is that many systems suffer from0:03:10what's called creeping featurism. Which is that George has a pet feature he'd like in the system, so he adds it.0:03:20And then Harry says, gee, this system is no longer what exactly I like, so I'm going to add my favorite feature. And then Jim adds his favorite feature.0:03:30And, after a while, the thing has a manual 500 pages long that no one can understand. And sometimes it's the same person who writes all of these0:03:40features and produces this terribly complicated thing. In some cases, like editors, it's sort of reasonable to have lots of features, because there are a lot of things you0:03:51want to be able to do and many of them arbitrary. But in computer languages, I think it's a disaster to have0:04:00too much stuff in them. The other alternative you get into is something called feeping creaturism, which is where you have a box which has0:04:12a display, a fancy display, and a mouse, and there is all sorts of complexity associated with all this fancy IO.0:04:21And your computer language becomes a dismal, little, tiny thing that barely works because of all the swapping, and disk twitching, and so on, caused by your Windows system.0:04:30And every time you go near the computer, the mouse process wakes up and says, gee do you have something for me to do, and then it goes back to sleep. And if you accidentally push mouse with you elbow, a big0:04:40puff of smoke comes out of your computer and things like that. So two ways to disastrously destroy a system by adding features. But try right now to add a little, simple feature.0:04:52This actually is a good one, and in fact, real Lisps have it. As you've seen, there are procedures like plus and times0:05:03that take any number of arguments. So we can write things like the sum of the product of a and x and x, and the product of b and x and c.0:05:17As you can see here, addition takes three arguments or two arguments, multiplication takes two arguments or three arguments, taking numbers of arguments all of which are to0:05:27be treated in the same way. This is a valuable thing, indefinite numbers of arguments. Yet the particular Lisp system that I showed you is one where0:05:40the numbers of arguments is fixed, because I had to match the arguments against the formal parameters in the binder, where there's a pairup.0:05:50Well, I'd like to be able to define new procedures like this that can have any number of arguments. Well there's several parts to this problem.0:06:01The first part is coming up with the syntactic specification, some way of notating the additional0:06:10arguments, of which you don't know how many there are. And then there's the other thing, which is once we've notated it, how are we going to interpret that notation so0:06:21as to do the right thing, whatever the right thing is? So let's consider an example of a sort of thing we might want to be able to do.0:06:33So an example might be, that I might want to be able to define a procedure which is a procedure of one required argument x and a bunch of arguments, I don't know how0:06:45many there are, called y. So x is required, and there are many y's, many argument--0:07:04y will be the list of them.0:07:14Now, with such a thing, we might be able to say something like, map-- I'm going to do something to every one-- of that procedure of one argument u, which multiplies x0:07:30by u, and we'll apply that to y. I've used a dot here to indicate that the thing after0:07:41this is a list of all the rest of the arguments. I'm making a syntactic specification.0:07:53Now, what this depends upon, the reason why this is sort of a reasonable thing to do, is because this happens to be a syntax that's used in the Lisp reader for0:08:04representing conses. We've never introduced that before. You may have seen when playing with the system that if you0:08:13cons two things together, you get the first, space, dot, the second, space-- the first, space, dot, space, the second with parentheses0:08:23around the whole thing. So that, for example, this x dot y corresponds to a pair,0:08:36which has got an x in it and a y in it. The other notations that you've seen so far are things0:08:45like a procedure of arguments x and y and z which do things,0:08:55and that looks like-- Just looking at the bound variable list, it looks like0:09:04this, x, y, z, and the empty thing.0:09:18If I have a list of arguments I wish to match this against, supposing, I have a list of arguments one, two, three, I want to match these against. So I might have here a list of0:09:36three things, one, two, three.0:09:48And I want to match x, y, z against one, two, three. Well, it's clear that the one matches the x, because I can just sort of follow the structure, and the two matches0:10:00the y, and the three matches the z. But now, supposing I were to compare this x dot y--0:10:09this is x dot y-- supposing I compare that with a list of three arguments, one, two, three.0:10:18Let's look at that again.0:10:28One, two, three-- Well, I can walk along here and say, oh yes, x matches the one, the y matches the list, which is two and three.0:10:43So the notation I'm choosing here is one that's very natural for Lisp system.0:10:52But I'm going to choose this as a notation for representing a bunch of arguments. Now, there's an alternative possibility. If I don't want to take one special out, or two special0:11:03ones out or something like that, if I don't want to do that, if I want to talk about just the list of all the arguments like in addition, well then the argument list0:11:16I'm going to choose to be that procedure of all the arguments x which does something with x.0:11:25And which, for example, if I take the procedure, which takes all the arguments x and returned the list of them,0:11:35that's list. That's the procedure list.0:11:45How does this work? Well, indeed what I had as the bound variable list in this case, whatever it is, is being matched against a list of arguments.0:11:55This symbol now is all of the arguments. And so this is the choice I'm making for a particular syntactic specification, for the description of procedures0:12:08which take indefinite numbers of arguments. There are two cases of it, this one and this one.0:12:18When you make syntactic specifications, it's important that it's unambiguous, that neither of these can be confused with a representation we already have, this one.0:12:33I can always tell whether I have a fixed number of explicitly named arguments made by these formal parameters, or a fixed number of named formal parameters0:12:45followed by a thing which picks up all the rest of them, or a list of all the arguments which will be matched against0:12:54this particular formal parameter called x, because these are syntactically distinguishable. Many languages make terrible errors in that form where0:13:05whole segments of interpretation are cut off, because there are syntactic ambiguities in the language.0:13:14They are the traditional problems with ALGOL like languages having to do with the nesting of ifs in the predicate part.0:13:25In any case, now, so I've told you about the syntax, now, what are we going to do about the semantics of this?0:13:35How do we interpret it? Well this is just super easy. I'm going to modify the metacircular interpreter to do it. And that's a one liner.0:13:46There it is. I'm changing the way you pair things up.0:13:56Here's the procedure that pairs the variables, the0:14:06formal parameters, with the arguments that were passed from the last description of the metacircular interpreter.0:14:18And here's some things that are the same as they were before. In other words, if the list of variables is empty, then if the list of values is empty, then I have an empty list.0:14:31Otherwise, I have too many arguments, that is, if I have empty variables but not empty values.0:14:41If I have empty values, but the variables are not empty, I have too few arguments.0:14:50The variables are a symbol-- interesting case-- then, what I should do is say, oh yes, this is the special0:15:04case that I have a symbolic tail. I have here a thing just like we looked over here.0:15:14This is a tail which is a symbol, y. It's not a nil. It's not the empty list. Here's a symbolic tail that is0:15:24just the very beginning of the tail. There is nothing else. In that case, I wish to match that variable with all the0:15:36values and add that to the pairing that I'm making. Otherwise, I go through the normal arrangement of making0:15:47up the whole pairing. I suppose that's very simple. And that's all there is to it.0:15:57And now I'll answer some questions. The first one-- Are there any questions?0:16:06Yes? AUDIENCE: Could you explain that third form? PROFESSOR: This one? Well, maybe we should look at the thing as a0:16:15piece of list structure. This is a procedure which contains a lambda.0:16:25I'm just looking at the list structure which represents this. Here's x. These are our symbols.0:16:37And then the body is nothing but x. If I were looking for the bound variable list part of0:16:48this procedure, I would go looking at the CADR, and I'd find a symbol. So the, naturally, which is this pairup thing I just showed you, is going to be matching a symbolic object0:17:01against a list of arguments that were passed. And it will bind that symbol to the list of arguments.0:17:13In this case, if I'm looking for it, the match will be against this in the bound variable list position.0:17:24Now, if what this does is it gets a list of arguments and returns it, that's list. That's what the procedure is.0:17:34Oh well, thank you. Let's take a break. [MUSIC PLAYING]0:18:20PROFESSOR: Well let's see. Now, I'm going to tell you about a rather more substantial variation, one that's a famous variation that0:18:32many early Lisps had. It's called dynamic binding of variables.0:18:41And we'll investigate a little bit about that right now. I'm going to first introduce this by showing you the sort of thing that would make someone want this idea.0:18:53I'm not going to tell what it is yet, I'm going to show you why you might want it. Suppose, for example, we looked at the sum procedure0:19:02again for summing up a bunch of things. To be that procedure, of a term, lower bound, method of0:19:15computing the next index, and upper bound, such that, if a0:19:25is greater than b then the result is 0, otherwise, it's0:19:34the sum, of the term, procedure, applied to a and the result of adding up, terms, with the next a being0:19:51the a, the next procedure passed along, and the upper0:20:06bound being passed along. Blink, blink, blink--0:20:18Now, when I use this sum procedure, I can use it, for example, like this. We can define the sum of the powers to be, for example, sum0:20:38of a bunch of powers x to the n, to be that procedure of a, b, and n-- lower bound, the upper bound, and n--0:20:48which is sum, of lambda of x, the procedure of one argument x, which exponentiates x to the n, with the a, the0:21:05incrementer, and b, being passed along. So we're adding up x to n, given an x.0:21:16x takes on values from a to b, incrementing by one. I can also write the--0:21:27That's right. Product, excuse me. The product of a bunch of powers.0:21:38It's a strange name. I'm going to leave it there. Weird-- I write up what I have. I'm sure that's right.0:21:50And if I want the product of a bunch of powers-- That was 12 brain cells, that double-take.0:22:03I can for example use the procedure which is like sum, which is for making products, but it's similar to that, that you've seen before. There's a procedure of three arguments again.0:22:16Which is the product of terms that are constructed, or factors in this case, constructed from0:22:26exponentiating x to the n, where I start with a, I0:22:35increment, and I go to b. Now, there's some sort of thing here that should disturb0:22:48you immediately. These look the same. Why am I writing this code so many times? Here I am, in the same boat I've been in before.0:23:01Wouldn't it be nice to make an abstraction here? What's an example of a good abstraction to make? Well, I see some codes that's identical. Here's one, and here's another.0:23:14And so maybe I should be able to pull that out. I should be able to say, oh yes, the sum of the powers could be written in terms of something called0:23:23the nth power procedure. Imagine somebody wanted to write a slightly different procedure that looks like this.0:23:37The sum powers to be a procedure of a, b, and n, as0:23:49the result of summing up the nth power. We're going to give a name to that idea, for starting at a,0:23:59going by one, and ending at b. And similarly, I might want to write the product powers this0:24:12way, abstracting out this idea. I might want this.0:24:22Product powers, to be a procedure of a, b, and n,0:24:35which is the product of the nth power operation on a with0:24:47the incrementation and b being my arguments for the0:24:56analogous-thing product. And I'd like to be able to define, I'd like to be able to define nth power-- I'll put it over here.0:25:11I'll put it at the top.0:25:25--to be, in fact, my procedure of one argument x which is the result of exponentiating x to the n.0:25:35But I have a problem. My environment model, that is my means of interpretation for0:25:44the language that we've defined so far, does not give me a meaning for this n. Because, as you know, this n is free in this procedure.0:26:06The environment model tells us that the meaning of a free variable is determined in the environment in which this procedure is defined.0:26:16In a way I have written it, assuming these things are defined on the blackboard as is, this is defined in the global environment, where there is no end.0:26:25Therefore, n is unbound variable. But it's perfectly clear, to most of us, that we would like it to be this n and this n.0:26:38On the other hand, it would be nice. Certainly we've got to be careful here of keeping this to be this, and this one over here, wherever it0:26:51is to be this one. Well, the desire to make this work has led to0:27:01a very famous bug. I'll tell you about the famous bug. Look at this slide.0:27:10This is an idea called dynamic binding. Where, instead of the free variable being interpreted in the environment of definition of a procedure, the free0:27:22variable is interpreted as having its value in the environment of the caller of the procedure.0:27:31So what you have is a system where you search up the chain of callers of a particular procedure, and, of course, in0:27:41this case, since nth power is called from inside product whatever it is-- I had to write our own sum which is the analogous procedure--0:27:50and product is presumably called from product powers, as you see over here, then since product powers bind with variable n , then nth powers n would be derived0:28:03through that chain. Similarly, this n, the nth power in n in this case, would0:28:12come through nth power here being called from inside sum. You can see it being called from inside sum here. It's called term here.0:28:22But sum was called from inside of sum powers, which bound n. Therefore, there would be an n available for that n to get0:28:35it's value from. What we have below this white line plus over here, is what's called a dynamic binding view of the world.0:28:46If that works, that's a dynamic binding view. Now, let's take a look, for example, at just what it takes0:28:55to implement that. That's real easy. In fact, the very first Lisps that had any interpretations of the free variables at all, had dynamic binding0:29:04interpretations for the free variables. APL has dynamic binding interpretation for the free variables, not lexical or static binding.0:29:15So, of course, the change is in eval. And it's really in two places. First of all, one thing we see, is that things become a0:29:27little simpler. If I don't have to have the environment be the environment of definition for procedure, the procedure need not capture0:29:38the environment at the time it's defined. And so if we look here at this slide, we see that the clause0:29:47for a lambda expression, which is the way a procedure is defined, does not make up a thing which has a type closure0:29:57and a attached environment structure. It's just the expression itself. And we'll decompose that some other way somewhere else.0:30:06The other thing we see is the applicator must be able to get the environment of the caller. The caller of a procedure is right here.0:30:19If the expression we're evaluating is anpplication or a combination, then we're going to call a procedure which is the value of the operator.0:30:29The environment of the caller is the environment we have right here, available now. So all I have to do is pass that environment to the0:30:38applicator, to apply. And if we look at that here, the only change we have to make is that fellow takes that environment and uses that0:30:49environment for the purpose of extending that environment when abiding the formal parameters of the procedure to0:31:00the arguments that were passed, not an environment that was captured in the procedure. The reason why the first Lisps were implemented this way, is0:31:09the sort of the obvious, accidental implementation. And, of course, as usual, people got used to it and liked it. And there were some people said, this is0:31:18the way to do it. Unfortunately that causes some serious problems. The most important, serious problem in using dynamic binding is0:31:31there's a modularity crisis that's involved it. If two people are working together on some big system, then an important thing to want is that the names used by0:31:41each one don't interfere with the names of the other. It's important that when I invent some segment of code0:31:51that no one can make my code stop working by using my names that I use internal to my code, internal to his code. However, dynamic binding violates that particular0:32:03modularity constraint in a clear way. Consider, for example, what happens over here.0:32:12Suppose it was the case that I decided to change the word next. Supposing somebody is writing sum, and somebody else is0:32:25going to use sum. The writer of sum has a choice of what names he may use. Let's say, I'm that writer.0:32:36Well, by gosh, just happens I didn't want to call this next. I called it n. So all places where you see next, I called it n.0:32:48Whoops. I changed nothing about the specifications of this program, but this program stops working. Not only that, unfortunately, this one does too.0:32:59Why do these programs stop working? Well, it's sort of clear. Instead of chasing out the value of the n that occurs in0:33:09nth power over here or over here, through the environment of definition, where this one is always linked to this one,0:33:19if it was through the environment of definition, because here is the definition. This lambda expression was executed in the environment where that n was defined.0:33:30If instead of doing that, I have to chase through the call chain, then look what horrible thing happens. Well, this was called from inside sum as term, term a.0:33:44I'm looking for a value of n. Instead of getting this one, I get that one. So by changing the insides of this program, this program0:33:53stops working. So I no longer have a quantifier, as I described before.0:34:02The lambda symbol is supposed to be a quantifier. A thing which has the property that the names that are bound by it are unimportant, that I can uniformly substitute any0:34:14names for these throughout this thing, so long as they don't occur in here, the new names, and the meaning of this expression should remain unchanged.0:34:24I've just changed the meaning of the expression by changing the one of the names. So lambda is no longer a well defined idea. It's a very serious problem.0:34:34So for that reason, I and my buddies have given up this particular kind of abstraction, which I would0:34:43like to have, in favor of a modularity principle. But this is the kind of experiment you can do if you0:34:52want to play with these interpreters. You can try them out this way, that way, and the other way. You see what makes a nicer language.0:35:02So that's a very important thing to be able to do. Now, I would like to give you a feeling for I think the right thing to do is here. How are you going to I get this kind of power in a0:35:14lexical system? And the answer is, of course, what I really want is a something that makes up for me an exponentiator for a particular n.0:35:23Given an n, it will make me an exponentiator. Oh, but that's easy too. In other words, I can write my program this way.0:35:35I'm going to define a thing called PGEN, which is a procedure of n which produces for me an exponentiator.0:35:50--x to the n. Given that I have that, then I can capture the abstraction I0:36:00wanted even better, because now it's encapsulated in a way where I can't be destroyed by a change of names. I can define some powers to be a procedure again of a, b, and0:36:20n which is the sum of the term function generated by using this generator, PGEN, n, with a, incrementer, and b.0:36:42And I can define the product of powers to be a procedure of0:36:57a, b, and n which is the product PGEN, n, with a,0:37:09increment, and b. Now, of course, this is a very simple example where this object that I'm trying to abstract over is small. But it could be a 100 lines of code.0:37:20And so, the purpose of this is, of course, to make it simple. I'd give a name to it, it's just that here it's a parameterized name. It's a name that depends upon, explicitly, the lexically0:37:31apparent value of n. So you can think of this as a long name.0:37:40And here, I've solved my problem by naming the term generation procedures within an n in them.0:37:55Are there any questions? Oh, yes, David. AUDIENCE: Is the only solution to the problem you raise to0:38:04create another procedure? In other words, can this only work in languages that are capable of defining objects as procedures? PROFESSOR: Oh, I see.0:38:16My solution to making this abstraction, when I didn't want include the procedure inside the body, depends upon my ability to return a procedure or export one.0:38:28And that's right. If I don't have that, then I just don't have this ability to make an abstraction in a way where I don't have0:38:39possibilities of symbol conflicts that were unanticipated. That's right. I consider being able to return the procedural value0:38:52and, therefore, to sort of have first class procedures, in general, as being essential to doing very good modular0:39:01programming. Now, indeed there are many other ways to skin this cat. What you can do is take for each of the bad things that0:39:10you have to worry about, you can make a special feature that covers that thing. You can make a package system. You can make a module system as in Ada, et cetera.0:39:22And all of those work, or they cover little regions of it. The thing is that returning procedures as values cover all of those problems. And so it's the simplest mechanism that0:39:35gives you the best modularity, gives you all of the known modularity mechanisms.0:39:45Well, I suppose it's time for the next break, thank you. [MUSIC PLAYING]0:40:41PROFESSOR: Well, yesterday when you learned about streams, Hal worried to you about the order of evaluation0:40:52and delayed arguments to procedures. The way we played with streams yesterday, it was the responsibility of the caller and the callee to both agree0:41:07that an argument was delayed, and the callee must force the argument if it needs the answer. So there had to be a lot of hand shaking between the0:41:18designer of a procedure and user of it over delayedness. That turns out, of course, to be a fairly bad thing, it0:41:29works all right with streams. But as a general thing, what you want is an idea to have a locus, a decision, a design decision in general, to have a place where it's made,0:41:40explicitly, and notated in a clear way. And so it's not a very good idea to have to have an agreement, between the person who writes a procedure and the0:41:52person who calls it, about such details as, maybe, the arguments of evaluation, the order of evaluation. Although, that's not so bad. I mean, we have other such agreements like,0:42:02the input's a number. But it would be nice if only one of these guys could take responsibility, completely.0:42:11Now this is not a new idea. ALGOL 60 had two different ways of calling a procedure.0:42:22The arguments could be passed by name or by value. And what that meant was that a name argument was delayed.0:42:31That when you passed an argument by name, that its value would only be obtained if you accessed that argument.0:42:42So what I'd like to do now is show you, first of all, a little bit about, again, we're going to make a modification to a language. In this case, we're going to add a feature.0:42:53We're going to add the feature of, by name parameters, if you will, or delayed parameters. Because, in fact, the default in our Lisp system is by the0:43:05value of a pointer. A pointer is copied, but the data structure it points at is not. But I'd like to, in fact, show you is how you add name0:43:17arguments as well. Now again, why would we need such a thing? Well supposing we wanted to invent certain kinds of what0:43:26otherwise would be special forms, reserve words? But I'd rather not take up reserve words. I want procedures that can do things like if.0:43:36If is special, or cond, or whatever it is. It's the same thing. It's special in that it determines whether or not to evaluate the consequent or the alternative based on the value0:43:48of the predicate part of an expression. So taking the value of one thing determines whether or not to do something else.0:43:57Whereas all the procedures like plus, the ones that we can define right now, evaluate all of their arguments before application.0:44:08So, for example, supposing I wish to be able to define something like the reverse of if in terms of if.0:44:19Call it unless. We've a predicate, a consequent, and an alternative.0:44:28Now what I would like to sort of be able to do is say-- oh, I'll do it in terms of cond. Cond, if not the predicate, then take the consequent,0:44:41otherwise, take the alternative.0:44:51Now, what I'd like this to mean, is supposing I do something like this. I'd like this unless say if equals one, 0, then the answer0:45:05is two, otherwise, the quotient of one and 0.0:45:15What I'd like that to mean is the result of substituting equal one, 0, and two, and the quotient of one, 0 for p, c, and a.0:45:25I'd like that to mean, and this is funny, I'd like it to transform into or mean cond not equal one, 0, then the0:45:40result is two, otherwise I want it to be the quotient one and 0.0:45:54Now, you know that if I were to type this into Lisp, I'd get a two. There's no problem with that. However, if I were to type this into Lisp, because all0:46:05the arguments are evaluated before I start, then I'm going to get an error out of this. So that if the substitutions work at all, of course, I0:46:16would get the right answer. But here's a case where the substitutions don't work. I don't get the wrong answer. I get no answer. I get an error.0:46:28Now, however, I'd like to be able to make my definition so that this kind of thing works. What I want to do is say something special about c and a.0:46:39I want them to be delayed automatically. I don't want them to be evaluated at the time I call.0:46:51So I'm going to make a declaration, and then I'm going to see how to implement such a declaration. But again, I want you to say to yourself, oh, this is an interesting kluge he's adding in here.0:47:02The piles of kluges make a big complicated mess. And is this going to foul up something else that might occur. First of all, is it syntactically unambiguous?0:47:13Well, it will be syntactically unambiguous with what we've seen so far. But what I'm going to do may, in fact, cause trouble. It may be that the thing I had will conflict with type0:47:25declarations I might want to add in the future for giving some system, some compiler or something, the ability to optimize given the types are known.0:47:34Or it might conflict with other types of declarations I might want to make about the formal parameters. So I'm not making a general mechanism here where I can add0:47:44declarations. And I would like to be able to do that. But I don't want to talk about that right now. So here I'm going to do, I'm going to build a kluge.0:47:57So we're going to define unless of a predicate--0:48:08and I'm going to call these by name-- the consequent, and name the alternative.0:48:19Huh, huh-- I got caught in the corner.0:48:31If not p then the result is c, else--0:48:40that's what I'd like. Where I can explicitly declare certain of the parameters to0:48:49be delayed, to be computed later. Now, this is actually a very complicated modification to an interpreter rather than a simple one.0:49:00The ones you saw before, dynamic binding or adding indefinite argument procedures, is relatively simple.0:49:09But this one changes a basic strategy. The problem here is that our interpreter, as written,0:49:18evaluates a combination by evaluating the procedure, the operator producing the procedure, and evaluating the operands producing the arguments, and then doing0:49:31apply of the procedure to the arguments. However, here, I don't want to evaluate the operands to0:49:40produce the arguments until after I examined the procedure to see what the procedure's declarations look like.0:49:49So let's look at that. Here we have a changed evaluator. I'm starting with the simple lexical evaluator, not0:50:02dynamic, but we're going to have to do something sort of similar in some ways. Because of the fact that, if I delay a procedure--0:50:13I'm sorry-- delay an argument to a procedure, I'm going to have to attach and environment to it. Remember how Hal implemented delay.0:50:23Hal implemented delay as being a procedure of no arguments which does some expression. That's what delay of the expression is.0:50:35--of that expression. This turned into something like this.0:50:44Now, however, if I evaluate a lambda expression, I have to capture the environment. The reason why is because there are variables in there0:50:56who's meaning I wish to derive from the context where this was written. So that's why a lambda does the job.0:51:06It's the right thing. And such that the forcing of a delayed expression was same0:51:17thing as calling that with no arguments. It's just the opposite of this. Producing an environment of the call which is, in fact,0:51:28the environment where this was defined with an extra frame in it that's empty. I don't care about that. Well, if we go back to this slide, since it's the case, if0:51:42we look at this for a second, everything is the same as it was before except the case of applications or combinations.0:51:51And combinations are going to do two things. One, is I have to evaluate the procedure-- forget the procedure-- by evaluating the operator.0:52:00That's what you see right here. I have to make sure that that's current, that is not a delayed object, and evaluate that to the point where it's forced now.0:52:10And then I have to somehow apply that to the operands. But I have to keep the environment, pass that0:52:20environmental along. So some of those operands I may have to delay. I may have to attach that environment to those operands.0:52:29This is a rather complicated thing happening here. Looking at that in apply. Apply, well it has a primitive procedure0:52:39thing just like before. But the compound one is a little more interesting. I have to evaluate the body, just as before, in an0:52:50environment which is the result of binding some formal parameters to arguments in the environment.0:53:00That's true. The environment is the one that comes from the procedure now. It's a lexical language, statically bound. However, one thing I have to do is strip off the0:53:11declarations to get the names of the variables. That's what this guy does, vnames. And the other thing I have to do is process these declarations, deciding which of these operands--0:53:21that's the operands now, as opposed to the arguments-- which of these operands to evaluate, and which of them are to be encapsulated in delays of some sort.0:53:37The other thing you see here is that we got a primitive, a primitive like plus, had better get at the real operands. So here is a place where we're going to have to force them.0:53:47And we're going to look at what evlist is going to have to do a bunch of forces. So we have two different kinds of evlist now. We have evlist and gevlist. Gevlist is going to wrap delays around some things and force others, evaluate others.0:53:59And this guy's going to do some forcing of things. Just looking at this a little bit, this is a game you must0:54:10play for yourself, you know. It's not something that you're going to see all possible variations on an evaluator talking to me.0:54:19What you have to do is do this for yourself. And after you feel this, you play this a bit, you get to see all the possible design decisions and what they might mean, and how they interact with each other.0:54:29So what languages might have in them. And what are some of the consistent sets that make a legitimate language. Whereas what things are complicated kluges that are0:54:39just piles of junk. So evlist of course, over here, just as I said, is a list of operands which are going to be undelayed after0:54:49evaluation. So these are going to be forced, whatever that's going to mean. And gevlist, which is the next thing--0:55:01Thank you. What we see here, well there's a couple of possibilities. Either it's a normal, ordinary thing, a symbol sitting there0:55:13like the predicate in the unless, and that's what we have here. In which case, this is intended to be evaluated in applicative order.0:55:23And it's, essentially, just what we had before. It's mapping eval down the list. In other words, I evaluate the first expression and continue gevlisting the0:55:35CDR of the expression in the environment. However, it's possible that this is a name parameter. If it's a name parameter, I want to put a delay in which0:55:47combines that expression, which I'm calling by name, with the environment that's available at this time and0:55:59passing that as the parameter. And this is part of the mapping process that you see here.0:56:09The only other interesting place in this interpreter is cond. People tend to write this thing, and then they leave this one out.0:56:18There's a place where you have to force. Conditionals have to know whether or not the answer is true or false. It's like a primitive.0:56:28When you do a conditional, you have to force. Now, I'm not going to look at any more of this in any detail. It isn't very exciting. And what's left is how you make delays.0:56:38Well, delays are data structures which contain an expression, an environment, and a type on them. And it says they're a thunk. That comes from ALGOL language, and it's claimed to0:56:50be the sound of something being pushed on a stack. I don't know. I was not an ALGOLician or an ALGOLite or whatever, so I don't know. But that's what was claimed.0:57:00And undelay is something which will recursively undelay thunks until the thunk becomes something which isn't a thunk. This is the way you implement a call by name0:57:09like thing in ALGOL. And that's about all there is. Are there any questions?0:57:26AUDIENCE: Gerry? PROFESSOR: Yes, Vesko? AUDIENCE: I noticed you avoided calling by name in the primitive procedures, I was wondering what0:57:38cause you have on that? You never need that? PROFESSOR: Vesko is asking if it's ever reasonable to call a primitive procedure by name?0:57:47The answer is, yes. There's one particular case where it's reasonable, actually two.0:57:56Construction of a data structure like cons where making an array if you have arrays with any number of elements. It's unnecessary to evaluate those arguments.0:58:07All you need is promises to evaluate those arguments if you look at them. If I cons together two things, then I could cons together the0:58:17promises just as easily as I can cons together the things. And it's not even when I CAR CDR them that I have to look at them. That just gets out the promises and0:58:26passes them to somebody. That's why the lambda calculus definition, the Alonzo Church definition of CAR, CDR, and cons makes sense. It's because no work is done in CAR, CDR, and cons, it's0:58:36just shuffling data, it's just routing, if you will. However, the things that do have to look at data are things like plus.0:58:45Because they have a look at the bits that the numbers are made out of, unless they're lambda calculus numbers which are funny. They have to look at the bits to be able to crunch them0:58:54together to do the add. So, in fact, data constructors, data selectors,0:59:03and, in fact, things that side-effect data objects don't need to do any forcing in the laziest possible interpreters.0:59:16On the other hand predicates on data structures have to. Is this a pair? Or is it a symbol? Well, you better find out.0:59:25You got to look at it then. Any other questions?0:59:40Oh, well, I suppose it's time for a break. Thank you. [MUSIC PLAYING]1:00:02and

`0:00:00`Lecture 8A | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING BY J.S. BACH]0:00:17PROFESSOR: The last time we began having a look at how languages are constructed. Remember the main point that an evaluator for, LISP, say,0:00:26has two main elements. There is EVAL, and EVAL's job is to take in an expression0:00:36and an environment and turn that into a procedure and some arguments and pass that off to APPLY.0:00:49And APPLY takes the procedure in the arguments, turns that back into, in a general case, another expression to be evaluated in another environment and passes that0:00:58off to EVAL, which passes it to APPLY, and there's this whole big circle where things go around and around and around until you get either to some very primitive data or to a primitive procedure.0:01:07See, what this cycle has to do with is unwinding the means of combination and the means of abstraction in the language. So for instance, you have a procedure in LISP-- a0:01:17procedure is a general way of saying, I want to be able to evaluate this expression for any value of the arguments, and that's sort of what's going on here.0:01:27That's what APPLY does. It says the general thing coming in with the arguments reduces to the expression that's the body, and then if that's a compound expression or another procedure application, the thing will go around and around the circle.0:01:40Anyway, that's sort of the basic structure of gee, pretty much any interpreter. The other thing that you saw is once you have the interpreter in your hands, you have all this power to start0:01:49playing with the language. So you can make it dynamically scoped, or you can put in normal order evaluation, or you can add new forms to the language, whatever you like. Or more generally, there's this notion of metalinguistic0:02:00abstraction, which says that part of your perspective as an engineer, as a software engineer, but as an engineer0:02:09in general is that you can gain control of complexity by inventing new languages sometimes.0:02:18See, one way to think about computer programming is that it only incidentally has to do with getting a computer to do something. Primarily what a computer program has to do with, it's a0:02:29way of expressing ideas with communicating ideas. And sometimes when you want to communicate new kinds of ideas, you'd like to invent new modes of expressing that.0:02:39Well, today we're going to apply this framework to build a new language. See, once we have the basic idea of the interpreter, you0:02:48can pretty much go build any language that you like. So for example, we can go off and build Pascal. And gee, we would worry about syntax and parsing and various0:02:58kinds of compiler optimizations, and there are people who make honest livings doing that, but at the level of abstraction that we're talking, a Pascal interpreter0:03:09would not look very different at all from what you saw Gerry do last time. Instead of that, we'll spend today building a really0:03:18different language, a language that encourages you to think about programming not in terms of procedures, but in a really different way.0:03:29And the lecture today is going to be at two levels simultaneously. On the one hand, I'm going to show you what this language looks like, and on the other hand, I'll show you how it's0:03:40implemented. And we'll build an implementation in LISP and see how that works. And you should be drawing lessons on two levels. The first is to realize just how different a0:03:52language can be. So if you think that the jump from Fortran to LISP is a big deal, you haven't seen anything yet.0:04:01And secondly, you'll see that even with such a very different language, which will turn out to not have procedures at all and not talk about functions at all, there0:04:12will still be this basic cycle of eval and apply that's unwinds the means of combination and the means an abstraction. And then thirdly, as kind of a minor but elegant technical0:04:24point, you'll see a nice use of streams to avoid backtracking. OK, well, I said that this language is very different.0:04:35To explain that, let's go back to the very first idea that we talked about in this course, and that was the idea of the0:04:44distinction between the declarative knowledge of mathematics-- the definition of a square root as a mathematical truth--0:04:55and the idea that computer science talks about the how to knowledge-- contrast that definition of square root with a program to compute a square root.0:05:05That's where we started off. Well, wouldn't it be great if you could somehow bridge this gap and make a programming language which sort of did0:05:16things, but you talked about it in terms of truth, in declarative terms? So that would be a programming language in which you specify facts.0:05:27You tell it what is. You say what is true. And then when you want an answer, somehow the language has built into it automatically general kinds of0:05:38how to knowledge so it can just take your facts and it can evolve these methods on its on using the facts you gave it and maybe some general rules of logic.0:05:49So for instance, I might go up to this program and start telling it some things. So I might tell it that the son of Adam is Abel.0:06:08And I might tell it that the son of Adam is Cain.0:06:17And I might tell it that the son of Cain is Enoch.0:06:27And I might tell it that the son of Enoch is Irad, and all0:06:37through the rest of our chapter whatever of Genesis, which ends up ending in Adah, by the way, and this shows the genealogy of Adah from Cain.0:06:48Anyway, once you tell it these facts, you might ask it things. You might go up to your language and say, who's the0:06:58son of Adam? And you can very easily imagine having a little general purpose search program which would be able to go through and in response to that say, oh yeah, there are0:07:08two answers: the son of Adam is Abel and the son of Adam is Cain. Or you might say, based on the very same facts, who is Cain0:07:19the son of? And then you can imagine generating another slightly different search program which would be able to go through0:07:29and checked for who is Cain, and son of, and come up with Adam. Or you might say, what's the relationship0:07:40between Cain and Enoch? And again, a minor variant on that search program. You could figure out that it said son of.0:07:52But even here in this very simple example, what you see is that a single fact, see, a single fact like the son of Adam is Cain can be used to answer0:08:04different kinds of questions. You can say, who's the son of, or you can say who's the son of Adam, or you can say what's the relation between Adam and Cain? Those are different questions being run by different0:08:17traditional procedures all based on the same fact. And that's going to be the essence of the power of this programming style, that one piece of declarative knowledge0:08:30can be used as the basis for a lot of different kinds of how-to knowledge, as opposed to the kinds of procedures we're writing where you sort of tell it what input you're0:08:39giving it and what answer you want. So for instance, our square root program can perfectly well answer the question, what's the square root of 144?0:08:48But in principle, the mathematical definition of square root tells you other things. Like it could say, what is 17 the square root of?0:08:57And that would be have to be answered by a different program. So the mathematical definition, or in general, the facts that you give it are somehow unbiased as to what0:09:09the question is. Whereas the programs we tend to write specifically because they are how-to knowledge tend to be looking for a specific answer. So that's going to be one characteristic of what we're0:09:19talking about. We can go on. We can imagine that we've given our language some sort of facts. Now let's give it some rules of inference.0:09:30We can say, for instance, if the-- make up some syntax here-- if the son of x is y--0:09:41I'll put question marks to indicate variables here-- if the son of x is y and the son of y is z, then the0:10:01grandson of x is z. So I can imagine telling my machine that rule and then0:10:15being able to say, for instance, who's the grandson of Adam? Or who is Irad the grandson of?0:10:24Or deduce all grandson relationships you possibly can from this information. We can imagine somehow the language knowing how to do0:10:34that automatically. Let me give you maybe a little bit more concrete example.0:10:49Here's a procedure that merges two sorted lists. So x and y are two, say, lists of numbers, lists of distinct0:11:01numbers, if you like, that are in increasing order. And what merge does is take two such lists and combine them into a list where everything's in increasing0:11:10order, and this is a pretty easy programs that you ought to be able to write. It says, if x is empty, the answer is y. If y is empty, the answer is x.0:11:21Otherwise, you compare the first two elements. So you pick out the first thing in x and the first thing in y, and then depending on which of those first elements0:11:31is less, you stick the lower one on to the result a recursively merging, either chopping the first one off x0:11:40or chopping the first one off y. That's a standard kind of program. Let's look at the logic. Let's forget about the program and look at the logic on which0:11:51that procedure is based. See, there's some logic which says, gee, if the first one is less, then we get the answer by sticking something onto the0:12:00result of recursively merging the rest. So let's try and be explicit about what that logic is that's making the program work. So here's one piece.0:12:10Here's the piece of the program which recursively chops down x if the first thing in x is smaller.0:12:19And if you want to be very explicit about what the logic is there, what's really going on is a deduction, which says, if you know that some list, that we'll call cdr of x, and0:12:31y merged to form z, and you know that a is less than the0:12:40first thing in y, then you know that if you put a onto the cdr of x, then that result and y merge to form a onto z.0:12:55And what that is, that's the underlying piece of logic-- I haven't written it as a program, I wrote it a sort of deduction that's underneath this particular clause that0:13:05says we can use the recursion there. And then similar, here's the other clause just to complete it.0:13:14The other clause is based on this piece of logic, which is almost the same and I won't go through it, and then there's the n cases where we tested for null, and that's based on the idea that for any x, x and the empty list merge to form0:13:26an x, or for any y, the empty list and y merge to form y. OK, so there's a piece of procedure and the logic on0:13:39which it's based. And notice a big difference. The procedure looked like this: it0:13:51said there was a box-- and all the things we've been doing have the characteristic we have boxes and things going in and things going out-- there was this box called merge, and in came an x and y,0:14:04and out came an answer. That's the character of the procedure that we wrote.0:14:13These rules don't look like that. These rules talk about a relation. There's some sort of relation that in those slides I called0:14:23mrege-to-form. So I said x and y merge to form z, and somehow this is a function.0:14:32Right? The answer is a function of x and y, and here what I have is a relation between three things. And I'm not going to specify which is the input and which0:14:43is the output. And the reason I want to say that is because in principle, we could use exactly those same logic rules to answer a lot of different questions.0:14:54So we can say, for instance-- imagine giving our machine those rules of logic. Not the program, the underlying rules of logic. Then it ought to be able to say--0:15:04we could ask it-- 1, 3, 7 and 2, 4, 8 merge to form what?0:15:20And that's a question it ought to be able to answer. That's exactly the same question that our list procedure answered. But the exact same rules should also be able to answer0:15:33a question like this: 1, 3, 7 and what merged to form 1, 2, 3, 4, 7, 8?0:15:45The same rules of logic can answer this, although the procedure we wrote can't answer that question. Or we might be able to say what and what0:15:56else merge to form--0:16:07what and what else merge to form 1, 2, 3, 4, 7, 8? And the thing should be able to go through, if it really0:16:16can apply that logic, and deduce all, whatever is, 2 to the sixth answers to that question.0:16:25It could be 1 and the rest, or it could be 1, 2 and the rest. Or it could be 1 and 3 and 7 and the rest. There's a whole bunch of answers. And in principle, the logic should be0:16:36enough to deduce that. So there are going to be two big differences in the kind of program we're going to look at and not only list, but0:16:48essentially all the programming you've probably done so far in pretty much any language you can think of. The first is, we're not going to be computing functions.0:17:00We're not going to be talking about things that take input and output. We're going to be talking about relations. And that means in principle, these relations don't have0:17:09directionality. So the knowledge that you specify to answer this question, that same knowledge should also allow you to0:17:19answer these other questions and conversely. And the second issue is that since we're talking about0:17:30relations, these relations don't necessarily have one answer. So that third question down there doesn't have a particular answer, it has a whole bunch of answers.0:17:42Well, that's where we're going. This style of programming, by the way, is called logic programming, for kind of obvious reasons.0:17:56And people who do logic programming say that-- they have this little phrase-- they say the point of logic programming is that you use logic to express what is true,0:18:10you use logic to check whether something is true, and you use logic to find out what is true.0:18:19The best known logic programming language, as you probably know, is called Prolog. The language that we're going to implement this morning is0:18:31something we call the query language, and it essentially has the essence of prologue. It can do about the same stuff, although it's a lot slower because we're going to implement it in LISP rather0:18:42than building a particular compiler. We're going to interpret it on top of the LISP interpreter. But other than that, it can do about the same stuff as prolog. It has about the same power and about the same0:18:52limitations. All right, let's break for question. STUDENT: Yes, could you please repeat what the three things0:19:04you use logic programming to find? In other words, to find what is true, learn what is true-- what is the? PROFESSOR: Right. Sort of a logic programmer's little catechism.0:19:15You use logic to express what is true, like these rules. You use logic to check whether something is true, and that's0:19:26the kind of question I didn't answer here. I might say-- another question I could put down here is to say, is it true that 1, 3, 7 and 2, 4, 8 merge to form 1, 2, 6, 10 And0:19:41that same logic should be enough to say no. So I use logic to check what is true, and then you also use logic to find out what's true.0:20:04All right. Let's break. [MUSIC PLAYING BY J.S. BACH]0:20:22[MUSIC ENDS]0:20:47[MUSIC PLAYING BY J.S. BACH]0:21:02PROFESSOR: OK, let's go ahead and take a look at this query language and operation. The first thing you might notice, when I put up that0:21:12little biblical database, is that it's nice to be able to ask this language questions in relation to some collection of facts.0:21:21So let's start off and make a little collection of facts. This is a tiny fragment of personnel records for a Boston0:21:31high tech company, and here's a piece of the personnel records of Ben Bitdiddle. And Ben Bitdiddle is the computer wizard in this0:21:41company, he's the underpaid computer wizard in this company. His supervisor is all Oliver Warbucks, and here's his address.0:21:52So the format is we're giving this information: job, salary, supervisor, address. And there are some other conventions. Computer here means that Ben works in the computer0:22:01division, and his position in the computer division is wizard. Here's somebody else. Alyssa, Alyssa P. Hacker is a computer programmer, and she0:22:13works for Ben, and she lives in Cambridge. And there's another programmer who works for Ben who's Lem E. Tweakit.0:22:22And there's a programmer trainee, who is Louis Reasoner, and he works for Alyssa. And the big wheel of the company doesn't work for0:22:34anybody, right? That's Oliver Warbucks. Anyway, what we're going to do is ask questions about that0:22:43little world. And that'll be a sample world that we're going to do logic in. Let me just write up here, for probably the last time, what I0:22:55said is the very most important thing you should get out of this course, and that is, when somebody tells you about a language, you say, fine-- what are the primitives, what are the means of combination,0:23:15how do you put the primitives together, and then how do you abstract them, how do you abstract the compound pieces0:23:24so you can use them as pieces to make something more complicated? And we've said this a whole bunch of times already, but it's worth saying again.0:23:36Let's start. The primitives. Well, there's really only one primitive, and the primitive in this language is called a query. A primitive query.0:23:46Let's look at some primitive queries. Job x. Who is a computer programmer?0:23:55Or find every fact in the database that matches job of0:24:04the x is computer programmer. And you see a little syntax here. Things without question marks are meant to be literal, question mark x means that's a variable, and this thing will0:24:13match, for example, the fact that Alyssa P. Hacker is a computer programmer, or x is Alyssa P. Hacker.0:24:26Or more generally, I could have something with two variables in it. I could say, the job of x is computer something, and0:24:39that'll match computer wizard. So there's something here: type will match wizard, or type will match programmer, or x might match0:24:49various certain things. So there are, in our little example, only three facts in that database that match that query.0:24:59Let's see, just to show you some syntax, the same query, this query doesn't match the job of x, doesn't match Lewis0:25:11Reasoner, the reason for that is when I write something here, what I mean is that this is going to be a list of two symbols, of which the first is the word computer, and the0:25:22second can be anything. And Lewis's job description here has three symbols, so it doesn't match. And just to show you a little bit of syntax, the more0:25:35general thing I might want to type is a thing with a dot here, and this is just standard this notation for saying, this is a list, of which the first element is the0:25:46word computers, and THE REST, is something that I'll call type. So this one would match.0:25:56Lewis's job is computer programmer trainee, and type here would be the cdr of this list. It would be the list programmer trainee.0:26:06And that kind of dot processing is done automatically by the LISP reader.0:26:15Well, let's actually try this. The idea is I'm going to type in queries in this language, and answers will come out. Let's look at this.0:26:25I can go up and say, who works in the computer division? Job of x is computer dot y.0:26:39Doesn't matter what I call the dummy variables. It says the answers to that, and it's found four answers.0:26:48Or I can go off and say, tell me about everybody's supervisor. So I'll put in the query, the primitive query, the supervisor of x is y.0:27:02There are all the supervisor relationships I know. Or I could go type in, who lives in Cambridge? So I can say, the address of x is Cambridge dot anything.0:27:25And only one person lives in Cambridge. OK, so those are primitive queries. And you see what happens to basic interaction with the0:27:34system is you type in a query, and it types out all possible answers. Or another way to say that: it finds out all the possible0:27:43values of those variables x and y or t or whatever I've called them, and it types out all ways of taking that query and instantiating it--0:27:53remember that from the rule system lecture-- instantiates the query with all possible values for those variables and then types out all of them. And there are a lot of ways you can0:28:02arrange a logic language. Prolog, for instance, does something slightly different. Rather than typing back your query, prolog would type out, x equals this and y equals that, or x sequels this and y0:28:12equals that. And that's a very surface level thing, you can decide what you like. OK. All right.0:28:21So the primitives in this language? Only one, right? Primitive query.0:28:31OK. Means of combination. Let's look at some compound queries in this language. Here's one.0:28:41This one says, tell me all the people who work in the computer division. Tell me all the people who work in the computer division0:28:52together with their supervisors. The way I write that is the query is and. And the job of the x is computer something or other.0:29:04And job of x is computer dot y. And the supervisor of x is z. Tell me all the people in the computer division--0:29:13that's this-- together with their supervisors. And notice in this query I have three variables-- x, y, and z.0:29:23And this x is supposed to be the same as that x. So x works in the computer division, and the supervisor of x is z.0:29:34Let's try another one. So one means of combination is and. Who are all the people who make more than $30,000?0:29:45And the salary of some person p is some amount a.0:29:54And when I go and look at a, a is greater than $30,000. And LISP value here is a little piece of interface that0:30:06interfaces the query language to the underlying LISP. And what the LISP value allows you to do is call any LISP predicate inside a query.0:30:17So here I'm using the LISP predicate greater than, so I say LISP value. This I say and. So all the people whose salary is greater than $30,000.0:30:28Or here's a more complicated one. Tell me all the people who work in the computer division who do not have a supervisor who works in0:30:38the computer division. and x works in the computer division. The job of x is computer dot y.0:30:47And it's not the case that both x has a supervisor z and the job of z is computer something or other.0:30:59All right, so again, this x has got to be that x, and this z is going to be that z.0:31:09And then you see another means a combination, not. All right, well, let's look at that.0:31:20It works the same way. I can go up to the machine and say and the job of the x is0:31:33computer dot y. And the supervisor of x is z.0:31:46And I typed that in like a query. And what it types back, what you see are the queries I0:31:55typed in instantiated by all possible answers. And then you see there are a lot of answers. All right. So the means of combination in this language--0:32:05and this is why it's called a logic language-- are logical operations. Means of combinations are things like AND and NOT and0:32:16there's one I didn't show you, which is OR. And then I showed you LISP value, which is not logic, of course, but is a little special hack to interface that0:32:26to LISP so you can get more power. Those are the means of combination. OK, the means of abstraction. What we'd like to do--0:32:38let's go back for second and look at that last slide. We might like to take very complicated thing, the idea that someone works in a division but does not have a0:32:48supervisor in the division. And as before, name that. Well, if someone works in a division and does not have a0:32:58supervisor who works in that division, that means that person is a big shot. So let's make a rule that somebody x is a big shot in0:33:08some department if x works in the department and it's not the case that x has a supervisor who works in the0:33:19department. So this is our means of abstraction. This is a rule. And a rule has three parts.0:33:30The thing that says it's a rule. And then there's the conclusion of the rule. And then there's the body of the rule.0:33:40And you can read this as a piece of logic which says, if you know that the body of the rule is true, then you can conclude that the conclusion is true.0:33:49Or in order to deduce that x is a big shot in some department, it's enough to verify that. So that's what rules look like.0:34:03Let's go back and look at that merge example that I did before the break. Let's look at how that would look in terms of rules. I'm going to take the logic I put up and just change it into0:34:14a bunch of rules in this format. We have a rule. Remember, there was this thing merge-to-form. There is a rule that says, the empty list and y0:34:28merge to form y. This is the rule conclusion. And notice this particular rule has no body. And in this language, a rule with no body is something that0:34:40is always true. You can always assume that's true. And there was another piece of logic that said anything in the empty list merged to form the anything.0:34:49That's this. A rule y and the empty list merge to form y. Those corresponded to the two end cases in our merge0:34:58procedure, but now we're talking about logic, not about procedures. Then we had another rule, which said if you know how0:35:07shorter things merge, you can put them together. So this says, if you have a list x and y and z, and if you want to deduce that a dot x-- this means constant a onto x,0:35:19or a list whose first thing is a and whose rest is x-- so if you want to deduce that a dot x and b dot y merge to form b dot c--0:35:30that would say you merge these two lists a x and b y and you're going to get something that starts with b-- you can deduce that if you know that it's the case both0:35:41that a dot x and y merge to form z and a is larger than b. So when I merge them, b will come first in the list. That's0:35:52a little translation of the logic rule that I wrote in pseudo-English before. And then just for completeness, here's the other case.0:36:03a dot x and b dot y merge to form a dot z if x and b dot y merged to form z and b is larger than a.0:36:12So that's a little program that I've typed in in this language, and now let's look at it run.0:36:21So I typed in the merge rules before, and I could use this like a procedure. I could say merge to form 1 and 3 and 2 and 7.0:36:39So here I'm using it like the LISP procedure. Now it's going to think about that for a while and apply these rules.0:36:50So it found an answer. Now it's going to see if there are any other answers but it doesn't know a priori there's only one answer. So it's sitting here checking all possibilities, and it0:37:00says, no more. Done. So there I've used those rules like a procedure. Or remember the whole point is that I can ask different kinds of questions.0:37:10I could say merge to form, let's see, how about 2 and a.0:37:24Some list of two elements which I know starts with 2, and the other thing I don't know, and x and some other0:37:34list merge to form a 1, 2, 3 and 4. So now it's going to think about that.0:37:44It's got to find-- so it found one possibility. It said a could be 3, and x could be the list 1, 4.0:37:53And now, again, it's got to check because it doesn't a priori know that there aren't any other possibilities going on.0:38:03Or like I said, I could say something like merge to form, like, what and what else merge to form 1, 2, 3, 4, 5?0:38:24Now it's going to think about that. And there are a lot of answers that it might get.0:38:35And what you see is here you're really paying the price of slowness. And kind of for three reasons. One is that this language is doubly interpreted.0:38:47Whereas in a real implementation, you would go compile this down to primitive operations. The other reason is that this particular algorithm for0:38:56merges is doubly recursive. So it's going to take a very long time. And eventually, this is going to go through and find--0:39:06find what? Two to the fifth possible answers. And you see they come out in some fairly arbitrary order, depending on which order it's going to be0:39:17trying these rules. In fact, what we're going to do when they edit the videotape is speed all this up. Don't you like taking out these weights?0:39:26And don't you wish you could do that in your demos? Anyway, it's still grinding there.0:39:39Anyway, there are 32 possibilities-- we won't wait for it to print out all of them. OK, so the needs of abstraction in this0:39:49language are rules. So we take some bunch of things that are put together with logic and we name them.0:40:00And you can think of that as naming a particular pattern of logic. Or you can think of that as saying, if you want to deduce some conclusion, you can apply those rules of logic.0:40:10And those are three elements of this language. Let's break now, and then we'll talk about how it's actually implemented.0:40:22STUDENT: Does using LISP value primitive or whatever interfere with your means to go both directions on a query?0:40:31PROFESSOR: OK, that's a-- the question is, does using LISP value interfere with the ability to go both directions on the query?0:40:40We haven't really talked about the implementation yet, but the answer is, yes, it can. In general, as we'll see at the end--0:40:50although I really won't to go into details-- it's fairly complicated, especially when you use either not or LISP value--0:40:59or actually, if you use anything besides only and, it becomes very complicated to say when these things will work.0:41:08They won't work quite in all situations. I'll talk about that at the end of the second half today. But the answer to your question is, yes, by dragging0:41:17in a lot more power from LISP value, you lose some of the principal power of logic programming. That's a trade-off that you have to make.0:41:28OK, let's take a break.

`0:00:00`Lecture 8B | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:18PROFESSOR: All right, well, we've seen how the query language works. Now, let's talk about how it's implemented. You already pretty much can guess what's going on there.0:00:29At the bottom of it, there's a pattern matcher. And we looked at a pattern matcher when we did the rule-based control language.0:00:38Just to remind you, here are some sample patterns. This is a pattern that will match any list of three things of which the first is a and the second is c and the middle0:00:48one can be anything. So in this little pattern-matching syntax, there's only one distinction you make. There's either literal things or variables, and variables0:00:57begin with question mark. So this matches any list of three things of which the first is a and the second is c.0:01:06This one matches any list of three things of which the first is the symbol job. The second can be anything. And the third is a list of two things of which the first is0:01:16the symbol computer and the second can be anything. And this one, this next one matches any list of three0:01:25things, and the only difference is, here, the third list, the first is the symbol computer, and then there's some rest of the list. So this means two elements and this0:01:36means arbitrary number. And our language implementation isn't even going to have to worry about implementing this dot because that's automatically done by Lisp's reader.0:01:48Remember matchers also have some consistency in them. This match is a list of three things of which the first is a. And the second and third can be anything, but they have to be the same thing.0:01:57They're both called x. And this matches a list of four things of which the first is the fourth and the second is the same as the third. And this last one matches any list that begins with a.0:02:09The first thing is a, and the rest can be anything. So that's just a review of pattern matcher syntax that you've already seen.0:02:18And remember, that's implemented by some procedure called match. And match takes a pattern and some data and a dictionary.0:02:43And match asks the question is there any way to match this pattern against this data object subject to the bindings0:02:55that are already in this dictionary? So, for instance, if we're going to match the pattern x, y, y, x against the data a, b, b, a subject to a dictionary,0:03:18that says x equals a. Then the matcher would say, yes, that's consistent. These match, and it's consistent with what's in the0:03:28dictionary to say that x equals a. And the result of the match is the extended dictionary that says x equals a and y equals b.0:03:39So a matcher takes in pattern data dictionary, puts out an extended dictionary if it matches, or if it doesn't match, says that it fails. So, for example, if I use the same pattern here, if I say0:03:51this x, y, y, x match a, b, b, a with the dictionary y equals0:04:02a, then the matcher would put out fail.0:04:12Well, you've already seen the code for a pattern matcher so I'm not going to go over it, but it's the same thing we've been doing before.0:04:21You saw that in the system on rule-based control. It's essentially the same matcher. In fact, I think the syntax is a little bit simpler because we're not worrying about arbitrary constants and0:04:30expressions and things. There's just variables and constants. OK, well, given that, what's a primitive query?0:04:42Primitive query is going to be a rather complicated thing. It's going to be-- let's think about the query job of x is d dot y.0:05:06That's a query we might type in. That's going to be implemented in the system. We'll think of it as this little box.0:05:15Here's the primitive query. What this little box is going to do is take in two streams0:05:32and put out a stream. So the shape of a primitive query is that it's a thing where two streams come in and one stream goes out.0:05:41What these streams are going to be is down here is the database.0:05:51So we imagine all the things in the database sort of sitting there in a stream and this thing sucks on them.0:06:00So what are some things that might be in the database? Oh, job of Alyssa is something and some0:06:22other job is something. So imagine all of the facts in the database sitting there in the stream.0:06:32That's what comes in here. What comes in here is a stream of dictionaries. So one particular dictionary might say y equals programmer.0:06:55Now, what the query does when it gets in a dictionary from this stream, it finds all possible ways of matching the0:07:06query against whatever is coming in from the database. It looks at the query as a pattern, matches it against0:07:15any fact from the database or all possible ways of finding and matching the database with respect to this dictionary0:07:24that's coming in. So for each fact in the database, it calls the matcher using the pattern, fact, and dictionary.0:07:35And every time it gets a good match, it puts out the extended dictionary. So, for example, if this one comes in and it finds a match,0:07:44out will come a dictionary that in this case will have y equals programmer and x equals something.0:07:56y is programmer, x is something, and d is whatever it found. And that's all. And, of course, it's going to try this for every fact in the0:08:07dictionary. So it might find lots of them. It might find another one that says y equals programmer and x equals, and d equals.0:08:20So for one frame coming in, it might put out-- for one dictionary coming in, it might put out a lot of dictionaries, or it might put out none.0:08:30It might have something that wouldn't match like x equals FOO.0:08:39This one might not match anything in which case nothing will go into this stream corresponding to this frame. Or what you might do is put in an empty frame, and an empty0:08:53frame says try matching all ways-- find all possible ways of matching the query against0:09:02something in the database subject to no previous restrictions. And if you think about what that means, that's just the computation that's done when you type in a query right off.0:09:13It tries to find all matches. So a primitive query sets up this mechanism. And what the language does, when you type in the query at0:09:23the top level, it takes this mechanism, feeds in one single empty dictionary, and then for each thing that comes out0:09:33takes the original query and instantiates the result with all the different dictionaries, producing a new stream of instantiated patterns here.0:09:44And that's what gets printed on the terminal. That's the basic mechanism going on there.0:09:53Well, why is that so complicated? You probably can think of a lot simpler ways to arrange this match for a primitive query rather than having all0:10:03of these streams floating around. And the answer is-- you probably guess already. The answer is this thing extends elegantly to implement0:10:15the means of combination. So, for instance, suppose I don't only want to do this. I don't want to say who to be everybody's job description.0:10:27Suppose I want to say AND the job of x is d dot y and the0:10:39supervisor of x is z.0:10:48Now, supervisor of x is z is going to be another primitive query that has the same shape to take in a stream of data0:10:57objects, a stream of initial dictionaries, which are the restrictions to try and use when you match, and it's going to put out a stream of dictionaries.0:11:08So that's what this primitive query looks like. And how do I implement the AND? Well, it's simple. I just hook them together. I take the output of this one, and I put that to the0:11:17input of that one. And I take the dictionary here and I fan it out.0:11:26And then you see how that's going to work, because what's going to happen is a frame will now come in here, which has a binding for x, y, and d.0:11:37And then when this one gets it, it'll say, oh, gee, subject to these restrictions, which now already have values in the dictionary for y and x and d, it looks in the0:11:52database and says, gee, can I find any supervisor facts? And if it finds any, out will come dictionaries which have bindings for y and x and d and z now.0:12:12And then notice that because the frames coming in here have these restrictions, that's the thing that assures that when you do the AND, this x will mean the same thing as that x.0:12:26Because by the time something comes floating in here, x has a value that you have to match against consistently. And then you remember from the code from the matcher, there0:12:36was something in the way the matcher did dictionaries that arrange consistent matches. So there's AND. The important point to notice is the general shape.0:12:48Look at what happened: the AND of two queries, say, P and Q. Here's P and Q. The AND of two queries, well,0:13:00it looks like this. Each query takes in a stream from the database, a stream of inputs, and puts out a stream of outputs.0:13:10And the important point to notice is that if I draw a box around this thing and say this is AND of P and Q, then that0:13:26box has exactly the same overall shape. It's something that takes in a stream from the database. Here it's going to get fanned out inside, but from the0:13:37outside you don't see that. It takes an input stream and puts out an output stream. So this is AND. And then similarly, OR would look like this.0:13:46OR would-- although I didn't show you examples of OR. OR would say can I find all ways of matching P or Q. So I0:13:55have P and Q. Each will have their shape.0:14:04And the way OR is implemented is I'll take my database stream. I'll fan it out.0:14:13I'll put one into P and one into Q. I'll take my initial query stream coming in and fan it out.0:14:26So I'll look at all the answers I might get from P and all the answers I might get from Q, and I'll put them through some sort of thing that appends them or merges0:14:35the result into one stream, and that's what will come out. And this whole thing from the outside is OR.0:14:52And again, you see it has the same overall shape when looked at from the outside.0:15:01What's NOT? NOT works kind of the same way. If I have some query P, I take the primitive query for P.0:15:14Here, I'm going to implement NOT P. And NOT's just going to act as a filter. I'll take in the database and my original stream of0:15:27dictionaries coming in, and what NOT P will do is it will filter these guys.0:15:39And the way it will filter it, it will say when I get in a dictionary here, I'll find all the matches, and if I find any, I'll throw it away. And if I don't find any matches to something coming in0:15:49here, I'll just pass that through, so NOT is a pure filter. So AND is-- think of these sort of electoral0:15:59resistors or something. AND is series combination and OR is parallel combination. And then NOT is not going to extend any dictionaries at all. It's just going to filter it.0:16:08It's going to throw away the ones for which it finds a way to match. And list value is sort of the same way. The filter's a little more complicated. It applies to predicate.0:16:19The major point to notice here, and it's a major point we've looked at before, is this idea of closure.0:16:28The things that we build as a means of combination have the same overall structure as the primitive things that we're combining.0:16:39So the AND of two things when looked at from the outside has the same shape. And what that means is that this box here could be an AND0:16:48or an OR or a NOT or something because it has the same shape to interface to the larger things. It's the same thing that allowed us to get complexity0:16:57in the Escher picture language or allows you to immediately build up these complicated structures just out of pairs. It's closure.0:17:06And that's the thing that allowed me to do what by now you took for granted when I said, gee, there's a query which is AND of job and salary, and I said, oh,0:17:15there's another one, which is AND of job, a NOT of something. The fact that I can do that is a direct consequence of this closure principle.0:17:25OK, let's break and then we'll go on. AUDIENCE: Where does the dictionary come from? PROFESSOR: The dictionary comes initially from0:17:35what you type in. So when you start this up, the first thing it does is set up this whole structure. It puts in one empty dictionary.0:17:45And if all you have is one primitive query, then what will come out is a bunch of dictionaries with things filled in. The general situation that I have here is when this is in0:17:55the middle of some nest of combined things. Let's look at the picture over here. This supervisor query gets in some dictionary.0:18:06Where did this one come from? This dictionary came from the fact that I'm looking at the output of this primitive query.0:18:16So maybe to be very specific, if I literally typed in just this query at the top level, this AND, what would actually happen is it would build this structure and start up this0:18:26whole thing with one empty dictionary. And now this one would process, and a whole bunch of dictionaries would come out with x, y's and d's in them.0:18:38Run it through this one. So now that's the input to this one. This one would now put out some other stuff. And if this itself were buried in some larger thing, like an0:18:50OR of something, then that would go feed into the next one. So you initially get only one empty dictionary when you0:19:00start it, but as you're in the middle of processing these compounds things, that's where these cascades of dictionaries start getting generated. AUDIENCE: Dictionaries only come about as a result of0:19:11using the queries? Or do they become-- do they stay someplace in space like the database does?0:19:23Are these temporary items? PROFESSOR: They're created temporarily in the matcher. Really, they're someplace in storage. Initially, someone creates a thing called the empty0:19:32dictionary that gets initially fed to this match procedure, and then the match procedure builds some dictionaries, and they get passed on and on. AUDIENCE: OK, so they'll go way after the match?0:19:43PROFESSOR: They'll go away when no one needs them again, yeah. AUDIENCE: It appears that the AND performs some redundant0:19:54searches of the database. If the first clause matched, let's say, the third element and not on the first two elements, the second clause is going to look at those first two elements again, discarding0:20:04them because they don't match. The match is already in the dictionary. Would it makes sense to carry the data element from the database along with the dictionary?0:20:17PROFESSOR: Well, in general, there are other ways to arrange this search, and there's some analysis that you can do. I think there's a problem in the book, which talks about a different way that you can cascade AND to eliminate0:20:27various kinds of redundancies. This one is meant to be-- was mainly meant to be very simple so you can see how they fit together. But you're quite right. There are redundancies here that you can get rid of.0:20:38That's another reason why this language is somewhat slow. There are a lot smarter things you can do. We're just trying to show you a very simple, in principle, implementation.0:20:51AUDIENCE: Did you model this language on Prolog, or did it just come out looking like Prolog?0:21:04PROFESSOR: Well, Jerry insulted a whole bunch of people yesterday, so I might as well say that the MIT attitude towards Prolog is something that people did in about 1971 and decided that it wasn't really the right thing0:21:15and stopped. So we modeled this on the sort of natural way that this thing was done in about 1971, except at that point, we didn't do it0:21:26with streams. After we were using it for about six months, we discovered that it had all these problems, some of which0:21:35I'll talk about later. And we said, gee, Prolog must have fixed those, and then we found out that it didn't. So this does about the same thing as Prolog. AUDIENCE: Does Prolog use streams?0:21:44PROFESSOR: No. In how it behaves, it behaves a lot like Prolog. Prolog uses a backtracking strategy.0:21:53But the other thing that's really good about Prolog that makes it a usable thing is that there's a really very, very well-engineered compiler technology that makes it run0:22:04fast. So although you saw the merge spitting out these answers very, very slowly, a real Prolog will run very,0:22:13very fast. Because even though it's sort of doing this, the real work that went into Prolog is a very, very excellent compiler effort.0:22:24Let's take a break.0:23:16We've looked at the primitive queries and the ways that streams are used to implement the means of combination: AND and OR and NOT.0:23:26Now, let go on to the means of abstraction. Remember, the means of abstraction in this language are rules.0:23:35So z is a boss in division d if there's some x who has a job in division d and z is the supervisor of x.0:23:48That's what it means for someone to be a boss. And in effect, if you think about what we're doing with relation to this, there's the query we wrote-- the job of x0:23:58is in d and the supervisor of x is z-- what we in effect want to do is take this whole mess and draw a box around it and say this whole thing inside the0:24:24box is boss of z in division d.0:24:33That's in effect what we want to do. So, for instance, if we've done that, and we want to0:24:45check whether or not it's true that Ben Bitdiddle is a boss in the computer division, so if I want to say boss of Ben0:25:00Bitdiddle in the computer division, imagine typing that in as query to the system, in effect what we want to do is0:25:10set up a dictionary here, which has z to Ben Bitdiddle0:25:28and d to computer.0:25:37Where did that dictionary come from? Let's look at the slide for one second. That dictionary came from matching the query that said boss of Ben Bitdiddle and computer onto the conclusion0:25:47of the rule: boss of z and d. So we match the query to the conclusion of the rule. That gives us a dictionary, and that's the thing that we0:26:00would now like to put into this whole big thing and process and see if anything comes out the other side. If anything comes out, it'll be true.0:26:11That's the basic idea. So in general, the way we implement a rule is we match the conclusion of the rule against something we might0:26:21want to check it's true. That match gives us a dictionary, and with respect to that dictionary, we process the body of the rule.0:26:36Well, that's really all there is, except for two technical points. The first technical point is that I might have said0:26:46something else. I might have said who's the boss in the computer division? So I might say boss of who in computer division.0:27:00And if I did that, what I would really like to do in effect is start up this dictionary with a match that0:27:09sort of says, well, d is computer and z is whatever who is.0:27:21And our matcher won't quite do that. That's not quite matching a pattern against data. It's matching two patterns and saying are they consistent or0:27:31not or what ways make them consistent. In other words, what we need is not quite a pattern matcher, but something a little bit more general called a unifier.0:27:44And a unifier is a slight generalization of a pattern matcher. What a unifier does is take two patterns and say what's0:27:55the most general thing you can substitute for the variables in those two patterns to make them satisfy the pattern0:28:04simultaneously? Let me give you an example. If I have the pattern two-element list, which is x0:28:13and x, so I have a two-element list where both elements are the same and otherwise I don't care what they are, and I unify that against the pattern that says there's a0:28:23two-element list, and the first one is a and something in c and the second one is a and b and z, then what the0:28:33unifier should tell me is, oh yeah, in that dictionary, x has to be a, b, c, and y has to be d and z has to be c.0:28:43Those are the restrictions I'd have to put on the values of x, y, and z to make these two unify, or in other words, to make this match x and make this match x.0:28:55The unifier should be able to deduce that. But the unifier may-- there are more complicated things. I might have said something a little bit more complicated. I might have said there's a list with two elements, and0:29:07they're both the same, and they should unify against something of this form. And the unifier should be able to deduce from that.0:29:16Like that y would have to be b. y would have to be b. Because these two are the same, so y's got to be b. And v here would have to be a.0:29:28And z and w can be anything, but they have to be the same thing. And x would have to be b, followed by a, followed by0:29:40whatever w is or whatever z is, which is the same. So you see, the unifier somehow has to deduce things to unify these patterns.0:29:50So you might think there's some kind of magic deduction going on, but there's not. A unifier is basically a very simple modification of a0:29:59pattern matcher. And if you look in the book, you'll see something like three or four lines of code added to the pattern matcher you just saw to handle the symmetric case.0:30:08Remember, the pattern matcher has a place where it says is this variable matching a constant. And if so, it checks in the dictionary. There's only one other clause in the unifier, which says is0:30:18this variable matching a variable, in which case you go look in the dictionary and see if that's consistent with what's in the dictionary.0:30:27So all the, quote, deduction that's in this language, if you sort of look at it, sort of sits in the rule applications, which, if you look at that, sits in the0:30:37unifier, which, if you look at that under a microscope, sits essentially in the pattern matcher. There's no magic at all going on in there.0:30:47And the, quote, deduction that you see is just the fact that there's this recursion, which is unwinding the matches bit by bit.0:30:56So it looks like this thing is being very clever, but in fact, it's not being very clever at all. There are cases where a unifier might have to be clever. Let me show you one more.0:31:11Suppose I want to unify a list of two elements, x and x, with a thing that says it's y followed by a dot y.0:31:24Now, if you think of what that would have to mean, it would have to mean that x had better be the same as y, but also x had better be the same as a list whose first element is a0:31:35and whose rest is y. And if you think about what that would have to mean, it would have to mean that y is the infinite list of a's.0:31:47In some sense, in order to do that unification, I have to solve the fixed-point equation cons of a to y is equal to y.0:32:04And in general, I wrote a very simple one. Really doing unification might have to solve an arbitrary fixed-point equation: f of y equals y.0:32:15And basically, you can't do that and make the thing finite all the time. So how does the logic language handle that?0:32:25The answer is it doesn't. It just punts. And there's a little check in the unifier, which says, oh, is this one of the hard cases which when I go to match0:32:35things would involve solving a fixed-point equation? And in this case, I will throw up my hands. And if that check were not in there, what would happen?0:32:47In most cases is that the unifier would just go into an infinite loop. And other logic programming languages work like that.0:32:56So there's really no magic. The easy case is done in a matcher. The hard case is not done at all. And that's about the state of this technology.0:33:12Let me just say again formally how rules work now that I talked about unifiers. So the official definition is that to apply a rule, we--0:33:25well, let's start using some words we've used before. Let's talk about sticking dictionaries into these big boxes of query things as evaluating these large queries0:33:40relative to an environment or a frame. So when you think of that dictionary, what's the dictionary after all? It's a bunch of meanings for symbols. That's what we've been calling frames or environments.0:33:51What does it mean to do some processing relevant to an environment? That's what we've been calling evaluation. So we can say the way that you apply a rule is to evaluate0:34:03the rule body relative to an environment that's formed by unifying the rule conclusion with the given query.0:34:13And the thing I want you to notice is the complete formal similarity to the net of circular evaluator or the substitution model. To apply a procedure, we evaluate the procedure body0:34:27relative to an environment that's formed by blinding the procedure parameters to the arguments. There's a complete formal similarity here between the0:34:36rules, rule application, and procedure application even though these things are very, very different. And again, you have the EVAL APPLY loop.0:34:47EVAL and APPLY. So in general, I might be processing some combined0:34:57expression that will turn into a rule application, which will generate some dictionaries or frames or environments-- whatever you want to call them-- from match, which will then be the input to some big compound thing like this.0:35:08This has pieces of it and may have other rule applications. And you have essentially the same cycle even though there's nothing here at all that looks like procedures.0:35:19It really has to do with the fact you've built a language whose means of combination and abstraction unwind in certain ways.0:35:28And then in general, what happens at the very top level, you might have rules in your database also, so things in0:35:37this database might be rules. There are ways to check that things are true. So it might come in here and have to do a rule check.0:35:46And then there's some control structure which says, well, you look at some rules, and you look at some data elements, and you look at some rules and data elements, and these fan out and out and out. So it becomes essentially impossible to say what order0:35:56it's looking at these things in, whether it's breadth first or depth first or anything. And it's even more impossible because the actual order is somehow buried in the delays of the streams. So what's very0:36:08hard to tell from this is the order in which it's scanned. But what's true, because you're looking at the stream view, is that all of them eventually get looked at.0:36:24Let me just mention one tiny technical problem.0:36:37Suppose I tried saying boss of y is computer, then a funny thing would happen. As I stuck a dictionary with y in here, I might get--0:36:53this y is not the same as that y, which was the other piece of somebody's job description. So if I really only did literally what I said, we'd0:37:04get some variable conflict problems. So I lied to you a little bit. Notice that problem is exactly a problem we've run into before.0:37:14It is precisely the need for local variables in a language. When I have the sum of squares, that x had better not be that x.0:37:24That's exactly the same as this y had better not be that y. And we know how to solve that.0:37:33That was this whole environment model, and we built chains of frames and all sorts of things like that. There's a much more brutal way to solve it. In the query language, we didn't even do that. We did something completely brutal.0:37:43We said every time you apply a rule, rename consistently all the variables in the rule to some new unique names that won't conflict with anything.0:37:55That's conceptually simpler, but really brutal and not particularly efficient. But notice, we could have gotten rid of all of our environment structures if we defined for procedures in Lisp0:38:08the same thing. If every time we applied a procedure and did the substitution model we renamed all the variables in the procedure, then we never would have had to worry about local variables because they would never arise.0:38:19OK, well, that would be inefficient, and it's inefficient here in the query language, too, but we did it to keep it simple. Let's break for questions.0:38:30AUDIENCE: When you started this section, you emphasized how powerful our APPLY EVAL model was that we could use it0:38:40for any language. And then you say we're going to have this language which is so different. It turns out that this language, as you just pointed out, is very much the same. I'm wondering if you're arguing that all languages end0:38:49up coming down to this you can apply a rule or apply a procedure or some kind of apply? PROFESSOR: I would say that pretty much any language where0:38:59you really are building up these means of combination and giving them simpler names and you're saying anything of the sort, like here's a general kind of expression, like how0:39:10to square something, almost anything that you would call a procedure. If that's got to have parts, you have to unwind those parts. You have to have some kind of organization which says when I0:39:20look at the abstract variables or tags or whatever you want to call them that might stand for particular things, you have to keep track of that, and that's going to be0:39:29something like an environment. And then if you say this part can have parts which I have to unwind, you've got to have something like this cycle.0:39:39And lots and lots of languages have that character when they sort of get put together in this way. This language again really is different because there's nothing like procedures on the outside.0:39:50When you go below the surface and you see the implementation, of course, it starts looking the same. But from the outside, it's a very different world view. You're not computing functions of inputs.0:40:03AUDIENCE: You mentioned earlier that when you build all of these rules in pattern matcher and with the delayed action of streams, you really have no way to know in what0:40:13order things are evaluated. PROFESSOR: Right. AUDIENCE: And that would indicate then that you should only express declarative knowledge that's true for all-time, no-time sequence built into it.0:40:23Otherwise, these things get all-- PROFESSOR: Yes. Yes. The question is this really is set up for doing declarative0:40:32knowledge, and as I presented it-- and I'll show you some of the ugly warts under this after the break. As I presented it, it's just doing logic.0:40:43And in principle, if it were logic, it wouldn't matter what order it's getting done. And it's quite true when you start doing things where you0:40:52have side effects like adding things to the database and taking things out, and we'll see some others, you use that kind of control.0:41:01So, for example, contrasting with Prolog. Say Prolog has various features where you really exploit the order of evaluation. And people write Prolog programs that way.0:41:11That turns out to be very complicated in Prolog, although if you're an expert Prolog programmer, you can do it. However, here I don't think you can do it at all.0:41:20It's very complicated because you really are giving up control over any prearranged order of trying things. AUDIENCE: Now, that would indicate then that you have a0:41:29functional mapping. And when you started out this lecture, you said that we express the declarative knowledge which is a relation, and we don't talk about the inputs and the outputs.0:41:41PROFESSOR: Well, there's a pun on functional, right? There's function in the sense of no side effects and not depending on what order is going on. And then there's functional in the sense of mathematical0:41:50function, which means input and output. And it's just that pun that you're making, I think. AUDIENCE: I'm a little unclear on what you're doing with these two statements, the two boss statements.0:42:01Is the first one building up the database and the second one a query or-- PROFESSOR: OK, I'm sorry.0:42:12What I meant here, if I type something like this in as a query-- I should have given an example way at the very beginning. If I type in job, Ben Bitdiddle, computer wizard,0:42:25what the processing will do is if it finds a match, it'll find a match to that exact thing, and it'll type out a job, Ben Bitdiddle, computer wizard.0:42:34If it doesn't find a match, it won't find anything. So what I should have said is the way you use the query language to check whether something is true, remember,0:42:43that's one of the things you want to do in logic programming, is you type in your query and either that comes out or it doesn't. So what I was trying to illustrate here, I wanted to0:42:52start with a very simple example before talking about unifiers. So what I should have said, if I just wanted to check whether this is true, I could type that in and see if anything0:43:02came out AUDIENCE: And then the second one-- PROFESSOR: The second one would be a real query. AUDIENCE: A real query, yeah. PROFESSOR: What would come out, see, it would go in here0:43:12say with FOO, and in would go frame that says z is bound to who and d is bound to computer. And this will pass through, and then by the time it got0:43:21out of here, who would pick up a binding. AUDIENCE: On the unifying thing there, I still am not0:43:31sure what happens with who and z. If the unifying-- the rule here says--0:43:42OK, so you say that you can't make question mark equal to question mark who. PROFESSOR: Right. That's what the matcher can't do. But what this will mean to a unifier is that there's an0:43:52environment with three variables. d here is computer. z is whatever who is.0:44:01So if later on in the matcher routine it said, for example, who has to be 3, then when I looked up in the dictionary,0:44:14it will say, oh, z is 3 because it's the same as who. And that's in some sense the only thing you need to do to extend the unifier to a matcher. AUDIENCE: OK, because it looked like when you were0:44:23telling how to unify it, it looked like you would put the things together in such a way that you'd actually solve and have a value for both of them. And what it looks like now is that you're actually pass a0:44:32dictionary with two variables and the variables are linked. PROFESSOR: Right. It only looks like you're solving for both of them because you're sort of looking at the whole solution at once. If you sort of watch the thing getting built up recursively,0:44:42it's merely this. AUDIENCE: OK, so you do pass off that dictionary with two variables? PROFESSOR: That's right. AUDIENCE: And link? PROFESSOR: Right. It just looks like an ordinary dictionary.0:44:54AUDIENCE: When you're talking about the unifier, is it that there are some cases or some points that you are not able to use by them?0:45:04PROFESSOR: Right. AUDIENCE: Can you just by building the rules or writing the forms know in advance if you are going to be able to0:45:15solve to get the unification or not? Can you add some properties either to the rules itself or to the formula that you're writing so that you avoid the0:45:26problem of not finding unification? PROFESSOR: I mean, you can agree, I think, to write in a fairly restricted way where you won't run into it.0:45:35See, because what you're getting-- see, the place where you get into problems is when you-- well, again, you're trying to match things like that against0:45:45things where these have structure, where a, y, b, y something.0:45:58So this is the kind of place where you're going to get into trouble. AUDIENCE: So you can do that syntactically? PROFESSOR: So you can kind of watch your rules in the kinds0:46:09of things that your writing. AUDIENCE: So that's the problem that the builder of the database has to be concerned? PROFESSOR: That's a problem.0:46:19It's a problem either-- not quite the builder of the database, the person who is expressing the rules, or the builder of the database. What the unifier actually does is you can check at the next0:46:29level down when you actually get to the unifier and you'll see in the code where it looks up in the dictionary. If it sort of says what does y have to be? Oh, does y have to be something that contains a y as0:46:40its expression? At that point, the unifier and say, oh my God, I'm trying to solve a fixed-point equation. I'll give it up here.0:46:49AUDIENCE: You make the distinction between the rules in the database. Are the rules added to the database? PROFESSOR: Yes. Yes, I should have said that.0:46:58One way to think about rules is that they're just other things in the database. So if you want to check the things that have to be checked in the database, they're kind of virtual facts that are in0:47:08the database. AUDIENCE: But in that explanation, you made the differentiation between database and the rules itself.0:47:18PROFESSOR: Yeah, I probably should not have done that. The only reason to do that is in terms of the implementation. When you look at the implementation, there's a part which says check either primitive assertions in the0:47:28database or check rules. And then the real reason why you can't tell what order things are going to come out in and is that the rules0:47:38database and the data database sort of get merged in a kind of delayed evaluation way. And so that's what makes the order very complicated.0:47:55OK, let's break.0:48:33We've just seen how the logic language works and how rules work. Now, let's turn to a more profound question. What do these things mean?0:48:43That brings us to the subtlest, most devious part of this whole query language business, and that is that it's not quite what it seems to be.0:48:53AND and OR and NOT and the logical implication of rules are not really the AND and OR and NOT and logical0:49:05implication of logic. Let me give you an example of that. Certainly, if we have two things in logic, it ought to be the case that AND of P and Q is the same as AND of Q and0:49:22P and that OR of P and Q is the same as OR of Q and P. But let's look here. Here's an example.0:49:32Let's talk about somebody outranking somebody else in our little database organization. We'll say s is outranked by b or if either the supervisor of0:49:47this is b or there's some middle manager here, that supervisor of s is m, and m is outranked by b.0:49:59So there's one way to define rule outranked by. Or we can write exactly the same thing, except at the bottom here, we reversed the order of these two clauses.0:50:11And certainly if this were logic, those ought to mean the same thing. However, in our particular implementation, if you say0:50:20something like who's outranked by Ben Bitdiddle, what you'll find is that this rule will work perfectly well and generate answers, whereas this rule will go0:50:31into an infinite loop. And the reason for that is that this will come in and say, oh, who's outranked by Ben Bitdiddle?0:50:41Find an s which is outranked by b, where b is Ben Bitdiddle, which is going to happen in it a subproblem.0:50:50Oh gee, find an m such as m is outranked by Ben Bitdiddle with no restrictions on m. So this will say in order to solve this problem, I solve0:51:01exactly the same problem. And then after I've solved that, I'll check for a supervisory relationship. Whereas this one won't get into that, because before it0:51:10tries to find this outranked by, it'll already have had a restriction on m here. So these two things which ought to mean the same, in0:51:21fact, one goes into an infinite loop. One does not. That's a very extreme case of a general thing that you'll0:51:30find in logic programming that if you start changing the order of the things in the ANDs or ORs, you'll find0:51:39tremendous differences in efficiency. And we just saw an infinitely big difference in efficiency and an infinite loop.0:51:49And there are similar things having to do with the order in which you enter rules. The order in which it happens to look at rules in the database may vastly change the efficiency with which it gets0:51:59out answers or, in fact, send it into an infinite loop for some orderings. And this whole thing has to do with the fact that you're0:52:08checking these rules in some order. And some rules may lead to really long paths of implication. Others might not. And you don't know a priori which ones are good and which0:52:18ones are bad. And there's a whole bunch of research having to do with that, mostly having to do with thinking about making parallel implementations of logic programming languages. And in some sense, what you'd like to do is check all rules0:52:29in parallel and whichever ones get answers, you bubble them up. And if some go down infinite deductive changed, well, you just-- you know, memory is cheap and processors are cheap, and you0:52:38just let them buzz for as for as long as you want. There's a deeper problem, though, in comparing this0:52:47logic language to real logic. The example I just showed you, it went into an infinite loop maybe, but at least it didn't give the wrong answer.0:52:58There's an actual deeper problem when we start comparing, seriously comparing this logic language with real0:53:07classical logic. So let's sort of review real classical logic. All humans are mortal.0:53:22That's pretty classical logic. Then maybe we'll continue in the very best classical tradition. We'll say all--0:53:31let's make it really classical. All Greeks are human, which has the syllogism that0:53:41Socrates is a Greek. And then what do you write here? I think three dots, classical logic.0:53:51Therefore, then the syllogism, Socrates is mortal.0:54:01So there's some real honest classical logic. Let's compare that with our classical logic database.0:54:12So here's a classical logic database. Socrates is a Greek. Plato is a Greek. Zeus is a Greek, and Zeus is a god.0:54:24And all humans are mortal. To show that something is mortal, it's enough to show that it's human.0:54:34All humans are fallible. And all Greeks are humans is not quite right. This says that all Greeks who are not gods are human.0:54:45So to show something's human, it's enough to show it's a Greek and not a god. And the address of any Greek god is Mount Olympus.0:54:54So there's a little classical logic database. And indeed, that would work fairly well. If we type that in and say is Socrates mortal or Socrates0:55:05fallible or mortal? It'll say yes. Is Plato mortal and fallible. It'll say yes. If we say is Zeus mortal? It won't find anything.0:55:14And it'll work perfectly well. However, suppose we want to extend this. Let's define what it means for someone to be a perfect being.0:55:25Let's say rule: a perfect being.0:55:34And I think this is right. If you're up on your medieval scholastic philosophy, I believe that perfect beings are ones who were neither mortal nor fallible.0:55:44AND NOT mortal x, NOT fallible x.0:55:59So we'll define this system to teach it what a perfect being is. And now what we're going to do is he ask for the address of0:56:09all the perfect beings. AND the address of x is y and x is perfect.0:56:23And so what we're generating here is the world's most exclusive mailing list. For the address of all the perfect0:56:32things, we might have typed this in. Or we might type in this. We'll say AND perfect of x and the address of x is y.0:56:52Well, suppose we type all that in and we try this query. This query is going to give us an answer. This query will say, yeah, Mount Olympus.0:57:04This query, in fact, is going to give us nothing. It will say no addresses of perfect beings. Now, why is that? Why is there a difference?0:57:14This is not an infinite loop question. This is a different answer question. The reason is that if you remember the implementation of NOT, NOT acted as a filter.0:57:25NOT said I'm going to take some possible dictionaries, some possible frames, some possible answers, and filter out the ones that happened to satisfy some condition, and0:57:35that's how I implement NOT. If you think about what's going on here, I'll build this query box where the output of an address piece gets fed into0:57:46a perfect piece. What will happen is the address piece will set up some things of everyone whose address I know.0:57:55Those will get filtered by the NOTs inside perfect here. So it will throw out the ones which happened to be either mortal or fallible.0:58:04In the other order what happens is I set this up, started up with an empty frame. The perfect in here doesn't find anything for the NOTs to filter, so nothing comes out here at all.0:58:18And there's sort of nothing there that gets fed into the address thing. So here, I don't get an answer. And again, the reason for that is NOT isn't generating anything.0:58:27NOT's only throwing out things. And if I never started up with anything, there's nothing for it to throw out. So out of this thing, I get the wrong answer.0:58:37How can you fix that? Well, there are ways to fix that. So you might say, well, that's sort of stupid. Why are you just doing all your NOT stuff at the beginning? The right way to implement NOT is to realize that when you0:58:48have conditions like NOT, you should generate all your answers first, and then with each of these dictionaries pass along until at the very end I'll do filtering.0:58:58And there are implementations of logic languages that work like that that solve this particular problem. However, there's a more profound problem, which is0:59:10which one of these is the right answer? Is it Mount Olympus or is it nothing? So you might say it's Mount Olympus, because after all,0:59:19Zeus is in that database, and Zeus was neither mortal nor fallible.0:59:29So you might say Zeus wants to satisfy NOT mortal Zeus or NOT0:59:43fallible Zeus. But let's actually look at that database. Let's look at it. There's no way-- how does it know that Zeus is not fallible?0:59:54There's nothing in there about that. What's in there is that humans are fallible. How does it know that Zeus is not mortal?1:00:04There's nothing in there about that. It just said I don't have any rule, which-- the only way I can deduce something's mortal is if it's1:00:13human, and that's all it really knows about mortal. And in fact, if you remember your classical mythology, you know that the Greek gods were not mortal but fallible.1:00:25So the answer is not in the rules there. See, why does it deduce that?1:00:34See, Socrates would certainly not have made this error of logic. What NOT needs in this language is not NOT.1:00:43It's not the NOT of logic. What NOT needs in this language is not deducible from things in the database as opposed to not true.1:00:55That's a very big difference. Subtle, but big. So, in fact, this is perfectly happy to say not anything that it doesn't know about.1:01:04So if you ask it is it not true that Zeus likes chocolate ice cream? It will say sure, it's not true. Or anything else or anything it doesn't know about. NOT means not deducible from the things you've told me.1:01:18In a world where you're identifying not deducible with, in fact, not true, this is called the closed world assumption.1:01:36The closed world assumption. Anything that I cannot deduce from what I know is not true, right?1:01:46If I don't know anything about x, the x isn't true. That's very dangerous. From a logical point of view, first of all, it doesn't really makes sense. Because if I don't know anything about x, I'm willing1:01:58to say not x. But am I willing to say not not x? Well, sure, I don't know anything about that either maybe. So not not x is not necessarily the same as x and1:02:09so on and so on and so on, so there's some sort of funny bias in there. So that's sort of funny. The second thing, if you start building up real reasoning1:02:22programs based on this, think how dangerous that is. You're saying I know I'm in a position to deduce everything1:02:33true that's relevant to this problem. I'm reasoning, and built into my reasoning mechanism is the assumption that anything that I don't know can't possibly be1:02:45relevant to this problem, right? There are a lot of big organizations that work like that, right?1:02:54Most corporate marketing divisions work like that. You know the consequences to that. So it's very dangerous to start really typing in these1:03:04big logical implication systems and going on what they say, because they have this really limiting assumption built in. So you have to be very, very careful about that.1:03:14And that's a deep problem. That's not a problem about we can make a little bit cleverer implementation and do the filters and organize the infinite loops to make them go away.1:03:23It's a different kind of problem. It's a different semantics. So I think to wrap this up, it's fair to say that logic programming I think is a terrifically exciting idea,1:03:34the idea that you can bridge this gap from the imperative to the declarative, that you can start talking about relations and really get tremendous power by going1:03:46above the abstraction of what's my input and what's my output. And linked to logic, the problem is it's a goal that I1:03:55think has yet to be realized. And probably one of the very most interesting research questions going on now in languages is how do you1:04:06somehow make a real logic language? And secondly, how do you bridge the gap from this world of logic and relations to the worlds of more traditional1:04:16languages and somehow combine the power of both. OK, let's break. AUDIENCE: Couldn't you solve that last problem by having1:04:25the extra rules that imply it? The problem here is you have the definition of something, but you don't have the definition of its opposite. If you include in the database something that says something1:04:35implies mortal x, something else implies not mortal x, haven't you basically solved the problem? PROFESSOR: But the issue is do you put a finite1:04:45number of those in? AUDIENCE: If things are specified always in pairs--1:04:54PROFESSOR: But the impression is then what do you do about deduction? You can't specify NOTs.1:05:03But the problem is, in a big system, it turns out that might not be a finite number of things.1:05:12There are also sort of two issues. Partly it might not be finite. Partly it might be that's not what you want.1:05:21So a good example would be suppose I want to do connectivity. I want a reason about connectivity. And I'm going to tell you there's four things: a and b1:05:32and c and d. And I'll tell you a is connected to b and c's connected to d.1:05:43And now I'll tell you is a connected to d? That's the question. There's an example where I would like something like the closed world assumption.1:05:54That's a tiny toy, but a lot of times, I want to be able to say something like anything that I haven't told you, assume is not true.1:06:04So it's not as simple as you only want to put in explicit NOTs all over the place. It's that sometimes it really isn't clear what you even want.1:06:14That having to specify both everything and not everything is too precise, and then you get down into problems there. But there are a lot of approaches that explicitly put1:06:24in NOTs and reason based on that. So it's a very good idea. It's just that then it starts becoming a little cumbersome in the very large problems you'd like to use.1:06:43AUDIENCE: I'm not sure how directly related to the argument this is, but one of your points was that one of the dangers of the closed rule is you never really know all the things that are there.1:06:53You never really know all the parts to it. Isn't that a major problem with any programming? I always write programs where I assume that I've got all the cases, and so I check for them all or whatever, and somewhere1:07:04down the road, I find out that I didn't check for one of them. PROFESSOR: Well, sure, it's true. But the problem here is it's that assumption which is the1:07:14thing that you're making if you believe you're identifying this with logic. So you're quite right. It's a situation you're never in. The problem is if you're starting to believe that what1:07:24this is doing is logic and you look at the rules you write down and say what can I deduce from them, you have to be very careful to remember that NOT means something else.1:07:33And it means something else based on an assumption which is probably not true. AUDIENCE: Do I understand you correctly that you cannot fix this problem without killing off all possibilities of1:07:44inference through altering NOT? PROFESSOR: No, that's not quite right. There are other-- there are ways to do logic with real NOTs.1:07:56There are actually ways to do that. But they're very inefficient as far as anybody knows. And they're much more--1:08:05the, quote, inference in here is built into this unifier and this pattern matching unification algorithm. There are ways to automate real logical reasoning.1:08:16But it's not based on that, and logic programming languages don't tend to do that because it's very inefficient as far as anybody knows.1:08:29All right, thank you.

`0:00:00`Lecture 9A | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:17PROFESSOR: Well, up 'til now, I suppose, we've been learning about a lot of techniques for organizing big programs,0:00:26symbolic manipulation a bit, some of the technology that you use for establishing languages, one in terms of0:00:36another, which is used for organizing very large programs. In fact, the nicest programs I know look more like a pile of languages than like a decomposition of a problem0:00:47into parts. Well, I suppose at this point, there are still, however, a few mysteries about how this sort of stuff works.0:00:56And so what we'd like to do now is diverge from the plan of telling you how to organize big programs, and rather tell0:01:06you something about the mechanisms by which these things can be made to work. The main reason for this is demystification, if you will,0:01:18that we have a lot of mysteries left, like exactly how it is the case that a program is controlled, how a0:01:27computer knows what the next thing to do is, or something like that. And what I'd like to do now is make that clear to you, that0:01:36even if you've never played with a physical computer before, the mechanism is really very simple, and that you can understand it completely with no trouble.0:01:47So I'd like to start by imagining that we-- well, the way we're going to do this, by the way, is we're going to take some very simple Lisp programs, very simple0:01:57Lisp programs, and transform them into hardware. I'm not going to worry about some intermediate step of going through some existing computer machine language and0:02:07then showing you how that computer works, because that's not as illuminating. So what I'm really going to show you is how a piece of0:02:16machinery can be built to do a job that you have written down as a program. That program is, in fact, a description of a machine.0:02:25We're going to start with a very simple program, proceed to show you some simple mechanisms, proceed to a few more complicated programs, and then later show you a not very0:02:36complicated program, how the evaluator transforms into a piece of hardware. And of course at that point, you have made the universal transition and can execute any program imaginable with a0:02:47piece of well-defined hardware. Well, let's start up now, give you a real concrete feeling for this sort of thing. Let's start with a very simple program.0:02:59Here's Euclid's algorithm. It's actually a little bit more modern than Euclid's algorithm. Euclid's algorithm for computing the greatest common0:03:09divisor of two numbers was invented 350 BC, I think. It's the oldest known algorithm.0:03:19But here we're going to talk about GCD of A and B, the Greatest Common Divisor or two numbers, A and B. And the algorithm is extremely simple.0:03:29If B is 0, then the result is going to be A. Otherwise, the0:03:38result is the GCD of B and the remainder when A is divided by0:03:52B. So this we have here is a very simple iterative process.0:04:02This a simple recursive procedure, recursively defined procedure, recursive definition, which yields an iterative process. And the way it works is that every step, it determines0:04:13whether B was zero. And if B is 0, we got the answer in A. Otherwise, we make another step where A is the old B, and B is the0:04:25remainder of the old A divided by the old B. Very simple. Now this, I've already told you some of the mechanism by just saying it that way.0:04:34I set it in time. I said there are certain steps, and that, in fact, one of the things you can see here is that one of the reasons why this is iterative is nothing is needed of the last step to0:04:46get the answer. All of the information that's needed to run this algorithm is in A and B. It has two well-defined state variables.0:05:00So I'm going to define a machine for you that can compute you GCDs. Now let's see. Every computer that's ever been made that's a0:05:10single-process computer, as opposed to a multiprocessor of some sort, is made according to the same plan. The plan is the computer has two parts, a part called the0:05:21datapaths, and a part called the controller. The datapaths correspond to a calculator that you might have. It contains certain registers that remember0:05:31things, and you've all used calculators. It has some buttons on it and some lights. And so by pushing the various buttons, you can cause operations to happen inside there among the registers, and0:05:42some of the results to be displayed. That's completely mechanical. You could imagine that box has no intelligence in it. Now it might be very impressive that it can produce0:05:52the sine of a number, but that at least is apparently possibly mechanical. At least, I could open that up in the same way I'm about to open GCD.0:06:02So this may have a whole computer inside of it, but that's not interesting. Addition is certainly simple. That can be done without any further mechanism. Now also, if we were to look at the other half, the0:06:15controller, that's a part that's dumb, too. It pushes the buttons. It pushes them according to the sequence, which is written down on a piece of paper, and observes the lights.0:06:26And every so often, it comes to a place in a sequence that says, if light A is on, do this sequence. Otherwise, do that sequence. And thereby, there's no complexity there either.0:06:37Well, let's just draw that and see what we feel about that. So for computing GCDs, what I want you to think about is0:06:48that there are these registers. A register is a place where I store a number, in this case. And this one's called a. And then there's another one for storing b.0:07:03Now we have to see what things we can do with these registers, and they're not entirely obvious what you can do with them. Well, we have to see what things we need to do with them. We're looking at the problem we're trying to solve.0:07:14One of the important things for designing a computer, which I think most designers don't do, is you study the problem you want to solve and then use what you learn from0:07:23studying the problem you want to solve to put in the mechanisms needed to solve it in the computer you're building, no more no less.0:07:32Now it may be that the problem you're trying to solve is everybody's problem, in which case you have to build in a universal interpreter of some language. But you shouldn't put any more in than required to build the0:07:42universal interpreter of some language. We'll worry about that in a second. OK, going back to here, let's see. What do we have to be able to do?0:07:51Well, somehow, we have to be able to get B into A. We have to be able to get the old value of B into the value of A. So we have to have some path by which stuff can flow,0:08:03whatever this information is, from b to a. I'm going to draw that with by an arrow saying that it is possible to move the contents of b into a, replacing the0:08:13value of a. And there's a little button here which you push which allows that to happen. That's what the little x is here.0:08:23Now it's also the case that I have to be able to compute the remainder of a and b. Now that may be a complicated mess. On the other hand, I'm going to make it a small box. If we have to, we may open up that box and look inside and0:08:34see what it is. So here, I'm going to have a little box, which I'm going to draw this way, which we'll call the remainder.0:08:46And it's going to take in a. That's going to take in b. And it's going to put out something, the remainder of a0:08:59divided by b. Another thing we have to see here is that we have to be able to test whether b is equal to 0.0:09:08Well, that means somebody's got to be looking at-- a thing that's looking at the value of b. I have a light bulb here which lights up if b equals 0.0:09:21That's its job. And finally, I suppose, because of the fact that we want the new value of a to be the old value of b, and0:09:30simultaneously the new value of b to be something I've done with a, and if I plan to make my machine such that everything happens one at a time, one motion at a time,0:09:41and I can't put two numbers in a register, then I have to have another place to put one while I'm interchanging. OK?0:09:50I can't interchange the two things in my hands, unless I either put two in one hand and then pull it back the other way, or unless I put one down, pick it up, and put the other one, like that, unless I'm a juggler, which I'm not, as you0:10:02can see, in which case I have a possibility of timing errors. In fact, much of the type of computer design people do0:10:11involves timing errors, of some potential timing errors, which I don't much like. So for that reason, I have to have a place to put the second0:10:22one of them down. So I have a place called t, which is a register just for temporary, t, with a button on it. And then I'll take the result of that, since I have to take0:10:32that and put into b, over here, we'll take the result of that and go like this, and a button here.0:10:42So that's the datapaths of a GCD machine. Now what's the controller? Controller's a very simple thing, too.0:10:52The machine has a state. The way I like to visualize that is that I've got a maze. And the maze has a bunch of places0:11:01connected by directed arrows. And what I have is a marble, which represents the state of the controller.0:11:10The marble rolls around in the maze. Of course, this analogy breaks down for energy reasons. I sometimes have to pump the marble up to the top, because0:11:19it's going to otherwise be a perpetual motion machine. But not worrying about that, this is not a physical analogy. This marble rolls around. And every time it rolls around certain bumpers, like in a0:11:30pinball machine, it pushes one of these buttons. And every so often, it comes to a place, which is a division, where it has to make a choice.0:11:40And there's a flap, which is controlled by this. So that's a really mechanical way of thinking about it. Of course, controllers these days, are not built that way0:11:50in real computers. They're built with a little bit of ROM and a state register. But there was a time, like the DEC PDP-6, where that's how0:11:59you built the controller of a machine. There was a bit that ran around the delay line, and it triggered things as it went by.0:12:08And it would come back to the beginning and get fed round again. And of course, there were all sorts of great bugs you could have like two bits going around, two marbles.0:12:17And then the machine has lost its marbles. That happens, too. Oh, well. So anyway, for this machine, what I have to do is the following. I'm going to start my maze here.0:12:30And the first thing I've got to do, in a notation which many of you are familiar with, is b equal to zero, a test.0:12:41And there's a possibility, either yes, in which case I'm done. Otherwise, if no, then I'm going have to0:12:53roll over some bumpers. I'm going to do it in the following order. I want to do this interchange game.0:13:04Now first, since I need both a and b, but then the first-- and this is not necessary-- I want to collect this. This is the thing that's going to go into b.0:13:13So I'm going to say, take this, which depends upon both a and b, and put the remainder into here. So I'm going to push this button first. Then, I'm going0:13:22to transfer b to a, push that button, and then I transfer the temporary into b, push that button.0:13:32So a very sequential machine, it's very inefficient. But that's fine right now. We're going to name the buttons, t gets remainder.0:13:46a gets b. And b gets t.0:13:55And then I'm going to go around here and it's to go back to start. And if you look, what are we seeing here? We're seeing the various--0:14:05what I really have is some sort of mechanical connection, where t gets r controls this thing.0:14:16And I have here that a gets b controls this fellow over here, and this fellow over here.0:14:28Boy, that's absolutely pessimal, the inverse of optimal. Every line heads across every other line the way I drew it.0:14:38I suppose this goes here, b gets t. Now I'd like to run this machine.0:14:48But before I run the machine, I want to write down a description of this controller, just so you can see that these things, of course, as usual, can be written down in some nice language, so that we don't have to always draw these diagrams. One of the problems0:14:59with diagrams is that they take up a lot of space. And for a machine this small, it takes two blackboards. For a machine that's the evaluator machine, I have trouble putting it into this room, even though0:15:08it isn't very big. So I'm going to make a little language for this that's just a description of that, saying define a0:15:17machine we'll call GCD. Of course, once we have something like this, we have a simulator for it.0:15:27And the reason why we want to build a language in this form, is because all of a sudden we can manipulate these expressions that I'm writing down. And then of course I can write things that can algebraically manipulate these things, simulate them, all that sort0:15:38of things that I might want to do, perhaps transform them as a layout, who knows. Once I have a nice representation of registers,0:15:48it has certain registers, which we can call A, B, and T. And there's a controller.0:16:02Actually, a better language, which would be more explicit, would be one which named every button also and said what it did. Like, this button causes the contents of T to go to the0:16:13contents of B. Well I don't want to do that, because it's actually harder to read to do that, and it takes up more space. So I'm going to have that in the instructions written in the controller.0:16:23It's going to be implicit what the operations are. They can be deduced by reading these and collecting together all the different things that can be done.0:16:33Well, let's just look at what these things are. There's a little loop that we go around which says branch,0:16:42this is the representation of the little flap that decides which way you go here, if 0 fetch of B, the contents of B,0:16:58and if the contents of B is 0, then go to a place called done. Now, one thing you're seeing here, this looks very much like a traditional computer language.0:17:08And what you're seeing here is things like labels that represent places in a sequence written down as a sequence.0:17:17The reason why they're needed is because over here, I've written something with loops. But if I'm writing English text, or something like that,0:17:26it's hard to refer to a place. I don't have arrows. Arrows are represented by giving names to the places where the arrows terminate, and then referring to them by0:17:35those names. Now this is just an encoding. There's nothing magical about things like that. Next thing we're going to do is we're going to say, how do0:17:45we do T gets R? Oh, that's easy enough, assign. We assign to T the remainder.0:17:56Assign is the name of the button. That's the button-pusher. Assign to T the remainder, and here's the representation of0:18:05the operation, when we divide the fetch of A by the fetch of0:18:17B. And we're also going to assign to A the fetch of B, assign to0:18:35B the result of getting the contents of T. And now I have0:18:50to refer to the beginning here. I see, why don't I call that loop like I have here?0:19:05So that's that reference to that arrow. And when we're done, we're done. We go to here, which is the end of the thing.0:19:14So here's just a written representation of this fragment of machinery that we've drawn here. Now the next thing I'd like to do is run this.0:19:25I want us to feel it running. Never done this before, you got to do it once. So let's take a particular problem. Suppose we want to compute the GCD of a equals 300:19:38and b equals 42. I have no idea what that is right now. But a 30 and b is 42.0:19:50So that's how I start this thing up. Well, what's the first thing I do? I say is B equal to 0, no. Then assign to T the remainder of the fetch of A and the0:20:01fetch of B. Well the remainder of 30 when divided by 42 is itself 30.0:20:11Push that button. Now the marble has rolled to here. A gets B. That pushes this button.0:20:21So 42 moves into here. B gets C. Push that button. The 30 goes here.0:20:32Let met just interchange them. Now let's see, go back to the beginning. B 0, no. T gets the remainder.0:20:43I suppose the remainder when dividing 42 by 30 is 12. I push that one. Next thing I do is allow the 30 to go to here, push this0:20:54one, allow the 12 to go to here. Go around this thing. Is that done? No. How about--0:21:05so now I have to find out the remainder of 30 divided by 12. And I believe that's 6. So 6 goes here on this button push.0:21:15Then the next thing I push is this one, which the 12 goes into here. Then I push this button.0:21:25The 6 gets into here. Is 6 equal to 0? No. OK.0:21:34So then at that point, the next thing to do is divide it. Ooh, this has got a remainder of 0. Looks like we're almost done. Move the 6 over here next.0:21:470 over here. Is the answer 0? Yes. B is 0, therefore the answer is in A. The answer is 6.0:21:56And indeed that's right, because if we look at the original problem, what we have is 30 is 2 times 3 times 5,0:22:07and 42 is 2 times 3 times 7. So the greatest common divisor is 2 times 3, which is 6.0:22:18Now normally, we write one other little line here, just to make it a little bit clearer, which is that we leave in a connection saying that this light is the guy0:22:29that that flap looks at. Of course, any real machine has a lot more complicated0:22:38things in it than what I've just shown you. Let's look for a second at the first still store.0:22:47Wow. Well you see, for example, one thing we might want to do is worry about the operations that are of IO form.0:22:56And we may have to collect something from the outside. So a state machine that we might have, the controller may0:23:06have to, for example, get a value from something and put register a to load it up. I have to master load up register b with another value.0:23:17And then later, when I'm done, I might want to print the answer out. And of course, that might be either simple or complicated.0:23:26I'm writing, assuming print is very simple, and read is very simple. But in fact, in the real world, those are very complicated operations, usually much, much larger and more complicated than the thing you're doing as your0:23:37problem you're trying to solve. On the other hand, I can remember a time when, I remember using IBM 7090 computer of sorts, where0:23:49things like read and write of a single object, a single number, a number, is a primitive operation of the IO0:23:58controller. OK? And so we have that kind of thing in there. And in such a machine, well, what are we really doing?0:24:08We're just saying that there's a source over here called "read," which is an operation which always has a value. We have to think about this as always having a value which0:24:17can be gated into either register a or b. And print is some sort of thing which when you gate it appropriately, when you push the button on it, will cause a0:24:27print of the value that's currently in register a. Nothing very exciting. So that's one sort of thing you might want to have. But0:24:36these are also other things that are a little bit worrisome. Like I've used here some complicated mechanisms. What you see here is remainder. What is that? That may not be so obvious how to compute.0:24:46It may be something which when you open it up, you get a whole machine. OK? In fact, that's true. For example, if I write down the program for remainder, the0:24:59simplest program for it is by repeated subtraction. Because of course, division can be done by repeated subtraction of numbers, of integers.0:25:09So the remainder of N divided by D is nothing more than if N0:25:30is less than D, then the result is N. Otherwise, it's the remainder when we subtract D from N with respect to D,0:25:48when divided by D. Gee, this looks just like the GCD program. Of course, it's not a very nice way to do remainders.0:25:59You'd really want to use something like binary notation and shift and things like that in a practical computer. But the point of that is that if I open this thing up, I0:26:09might find inside of it a computer. Oh, we know how to do that. We just made one. And it could be another thing just like this. On the other hand, we might want to make a more efficient0:26:20or better-structured machine, or maybe make use of some of the registers more than once, or some horrible mess like that that hardware designers like to do, and for very good reasons.0:26:29So for example, here's a machine that you see, which you're not supposed to be able to read. It's a little bit complicated. But what it is is the integration of the remainder0:26:41into the GCD machine. And it takes, in fact, no more registers. There are three registers in the datapaths. OK? But now there's a subtractor.0:26:51There are two things that are tested. Is b equal to 0, or is t less than b? And then the controller, which you see over here, is not much0:27:00more complicated. But it has two loops in it, one of which is the main one for doing the GCD, and one of which is the subtraction loop0:27:10for doing the remainder sub-operation. And there are ways, of course, of, if you think about it, taking the remainder program.0:27:19If I take remainder, as you see over there, as a lambda expression, substitute it in for remainder over here in the GCD program, then do some simplification by substituting0:27:30a and b for remainder in there, then I can unwind this loop. And I can get this piece of machinery by basically, a0:27:41little bit of algebraic simplification on the lambda expressions. So I suppose you've seen your first very0:27:50simple machines now. Are there any questions?0:28:02Good. This looks easy, doesn't it? Thank you. I suppose, take a break.0:28:11[MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:28:47PROFESSOR: Well, let's see. Now you know how to make an iterative procedure, or a procedure that yields an iterative process, turn into a machine.0:28:57I suppose the next thing we want to do is worry about things that reveal recursive processes. So let's play with a simple factorial procedure.0:29:10We define factorial of N to be if n is 1, the result is 1,0:29:24using 1 right now to decrease the amount of work I have to do to simulate it, else it's times N factorial N minus 1.0:29:42And what's different with this program, as you know, is that after I've computed factorial of N minus 1 here, I have to0:29:51do something to the result. I have to multiply it by N. So the only way I can visualize what this machine is0:30:00doing, because of the fact-- think of it this way, that I have a machine out here which somehow needs a factorial machine in order to compute its answer.0:30:09But this machine, the outer machine, has to exist before and after the factorial machine, which is inside. Whereas in the iterative case, the outer machine doesn't need0:30:20to exist after the inner machine is running, because you never need to go back to the outer machine to do anything. So here we have a problem where we have a machine which0:30:31has the same machine inside of it, an infinitely large machine.0:30:40And it's got other things inside of it, like a multiplier, which takes some inputs, and there's a minus 1 box, and things like that.0:30:50You can imagine that's what it looks like. But the important thing is that here I have something that happens before and after, in the outer machine, the0:31:00execution of the inner machine. So this machine has to have a life. It has to exist on both times sides of this machine.0:31:13So somehow, I have to have a place to store the things that this thing needs to run. Infinite objects don't exist in the real world.0:31:24What we have to do is arrange an illusion that we have an infinite object, we have an infinite amount of hardware somewhere. Now of course, illusion's all that really matters.0:31:36If we can arrange that every time you look at some infinite object, the part of it that you look at is there, then it's as infinite as you need it to be.0:31:47And of course, one of the things we might want to do, just look at this thing over here, is the organization that we've had so far involves having a part of the machine,0:32:01which is the controller, which sits right over here, which is perfectly finite and very simple. We have some datapaths, which consist of0:32:11registers and operators. And what I propose to do here is decompose the machine into two parts, such that there is a part which is fundamentally finite, and some part where a certain amount of infinite0:32:22stuff can be kept. On the other hand this is very simple and really isn't infinite, but it's just very large. But it's so simple that it could be cheaply reproduced in0:32:31such large amounts, we call it memory, that we can make a structure called a stack out of it which will allow us to,0:32:40in fact, simulate the existence of an infinite machine which is made out of a recursive nest of many machines. And the way it's going to work is that we're going to store0:32:51in this place called the stack the information required after the inner machine runs to resume the operation of the0:33:00outer machine. So it will remember the important things about the life of the outer machine that will be needed for this0:33:09computation. Since, of course, these machines are nested in a recursive manner, then in fact the stack will only be0:33:20accessed in a manner which is the last thing that goes in is the first thing that comes out.0:33:29So we'll only need to access some little part of this stack memory. OK, well, let's do it. I'm going to build you a datapath now, and I'm going to0:33:38write the controller. And then we're going to execute this to see how you do it. So the factorial machine isn't so bad.0:33:47It's going to have a register called the value, where the answer is going to be stored, and a registered called N,0:33:59which is where the number I'm taking factorial will be stored, factorial of. And it will be necessary in some instances to connect VAL0:34:09to N. In fact, one nice case of this is if I just said over here, N, because that would be right for N equal 1N.0:34:19And I could just move the answer over there if that's important. I'm not worried about that right now. And there are things I have to be able to do.0:34:29Like I have to be able to, as we see here, multiply N by something in VAL, because VAL is the result of computing factorial.0:34:38And I have to put the result back into VAL. So here we can see that the result of computing a factorial is N times the result0:34:48of computing a factorial. VAL will be the representation of the answer of the inner factorial. And so I'm going to have to have a multiplier here, which0:35:02is going to sample the value of N and the value of VAL and put the result back into VAL like that.0:35:17I'm also going to have to be able to see if N is 1. So I need a light bulb.0:35:28And I suppose the other thing I'm going to need to have is a way of decrementing N. So I'm going to have a decrementer,0:35:38which takes N and is going to put back the result into N. That's pretty much what I need in my machine.0:35:49Now, there's a little bit else I need. It's a little bit more complicated, because I'm also going to need a way to store, to save away, the things that0:35:58are going to be needed for resuming the computation of a factorial after I've done a sub-factorial. What's that?0:36:07One thing I need is N. So I'm going to build here a thing called a stack. The stack is a bunch of stuff that I'm going to write in0:36:24sequentially. I don't how long it is. The longer it is, the better my illusion of infinity. And I'm going to have to have a way of getting stuff out of0:36:36N and into the stack and vice versa. So I'm going to need a connection like this, which is two-way, whereby I can save the value of N and then0:36:52restore it some other time through that connection. This is the stack. I also need a way of remembering where I was in the0:37:02computation of factorial in the outer program. Now in the case of this machine, it0:37:11isn't very much a problem. Factorial always returns, has to go back to the place where we multiply by N, except for the last time, when it has to0:37:21return to whatever needs the factorial or go to done or stop. However, in general, I'm going to have to remember where I have been, because I might have computed factorial from0:37:30somewhere else. I have to go back to that place and continue there. So I'm going to have to have some way of taking the place where the marble is in the finite state controller, the0:37:41state of the controller, and storing that in the stack as well. And I'm going to have to have ways of restoring that back to the state of the-- the marble.0:37:51So I have to have something that moves the marble to the right place. Well, we're going to have a place which is the marble now. And it's called the continue register, called continue,0:38:09which is the place to put the marble next time I go to continue. That's what that's for. And so there's got to be some path from that into the controller.0:38:22I also have to have some way of saving that on the stack. And I have to have some way of setting that up to have0:38:32various constants, a certain fixed number of constants. And that's very easy to arrange. So let's have some constants here. We'll call this one after-fact.0:38:47And that's a constant which we'll get into the continue register, and also another one called fact-done.0:39:05So this is the machine I want to build. That's its datapaths, at least. And it mixes a little with the controller here, because of the fact that I have to remember where I was and restore0:39:15myself to that place. But let's write the program now which represents the controller. I'm not going to write the define machine thing and the register list, because that's not very interesting.0:39:24I'm just going to write down the sequence of instructions that constitute the controller. So we have assign, to set up, continue to done.0:39:44We have a loop which says branch if equal 1 fetch N, if0:40:01N is 1, then go to the base step of the induction, the simple case. Otherwise, I have to remember the things that are necessary0:40:10to perform a sub-factorial. I'm going to go over here, and I have to perform a sub-factorial. So I have to remember what's needed after I will0:40:21be done with that. See, I'm about to do something terrible. I'm about to change the value of N. But this guy has to know the old value of N. But in order to make the0:40:32sub-factorial work, I have to change the value of N. So I have to remember the old value. And I also have to remember where I've been. So I save up continue.0:40:47And this is an instruction that says, put something in the stack. Save the contents of the continuation register, which0:40:56in this case is done, because later I'm going to change that, too, because I need to go back to after-fact, as well. We'll see that.0:41:05We save N, because I'm going to need that for later. Assign to N the decrement of fetch N. Assign continue,0:41:31we're going to look at this now, to after, we'll call it. That's a good name for this, a little bit easier and shorter, and fits in here.0:41:52Now look what I'm doing here. I'm saying, if the answer is 1, I'm done. I'm going to have to just get the answer.0:42:02Otherwise, I'm going to save the continuation, save N, make N one less than N, remember I'm going to come back to someplace else, and go back and start doing another factorial.0:42:13However, I've got a different machine [? in me ?] now. N is 1, and continue is something else.0:42:22N is N minus 1. Now after I'm done with that, I can go there. I will restore the old value of N, which is the opposite of0:42:34this save over here. I will restore the continuation.0:42:49I will then go to here. I will assign to the VAL register the product0:43:03of N and fetch VAL.0:43:13VAL fetch product assign. And then I will be done. I will have my answer to the sub-factorial in VAL.0:43:26At that point, I'm going to return by going to the place where the continuation is pointing. That says, go to fetch continue.0:43:45And then I have finally a base step, which is the immediate answer. Assign to VAL fetch N, and go to fetch continue.0:44:12And then I'm done. Now let's see how this executes on a very simple case, because then we'll see the use of this stack to do0:44:25the job we need. This is statically what it's doing, but we have look dynamically at this. So let's see. First thing we do is continue gets done.0:44:36The way that happened is I pushed this. Let's call that done the way I have it.0:44:46I push that button. Done goes into there. Now I also have to set this thing up to have an initial value. Let's consider a factorial of three, a simple case.0:45:00And we're going to start out with our stack growing over here. Stacks have their own little internal state saying where they are, where the next place I'm going to write is.0:45:12So now we say, is N 1? The answer is no. So now I'm going to save continue, bang. Now that done goes in here.0:45:22And this moves to here, the next place I'm going to write. Save N 3. OK? Assign to N the decrement of N. That means0:45:34I've pushed this button. This becomes 2. Assign to continue aft. So I've pushed that button.0:45:43Aft goes in here. OK, now go to loop, bang, so up to here.0:45:54Is N 1? No. So I have to save continue. What's continue? Continue is aft. Push this button. So this moves to here.0:46:08I have to save N. N is over here. I got to 2. Push that button. So a 2 gets written there. And then this thing moves down here.0:46:20OK, save N. Assign N to the decrement of N. This becomes a 1.0:46:29Assign continue to aft. A-F-T gets written there again. Go to loop. Is N equal to 1? Oh, yes, the answer is 1.0:46:41OK, go to base step. Assign to VAL fetch of N. Bang, 1 gets put in there.0:46:51Go to fetch continue. So we look in continue. Basically, I'm pushing a button over here that goes to the controller. The continue becomes aft, and all of a sudden, the program's running here.0:47:02I now have to restore the outer version of factorial. So we go here. We say, restore N. So restore N means take the contents0:47:12that's here. Push this button, and it goes into here, 2, and the pointer moves up.0:47:22Restore continue, pretty easy. Go push this button. And then aft gets written in here again.0:47:31That means this thing moves up. I've gotten rid of something else on my stack.0:47:42Right, then I go to here, which says, assign to VAL the product of N an VAL. So I push this button over here, bang. 2 times 1 gives me a 2, get written there.0:47:55Go to fetch continue. Continue is aft. I go to aft. Aft says restore N. Do your restore N, means I take the0:48:06value over here, which is 3, push this up to here, and move it into here, N. Now it's pushing that button.0:48:17The next thing I do is restore continue. Continue is now going to become done. So this moves up here when I push this button.0:48:27Done may or may be there anymore, I'm not interested, but it certainly is here. Next thing I do is assign to VAL the product of the fetch0:48:39of N and the fetch of VAL. That's pushing this button over here, bang. 2 times 3 is 6. So I get a 6 over here.0:48:52And go to fetch continue, whoops, I go to done, and I'm done. And my answer is 6, as you can see in the VAL register. And in fact, the stack is in the state it0:49:02originally was in. Now there's a bit of discipline in using these things like stacks that we have to be careful of.0:49:13And we'll see that in the next segment. But first I want to ask if there are any questions for this.0:49:28Are there any questions? Yes, Ron. AUDIENCE: What happens when you roll off the end of the stack with-- PROFESSOR: What do you mean, roll off of? AUDIENCE: Well, the largest number-- a larger starting point of N requires more memory, correct?0:49:38PROFESSOR: Oh, yes. Well, I need to have a long enough stack. You say, what if I violate my illusion? AUDIENCE: Yes. PROFESSOR: Well, then the magic doesn't work.0:49:48The truth of the matter is that every machine is finite. And for a procedure like this, there's a limit to the number of sub-factorials I could have.0:49:59Remember when we were doing the y-operator a while ago, we pointed out that there was a sequence of exponentiation procedures, each of which was a little better than the previous one.0:50:08Well, we're now seeing how we implement that mathematical idea. The limiting process is only so good as as far as you take the limit.0:50:17If you think about it, what am I using here? I'm using about two pieces of memory for every recursion of0:50:26this process. If we try to compute factorial of 10,000, that's not a lot of memory. On the other hand, it's an awful big number.0:50:36So the question is, is that a valuable thing in this case. But it really turns out not to be a terrible limit, because memory is el cheapo, and people are pretty expensive.0:50:48OK, thank you, let's take a break. [MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:51:55PROFESSOR: Well, let's see. What I've shown you now is how to do a simple iterative process and a simple recursive process.0:52:05I just want to summarize the design of simple machines for specific applications by showing you a little bit more complicated design, that of a thing that does doubly0:52:15recursive Fibonacci, because it will indicate to us, and we'll understand, a bit about the conventions required for making stacks operate correctly.0:52:26So let's see. I'm just going to write down, first of all, the program I'm going to translate. I need a Fibonacci procedure, it's very simple, which says,0:52:41if N is less than 2, the result is N, otherwise it's0:52:50the sum of Fib of N minus 1 and Fib of N minus 2.0:53:07That's the plan I have here. And we're just going to write down the controller for such a machine. We're going to assume that there are registers, N, which0:53:16holds the number we're taking Fibonacci of, VAL, which is where the answer is going to get put, and continue, which is the thing that's linked to the controller, like before.0:53:26But I'm not going to draw another physical datapath, because it's pretty much the same as the last one you've seen. And of course, one of the most amazing things about0:53:37computation is that after a while, you build up a little more features and a few more features, and all of the sudden, you've got everything you need. So it's remarkable that it just gets there so fast. I0:53:48don't need much more to make a universal computer. But in any case, let's look at the controller for the Fibonacci thing. First thing I want to do is start the thing up by assign0:54:01to continue a place called done, called Fib-done here.0:54:13So that means that somewhere over here, I'm going to have a label, Fib-done, which is the place where I go when I want the machine to stop.0:54:24That's what that is. And I'm going to make up a loop. It's a place I'm going to go to in order to start up0:54:33computing a Fib. Whatever is in N at this point, Fibonacci will be computed of, and we will return to the place specified by continue.0:54:46So what you're going to see here at this place, what I want here is the contract that says, I'm going to write this with a comment syntax, the contract is N contains arg,0:55:00the argument. Continue is the recipient.0:55:12And that's where it is. At this point, if I ever go to this place, I'm expecting this to be true, the argument for computing the Fibonacci.0:55:24Now the next thing I want to do is to branch. And if N is less than 2--0:55:34by the way, I'm using what looks like Lisp syntax. This is not Lisp. This does not run. What I'm writing here does not run as a simple Lisp program.0:55:46This is a representation of another language. The reason I'm using the syntax of parentheses and so on is because I tend to use a Lisp system to write an0:55:56interpreter for this which allows me to simulate the machine I'm trying to build. I don't want to confuse this to think that0:56:05this is Lisp code. It's just I'm using a lot of the pieces of Lisp. I'm embedding a language in Lisp, using Lisp as pieces to make my process of making my simulator easy.0:56:16So I'm inheriting from Lisp all of its properties. Fetch of N 2, I want to go to a place called immediate answer.0:56:25It's the base step. Now, that's somewhere over here, just above done.0:56:37And we'll see it later. Now, in the general case, which is the part I'm going to write down now, let's just do it. Well, first of all, I'm going to have to0:56:46call Fibonacci twice. In each case-- well, in one case at least, I'm going to have to know what to do to come back and do the next one.0:56:56I have to remember, have I done the first Fib, or have I done the second one? Do I have to come back to the place where I do the second0:57:06Fib, or do I have to come back to the place where I do the add? In the first case, over the first Fibonacci, I'm going to need the value of N for computing for the second one.0:57:20So I have to store some of these things up. So first I'm going to save continue. That's who needs the answer.0:57:31And the reason I'm doing that is because I'm about to assign continue to the place which is the place I0:57:42want to go to after. Let's call it Fib-N-minus-1, big long name,0:57:52classic Lisp name. Because I'm going to compute the first Fib of N minus 1, and then after that, I want to come back and0:58:02do something else. That's the place I want to go to after I've done the first Fibonacci calculation.0:58:11And I want to do a save of N, because I'm going to need it later, after that. Now I'm going to, at this point, get ready to do the0:58:21Fibonacci of N minus 1. So assign to N the difference of the fetch of N and 1.0:58:38Now I'm ready to go back to doing the Fib loop.0:58:47Have I satisfied my contract? And the answer is yes. N contains N minus 1, which is what I need.0:58:57Continue contains a place I want to go to when I'm done with calculating N minus 1. So I've satisfied the contract. And therefore, I can write down here a label,0:59:11after-Fib-N-minus-1.0:59:20Now what am I going to do here? Here's a place where I now have to get ready to do Fib of N minus 2.0:59:29But in order to do a Fib of N minus 2, look, I don't know. I've clobbered my N over here. And presumably my N is counted down all the way to 1 or 0 or something at this point.0:59:39So I don't know what the value of N in the N register is. I want the value of N that was on the stack that I saved over here so that could restore it over here.0:59:49I saved up the value of N, which is this value of N at this point, so that I could restore it after computing Fib of N minus 1, so that I could count that down to N minus 20:59:59and then compute Fib of N minus 2. So let's restore that.1:00:08Restore of N. Now I'm about to do something which is superstitious, and we will remove it shortly.1:00:18I am about to finish the sequence of doing the subroutine call, if you will. I'm going to say, well, I also saved up the continuation,1:00:28since I'm going to restore it now. But actually, I don't have to, because I'm not going to need it. We'll fix that in a second. So we'll do a restore of continue, which is what I1:00:46would in general need to do. And we're just going to see what you would call in the compiler world a peephole optimization, which says, whoops, you didn't have to do that.1:00:55OK, so the next thing I see here is that I have to get ready now to do Fibonacci of N minus 2. But I don't have to save N anymore.1:01:05The reason why I don't have to save N anymore is because I don't need N after I've done Fib of N minus 2, because the next thing I do is add. So I'm just going to set up my N that way.1:01:16Assign N minus difference of fetch N and 2.1:01:31Now I have to finish the setup for calling Fibonacci of N minus 2. Well, I have to save up continue and assign continue,1:01:48continue, to the place which is after Fib N 2, that place1:02:03over here somewhere. However, I've got to be very careful. The old value, the value of Fib of N minus 1, I'm going to1:02:12need later. The value of Fibonacci of N minus 1, I'm going to need. And I can't clobber it, because I'm going to have to1:02:21add it to the value of Fib of N minus 2. That's in the value register, so I'm going to save it. So I have to save this right now, save up VAL.1:02:33And now I can go off to my subroutine, go to Fib loop.1:02:44Now before I go any further and finish this program, I just want to look at this segment so far and see, oh yes, there's a sequence of instructions here, if you1:02:55will, that I can do something about. Here I have a restore of continue, a save of continue,1:03:06and then an assign of continue, with no other references to continue in between. The restore followed by the save1:03:15leaves the stack unchanged. The only difference is that I set the continue register to a value, which is the value that was on the stack.1:03:24Since I now clobber that value, as in it was never referenced, these instructions are unnecessary. So we will remove these.1:03:38But I couldn't have seen that unless I had written them down. Was that really true? Well, I don't know.1:03:48OK, so we've now gone off to compute Fibonacci of N minus 2. So after that, what are we going to do?1:04:05Well, I suppose the first thing we have to do-- we've got two things. We've got a thing in the value register which is now valuable. We also have a thing on the stack that can be restored into the value register.1:04:14And what I have to be careful with now is I want to shuffle this right so I can do the multiply. Now there are various conventions I might use, but I'm going to be very picky and say, I'm only going to restore1:04:24into a register I've saved from. If that's the case, I have to do a shuffle here. It's the same problem with how many hands I have. So I'm going to assign to N, because I'm not going to need N1:04:37anymore, N is useless, the current value of VAL, which was the value of Fib of N minus 2.1:04:52And I'm going to restore the value register now.1:05:01This restore matches this save. And if you're very careful and examine very carefully what goes on, restores and saves are always matched.1:05:13Now there's an outstanding save, of course, that we have to get rid of soon. And so I restored the value register. Now I restore the continue one, which matches this one,1:05:34dot, dot, dot, dot, dot, dot, dot, down to here, restoring that continuation. That continuation is a continuation of Fib of N,1:05:46which is the problem I was trying to solve, a major problem I'm trying to solve. So that's the guy I have to go back to who wants Fib of N. I saved them all the way up here when I realized N was1:05:55not less than 2. And so I had to do a complicated operation. Now I've got everything I need to do it. So I'm going to restore that, assign to VAL the sum of fetch1:06:17VAL and fetch of N, and go to continue.1:06:38So now I've returned from computing Fibonacci of N, the general case.1:06:47Now what's left is we have to fix up a few details, like there's the base case of this induction, immediate answer,1:07:03which is nothing more than assign to VAL fetch of N,1:07:13because N was less than 2, and therefore, the answer is N in our original program, and return continue--1:07:31bobble, bobble almost-- and finally Fib done.1:07:43So that's a fairly complicated program. And the reason I wanted you see to that is because I want you to see the particular flavors of stack discipline that I was obeying.1:07:52It was first of all, I don't want to take anything that I'm not going to need later. I was being very careful.1:08:01And it's very important. And there are all sorts of other disciplines people make with frames and things like that of some sort, where you1:08:10save all sorts of junk you're not going to need later and restore it because, in some sense, it's easier to do that. That's going to lead to various disasters, which we'll1:08:19see a little later. It's crucial to say exactly what you're going to need later. It's an important idea.1:08:29And the responsibility of that is whoever saves something is the guy who restores it, because he needs it. And in such discipline, you can see what things are1:08:40unnecessary, operations that are unimportant. Now, one other thing I want to tell you about that's very1:08:49simple is that, of course, the picture you see is not the whole picture. Supposing I had systems that had things like other1:08:58operations, CAR, CDR, cons, building a vector and referencing the nth element of it, or things like that.1:09:10Well, at this level of detail, whatever it is, we can conceptualize those as primitive operations in the datapath. In other words, we could say that some machine that, for1:09:21example, has the append machine, which has to do cons of the CAR of x with the append of the CDR of x and y, well, gee, that's exactly the same as1:09:31the factorial structure. Well, it's got about the same structure. And what do we have? We have some sort of things in it which may be registers, x1:09:41and y, and then x has to somehow move to y sometimes, x has to get the value of y. And then we may have to be able to do something which is a cons.1:09:51I don't remember if I need to like this is in this system, but cons is sort of like subtract or add or something.1:10:01It combines two things, producing a thing which is the cons, which we may then think goes into there. And then maybe a thing called the CAR, which will produce--1:10:14I can get the CAR or something. And maybe I can get the CDR of something, and so on. But we shouldn't be too afraid of saying things this way, because the worst that could happen is if we open up cons,1:10:27what we're going to find is some machine. And cons may in fact overlap with CAR and CDR, and it always does, in the same way that plus and minus overlap,1:10:38and really the same business. Cons, CAR, and CDR are going to overlap, and we're going to find a little controller, a little datapath, which may1:10:48have some registers in it, some stuff like that. And maybe inside it, there may also be an infinite part, a part that's semi-infinite or something, which is a lot of1:10:59very uniform stuff, which we'll call memory. And I wouldn't be so horrified if that were the way it works.1:11:09In fact, it does, and we'll talk about that later. So are there any questions?1:11:24Gee, what an unquestioning audience. Suppose I tell you a horrible pile of lies.1:11:39OK. Well, thank you. Let's take our break. [MUSIC PLAYING - "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]

`0:00:00`Lecture 9B | MIT 6.001 Structure and Interpretation, 1986

0:00:000:00:15PROFESSOR: Well, I hope you appreciate that we have inducted you into some real magic, the magic of building0:00:26languages, really building new languages. What have we looked at? We've looked at an Escher picture language: this0:00:39language invented by Peter Henderson. We looked at digital logic language.0:00:53Let's see. We've looked at the query language. And the thing you should realize is, even though these were toy examples, they really are the kernels of really0:01:06useful things. So, for instance, the Escher picture language was taken by Henry Wu, who's a student at MIT, and developed into a real0:01:17language for laying out PC boards based just on extending those structures. And the digital logic language, Jerry mentioned when he showed it to you, was really extended to be used as0:01:28the basis for a simulator that was used to design a real computer. And the query language, of course, is kind of the germ of prologue.0:01:37So we built all of these languages, they're all based on LISP. A lot of people ask what particular problems is LISP0:01:48good for solving for? The answer is LISP is not good for solving any particular problems. What LISP is good for is constructing within it the right language to solve the problems you want to0:01:58solve, and that's how you should think about it. So all of these languages were based on LISP. Now, what's LISP based on?0:02:07Where's that come from? Well, we looked at that too. We looked at the meta-circular evaluator and said well, LISP0:02:23is based on LISP. And when we start looking at that, we've got to do some real magic, right? So what does that mean, right? Why operators, and fixed points, and the idea that what0:02:37this means is that LISP is somehow the fixed-point equation for this funny set of things which are defined in terms of themselves.0:02:47Now, it's real magic. Well, today, for a final piece of magic, we're going to make all the magic go away.0:03:06We already know how to do that. The idea is, we're going to take the register machine architecture and show how to implement LISP on terms of that.0:03:15And, remember, the idea of the register machine is that there's a fixed and finite part of the machine.0:03:24There's a finite-state controller, which does some particular thing with a particular amount of hardware. There are particular data paths: the operation the machine does.0:03:33And then, in order to implement recursion and sustain the illusion of infinity, there's some large amount of memory, which is the stack.0:03:42So, if we implement LISP in terms of a register machine, then everything ought to become, at this point, completely concrete. All the magic should go away.0:03:51And, by the end of this talk, I want you get the feeling that, as opposed to this very mysterious meta-circular evaluator, that a LISP evaluator really is something0:04:01that's concrete enough that you can hold in the palm of your hand. You should be able to imagine holding a LISP interpreter there. All right, how are we going to do this?0:04:10We already have all the ingredients. See, what you learned last time from Jerry is how to take any particular couple of LISP procedures and hand-translate0:04:23them into something that runs on a register machine. So, to implement all of LISP on a register machine, all we have to do is take the particular procedures that are0:04:34the meta-circular evaluator and hand-translate them for a register machine. And that does all of LISP, right? So, in principle, we already know how to do this.0:04:45And, indeed, it's going to be no different, in kind, from translating, say, recursive factorial or recursive Fibonacci.0:04:54It's just bigger and there's more of it. So it'd just be more details, but nothing really conceptually new. All right, also, when we've done that, and the thing is0:05:03completely explicit, and we see how to implement LISP in terms of the actual sequential register operations, that's going to be our final most explicit model of0:05:13LISP in this course. And, remember, that's a progression through this course. We started out with substitution, which is sort of like algebra. And then we went to the environment model, which0:05:22talked about the actual frames and how they got linked together. And then we made that more concrete in the meta-circular evaluator.0:05:31There are things the meta-circular evaluator doesn't tell us. You should realize that. For instance, it left unanswered the question of how0:05:40a procedure, like recursive factorial here , somehow takes space that grows. On the other hand, a procedure which also looks syntactically0:05:51recursive, called fact-iter, somehow doesn't take space. We justify that it doesn't need to take space by showing0:06:01the substitution model. But we didn't really say how it happens that the machine manages to do that, that that has to do with the details of how arguments are passed to procedures.0:06:12And that's the thing we didn't see in the meta-circular evaluator precisely because the way arguments got passed to procedures in this LISP depended on the way arguments0:06:21got passed to procedures in this LISP. But, now, that's going to become extremely explicit.0:06:30OK. Well, before going on to the evaluator, let me just give you a sense of what a whole LISP system looks like so you can see the parts we're going to talk about and the parts0:06:39we're not going to talk about. Let's see, over here is a happy LISP user, and the LISP0:06:49user is talking to something called the reader.0:07:00The reader's job in life is to take characters from the user0:07:14and turn them into data structures in something called a list structure memory.0:07:29All right, so the reader is going to take symbols, parentheses, and A's and B's, and ones and threes that you type in, and turn these into actual list structure: pairs,0:07:39and pointers, and things. And so, by the time evaluator is going, there are no characters in the world. And, of course, in more modern list systems, there's sort of0:07:49a big morass here that might sit between the user and the reader: Windows systems, and top levels, and mice, and all kinds of things. But conceptually, characters are coming in.0:07:59All right, the reader transforms these into pointers to stuff in this memory, and that's what the0:08:09evaluator sees, OK? The evaluator has a bunch of helpers.0:08:19It has all possible primitive operators you might want. So there's a completely separate box, a floating point0:08:29unit, or all sorts of things, which do the primitive operators. And, if you want more special primitives, you build more0:08:38primitive operators, but they're separate from the evaluator. The evaluator finally gets an answer and communicates that to the printer.0:08:50And now, the printer's job in life is to take this list structure coming from the evaluator, and turn it back into characters, and communicate them to the user0:09:03through whatever interface there is. OK. Well, today, what we're going to talk about is this evaluator.0:09:12The primitive operators have nothing particular to do with LISP, they're however you like to implement primitive operations. The reader and printer are actually complicated, but0:09:22we're not going to talk about them. They sort of have to do with details of how you might build up list structure from characters. So that is a long story, but we're not going0:09:31to talk about it. The list structure memory, we'll talk about next time. So, pretty much, except for the details of reading and printing, the only mystery that's going to be left after0:09:41you see the evaluator is how you build list structure on conventional memories. But we'll worry about that next time too.0:09:50OK. Well, let's start talking about the evaluator. The one that we're going to show you, of course, is not, I0:09:59think, nothing special about it. It's just a particular register machine that runs LISP. And it has seven registers, and here0:10:08are the seven registers. There's a register, called EXP, and its job is to hold the expression to be evaluated.0:10:18And by that, I mean it's going to hold a pointer to someplace in list structure memory that holds the expression to be evaluated. There's a register, called ENV, which holds the0:10:29environment in which this expression is to be evaluated. And, again, I made a pointer. The environment is some data structure.0:10:38There's a register, called FUN, which will hold the procedure to be applied when you go to apply a procedure. A register, called ARGL, which wants the list0:10:48of evaluated arguments. What you can start seeing here is the basic structure of the evaluator. Remember how evaluators work. There's a piece that takes expressions and environments,0:10:57and there's a piece that takes functions, or procedures and arguments. And going back and forth around here is the eval/apply loop.0:11:07So those are the basic pieces of the eval and apply. Then there's some other things, there's continue. You just saw before how the continue register is used to implement recursion and stack discipline.0:11:19There's a register that's going to hold the result of some evaluation. And then, besides that, there's one temporary register, called UNEV, which typically, in the evaluator,0:11:29is going to be used to hold temporary pieces of the expression you're working on, which you haven't gotten around to evaluate yet, right? So there's my machine: a seven-register machine.0:11:40And, of course, you might want to make a machine with a lot more registers to get better performance, but this is just a tiny, minimal one. Well, how about the data paths?0:11:49This machine has a lot of special operations for LISP. So, here are some typical data paths.0:12:00A typical one might be, oh, assign to the VAL register the contents of the EXP register. In terms of those diagrams you saw, that's a little button on0:12:10some arrow. Here's a more complicated one. It says branch, if the thing in the expression register is a conditional to some label here, called the0:12:21ev-conditional. And you can imagine this implemented in a lot of different ways. You might imagine this conditional test as a special purpose sub-routine, and conditional might be0:12:32represented as some data abstraction that you don't care about at this level of detail. So that might be done as a sub-routine. This might be a machine with hardware-types, and0:12:41conditional might be testing some bits for a particular code. There are all sorts of ways that's beneath the level of abstraction we're looking at.0:12:50Another kind of operation, and there are a lot of different operations assigned to EXP, the first clause of what's in EXP. This might be part of processing a conditional.0:12:59And, again, first clause is some selector whose details we don't care about. And you can, again, imagine that as a sub-routine which'll do some list operations, or you can imagine that as0:13:09something that's built directly into hardware. The reason I keep saying you can imagine it built directly into hardware is even though there are a lot of operations,0:13:18there are still a fixed number of them. I forget how many, maybe 150. So, it's plausible to think of building these directly into hardware. Here's a more complicated one.0:13:28You can see this has to do with looking up the values of variables. It says assign to the VAL register the result of looking up the variable value of some particular expression, which,0:13:39in this case, is supposed to be a variable in some environment. And this'll be some operation that searches through the environment structure, however it is represented, and goes0:13:49and looks up that variable. And, again, that's below the level of detail that we're thinking about. This has to do with the details of the data structures0:13:58for representing environments. But, anyway, there is this fixed and finite number of operations in the register machine.0:14:08Well, what's its overall structure? Those are some typical operations. Remember what we have to do, we have to take the0:14:17meta-circular evaluator-- and here's a piece of the meta-circular evaluator. This is the one using abstract syntax that's in the book.0:14:28It's a little bit different from the one that Jerry shows you. And the main thing to remember about the evaluator is that0:14:37it's doing some sort of case analysis on the kinds of expressions: so if it's either self-evaluated, or quoted, or0:14:46whatever else. And then, in the general case where the expression it's looking at is an application, there's some tricky recursions going on.0:14:55First of all, eval has to call itself both to evaluate the operator and to evaluate all the operands.0:15:05So there's this sort of red recursion of values walking down the tree that's really the easy recursion. That's just a val walking down this tree of expressions.0:15:14Then, in the evaluator, there's a hard recursion. There's the red to green. Eval calls apply. That's the case where evaluating a procedure or0:15:26argument reduces to applying the procedure to the list of arguments. And then, apply comes over here. Apply takes a procedure and arguments and, in the general0:15:39case where there's a compound procedure, apply goes around and green calls red. Apply comes around and calls eval again.0:15:48Eval's the body of the procedure in the result of extending the environment with the parameters of the procedure by binding the arguments.0:15:59Except in the primitive case, where it just calls something else primitive-apply, which is not really the business of the evaluator. So this sort of red to green, to red to green, that's the0:16:11eval/apply loop, and that's the thing that we're going to want to see in the evaluator. All right. Well, it won't surprise you at all that the two big pieces of0:16:22this evaluator correspond to eval and apply. There's a piece called eval-dispatch, and a piece called apply-dispatch.0:16:32And, before we get into the details of the code, the way to understand this is to think, again, in terms of these pieces of the evaluator having contracts with the rest of the world.0:16:41What do they do from the outside before getting into the grungy details? Well, the contract for eval-dispatch--0:16:50remember, it corresponds to eval. It's got to evaluate an expression in an environment. So, in particular, what this one is going to do, eval-dispatch will assume that, when you call it, that0:16:59the expression you want to evaluate is in the EXP register. The environment in which you want the evaluation to take place is in the ENV register.0:17:09And continue tells you the place where the machine should go next when the evaluation is done. Eval-dispatch's contract is that it'll actually perform0:17:20that evaluation, and, at the end of which, it'll end up at the place specified by continue. The result of the evaluation will be in the VAL register.0:17:29And it just warns you, it makes no promises about what happens to the registers. All other registers might be destroyed. So, there's one piece, OK?0:17:41Together, the pieces, apply-dispatch that corresponds to apply, it's got to apply a procedure to some arguments, so it assumes that this register, ARGL, contains0:17:52a list of the evaluated arguments. FUN contains the procedure. Those correspond to the arguments to the apply procedure in the meta-circular evaluator.0:18:03And apply, in this particular evaluator, we're going to use a discipline which says the place the machine should go to next when apply is done is, at the moment apply-dispatch is0:18:14called at the top of the stack, that's just discipline for the way this particular machine's organized. And now apply's contract is given all that.0:18:23It'll perform the application. The result of that application will end up in VAL. The stack will be popped. And, again, the contents of all the other registers may be0:18:33destroyed, all right? So that's the basic organization of this machine. Let's break for a little bit and see if there are any questions, and then we'll do a real example.0:19:47Well, let's take the register machine now, and actually step through, and really, in real detail, so you see completely0:19:57concrete how some expressions are evaluated, all right? So, let's start with a very simple expression.0:20:09Let's evaluate the expression 1.0:20:18And we need an environment, so let's imagine that somewhere there's an environment, we'll call it E,0.0:20:30And just, since we'll use these later, we obviously don't really need anything to evaluate 1. But, just for reference later, let's assume that E,0 has in0:20:40it an X that's bound to 3 and a Y that's bound to 4, OK?0:20:49And now what we're going to do is we're going to evaluate 1 in this environment, and so the ENV register has a pointer0:20:59to this environment, E,0, all right? So let's watch that thing go. What I'm going to do is step through the code.0:21:08And, let's see, I'll be the controller. And now what I need, since this gets rather complicated, is a very little execution unit. So here's the execution unit, OK?0:21:22OK. OK. All right, now we're going to start. We're going to start the machine at0:21:31eval-dispatch, right? That's the beginning of this. Eval-dispatch is going to look at the expression in dispatch, just like eval where we look at the very first thing.0:21:42We branch on whether or not this expression is self-evaluating. Self-evaluating is some abstraction we put into the machine--0:21:52it's going to be true for numbers-- to a place called ev-self-eval, right? So me, being the controller, looks at ev-self-eval, so we'll go over to there.0:22:02Ev-self-eval says fine, assign to val whatever is in the expression unit, OK?0:22:15And I have a bug because what I didn't do when I initialized this machine is also say what's supposed to happen when it's done, so I should have started out the machine with0:22:27done being in the continue register, OK? So we assign to VAL. And now go to fetch of continue, and [? the value changed. ?]0:22:38OK. OK, let's try something harder. Let's reset the machine here, and we'll put in the0:22:47expression register, X, OK?0:22:56Start again at eval-dispatch. Check, is it self-evaluating? No. Is it a variable? Yes.0:23:05We go off to ev-variable. It says assign to VAL, look up the variable value in the0:23:14expression register, OK? Go to fetch of continue.0:23:23PROFESSOR: Done. PROFESSOR: OK. All right. Well, that's the basic idea. That's a simple operation of the machine.0:23:32Now, let's actually do something a little bit more interesting. Let's look at the expression the sum of x and y.0:23:49OK. And now we'll see how you start unrolling these expression trees, OK? Well, start again at eval-dispatch, all right?0:24:04Self-evaluating? No. Variable? No. All the other special forms which I didn't write down, like quote, and lambda, and set, and whatever, it's none of those.0:24:13It turns out to be an application, so we go off to ev-application, OK? Ev-application, remember what it's going to do overall.0:24:25It is going to evaluate the operator. It's going to evaluate the arguments, and then it's going to go apply them.0:24:35So, before we start, since we're being very literal, we'd better remember that, somewhere in this environment, it's linked to another environment in which plus is0:24:46bound to the primitive procedure plus before we get an unknown variable in our machine.0:24:55OK, so we're at ev-application. OK, assign to UNEV the operands of what's in the0:25:05expression register, OK? Those are the operands. UNEV's a temporary register where we're going to save them. PROFESSOR: I'm assigning. PROFESSOR: Assign to x the operator.0:25:18Now, notice we've destroyed that expression in x, but the piece that we need is now in UNEV. OK. Now, we're going to get set up to recursively0:25:27evaluate the operator. Save the continue register on the stack. Save the environment.0:25:40Save UNEV. OK, assign to continue a0:25:53label called eval-args. Now, what have we done? We've set up for a recursive call.0:26:04We're about to go to eval-dispatch. We've set up for a recursive call to eval-dispatch. What did we do? We took the things we're going to need later, those operands0:26:15that were in UNEV; the environment in which we're going to eventually have to, maybe, evaluate those operands; the place we eventually want to go to, which, in this case, was done; we've saved them on the stack.0:26:27The reason we saved them on the stack is because eval-dispatch makes no promises about what registers it may destroy. So all that stuff is saved on the stack. Now, we've set up eval-dispatch's contract.0:26:37There's a new expression, which is the operator plus; a new environment, although, in this case, it's the same one; and a new place to go to when you're done, which is eval-args.0:26:47So that's set up. Now, we're going to go off to eval-dispatch. Here we are back at eval-dispatch. It's not self-evaluating. Oh, it's a variable, so we'd better go off to0:26:57ev-variable, right? Ev-variable is assigned to VAL. Look up the variable value of the expression, OK?0:27:08So VAL is the primitive procedure plus, OK? And go to fetch of continue. PROFESSOR: Eval-args. PROFESSOR: Right, which is now eval-args not done.0:27:19So we come back here at eval-args, and what do we do? We're going to restore the stuff that we saved, so we restore UNEV. And notice, there, it wasn't necessary,0:27:31although, in general, it would be. It might be some arbitrary evaluation that happened. We restore ENV. OK, we assign to FUN fetch of VAL.0:27:58OK, now, we're going to go off and start evaluating some arguments. Well, first thing we'd better do is save FUN because some0:28:08arbitrary stuff might happen in that evaluation. We initialize the argument list. Assign to argl an empty0:28:18argument list, and go to eval-arg-loop, OK? At eval-arg-loop, the idea of this is we're going to0:28:29evaluate the pieces of the expressions that are in UNEV, one by one, and move them from unevaluated in UNEV to evaluated in the arg list, OK?0:28:38So we save argl. We assign to x the first operand of the stuff in UNEV.0:28:53Now, we check and see if that was the last operand. In this case, it is not, all right? So we save the environment.0:29:09We save UNEV because those are all things we might need later. We're going to need the environment to do some more evaluations. We're going to need UNEV to look at what the rest of those0:29:18arguments were. We're going to assign continue a place called accumulate-args, or accumulate-arg.0:29:30OK, now, we've set up for another call to eval-dispatch, OK? All right, now, let me short-circuit this so we don't0:29:39go through the details of eval-dispatch. Eval-dispatch's contract says I'm going to end up, the world will end up, with the value of evaluating this expression in0:29:48this environment in the VAL register, and I'll end up there. So we short-circuit all of this, and a 3 ends up in VAL.0:29:58And, when we return from eval-dispatch, we're going to return to accumulate-arg. PROFESSOR: Accumulate-arg. PROFESSOR: With 3 in the VAL register, OK?0:30:08So that short-circuited that evaluation. Now, what do we do? We're going to go back and look at the rest of the arguments, so we restore UNEV. We restore0:30:18ENV. We restore argl.0:30:28One thing. PROFESSOR: Oops! Parity error. [LAUGHTER] PROFESSOR: Restore argl.0:30:41PROFESSOR: OK. OK, we assign to argl consing on fetch of the value register0:30:51to what's in argl. OK, we assign to UNEV the rest of the operands in fetch of0:31:04UNEV, and we go back to eval-arg-loop. PROFESSOR: Eval-arg-loop. PROFESSOR: OK.0:31:15Now, we're about to do the next argument, so the first thing we do is save argl.0:31:25OK, we assign to x the first operand of fetch of UNEV. OK,0:31:35we test and see if that's the last operand. In this case, it is, so we're going to go to a special place that says evaluate the last argument because, notice, after evaluating the argument, we don't need the0:31:45environment any more. That's going to be the difference. So here, at eval-last-arg, which is assigned to accumulate-last-arg, now, we're set up again for0:32:06eval-dispatch. We've got a place to go to when we're done. We've got an expression. We've got an environment. OK, so we'll short-circuit the call to eval-dispatch. And what'll happen is there's a y there, it's 4 in that0:32:18environment, so VAL will end up with 4 in it. And, then, we're going to end up at accumulate-last-arg, OK? So, at accumulate-last-arg, we restore argl.0:32:41We assign to argl cons of fetch of the new value onto it, so we cons a 4 onto that. We restore what was saved in the function register.0:32:53And notice, in this case, it had not been destroyed, but, in general, it will be. And now, we're ready to go off to apply-dispatch, all right?0:33:02So we've just gone through the eval. We evaluated the argument, the operator, and the arguments, and now, we're about to apply them. So we come off to apply-dispatch here, OK?0:33:17We come off to apply-dispatch, and we're going to check whether it's a primitive or a compound procedure. PROFESSOR: Yes. PROFESSOR: All right. So, in this case, it's a primitive procedure, and we go0:33:27off to primitive-apply. So we go off to primitive-apply, and it says assign to VAL the result of applying primitive procedure0:33:38of the function to the argument list. PROFESSOR: I don't know how to add. I'm just an execution unit. PROFESSOR: Well, I don't know how to add either. I'm just the evaluator, so we need a primitive operator.0:33:48Let's see, so the primitive operator, what's the sum of 3 and 4? AUDIENCE: 7. PROFESSOR: OK, 7. PROFESSOR: Thank you.0:33:58PROFESSOR: Now, we restore continue, and we go to fetch0:34:12of continue. PROFESSOR: Done. PROFESSOR: OK. Well, that was in as much detail as you will ever see. We'll never do it in as much detail again.0:34:21One very important thing to notice is that we just executed a recursive procedure, right? This whole thing, we used a stack and the0:34:31evaluator was recursive. A lot of people think the reason that you need a stack and recursion in an evaluator is because you might be evaluating recursive procedures like0:34:40factorial or Fibonacci. It's not true. So you notice we did recursion here, and all we evaluated was plus X, Y, all right? The reason that you need recursion in the evaluator is0:34:51because the evaluation process, itself, is recursive, all right? It's not because the procedure that you might be evaluating in LISP is a recursive procedure. So that's an important thing that people get0:35:01confused about a lot. The other thing to notice is that, when we're done here, we're really done. Not only are we at done, but there's no accumulated stuff0:35:12on the stack, right? The machine is back to its initial state, all right? So that's part of what it means to be done. Another way to say that is the evaluation process has reduced0:35:26the expression, plus X, Y, to the value here, 7. And by reduced, I mean a very particular thing.0:35:36It means that there's nothing left on the stack. The machine is now in the same state, except there's something in the value register. It's not part of a sub-problem of anything. There's nothing to go back to.0:35:46OK. Let's break. Question? AUDIENCE: The question here, in the stack, is because the data may be recursive.0:35:55You may have embedded expressions, for instance. PROFESSOR: Yes, because you might have embedded expressions. But, again, don't confuse that with what people sometimes0:36:06mean by the data may be recursive, which is to say you have these list-structured, recursive data list operations. That has nothing to do with it. It's simply that the expressions contain0:36:15sub-expressions. Yeah? AUDIENCE: Why is it that the order of the arguments in the arg list got reversed? PROFESSOR: Ah! Yes, I should've mentioned that.0:36:27Here, the reason the order is reversed-- it's a question of what you mean by reversed.0:36:36I believe it was Newton. In the very early part of optics, people realized that, when you look through the lens of your eye, the image was0:36:46up-side down. And there was a lot of argument about why that didn't mean you saw things up-side down. So it's sort of the same issue. Reversed from what? So we just need some convention.0:36:57The reason that they're coming at 4, 3 is because we're taking UNEV and consing the result onto argl. So you have to realize you've made that convention.0:37:06The place that you have to realize that-- well, there's actually two places. One is in apply-primitive-operator, which has to realize that the arguments to primitives go in,0:37:16in the opposite order from the way you're writing them down. And the other one is, we'll see later when you actually go to bind a function's parameters, you should realize the arguments are going to come in from the opposite0:37:26order of the variables to which you're binding them. So, if you just keep track of that, there's no problem. Also, this is completely arbitrary because, if we'd done, say, an iteration through a vector assigning0:37:36them, they might come out in the other order, OK? So it's just a convention of the way this particular evaluator works.0:37:45All right, let's take a break.0:38:41We just saw evaluating an expression and, of course, that was very simple one. But, in essence, it would be no different if it was some0:38:51big nested expression, so there would just be deeper recursion on the stack. But what I want to do now is show you the last piece. I want to walk you around this eval and apply loop, right?0:39:01That's the thing we haven't seen, really. We haven't seen any compound procedures where applying a procedure reduces to evaluating the body of the0:39:11procedure, so let's just suppose we had this. Suppose we were looking at the procedure define F of A and B0:39:29to be the sum of A and B. So, as we typed in that procedure previously, and now we're going to evaluate F of X and0:39:41Y, again, in this environment, E,0, where X is bound to 3 and Y is bound to 4.0:39:50When the defined is executed, remember, there's a lambda here, and lambdas create procedures. And, basically, what will happen is, in E,0, we'll end0:40:01up with a binding for F, which will say F is a procedure, and its args are A and B, and its body is plus a,b.0:40:18So that's what the environment would have looked like had we made that definition. Then, when we go to evaluate F of X and Y, we'll go through0:40:29exactly the same process that we did before. It's even the same expression. The only difference is that F, instead of having primitive plus in it, will have this thing.0:40:41And so we'll go through exactly the same process, except this time, when we end up at apply-dispatch, the function register, instead of having primitive plus, will0:40:50have a thing that will represent it saying procedure, where the args are A and B, and the body is plus A, B.0:41:08And, again, what I mean, by its ENV, I mean there's a pointer to it, so don't worry that I'm writing a lot of stuff there. There's a pointer to this procedure data structure.0:41:17OK, so, we're in exactly the same situation. We get to apply-dispatch, so, here, we come to apply-dispatch.0:41:26Last time, we branched off to a primitive procedure. Here, it says oh, we now have a compound procedure, so we're going to go off to compound-apply.0:41:38Now, what's compound-apply? Well, remember what the meta-circular evaluator did? Compound-apply said we're going to evaluate the body of0:41:50the procedure in some new environment. Where does that new environment come from? We take the environment that was packaged with the0:42:00procedure, we bind the parameters of the procedure to the arguments that we're passing in, and use that as a0:42:10new frame to extend the procedure environment. And that's the environment in which we evaluate the procedure body, right?0:42:21That's going around the apply/eval loop. That's apply coming back to call eval, all right?0:42:30OK. So, now, that's all we have to do in compound-apply. What are we going to do? We're going to manufacture a new environment.0:42:43And we're going to manufacture a new environment, let's see, that we'll call E,1.0:42:53E,1 is going to be some environment where the parameters of the procedure, where A is bound to 3 and B is0:43:02bound to 4, and it's linked to E,0 because that's where f is defined. And, in this environment, we're going to evaluate the0:43:11body of the procedure. So let's look at that, all right? All right, here we are at compound-apply, which says0:43:20assign to the expression register the body of the procedure that's in the function register. So I assign to the expression register the0:43:31procedure body, OK?0:43:42That's going to be evaluated in an environment which is formed by making some bindings using information determined0:43:53by the procedure-- that's what's in FUN-- and the argument list. And let's not worry about exactly what that does, but you can see the information's there. So make bindings will say oh, the procedure, itself, had an0:44:06environment attached to it. I didn't write that quite here. I should've said in environment because every procedure gets built with an environment. So, from that environment, it knows what the procedure's0:44:17definition environment is. It knows what the arguments are. It looks at argl, and then you see a reversal convention here. It just has to know that argl is reversed, and it builds0:44:27this frame, E,1. All right, so, let's assume that that's what make bindings returns, so it assigns to ENV this thing, E,1.0:44:41All right, the next thing it says is restore continue. Remember what continue was here? It got put up in the last segment.0:44:52Continue got stored. That was the original done, which said what are you going to do after you're done with this particular application? It was one of the very first things that happened when we0:45:01evaluated the application. And now, finally, we're going to restore continue. Remember apply-dispatch's contract. It assumes that where it should go to next was on the0:45:11stack, and there it was on the stack. Continue has done, and now we're going to go back to eval-dispatch. We're set up again.0:45:20We have an expression, an environment, and a place to go to. We're not going to go through that because it's sort of the same expression.0:45:35OK, but the thing, again, to notice is, at this point, we have reduced the original expression, F,X,Y, right?0:45:44We've reduced evaluating F,X,Y in environment E,0 to evaluate plus A, B in E,1. And notice, nothing's on the stack, right?0:45:55It's a reduction. At this point, the machine does not contain, as part of its state, the fact that it's in the middle of evaluating some procedure called f, that's gone, right?0:46:08There's no accumulated state, OK? Again, that's a very important idea. That's the meaning of, when we used to write in the0:46:17substitution model, this expression reduces to that expression. And you don't have to remember anything. And here, you see the meaning of reduction. At this point, there is nothing on the stack.0:46:31See, that has very important consequences. Let's go back and look at iterative factorial, all right?0:46:40Remember, this was some sort of loop and doing iter. And we kept saying that's an iterative procedure, right?0:46:52And what we wrote, remember, are things like, we said,0:47:04fact-iter of 5. We wrote things like reduces to iter of 1, and 1, and 5,0:47:19which reduces to iter of 1, and 2, and 5, and so on, and so on, and so on. And we kept saying well, look, you don't have to build up any0:47:29storage to do that. And we waved our hands, and said in principle, there's no storage needed. Now, you see no storage needed. Each of these is a real reduction, right?0:47:49As you walk through these expressions, what you'll see are these expressions on the stack in some particular environment, and then these expressions in the EXP0:48:00register in some particular environment. And, at each point, there'll be no accumulated stuff on the stack because each one's a real reduction, OK?0:48:09All right, so, for example, just to go through it in a little bit more care, if I start out with an expression that says something like, oh, say, fact-iter of 5 in some0:48:33environment that will, at some point, create an environment0:48:46in which n is down to 5. Let's call that--0:48:55And, at some point, the machine will reduce this whole thing to a thing that says that's really iter of 1, and0:49:081, and n, evaluated in this environment, E,1 with nothing on the stack.0:49:17See, at this moment, the machine is not remembering that evaluating this expression, iter-- which is the loop-- is part of this thing0:49:27called iterative factorial. It's not remembering that. It's just reducing the expression to that, right? If we look again at the body of iterative factorial, this0:49:38expression has reduced to that expression. Oh, I shouldn't have the n there. It's a slightly different convention from the slide to0:49:48the program, OK? And, then, what's the body of iter? Well, iter's going to be an it, and I won't go through the0:49:59details of if. It'll evaluate the predicate. In this case, it'll be false. And this iter will now reduce to the expression iter of0:50:14whatever it says, star, counter product, and-- what does it say-- plus counter 1 in some other environment, by this time,0:50:30E,2, where E,2 will be set up having bindings for product and counter, right?0:50:43And it'll reduce to that, right? It won't be remembering that it's part of something that it has to return to. And when iter calls iter again, it'll reduce to another thing that looks like this in some environment, E,3, which0:50:55has new bindings for product and counter. So, if you're wondering, see, if you've always been queasy0:51:08about how it is we've been saying those procedures, that look syntactically recursive, are, in fact, iterative, run in constant space, well, I don't know if this makes you0:51:19less queasy, but at least it shows you what's happening. There really isn't any buildup there. Now, you might ask well, is there buildup in principle in0:51:28these environment frames? And the answer is yeah, you have to make these new environment frames, but you don't have to hang onto them when you're done. They can be garbage collected, or the space can be reused0:51:39automatically. But you see the control structure of the evaluator is really using this idea that you actually have a reduction, so these procedures really are iterative procedures.0:51:50All right, let's stop for questions.0:52:02All right, let's break.0:52:48Let me contrast the iterative procedure just so you'll see where space does build up with a recursive procedure, so you can see the difference.0:52:58Let's look at the evaluation of recursive factorial, all right? So, here's fact-recursive, or standard factorial definition.0:53:07We said this one is still a recursive procedure, but this is actually a recursive process. And then, just to link it back to the way we started, we said0:53:17oh, you can see that it's going to be recursive process by the substitution model because, if I say recursive factorial of 5, that turns into 5 times--0:53:36what is it, fact-rec, or record fact-- 5 times recursive factorial of 4, which turns into 5 times 40:53:54times fact-rec of 3, which returns into 5 times 4 times 30:54:08times, and so on, right? The idea is there was this chain of stuff building up,0:54:18which justified, in the substitution model, the fact that it's recursive. And now, let's actually see that chain of stuff build up and where it is in the machine, OK?0:54:27All right, well, let's imagine we're going to start out again. We'll tell it to evaluate recursive factorial of 5 in0:54:41some environment, again, E,0 where recursive factorial is defined, OK? Well, now we know what's eventually going to happen.0:54:52This is going to come along, it'll evaluate those things, figure out it's a procedure, build somewhere over here an environment, E,1, which has n bound to 5, which hangs off of0:55:05E,0, which would be, presumably, the definition environment of recursive factorial, OK?0:55:14And, in this environment, it's going to go off and evaluate the body. So, again, the evaluation here will reduce to evaluating the0:55:27body in E,1. That's going to look at an if, and I won't go through the details of if. It'll look at the predicate. It'll decide it eventually has to evaluate the alternative.0:55:37So this whole thing, again, will reduce to the alternative of recursive factorial, the alternative clause, which says0:55:47that this whole thing reduces to times n of recursive0:55:56factorial of n minus 1 in the environment E,1, OK?0:56:08So the original expression, now, is going to reduce to evaluating that expression, all right? Now we have an application. We did an application before.0:56:18Remember what happens in an application? The first thing you do is you go off and you save the value of the continue register on the stack. So the stack here is going to have done in it.0:56:29And then you're going to set up to evaluate the sub-parts, OK? So here we go off to evaluate the sub-parts.0:56:39First thing we're going to do is evaluate the operator. What happens when we evaluate an operator? Well, we arrange things so that the operator ends up in0:56:49the expression register. The environments in the ENV register continue someplace where we're going to go evaluate the arguments. And, on the stack, we've saved the original continue, which0:56:59is where we wanted to be when we're all done. And then the things we needed when we're going to get done evaluating the operator, the things we'll need to evaluate the arguments, namely, the environment and those0:57:11arguments, those unevaluated arguments, so there they are sitting on the stack. And we're about to go off to evaluate the operator.0:57:23Well, when we return from this particular call-- so we're about to call eval-dispatch here-- when we return from this call, the value of that operator,0:57:32which, in this case, is going to be the primitive multiplier procedure, will end up in the FUN register, all right?0:57:43We're going to evaluate some arguments. They will evaluate in here. That'll give us 5, in this case. We're going to put that in the argl register, and then we'll0:57:53go off to evaluate the second operand. So, at the point where we go off to evaluate the second operand-- and I'll skip details like computing, and0:58:02minus 1, and all of that-- but, when we go off to evaluate the second operand, that will eventually reduce to another call to fact-recursive.0:58:12And, what we've got on the stack here is the operator from that combination that we're going to use it in and the other argument, OK?0:58:23So, now, we're set up for another call to recursive factorial. And, when we're done with this one, we're going to go to0:58:32accumulate the last arg. And remember what that'll do? That'll say oh, whatever the result of this has to get combined with that, and we're going to multiply them.0:58:41But, notice now, we're at another recursive factorial. We're about to call eval-dispatch again, except we haven't really reduced it because there's stuff0:58:51on the stack now. The stuff on the stack says oh, when you get back, you'd better multiply it by the 5 you had hanging there. So, when we go off to make another call, we0:59:07evaluate the n minus 1. That gives us another environment in which the new n's going to be down to 4. And we're about to call eval-dispatch again, right?0:59:18We get another call. That 4 is going to end up in the same situation. We'll end up with another call to fact-recursive n.0:59:30And sitting on the stack will be the stuff from the original one and, now, the subsidiary one we're doing. And both of them are waiting for the same thing. They're going to go to accumulate a last argument.0:59:40And then, of course, when we go to the fourth call, the same thing happens, right? And this goes on, and on, and on. And what you see here on the stack, exactly what's sitting0:59:51here on the stack, the thing that says times and 5. And what you're going to do with that is accumulate that into a last argument.1:00:00That's exactly this, right? This is exactly where that stuff is hanging. Effectively, the operator you're going to apply, the1:00:12other argument that it's got to be multiplied by when you get back and the parentheses, which says yeah, what you wanted to do was accumulate them. So, you see, the substitution model is not such a lie.1:00:22That really is, in some sense, what's sitting right on the stack. OK. All right, so that, in some sense, should explain for you,1:00:33or at least convince you, that, somehow, this evaluator is managing to take these procedures and execute some of them iteratively and some of them recursively, even though,1:00:46as syntactically, they look like recursive procedures. How's it managing to do that? Well, the basic reason it's managing to do that is the evaluator is set up to save only what it needs later.1:01:01So, for example, at the point where you've reduced evaluating an expression and an environment to applying a procedure to some arguments, it doesn't need that original1:01:11environment anymore because any environment stuff will be packaged inside the procedures where the application's going to happen.1:01:20All right, similarly, when you're going along evaluating an argument list, when you've finished evaluating the list, when you're finished evaluating the last argument, you don't need that argument list any more, right?1:01:31And you don't need the environment where those arguments would be evaluated, OK? So the basic reason that this interpreter is being so smart1:01:40is that it's not being smart at all, it's being stupid. It's just saying I'm only going to save what I really need. Well, let me show you here.1:01:54Here's the actual thing that's making a tail recursive. Remember, it's the restore of continue. It's saying when I go off to evaluate the procedure body, I1:02:09should tell eval to come back to the place where that original evaluation was supposed to come back to. So, in some sense, you want to say what's the actual line that makes a tail recursive?1:02:18It's that one. If I wanted to build a non-tail recursive evaluator, for some strange reason, all I would need to do is, instead1:02:27of restoring continue at this point, I'd set up a label down here called, "Where to come back after you've finished applying the procedure." Instead, I'd1:02:38set continue to that. I'd go to eval-dispatch, and then eval-dispatch would come back here. At that point, I would restore continue and go to the original one.1:02:47So here, the only consequence of that would be to make it non-tail recursive. It would give you exactly the same answers, except, if you did that iterative factorial and all those iterative1:02:57procedures, it would execute recursively. Well, I lied to you a little bit, but just a little bit, because I showed you a slightly over-simplified1:03:07evaluator where it assumes that each procedure body has only one expression. Remember, in general, a procedure has a sequence of expressions in it.1:03:17So there's nothing really conceptually new. Let me just show you the actual evaluator that handles sequences of expressions.1:03:28This is compound-apply now, and the only difference from the old one is that, instead of going off to eval directly, it takes the whole body of the procedure, which, in this1:03:38case, is a sequence of expressions, and goes off to eval-sequence. And eval-sequence is a little loop that, basically, does1:03:48these evaluations one at a time. So it does an evaluation. Says oh, when I come back, I'd better come back here to do the next one.1:03:58And, when I'm all done, when I want to get the last expression, I just restore my continue and go off to eval-dispatch. And, again, if you wanted for some reason to break tail1:04:08recursion in this evaluator, all you need to do is not handle the last expression, especially. Just say, after you've done the last expression, come back1:04:17to some other place after which you restore continue. And, for some reason, a lot of LISP evaluators tended to work that way.1:04:26And the only consequence of that is that iterative procedures built up stack. And it's not clear why that happened.1:04:35All right. Well, let me just sort of summarize, since this is a lot of details in a big program. But the main point is that it's no different,1:04:44conceptually, from translating any other program. And the main idea is that we have this universal evaluator program, the meta-circular evaluator. If we translate that into LISP, then1:04:53we have all of LISP. And that's all we did, OK? The second point is that the magic's gone away. There should be no more magic in this whole system, right?1:05:04In principle, it should all be very clear except, maybe, for how list structured memory works, and we'll see that later. But that's not very hard.1:05:15The third point is that all this tail recursion came from the discipline of eval being very careful to save only what it needs next time.1:05:25It's not some arbitrary thing where we're saying well, whenever we call a sub-routine, we'll save all the registers in the world and come back, right? See, sometimes it pays to really worry about efficiency.1:05:37And, when you're down in the guts of your evaluator machine, it really pays to think about things like that because it makes big consequences. Well, I hope what this has done is really made the1:05:49evaluator seem concrete, right? I hope you really believe that somebody could hold a LISP evaluator in the palm of their hand.1:05:59Maybe to help you believe that, here's a LISP evaluator that I'm holding the palm of my hand, right? And this is a chip which is actually quite a bit more1:06:11complicated than the evaluator I showed you. Maybe, here's a better picture of it.1:06:22What there is, is you can see the same overall structure. This is a register array. These are the data paths. Here's a finite state controller. And again, finite state, that's all there is.1:06:32And somewhere there's external memory that'll worry about things. And this particular one is very complicated because it's trying to run LISP fast. And it has some very, very fast1:06:41parallel operations in there like, if you want to index into an array, simultaneously check that the index is an1:06:50integer, check that it doesn't exceed the array bands, and go off and do the memory access, and do all those things simultaneously. And then, later, if they're all OK, actually get the value there.1:07:00So there are a lot of complicated operations in these data paths for making LISP run in parallel. It's a completely non-risk philosophy of evaluating LISP.1:07:10And then, this microcode is pretty complicated. Let's see, there's what? There's about 389 instructions of 220-bit microcode sitting1:07:23here because these are very complicated data paths. And the whole thing has about 89,000 transistors, OK?1:07:33OK. Well, I hope that that takes away a lot of the mystery. Maybe somebody wants to look at this.1:07:42Yeah. OK. Let's stop.1:07:55Questions? AUDIENCE: OK, now, it sounds like what you're saying is that, with the restore continue put in the proper place, that procedures that would invoke a recursive1:08:08process now invoke an integer process just by the way that the eval signature is? PROFESSOR: I think the way I'd prefer to put it is that, with1:08:17restore continue put in the wrong place, you can cause any syntactically-looking recursive procedure, in fact, to build up stack as it runs.1:08:28But there's no reason for that, so you might want to play around with it. You can just switch around two or three instructions in the1:08:38way compound-apply comes back, and you'll get something which isn't tail recursive. But the thing I wanted to emphasize is there's no magic.1:08:47It's not as if there's some very clever pre-processing program that's looking at this procedure, factorial iter, and say oh, gee, I really notice that I don't have to push1:08:59stack in order to do this. Some people think that that's what's going on. It's something much, much more dumb than that, it's this one place you're putting the restore instruction.1:09:08It's just automatic. AUDIENCE: OK. AUDIENCE: But that's not affecting the time complexity is it?1:09:17PROFESSOR: No. AUDIENCE: It's just that it's handling it recursively instead of iteratively. But, in terms of the order of time it takes to finish the1:09:26operation, it's the same one way or the other, right? PROFESSOR: Yes. Tail recursion is not going to change the time complexity of anything because, in some sense, it's the same algorithm that's going on.1:09:36What it's doing is really making this thing run as an iteration, right? Not going to run out of memory counting up to a giant number simply because the stack would get pushed.1:09:47See, the thing you really have to believe is that, when we write-- see, we've been writing all these things called iterations, infinite loops, define loop to be called loop.1:10:01That's is as much an iteration as if we wrote do forever loop, right? It's just syntactic sugar as the difference. These things are real, honest to god, iterations, right?1:10:14They don't change the time complexity, but they turn them into real iterations. All right, thank you.

`0:00:00`Lecture 10A | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC PLAYING]0:00:20PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between all these high-level languages like Lisp and the query0:00:30language and all of that stuff, bridged the gap between that and a conventional register machine. And in fact, you can think of the explicit control evaluator0:00:40either as, say, the code for a Lisp interpreter if you wanted to implement it in the assembly language of some conventional register transfer machine, or, if you like, you0:00:50can think of it as the microcode of some machine that's going to be specially designed to run Lisp. In either case, what we're doing is we're taking a machine that speaks some low-level language, and we're0:01:01raising the machine to a high-level language like Lisp by writing an interpreter. So for instance, here, conceptually, is a special0:01:21purpose machine for computing factorials. It takes in five and puts out 120. And what this special purpose machine is is actually a Lisp0:01:32interpreter that's configured itself to run factorials, because you fit into it a description of the factorial machine.0:01:42So that's what an interpreter is. It configures itself to emulate a machine whose description you read in. Now, inside the Lisp interpreter, what's that?0:01:52Well, that might be your general register language interpreter that configures itself to behave like a Lisp interpreter, because you put in a whole bunch of0:02:01instructions in register language. This is the explicit control evaluator. And then it also has some sort of library, a library of primitive operators and Lisp operations and all sorts of0:02:11things like that. That's the general strategy of interpretation. And the point is, what we're doing is we're writing an interpreter to raise the machine to the level of the0:02:24programs that we want to write. Well, there's another strategy, a different one, which is compilation. Compilation's a little bit different. Here--here we might have produced a special purpose0:02:37machine for, for computing factorials, starting with some sort of machine that speaks register language, except0:02:46we're going to do a different strategy. We take our factorial program. We use that as the source code into a compiler. What the compiler will do is translate that factorial0:02:57program into some register machine language. And this will now be not the explicit control evaluator for Lisp, this will be some register language for computing factorials.0:03:06So this is the translation of that. That will go into some sort of loader which will combine this code with code selected from the library to do things like0:03:17primitive multiplication. And then we'll produce a load module which configures the register language machine to be a special purpose factorial machine.0:03:28So that's a, that's a different strategy. In interpretation, we're raising the machine to the level of our language, like Lisp. In compilation, we're taking our program and lowering it to0:03:38the language that's spoken by the machine. Well, how do these two strategies compare? The compiler can produce code that will execute more0:03:48efficiently. The essential reason for that is that if you think about the register operations that are running, the interpreter has0:04:02to produce register operations which, in principle, are going to be general enough to execute any Lisp procedure. Whereas the compiler only has to worry about producing a0:04:12special bunch of register operations for, for doing the particular Lisp procedure that you've compiled. Or another way to say that is that the interpreter is a0:04:23general purpose simulator, that when you read in a Lisp procedure, then those can simulate the program described by that, by that procedure. So the interpreter is worrying about making a general purpose0:04:33simulator, whereas the compiler, in effect, is configuring the thing to be the machine that the interpreter would have been simulating. So the compiler can be faster.0:04:52On the other hand, the interpreter is a nicer environment for debugging. And the reason for that is that we've got the source code0:05:02actually there. We're interpreting it. That's what we're working with. And we also have the library around. See, the interpreter--the library sitting there is part of the interpreter.0:05:11The compiler only pulls out from the library what it needs to run the program. So if you're in the middle of debugging, and you might like to write a little extra program to examine some run0:05:21time data structure or to produce some computation that you didn't think of when you wrote the program, the interpreter can do that perfectly well, whereas the compiler can't. So there are sort of dual, dual advantages.0:05:31The compiler will produce code that executes faster. The interpreter is a better environment for debugging. And most Lisp systems end up having both, end up being0:05:43configured so you have an interpreter that you use when you're developing your code. Then you can speed it up by compiling. And very often, you can arrange that compiled code and interpreted code can call each other.0:05:54We'll see how to do that. That's not hard. In fact, the way we'll--0:06:04in the compiler we're going to make, the way we'll arrange for compiled coding and interpreted code to call, to call each other, is that we'll have the compiler use exactly the same register conventions as the interpreter.0:06:18Well, the idea of a compiler is very much like the idea of an interpreter or evaluator. It's the same thing.0:06:27See, the evaluator walks over the code and performs some register operations. That's what we did yesterday.0:06:37Well, the compiler essentially would like to walk over the code and produce the register operations that the evaluator would have done were it evaluating the thing.0:06:48And that gives us a model for how to implement a zeroth-order compiler, a very bad compiler but0:06:57essentially a compiler. A model for doing that is you just take the evaluator, you run it over the code, but instead of executing the actual operations, you just save them away.0:07:07And that's your compiled code. So let me give you an example of that. Suppose we're going to compile--suppose we want to compile the expression f of x.0:07:25So let's assume that we've got f of x in the x register and something in the environment register. And now imagine starting up the evaluator.0:07:34Well, it looks at the expression and it sees that it's an application. And it branches to a place in the evaluator code we saw0:07:43called ev-application. And then it begins. It stores away the operands and unev, and then it's going to put the operator in exp, and it's going to go0:07:53recursively evaluate it. That's the process that we walk through. And if you start looking at the code, you start seeing some register operations. You see assign to unev the operands, assign to exp the0:08:03operator, save the environment, generate that, and so on. Well, if we look on the overhead here, we can see, we0:08:16can see those operations starting to be produced. Here's sort of the first real operation that the evaluator would have done. It pulls the operands out of the exp register and assigns0:08:27it to unev. And then it assigns something to the expression register, and it saves continue, and it saves env. And all I'm doing here is writing down the register0:08:38assignments that the evaluator would have done in executing that code. And can zoom out a little bit. Altogether, there are about 19 operations there.0:08:49And this is the--this will be the piece of code up until the point where the evaluator branches off to apply-dispatch. And in fact, in this compiler, we're not going to worry about0:09:00apply-dispatch at all. We're going to have everything--we're going to have both interpreted code and compiled code. Always evaluate procedures, always apply procedures by going to apply-dispatch.0:09:10That will easily allow interpreted code and compiled code to call each other. Well, in principle, that's all we need to do.0:09:21You just run the evaluator. So the compiler's a lot like the evaluator. You run it, except it stashes away these operations instead of actually executing them. Well, that's not, that's not quite true.0:09:32There's only one little lie in that. What you have to worry about is if you have a, a predicate. If you have some kind of test you want to do, obviously, at0:09:44the point when you're compiling it, you don't know which branch of these--of a conditional like this you're going to do. So you can't say which one the evaluator would have done.0:09:55So all you do there is very simple. You compile both branches. So you compile a structure that looks like this. That'll compile into something that says, the code, the code0:10:08for P. And it puts its results in, say, the val register.0:10:18So you walk the interpreter over the predicate and make sure that the result would go into the val register. And then you compile an instruction that says, branch0:10:30if, if val is true, to a place we'll call label one.0:10:44Then we, we will put the code for B to walk the interpreter--walk the interpreter over B. And then0:10:54go to put in an instruction that says, go to the next thing, whatever, whatever was supposed to happen after this0:11:03thing was done. You put in that instruction. And here you put label one. And here you put the code for A. And you0:11:19put go to next thing.0:11:31So that's how you treat a conditional. You generate a little block like that. And other than that, this zeroth-order compiler is the0:11:40same as the evaluator. It's just stashing away the instructions instead of executing them. That seems pretty simple, but we've gained something by that.0:11:50See, already that's going to be more efficient than the evaluator. Because, if you watch the evaluator run, it's not only generating the register operations we wrote down, it's0:12:01also doing things to decide which ones to generate. So the very first thing it does, say, here for instance, is go do some tests and decide that this is an application,0:12:13and then branch off to the place that, that handles applications. In other words, what the evaluator's doing is simultaneously analyzing the code to see what to do, and0:12:23running these operations. And when you-- if you run the evaluator a million times, that analysis phase happens a million times, whereas in the compiler, it's happened once, and then you just have the register0:12:33operations themselves. Ok, that's a, a zeroth-order compiler, but it is a0:12:42wretched, wretched compiler. It's really dumb. Let's--let's go back and, and look at this overhead.0:12:52So look at look at some of the operations this thing is doing. We're supposedly looking at the operations and0:13:01interpreting f of x. Now, look here what it's doing. For example, here it assigns to exp the0:13:10operator in fetch of exp. But see, there's no reason to do that, because this is-- the compiler knows that the operator, fetch of exp, is f0:13:21right here. So there's no reason why this instruction should say that. It should say, we'll assign to exp, f. Or in fact, you don't need exp at all.0:13:32There's no reason it should have exp at all. What, what did exp get used for? Well, if we come down here, we're going to assign to val,0:13:43look up the stuff in exp in the environment. So what we really should do is get rid of the exp register altogether, and just change this instruction to say,0:13:53assign to val, look up the variable value of the symbol f in the environment. Similarly, back up here, we don't need unev at all,0:14:04because we know what the operands of fetch of exp are for this piece of code. It's the, it's the list x.0:14:13So in some sense, you don't want unev and exp at all. See, what they really are in some sense, those aren't0:14:22registers of the actual machine that's supposed to run. Those are registers that have to do with arranging the thing that can simulate that machine. So they're always going to hold expressions which, from0:14:34the compiler's point of view, are just constants, so can be put right into the code. So you can forget about all the operations worrying about exp and unev and just use those constants.0:14:44Similarly, again, if we go, go back and look here, there are things like assign to continue eval-args.0:14:53Now, that has nothing to do with anything. That was just the evaluator keeping track of where it should go next, to evaluate the arguments in some, in some0:15:05application. But of course, that's irrelevant to the compiler, because you-- the analysis phase will have already done that.0:15:15So this is completely irrelevant. So a lot of these, these assignments to continue have not to do where the running machine is supposed to0:15:24continue in keeping track of its state. It has to, to do with where the evaluator analysis should continue, and those are completely irrelevant. So we can get rid of them.0:15:44Ok, well, if we, if we simply do that, make those kinds of optimizations, get rid, get rid of worrying about exp and unev, and get rid of these irrelevant register0:15:55assignments to continue, then we can take this literal code, these sort of 19 instructions that the, that the evaluator0:16:05would have done, and then replace them. Let's look at the, at the slide. Replace them by--we get rid of about half of them.0:16:18And again, this is just sort of filtering what the evaluator would have done by getting rid of the irrelevant stuff. And you see, for instance, here the--where the evaluator0:16:29said, assign val, look up variable value, fetch of exp, here we have put in the constant f. Here we've put in the constant x.0:16:39So there's a, there's a little better compiler. It's still pretty dumb. It's still doing a lot of dumb things.0:16:50Again, if we go look at the slide again, look at the very beginning here, we see a save the environment, assign0:17:00something to the val register, and restore the environment. Where'd that come from? That came from the evaluator back here saying, oh, I'm in the middle of evaluating an application.0:17:11So I'm going to recursively call eval dispatch. So I'd better save the thing I'm going to need later, which is the environment. This was the result of recursively0:17:21calling eval dispatch. It was evaluating the symbol f in that case. Then it came back from eval dispatch, restored the environment.0:17:31But in fact, the actual thing it ended up doing in the evaluation is not going to hurt the environment at all. So there's no reason to be saving the environment and0:17:40restoring the environment here. Similarly, here I'm saving the argument list. That's a piece0:17:53of the argument evaluation loop, saving the argument list, and here you restore it. But the actual thing that you ended up doing didn't trash the argument list. So there was no reason to save it.0:18:08So another way to say, another way to say that is that the, the evaluator has to be maximally pessimistic, because0:18:19as far from its point of view it's just going off to evaluate something. So it better save what it's going to need later. But once you've done the analysis, the compiler is in a0:18:28position to say, well, what actually did I need to save? And doesn't need to do any-- it doesn't need to be as careful as the evaluator, because it knows what it0:18:38actually needs. Well, in any case, if we do that and eliminate all those redundant saves and restores, then we can0:18:48get it down to this. And you see there are actually only three instructions that we actually need, down from the initial 11 or so, or the initial 20 or so in the original one.0:19:00And that's just saying, of those register operations, which ones did we actually need?0:19:09Let me just sort of summarize that in another way, just to show you in a little better picture. Here's a picture of starting--0:19:18This is looking at all the saves and restores. So here's the expression, f of x, and then this traces through, on the bottom here, the various places in the0:19:30evaluator that were passed when the evaluation happened. And then here, here you see arrows.0:19:40Arrow down means register saved. So the first thing that happened is the environment got saved. And over here, the environment got restored.0:19:52And these-- so there are all the pairs of stack operations. Now, if you go ahead and say, well, let's remember that we don't--that unev, for instance, is a completely0:20:02useless register. And if we use the constant structure of the code, well, we don't need, we don't need to save unev. We don't need0:20:11unev at all. And then, depending on how we set up the discipline of the--of calling other things that apply, we may or may not0:20:22need to save continue. That's the first step I did. And then we can look and see what's actually, what's actually needed.0:20:32See, we don't-- didn't really need to save env or cross-evaluating f, because it wouldn't, it wouldn't trash it. So if we take advantage of that, and see the evaluation0:20:46of f here, doesn't really need to worry about, about hurting env. And similarly, the evaluation of x here, when the0:20:57evaluator did that it said, oh, I'd better preserve the function register around that, because I might need it later. And I better preserve the argument list.0:21:07Whereas the compiler is now in a position to know, well, we didn't really need to save-- to do those saves and restores. So in fact, all of the stack operations done by the evaluator turned out to be unnecessary or overly0:21:18pessimistic. And the compiler is in a position to know that.0:21:27Well that's the basic idea. We take the evaluator, we eliminate the things that you don't need, that in some sense have nothing to do with the compiler at all, just the evaluator, and then you see0:21:38which stack operations are unnecessary. That's the basic structure of the compiler that's described in the book. Let me just show you how that examples a0:21:48little bit too simple. To see how you, how you actually save a lot, let's look at a little bit more complicated expression.0:21:58F of G of X and 1. And I'm not going to go through all the code. There's a, there's a fair pile of it.0:22:09I think there are, there are something like 16 pairs of register saves and restores as the evaluator walks through that. Here's a diagram of them.0:22:20Let's see. You see what's going on. You start out by--the evaluator says, oh, I'm about to do an application. I'll preserve the environment. I'll restore it here.0:22:30Then I'm about to do the first operand. Here it recursively goes to the evaluator. The evaluator says, oh, this is an application, I'll save0:22:41the environment, do the operator of that combination, restore it here. This save--this restore matches that save. And so on.0:22:51There's unev here, which turns out to be completely unnecessary, continues getting bumped around here. The function register is getting, getting saved across0:23:01the first operands, across the operands. All sorts of things are going on. But if you say, well, what of those really were the business of the compiler as opposed to the evaluator, you get rid of0:23:12a whole bunch. And then on top of that, if you say things like, the evaluation of F doesn't hurt the environment register, or0:23:24simply looking up the symbol X, you don't have to protect the function register against that.0:23:34So you come down to just a couple of, a couple of pairs here. And still, you can do a little better. Look what's going on here with the environment register.0:23:44The environment register comes along and says, oh, here's a combination.0:23:54This evaluator, by the way, doesn't know anything about G. So here it says, so it says, I'd better save the environment register, because evaluating G might be some0:24:05arbitrary piece of code that would trash it, and I'm going to need it later, after this argument, for doing the second argument.0:24:15So that's why this one didn't go away, because the compiler made no assumptions about what G would do. On the other hand, if you look at what the second argument0:24:26is, that's just looking up one. That doesn't need this environment register. So there's no reason to save it. So in fact, you can get rid of that one, too.0:24:35And from this whole pile of, of register operations, if you simply do a little bit of reasoning like that, you get down to, I think, just two pairs of saves and restores.0:24:45And those, in fact, could go away further if you, if you knew something about G.0:24:56So again, the general idea is that the reason the compiler can be better is that the interpreter doesn't know what it's about to encounter. It has to be maximally pessimistic in saving things0:25:05to protect itself. The compiler only has to deal with what actually had to be saved. And there are two reasons that something might0:25:15not have to be saved. One is that what you're protecting it against, in fact, didn't trash the register, like it was just a variable look-up.0:25:24And the other one is, that the thing that you were saving it for might turn out not to actually need it. So those are the two basic pieces of knowledge that the0:25:34compiler can take advantage of in making the code more efficient.0:25:44Let's break for questions. AUDIENCE: You kept saying that the uneval register, unev0:25:54register didn't need to be used at all. Does that mean that you could just map a six-register machine? Or is that, in this particular example, it didn't need to be used? PROFESSOR: For the compiler, you could generate code for0:26:05the six-register, five, right? Because that exp goes away also. Assuming--yeah, you can get rid of both exp and unev,0:26:14because, see, those are data structures of the evaluator. Those are all things that would be constants from the point of view of the compiler. The only thing is this particular compiler is set up0:26:24so that interpreted code and compiled code can coexist. So the way to think about it is, is maybe you build a chip0:26:34which is the evaluator, and what the compiler might do is generate code for that chip. It just wouldn't use two of the registers.0:26:51All right, let's take a break. [MUSIC PLAYING]0:27:28We just looked at what the compiler is supposed to do. Now let's very briefly look at how, how this gets accomplished.0:27:38And I'm going to give no details. There's, there's a giant pile of code in the book that gives all the details. But what I want to do is just show you the, the essential idea here.0:27:49Worry about the details some other time. Let's imagine that we're compiling an expression that looks like there's some operator, and there are two arguments.0:28:03Now, the-- what's the code that the compiler should generate? Well, first of all, it should recursively go off and compile0:28:12the operator. So it says, I'll compile the operator.0:28:21And where I'm going to need that is to be in the function register, eventually. So I'll compile some instructions that will compile0:28:30the operator and end up with the result in the function register.0:28:45The next thing it's going to do, another piece is to say, well, I have to compile the first argument.0:28:55So it calls itself recursively. And let's say the result will go into val.0:29:09And then what it's going to need to do is start setting up the argument list. So it'll say, assign to argl cons of0:29:25fetch-- so it generates this literal instruction-- fetch of val onto empty list.0:29:35However, it might have to work-- when it gets here, it's going to need the environment. It's going to need whatever environment was here in order0:29:45to do this evaluation of the first argument. So it has to ensure that the compilation of this operand,0:29:54or it has to protect the function register against whatever might happen in the compilation of this operand. So it puts a note here and says, oh, this piece should be0:30:04done preserving the environment register.0:30:17Similarly, here, after it gets done compiling the first operand, it's going to say, I better compile-- I'm going to need to know the environment0:30:26for the second operand. So it puts a little note here, saying, yeah, this is also done preserving env. Now it goes on and says, well, the0:30:41next chunk of code is the one that's going to compile the second argument.0:30:50And let's say it'll compile it with a targeted to val, as they say.0:31:03And then it'll generate the literal instruction, building up the argument list. So it'll say, assign to argl cons of0:31:20the new value it just got onto the old argument list.0:31:34However, in order to have the old argument list, it better have arranged that the argument list didn't get trashed by whatever happened in here.0:31:43So it puts a little note here and says, oh, this has to be done preserving argl.0:31:54Now it's got the argument list set up. And it's all ready to go to apply dispatch.0:32:06It generates this literal instruction. Because now it's got the arguments in argl and the0:32:19operator in fun, but wait, it's only got the operator in fun if it had ensured that this block of code didn't trash what was in the function register.0:32:29So it puts a little note here and says, oh, yes, all this stuff here had better be done preserving0:32:39the function register. So that's the little--so when it starts ticking--so basically, what the compiler does is append a whole bunch0:32:51of code sequences. See, what it's got in it is little primitive pieces of things, like how to look up a symbol, how to do a0:33:01conditional. Those are all little pieces of things. And then it appends them together in this sort of discipline. So the basic means of combining things is to append0:33:11two code sequences.0:33:21That's what's going on here. And it's a little bit tricky. The idea is that it appends two code sequences, taking0:33:32care to preserve a register. So the actual append operation looks like this. What it wants to do is say, if--0:33:41here's what it means to append two code sequences. So if sequence one needs register--0:33:53I should change this. Append sequence one to sequence two, preserving some register.0:34:08Let me say, and. So it's clear that sequence one comes first. So if sequence two needs the register and sequence one0:34:26modifies the register, then the instructions that the0:34:35compiler spits out are, save the register. Here's the code.0:34:44You generate this code. Save the register, and then you put out the recursively compiled stuff for sequence one.0:34:53And then you restore the register. And then you put out the recursively compiled stuff for0:35:04sequence two. That's in the case where you need to do it. Sequence two actually needs the register, and sequence one actually clobbers it.0:35:15So that's sort of if. Otherwise, all you spit out is sequence one followed by0:35:25sequence two. So that's the basic operation for sticking together these bits of code fragments, these bits of0:35:34instructions into a sequence. And you see, from this point of view, the difference between the interpreter and the compiler, in some sense,0:35:46is that where the compiler has these preserving notes, and says, maybe I'll actually generate the saves and restores and maybe I won't, the interpreter being0:35:56maximally pessimistic always has a save and restore here. That's the essential difference. Well, in order to do this, of course, the compiler needs0:36:07some theory of what code sequences need and modifier registers. So the tiny little fragments that you put in, like the0:36:17basic primitive code fragments, say, what are the operations that you do when you look up a variable?0:36:27What are the sequence of things that you do when you compile a constant or apply a function? Those have little notations in there about what they need and what they modify.0:36:38So the bottom-level data structures-- Well, I'll say this. A code sequence to the compiler looks like this.0:36:48It has the actual sequence of instructions. And then, along with it, there's the set0:37:00of registers modified.0:37:10And then there's the set of registers needed.0:37:19So that's the information the compiler has that it draws on in order to be able to do this operation.0:37:29And where do those come from? Well, those come from, you might expect, for the very primitive ones, we're going to put them in by hand. And then, when we combine two sequences, we'll figure out0:37:39what these things should be. So for example, a very primitive one, let's see.0:37:48How about doing a register assignment. So a primitive sequence might say, oh, it's code fragment. Its code instruction is assigned to R1, fetch of R2.0:38:03So this is an example. That might be an example of a sequence of instructions. And along with that, it'll say, oh, what I need to0:38:13remember is that that modifies R1, and then it needs R2.0:38:24So when you're first building this compiler, you put in little fragments of stuff like that. And now, when it combines two sequences, if I'm going to0:38:37combine, let's say, sequence one, that modifies a bunch of registers M1, and needs a bunch of registers N1.0:38:54And I'm going to combine that with sequence two. That modifies a bunch of registers M2, and needs a0:39:07bunch of registers N2. Then, well, we can reason it out. The new code fragment, sequence one, and--0:39:20followed by sequence two, well, what's it going to modify? The things that it will modify are the things that are0:39:29modified either by sequence one or sequence two. So the union of these two sets are what0:39:38the new thing modifies. And then you say, well, what is this--what registers is it going to need?0:39:47It's going to need the things that are, first of all, needed by sequence one. So what it needs is sequence one. And then, well, not quite all of the ones that are needed by0:39:58sequence one. What it needs are the ones that are needed by sequence two that have not been set up by sequence one.0:40:08So it's sort of the union of the things that sequence two needs minus the ones that sequence one modifies.0:40:19Because it worries about setting them up. So there's the basic structure of the compiler. The way you do register optimizations is you have some0:40:30strategies for what needs to be preserved. That depends on a data structure. Well, it depends on the operation of what it means to put things together.0:40:39Preserving something, that depends on knowing what registers are needed and modified by these code fragments.0:40:48That depends on having little data structures, which say, a code sequence is the actual instructions, what they modify and what they need.0:40:57That comes from, at the primitive level, building it in. At the primitive level, it's going to be completely obvious what something needs and modifies. Plus, this particular way that says, when I build up bigger0:41:08ones, here's how I generate the new set of registers modified and the new set of registers needed. And that's the whole-- well, I shouldn't say that's the whole thing.0:41:17That's the whole thing except for about 30 pages of details in the book. But it is a perfectly usable rudimentary compiler.0:41:28Let me kind of show you what it does. Suppose we start out with recursive factorial. And these slides are going to be much too small to read.0:41:38I just want to flash through the code and show you about how much it is. That starts out with--here's a first block of it, where it compiles a procedure entry and does a bunch of assignments.0:41:48And this thing is basically up through the part where it sets up to do the predicate and test whether the predicate's true. The second part is what results from--0:41:59in the recursive call to fact of n minus one. And this last part is coming back from that and then taking0:42:08care of the constant case. So that's about how much code it would produce for factorial. We could make this compiler much, much better, of course.0:42:18The main way we could make it better is to allow the compiler to make any assumptions at all about what happens when you call a procedure. So this compiler, for instance, doesn't even know,0:42:30say, that multiplication is something that could be coded in line. Instead, it sets up this whole mechanism. It goes to apply-dispatch.0:42:41That's a tremendous waste, because what you do every time you go to apply-dispatch is you have to concept this argument list, because it's a very general thing you're going to. In any real compiler, of course, you're going to have0:42:51registers for holding arguments. And you're going to start preserving and saving the way you use those registers similar to the0:43:00same strategy here. So that's probably the very main way that this particular compiler in the book could be fixed. There are other things like looking up variable values and0:43:12making more efficient primitive operations and all sorts of things. Essentially, a good Lisp compiler can absorb an arbitrary amount of effort. And probably one of the reasons that Lisp is slow with0:43:23compared to languages like FORTRAN is that, if you look over history at the amount of effort that's gone into building Lisp compilers, it's nowhere near the amount of0:43:32effort that's gone into FORTRAN compilers. And maybe that's something that will change over the next couple of years. OK, let's break.0:43:43Questions? AUDIENCE: One of the very first classes-- I don't know if it was during class or after class- you0:43:52showed me the, say, addition has a primitive that we don't see, and-percent add or something like that. Is that because, if you're doing inline code you'd want0:44:03to just do it for two operators, operands? But if you had more operands, you'd want to do something special?0:44:12PROFESSOR: Yeah, you're looking in the actual scheme implementation. There's a plus, and a plus is some operator. And then if you go look inside the code for plus, you see something called--0:44:21I forget-- and-percent plus or something like that. And what's going on there is that particular kind of optimization. Because, see, general plus takes an0:44:30arbitrary number of arguments. So the most general plus says, oh, if I have an argument list, I'd better cons it up in some list and then figure out0:44:42how many there were or something like that. That's terribly inefficient, especially since most of the time you're probably adding two numbers. You don't want to really have to cons this argument list. So0:44:52what you'd like to do is build the code for plus with a bunch of entries. So most of what it's doing is the same. However, there might be a special entry that you'd go to0:45:02if you knew there were only two arguments. And those you'll put in registers. They won't be in an argument list and you won't have to [UNINTELLIGIBLE]. That's how a lot of these things work.0:45:12OK, let's take a break. [MUSIC PLAYING]

`0:00:00`Lecture 10B | MIT 6.001 Structure and Interpretation, 1986

0:00:00 [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:00:18PROFESSOR: Well, there's one bit of mystery left, which I'd like to get rid of right now. And that's that we've been blithely doing things like0:00:28cons assuming there's always another one. That we've been doing these things like car-ing and0:00:37cdr-ing and assuming that we had some idea how this can be done. Now indeed we said that that's equivalent to having procedures. But that doesn't really solve the problem, because the0:00:48procedure need all sorts of complicated mechanisms like environment structures and things like that to work. And those were ultimately made out of conses in the model that we had, so that really doesn't solve the problem.0:00:59Now the problem here is the glue the data structure's made out of. What kind of possible thing could it be? We've been showing you things like a machine, a computer0:01:11that has a controller, and some registers, and maybe a stack. And we haven't said anything about, for example, larger memory.0:01:20And I think that's what we have to worry about right now. But just to make it perfectly clear that this is an inessential, purely implementational thing, I'd0:01:31like to show you, for example, how you can do it all with the numbers. That's an easy one. Famous fellow by the name of Godel, a logician at the end0:01:45of the 1930s, invented a very clever way of encoding the complicated expressions as numbers.0:01:54For example-- I'm not saying exactly what Godel's scheme is, because he didn't use words like cons. He had other kinds of ways of combining to make expressions.0:02:03But he said, I'm going to assign a number to every algebraic expression. And the way I'm going to manufacture these numbers is by combining the numbers of the parts.0:02:12So for example, what we were doing our world, we could say that if objects are represented by numbers, then0:02:34cons of x and y could be represented by 2 to the x times 2 to the y.0:02:46Because then we could extract the parts. We could say, for example, that then car of, say, x is0:02:57the number of factors of 2 in x.0:03:06And of course cdr is the same thing. It's the number of factors of 3 in x.0:03:16Now this is a perfectly reasonable scheme, except for the fact that the numbers rapidly get to be much larger in number of digits than the number of0:03:25protons in the universe. So there's no easy way to use this scheme other than the theoretical one. On the other hand, there are other ways of representing0:03:37these things. We have been thinking in terms of little boxes. We've been thinking about our cons structures as looking0:03:47sort of like this. They're little pigeon holes with things in them. And of course we arrange them in little trees.0:03:57I wish that the semiconductor manufacturers would supply me with something appropriate for this, but actually what they do supply me with is a linear memory.0:04:09Memory is sort of a big pile of pigeonholes, pigeonholes like this. Each of which can hold a certain sized object, a fixed0:04:21size object. So, for example, a complicated list with 25 elements won't fit in one of these. However, each of these is indexed by an address.0:04:33So the address might be zero here, one here, two here, three here, and so on. That we write these down as numbers is unimportant. What matters is that they're distinct as a way to get to0:04:42the next one. And inside of each of these, we can stuff something into these pigeonholes. That's what memory is like, for those of you who haven't0:04:52built a computer. Now the problem is how are we going to impose on this type of structure, this nice tree structure.0:05:03Well it's not very hard, and there have been numerous schemes involved in this. The most important one is to say, well assuming that the semiconductor manufacturer allows me to arrange my memory0:05:13so that one of these pigeonholes is big enough to hold the address of another I haven't made. Now it actually has to be a little bit bigger because I0:05:23have to also install or store some information as to a tag which describes the kind of thing that's there. And we'll see that in a second.0:05:32And of course if the semiconductor manufacturer doesn't arrange it so I can do that, then of course I can, with some cleverness, arrange combinations of these to fit together in that way.0:05:43So we're going to have to imagine imposing this complicated tree structure on our nice linear memory. If we look at the first still store, we see a classic scheme0:05:57for doing that. It's a standard way of representing Lisp structures in a linear memory. What we do is we divide this memory into two parts.0:06:12An array called the cars, and an array called the cdrs. Now whether those happen to be sequential addresses or whatever, it's not important.0:06:22That's somebody's implementation details. But there are two arrays here. Linear arrays indexed by sequential indices like this.0:06:34What is stored in each of these pigeonholes is a typed object. And what we have here are types which begin with letters0:06:44like p, standing for a pair. Or n, standing for a number. Or e, standing for an empty list. The end of the list. And0:06:57so if we wish to represent an object like this, the list beginning with 1, 2 and then having a 3 and a 4 as its second and third elements.0:07:06A list containing a list as its first part and then two numbers as a second and third parts. Then of course we draw it sort of like this these days, in0:07:15box-and-pointer notation. And you see, these are the three cells that have as their car pointer the object which is either 1, 2 or 3 or 4.0:07:28And then of course the 1, 2, the car of this entire structure, is itself a substructure which contains a sublist like that. What I'm about to do is put down places which are--0:07:39I'm going to assign indices. Like this 1, over here, represents the index of this cell.0:07:49But that pointer that we see here is a reference to the pair of pigeonholes in the cars and the cdrs that are labeled by 1 in my linear memory down here.0:08:02So if I wish to impose this structure on my linear memory, what I do is I say, oh yes, why don't we drop this into cell 1?0:08:12I pick one. There's 1. And that says that its car, I'm going to assign it to be a pair. It's a pair, which is in index 5.0:08:22And the cdr, which is this one over here, is a pair which I'm going to stick into place 2. p2. And take a look at p2.0:08:32Oh yes, well p2 is a thing whose car is the number 3, so as you see, an n3. And whose cdr, over here, is a pair, which lives in place 4.0:08:46So that's what this p4 is. p4 is a number whose value is 4 in its car and whose cdr is0:08:56an empty list right there. And that ends it. So this is the traditional way of representing this kind of0:09:05binary tree in a linear memory. Now the next question, of course, that we might want to0:09:15worry about is just a little bit of implementation. That means that when I write procedures of the form assigned a, [UNINTELLIGIBLE] procedures--0:09:24lines of register machine code of the form assigned a, the car of [UNINTELLIGIBLE] b, what I really mean is addressing these elements.0:09:38And so we're going to think of that as a abbreviation for it. Now of course in order to write that down I'm going to introduce some sort of a structure called a vector.0:09:52And we're going to have something which will reference a vector, just so we can write it down. Which takes the name of the vector, or the--0:10:02I don't think that name is the right word. Which takes the vector and the index, and I have to have a0:10:12way of setting one of those with something called a vector set, I don't really care. But let's look, for example, at then that kind of implementation of car and cdr.0:10:26So for example if I happen to have a register b, which contains the type index of a pair, and therefore it is the0:10:37pointer to a pair, then I could take the car of that and if I-- write this down-- I might put that in register a. What that really is is a representation of the assign0:10:49to a, the value of vector reffing-- or array indexing, if you will-- or something, the cars object--0:10:58whatever that is-- with the index, b. And similarly for cdr. And we can do the same thing for assignment to data structures, if we need to do that sort of0:11:10thing at all. It's not too hard to build that. Well now the next question is how are we going to do allocation. And every so often I say I want a cons.0:11:21Now conses don't grow on trees. Or maybe they should. But I have to have some way of getting the next one. I have to have some idea of if their memory is unused that I0:11:33might want to allocate from. And there are many schemes for doing this. And the particular thing I'm showing you right now is not essential.0:11:42However it's convenient and has been done many times. One scheme's was called the free list allocation scheme. What that means is that all of the free memory that there is in the world is linked together in a linked list,0:11:54just like all the other stuff. And whenever you need a free cell to make a new cons, you grab the first, one make the free list be the cdr of it,0:12:04and then allocate that. And so what that looks like is something like this. Here we have the free list starting in 6.0:12:18And what that is is a pointer-off to say 8. So what it says is, this one is free and the0:12:27next one is an 8. This one is free and the next one is in 3, the next one that's free. That one's free and the next one is in 0.0:12:37That one's free and the next one's in 15. Something like that. We can imagine having such a structure.0:12:46Given that we have something like that, then it's possible to just get one when you need it. And so a program for doing cons, this is what0:12:57cons might turn into. To assign to a register A the result of cons-ing, a B onto C, the value in this containing B and the value0:13:08containing C, what we have to do is get the current [? type ?] ahead of the freelist, make the free list be its cdr. Then we have to change the cars to be the0:13:19thing we're making up to be in A to be the B, the thing in B. And we have to make change the cdrs of the thing that's in A0:13:30to be C. And then what we have in A is the right new frob, whatever it is. The object that we want.0:13:40Now there's a little bit of a cheat here that I haven't told you about, which is somewhere around here I haven't set that I've the type of the thing that I'm cons-ing up to be a0:13:51pair, and I ought to. So there should be some sort of bits here are being set, and I just haven't written that down. We could have arranged it, of course, for the free lift to0:14:01be made out of pairs. And so then there's no problem with that. But that sort of-- again, an inessential detail in a way0:14:10some particular programmer or architect or whatever might manufacture his machine or Lisp system. So for example, just looking at this, to allocate given0:14:23that I had already the structure that you saw before, supposing I wanted to allocate a new cell, which is going to be representation of list one, one, two, where already one0:14:38two was the car of the list we were playing with before. Well that's not so hard. I stored that one and one, so p1 one is the0:14:47representation of this. This is p5. That's going to be the cdr of this. Now we're going to pull something off the free list, but remember the free list started at six.0:14:57The new free list after this allocation is eight, a free list beginning at eight. And of course in six now we have a number one, which is0:15:06what we wanted, with its cdr being the pair starting in location five. And that's no big deal.0:15:16So the only problem really remaining here is, well, I don't have an infinitely large memory.0:15:25If I do this for a little while, say, for example, supposing it takes me a microsecond to do a cons, and I have a million cons memory then I'm only going to run out0:15:34in a second, and that's pretty bad. So what we do to prevent that disaster, that ecological disaster, talk about right after questions.0:15:44Are there any questions? Yes. AUDIENCE: In the environment diagrams that we were drawing0:15:54we would use the body of procedures, and you would eventually wind up with things that were no longer useful in that structure.0:16:04How is that represented? PROFESSOR: There's two problems here. One you were asking is that material becomes useless.0:16:13We'll talk about that in a second. That has to do with how to prevent ecological disasters. If I make a lot of garbage I have to somehow be able to clean up after myself. And we'll talk about that in a second.0:16:23The other question you're asking is how you represent the environments, I think. AUDIENCE: Yes. PROFESSOR: OK. And the environment structures can be represented in arbitrary ways. There are lots of them. I mean, here I'm just telling you about list cells.0:16:33Of course every real system has vectors of arbitrary length as well as the vectors of length, too, which represent list cells. And the environment structures that one uses in a0:16:45professionally written Lisp system tend to be vectors which contain a number of elements approximately equal to the number of arguments-- a little bit more because you0:16:56need certain glue. So remember, the environment [UNINTELLIGIBLE] frames. The frames are constructed by applying a procedure. In doing so, an allocation is made of a place which is the0:17:08number of arguments long plus [? unglue ?] that gets linked into a chain. It's just like algol at that level.0:17:19There any other questions? OK. Thank you, and let's take a short break. [MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:18:12PROFESSOR: Well, as I just said, computer memories supplied by the semiconductor manufacturers are finite. And that's quite a pity.0:18:21It might not always be that way. Just for a quick calculation, you can see that it's possible that if [? memory ?] prices keep going at the rate they're going that if you0:18:32still took a microsecond second to do a cons, then-- first of all, everybody should know that there's about pi times ten to the seventh seconds in a year. And so that would be ten to the seventh plus ten to the0:18:42sixth is ten to the thirteenth. So there's maybe ten to the fourteenth conses in the life of a machine. If there was ten to the fourteenth words of memory on your machine, you'd never run out.0:18:54And that's not completely unreasonable. Ten to the fourteenth is not a very large number.0:19:03I don't think it is. But then again I like to play with astronomy. It's at least ten to the eighteenth centimeters between us and the nearest star.0:19:12But the thing I'm about to worry about is, at least in the current economic state of affairs, ten to the fourteenth0:19:22pieces of memory is expensive. And so I suppose what we have to do is make do with much smaller. Memories Now in general we want to have an illusion of infinity.0:19:35All we need to do is arrange it so that whenever you look, the thing is there. That's really an important idea.0:19:49A person or a computer lives only a finite amount of time and can only take a finite number of looks at something. And so you really only need a finite amount of stuff.0:19:58But you have to arrange it so no matter how much there is, how much you really claim there is, there's always enough stuff so that when you take a look, it's there. And so you only need a finite amount.0:20:08But let's see. One problem is, as was brought up, that there are possible ways that there is lots of stuff that we make that we0:20:18don't need. And we could recycle the material out of which its made. An example is the fact that we're building environment0:20:27structures, and we do so every time we call a procedure. We have built in it a environment frame. That environment frame doesn't necessarily have a very long lifetime.0:20:36Its lifetime, meaning its usefulness, may exist only over the invocation of the procedure. Or if the procedure exports another procedure by returning0:20:45it as a value and that procedure is defined inside of it, well then the lifetime of the frame of the outer procedure still is only the lifetime of the procedure0:20:57which was exported. And so ultimately, a lot of that is garbage. There are other ways of producing garbage as well. Users produce garbage.0:21:07An example of user garbage is something like this. If we write a program to, for example, append two lists together, well one way to do it is to reverse the first0:21:19list onto the empty list and reverse that onto the second list. Now that's not terribly bad way of doing it.0:21:28And however, the intermediate result, which is the reversal of the first list as done by this program, is never going0:21:37to be accessed ever again after it's copied back on to the second. It's an intermediate result. It's going to be hard to ever see how anybody would ever be0:21:47able to access it. In fact, it will go away. Now if we make a lot of garbage like that, and we should be allowed to, then there's got to be some way to0:21:56reclaim that garbage. Well, what I'd like to tell you about now is a very clever technique whereby a Lisp system can prove a small0:22:09theorem every so often on the [? forum, ?] the following piece of junk will never be accessed again. It can have no affect on the future of the computation.0:22:21It's actually based on a very simple idea. We've designed our computers to look sort of like this. There's some data path, which contains the registers.0:22:35There are things like x, and env, and val, and so on. And there's one here called stack, some sort which points0:22:47off to a structure somewhere, which is the stack. And we'll worry about that in a second. There's some finite controller, finite state machine controller.0:22:56And there's some control signals that go this way and predicate results that come this way, not the interesting part. There's some sort of structured memory, which I0:23:07just told you how to make, which may contain a stack. I didn't tell you how to make things of arbitrary shape, only pairs. But in fact with what I've told you can simulate a stack0:23:16by a big list. I don't plan to do that, it's not a nice way to do it. But we could have something like that. We have all sorts of little data structures in here that0:23:25are hooked together in funny ways. They connect to other things. And so on. And ultimately things up there are pointers to these.0:23:37The things that are in the registers are pointers off to the data structures that live in this Lisp structure memory. Now the truth of the matter is that the entire consciousness0:23:52of this machine is in these registers. There is no possible way that the machine, if done correctly, if built correctly, can access anything in this0:24:02Lisp structure memory unless the thing in that Lisp structure memory is connected by a sequence of data structures to the registers.0:24:15If it's accessible by legitimate data structure selectors from the pointers that are stored in these registers. Things like array references, perhaps.0:24:24Or cons cell references, cars and cdrs. But I can't just talk about a random place in this memory, because I can't get to it. These are being arbitrary names I'm not allowed to0:24:34count, at least as I'm evaluating expressions. If that's the case then there's a very simple theorem0:24:44to be proved. Which is, if I start with all lead pointers that are in all these registers and recursively chase out, marking0:24:53all the places I can get to by selectors, then eventually I mark everything they can be gotten to. Anything which is not so marked is0:25:02garbage and can be recycled. Very simple. Cannot affect the future of the computation.0:25:11So let me show you that in a particular example. Now that means I'm going to have to append to my description of the list structure a mark.0:25:23And so here, for example, is a Lisp structured memory. And in this Lisp structured memory is a Lisp structure beginning in a place I'm going to call--0:25:35this is the root. Now it doesn't really have to have a root. It could be a bunch of them, like all the registers. But I could cleverly arrange it so all the registers, all0:25:45the things that are in old registers are also at the right moment put into this root structure, and then we've got one pointer to it. I don't really care.0:25:54So the idea is we're going to cons up stuff until our free list is empty. We've run out of things. Now we're going to do this process of proving the theorem0:26:04that a certain percentage of the memory has got crap in it. And then we're going to recycle that to grow new trees, a standard use of such garbage.0:26:17So in any case, what do we have here? Well we have some data structure which starts out over here one.0:26:27And in fact it has a car in five, and its cdr is in two. And all the marks start out at zero.0:26:36Well let's start marking, just to play this game. OK. So for example, since I can access one from the root I0:26:47will mark that. Let me mark it. Bang. That's marked. Now since I have a five here I can go to five and see, well0:27:00I'll mark that. Bang. That's useful stuff. But five references as a number in its car, I'm not interested in marking numbers but its cdr is seven. So I can mark that.0:27:10Bang. Seven is the empty list, the only thing that references, and it's got a number in its car. Not interesting.0:27:19Well now let's go back here. I forgot about something. Two. See in other words, if I'm looking at cell one, cell one contains a two right over here.0:27:30A reference to two. That means I should go mark two. Bang. Two contains a reference to four. It's got a number in its car, I'm not interested in that, so0:27:41I'm going to go mark that. Four refers to seven through its car, and is empty in its cdr, but I've already marked that one so I don't have to mark it again.0:27:51This is all the accessible structure from that place. Simple recursive mark algorithm. Now there are some unhappinesses about that0:28:01algorithm, and we can worry about that a second. But basically you'll see that all the things that have not been marked are places that are free, and I could recycle.0:28:14So the next stage after that is going to be to scan through all of my memory, looking for things that are not marked. Every time I come across a marked thing I unmark it, and0:28:23every time I come across an unmarked thing I'm going to link it together in my free list. Classic, very simple algorithm.0:28:32So let's see. Is that very simple? Yes it is. I'm not going to go through the code in any detail, but I just want to show you about how long it is. Let's look at the mark phase.0:28:42Here's the first part of the mark phase. We pick up the root. We're going to use that as a recursive procedure call.0:28:52We're going to sweep from there, after when we're done with marking. And then we're going to do a little couple of instructions that do this checking out on the marks and changing the0:29:01marks and things like that, according to the algorithm I've just shown you. It comes out here. You have to mark the cars of things and you also have to be able to mark the cdrs of things.0:29:10That's the entire mark phase. I'll just tell you a little story about this. The old DEC PDP-6 computer, this was the way that the0:29:22mark-sweep garbage collection, as it was, was written. The program was so small that with the data that it needed,0:29:31with the registers that it needed to manipulate the memory, it fit into the fast registers of the machine, which were 16. The whole program. And you could execute0:29:40instructions in the fast registers. So it's an extremely small program, and it could run very fast. Now unfortunately, of course, this program, because the fact0:29:53that it's recursive in the way that you do something first and then you do something after that, you have to work on the cars and then the cdrs, it requires auxiliary memory.0:30:03So Lisp systems-- those requires a stack for marking. Lisp systems that are built this way have a limit to the0:30:12depth of recursion you can have in data structures in either the car or the cdr, and that doesn't work very nicely. On the other hand, you never notice it if it's big enough.0:30:23And that's certainly been the case for most Maclisp, for example, which ran Macsyma where you could deal with expressions of thousands of elements long.0:30:33These are algebraic expressions with thousand of terms. And there's no problem with that. Such, the garbage collector does work.0:30:42On the other hand, there's a very clever modification to this algorithm, which I will not describe, by Peter Deutsch and Schorr and Waite-- Herb Schorr from IBM and Waite, who I don't know.0:30:55That algorithm allows you to build-- you do can do this without auxiliary memory, by remembering as you walk the data structures where you came from by reversing the pointers0:31:04as you go down and crawling up the reverse pointers as you go up. It's a rather tricky algorithm. The first time you write it-- or in fact, the first three times you write it it has a terrible bug in it.0:31:14And it's also rather slow, because it's complicated. It takes about six times as many memory references to do the sorts of things that we're talking about.0:31:24Well now once I've done this marking phase, and I get into a position where things look like this, let's look-- yes. Here we have the mark done, just as I did it.0:31:35Now we have to perform the sweep phase. And I described to you what this sweep is like. I'm going to walk down from one end of memory or the other, I don't care where, scanning every cell that's in0:31:45the memory. And as I scan these cells, I'm going to link them together, if they are free, into the free list. And if they're not free, I'm going to unmark them so the marks become zero.0:31:57And in fact what I get-- well the program is not very complicated. It looks sort of like this-- it's a little longer. Here's the first piece of it. This one's coming down from the top of memory.0:32:06I don't want you to try to understand this at this point. It's rather simple. It's a very simple algorithm, but there's pieces of it that just sort of look like this.0:32:15They're all sort of obvious. And after we've done the sweep, we get an answer that looks like that.0:32:25Now there are some disadvantages with mark-sweep algorithms of this sort. Serious ones. One important disadvantage is that your memories get larger0:32:34and larger. As you say, address spaces get larger and larger, you're willing to represent more and more stuff, then it gets very0:32:43costly to scan all of memory. What you'd really like to do is only scan useful stuff. It would even be better if you realized that some stuff was0:32:56known to be good and useful, and you don't have to look at it more than once or twice. Or very rarely. Whereas other stuff that you're not so sure about, you0:33:05can look at more detail every time you want to do this, want to garbage collect. Well there are algorithms that are organized in this way.0:33:15Let me tell you about a famous old algorithm which allows you only look at the part of memory which is known to be useful. And which happens to be the fastest known garbage0:33:24collector algorithm. This is the Minsky-Feinchel-Yochelson garbage collector algorithm. It was invented by Minsky in 1961 or '60 or something, for0:33:36the RLE PDP-1 Lisp, which had 4,096 words of list memory,0:33:45and a drum. And the whole idea was to garbage collect this terrible memory. What Minsky realized was the easiest way to do this is to0:33:56scan the memory in the same sense, walking the good structure, copying it out into the drum, compacted.0:34:06And then when we were done copying it all out, then you swap that back into your memory. Now whether or you not use a drum, or another piece of memory, or something like that isn't important.0:34:17In fact, I don't think people use drums anymore for anything. But this algorithm basically depends upon having about twice as much address space as you're actually using.0:34:30And so what you have is some, initially, some mixture of useful data and garbage. So this is called fromspace.0:34:45And this is a mixture of crud. Some of it's important and some of it isn't. Now there's another place which is hopefully big enough,0:34:55if we recall, tospace, which is where we're copying to. And what happens is-- and I'm not going to go through this detail.0:35:04It's in our book quite explicitly. There's a root point where you start from. And the idea is that you start with the root.0:35:14You copy the first thing you see, the first thing that the root points at, to the beginning of tospace. The first thing is a pair or something0:35:24like, a data structure. You then also leave behind a broken heart saying, I moved this object from here to here, giving the place0:35:36where it moved to. This is called a broken heart because a friend of mine who implemented one of these in 1966 was a very romantic character and called it a broken heart.0:35:49But in any case, the next thing you do is now you have a new free pointer which is here, and you start scanning. You scan this data structure you just copied.0:36:00And every time you encounter a pointer in it, you treat it as if it was the root pointer here. Oh, I'm sorry. The other thing you do is you now move the root pointer to there.0:36:09So now you scan this, and everything you see you treat as it were the root pointer. So if you see something, well it points up into there somewhere.0:36:18Is it pointing at a thing which you've not copied yet? Is there a broken heart there? If there's a broken heart there and it's something you have copied, you've just replaced this pointer with the0:36:27thing a broken heart points at. If this thing has not been copied, you copy it to the next place over here. Move your free pointer over here, and then leave a broken0:36:39heart behind and scan. And eventually when the scant pointer hits the free pointer, everything in memory has been copied.0:36:50And then there's a whole bunch of empty space up here, which you could either make into a free list, if that's what you want to do. But generally you don't in this kind of system. In this system you sequentially allocate your memory.0:37:00That is a very, very nice algorithm, and sort of the one we use in the scheme that you've been using. And it's expected--0:37:09I believe no one has found a faster algorithm than that. There are very simple modifications to this algorithm invented by Henry Baker which allow one to run0:37:19this algorithm in real time, meaning you don't have to stop to garbage collect. But you could interleave the consing that the machine does when its running with steps of the garbage collection process, so that the garbage collector's distributed, and0:37:31the machine doesn't have to stop, and garbage collecting can start. Of course in the case of machines with virtual memory where a lot of it is in inaccessible places, this0:37:41becomes a very expensive process. And there have been numerous attempts to make this much better. There is a nice paper, for those of you who are0:37:52interested, by Moon and other people which describes a modification to the incremental Minsky-Feinchel-Yochelson algorithm, and modification the Baker algorithm which is more efficient for virtual0:38:05memory systems. Well I think now the mystery to this is sort of gone. And I'd like to see if there are any questions.0:38:19Yes. AUDIENCE: I saw one of you run the garbage collector on the systems upstairs, and it seemed to me to run extremely fast. Did the whole thing take--0:38:30does it sweep through all of memory? PROFESSOR: No. It swept through exactly what was needed to copy the useful structure. It's a copying collector.0:38:40And it is very fast. On the whole, I suppose to copy-- in a Bobcat-- to copy, I think, a three megabyte thing or something is0:38:52less than a second, real time. Really, these are very small programs. One thing you should realise is that garbage collectors have to be small.0:39:05Not because they have to be fast, but because no one can debug a complicated garbage collector. A garbage collector, if it doesn't work, will trash your0:39:15memory in such a way that you cannot figure out what the hell happened. You need an audit trail. Because it rearranges everything, and how do you know what happened there? So this is the only kind of program that it really,0:39:27seriously matters if you stare at it long enough so you believe that it works. And sort of prove it to yourself. So there's no way to debug it.0:39:36And that takes it being small enough so you can hold it in your head. Garbage collectors are special in this way.0:39:45So every reasonable garbage collector has gotten small, and generally small programs are fast. Yes. AUDIENCE: Can you repeat the name of this technique once again?0:39:54PROFESSOR: That's the Minsky-Feinchel-Yochelson garbage collector. AUDIENCE: You got that? PROFESSOR: Minsky invented it in '61 for the RLE PDP-1. A version of it was developed and elaborated to be used in0:40:07Multics Maclisp by Feinchel and Yochelson in somewhere around 1968 or '69.0:40:19OK. Let's take a break. [MUSIC: "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]0:41:17PROFESSOR: Well we've come to the end of this subject, and we've already shown you a universal machine which is down to evaluator.0:41:26It's down to the level of detail you could imagine you could make one. This is a particular implementation of Lisp, built on one of those scheme chips that was talked about0:41:37yesterday, sitting over here. This is mostly interface to somebody's memory with a little bit of timing and other such stuff. But this fellow actually ran Lisp at a fairly reasonable0:41:48rate, as interpretive. It ran Lisp as fast as a DEC PDP-10 back in 1979. And so it's gotten pretty hardware.0:41:59Pretty concrete. We've also downed you a bit with the things you can compute. But is it the case that there are things we can't compute?0:42:11And so I'd like to end this with showing you some things that you'd like be able to compute that you can't. The answer is yes, there are things you can't compute.0:42:22For example, something you'd really like is-- if you're writing [UNINTELLIGIBLE], you'd like a program that would check that the thing you're0:42:32going to do will work. Wouldn't that be nice? You'd like something that would catch infinite loops, for example, in programs that were written by users.0:42:43But in general you can't write such a program that will read any program and determine whether or not it's an infinite loop. Let me show you that. It's a little bit of a minor mathematics.0:42:58Let's imagine that we just had a mathematical function before we start. And there is one, called s, which takes a procedure and0:43:12its argument, a. And what s does is it determines whether or not it's0:43:24safe to run p on a. And what I mean by that is this: it's true if p applied0:43:34to a will converge to a value without an error.0:43:52And it's false if p of a loops forever or makes an error.0:44:15Now that's surely a function. There is some for every procedure and for every argument you could give it that is either true or false0:44:25that it converges without making an error. And you could make a giant table of them. But the question is, can you write a procedure that compute0:44:34the values of this function? Well let's assume that we can. Suppose that we have a procedure called "safe" that0:44:58computes the value of s.0:45:12Now I'm going to show you by several methods that you can't do this. The easiest one, or the first one, let's define a procedure0:45:22called diag1. Given that we have safe, we can define diag1 to be the0:45:38procedure of one argument, p, which has the following properties. If if it's safe to apply p to itself, then I wish to have an0:45:54infinite loop. Otherwise I'm going to return 3.0:46:03Remember it was 42. What's the answer to the big question? Where of course we know what an infinite loop is.0:46:12Infinite loop, to be a procedure of no arguments, which is that nice lambda calculus loop. Lambda of x, x of x, applied to lambda of x, x of x.0:46:24So there's nothing left to the imagination here. Well let's see what the story is. I'm supposing it's the case that we worry about the0:46:38procedure called diag1 applied to diag1. Well what could it possibly be?0:46:49Well I don't know. We're going to substitute diag1 for p in the body here. Well is it safe to compute diag1 of diag1?0:47:00I don't know. There are two possibilities. If it's safe to compute diag1 of diag1 that means it shouldn't loop. That means I go to here, but then I0:47:09produce an infinite loop. So it can't be safe. But if it's not safe to compute diag1 of diag1 then the answer to this is 3. But that's diag1 of diag1, so it had to be safe.0:47:20So therefore by contradiction you cannot produce safe. For those of you who were boggled by that one I'm going0:47:30to say it again, in a different way. Listen to one more alternative. Let's define diag2.0:47:39These are named diag because of Cantor's diagonal argument. These are instances of a famous argument which was0:47:48originally used by Cantor in the late part of the last century to prove that the real numbers were not countable, that there are too many real numbers to0:47:58be counted by integers. That there are more points on a line, for example, than there are counting numbers. It may or may not be obvious, and I don't want to0:48:07get into that now. But diag2 is again a procedure of one argument p. It's almost the same as the previous one, which is, if0:48:19it's safe to compute p on p, then I'm going to produce--0:48:29then I want to compute some other things other than p of p.0:48:38Otherwise I'm going to put out false. Where other then it says, whatever p of p, I'm going to put out something else.0:48:48I can give you an example of a definition of other than which I think works. Let's see. Yes. Where other than be a procedure of one argument x0:49:06which says, if its eq x to, say, quote a, then the answer is quote b.0:49:15Otherwise it's quote a. That always produces something which is not what its argument is.0:49:25That's all it is. That's all I wanted. Well now let's consider this one, diag2 of diag2.0:49:38Well look. This only does something dangerous, like calling p of p, if it's safe to do so.0:49:47So if safe defined at all, if you can define such a procedure, safe, then this procedure is always defined and therefore safe on any inputs.0:50:01So diag2 of diag2 must reduce to other than diag2 of diag2.0:50:15And that doesn't make sense, so we have a contradiction, and therefore we can't define safe. I just waned to do that twice, slightly differently, so you0:50:27wouldn't feel that the first one was a trick. They may be both tricks, but they're at least slightly different.0:50:37So I suppose that pretty much wraps it up. I've just proved what we call the halting theorem, and I suppose with that we're going to halt.0:50:46I hope you have a good time. Are there any questions? Yes. AUDIENCE: What is the value of s of diag1?0:50:56PROFESSOR: Of what? AUDIENCE: S of diag1. If you said s is a function and we can [INTERPOSING VOICES] PROFESSOR: Oh, I don't know. I don't know. It's a function, but I don't know how to compute it.0:51:06I can't do it. I'm just a machine, too. Right? There's no machine that in principle-- it might be that in that particular case you just0:51:16asked, with some thinking I could figure it out. But in general I can't compute the value of s any better than any other machine can. There is such a function, it's just that no machine can be0:51:27built to compute it. Now there's a way of saying that that should not be surprising. Going through this--0:51:36I mean, I don't have time to do this here, but the number of functions is very large. If there's a certain number of answers possible and a certain0:51:48number of inputs possible, then it's the number of answers raised to the number inputs is the number of possible functions. On one variable.0:51:58Now that's always bigger than the thing you're raising to, the exponent. The number of functions is larger than the number of0:52:12programs that one can write, by an infinity counting argument. And it's much larger. So there must be a lot of functions that can't be0:52:22computed by programs. AUDIENCE: A few moments ago you were talking about specifications and automatic generation of solutions. Do you see any steps between specifications and solutions?0:52:37PROFESSOR: Steps between. You mean, you're saying, how you go about constructing devices given that have specifications for the device? Sure. AUDIENCE: There's a lot of software engineering that goes0:52:48through specifications through many layers of design and then implementation. PROFESSOR: Yes? AUDIENCE: I was curious if you think that's realistic. PROFESSOR: Well I think that some of it's realistic and0:52:57some of it isn't. I mean, surely if I want to build an electrical filter and I have a rather interesting possibility.0:53:07Supposing I want to build a thing that matches some power output to the radio transmitter, to some antenna.0:53:19And I'm really out of this power-- it's output tube out here. And the problem is that they have different impedances. I want them to match the impedances. I also want to make a filter in there which is going to get0:53:29rid of some harmonic radiation. Well one old-fashioned technique for doing this is called image impedances, or something like that.0:53:38And what you do is you say you have a basic module called an L-section. Looks like this.0:53:47If I happen to connect this to some resistance, r, and if I make this impedance x, xl, and if it happens to be q times r, then this produces a low pass filter with a q square plus0:53:59one impedance match. Just what I need. Because now I can take two of these, hook them together like this.0:54:11OK, and I take another one and I'll hook them together like that. And I have two L-sections hooked together.0:54:20And this will step the impedance down to one that I know, and this will step it up to one I know. Each of these is a low pass filter getting rid of some harmonics. It's good filter, it's called a pie-section filter.0:54:30Great. Except for the fact that in doing what I just did, I've made a terrible inefficiency in this system. I've made two coils where I should have made one.0:54:41And the problem with most software engineering art is that there's no mechanism, other than peephole optimization and compilers, for getting rid of the0:54:50redundant parts that are constructed when doing top down design. It's even worse, there are lots of very important structures that you can't construct at all this way.0:55:01So I think that the standard top down design is a rather shallow business. Doesn't really capture what people want to do in design. I'll give you another electrical example.0:55:10Electrical examples are so much clearer than computational examples, because computation examples require a certain degree of complexity to explain them. But one of my favorite examples in the electrical0:55:19world is how would I ever come up with the output stage of this inter-stage connection in an IF amplifier. It's a little transistor here, and let's see.0:55:32Well I'm going to have a tank, and I'm going to hook this up to, say, I'm going to link-couple that to the input0:55:43of the next stage. Here's a perfectly plausible plan-- well except for the fact that since I put that going up I should make that going that way.0:55:53Here's a perfectly plausible plan for a-- no I shouldn't. I'm dumb. Excuse me. Doesn't matter. The point is [UNINTELLIGIBLE] plan for a couple [UNINTELLIGIBLE]0:56:02stages together. Now what the problem is is what's this hierarchically? It's not one thing. Hierarchically it doesn't make any sense at all.0:56:11It's the inductance of a tuned circuit, it's the primary of a transformer, and it's also the DC path by which bias0:56:22conditions get to the collector of that transistor. And there's no simple top-down design that's going to produce a structure like that with so many overlapping uses for a0:56:33particular thing. Playing Scrabble, where you have to do triple word scores, or whatever, is not so easy in top-down design strategy.0:56:44Yet most of real engineering is based on getting the most oomph for effort. And that's what you're seeing here.0:56:54Yeah? AUDIENCE: Is this the last question? [LAUGHTER]0:57:18PROFESSOR: Apparently so. Thank you. [APPLAUSE]0:57:39[MUSIC-- "JESU, JOY OF MAN'S DESIRING" BY JOHANN SEBASTIAN BACH]

0:00:00Lecture 1A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 1B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 2A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 2B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 3A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 3B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 4A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 4B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 5A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 5B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 6A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 6B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 7A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 7B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 8A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 8B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 9A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 9B | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 10A | MIT 6.001 Structure and Interpretation, 1986

0:00:00Lecture 10B | MIT 6.001 Structure and Interpretation, 1986