0:00:00
Clojure Made Simple
0:00:00
I want to thank everybody for coming the title is talk is closure made simple on the brochure they left out the comma0:00:09
so it's not closure made simple in other words a tutorial in closure or an easy explanation of closure it's not actually a comprehensive explanation of closure0:00:18
at all but a look at a slice of what closure is about a way of thinking about why you might want to use it so I'm the0:00:27
person who made closure I currently work on a database called de Tomic which is kind of a functional database it's written in closure and runs on JVM0:00:37
architecture I'm a co-founder of daytime of cognate equi build stay Tomic and sponsors the development and stewardship0:00:46
of closure but the main point I wanted to make about myself to this audience was because in this talk I might seem somewhat skeptical of Java and0:00:56
object-oriented programming is that I've done an absolute ton of that that's what I did for two decades before I said if I0:01:05
still want to be a programmer I don't want to do it this way anymore so I know exactly how apps are built using Java and C++ and C sharp because that's what0:01:15
I used to do that doesn't mean what I think about them is correct but that's my experience but I'm wondering about0:01:24
you how many people program in Java here how are you happy about that how you were actively unhappy about that looking for alternatives okay great how many0:01:35
people have tried closure at all great how people never heard of closure and are in the wrong room okay how people have tried closure and are trying to get0:01:45
it to use it at work but not yet a few maybe this talk will give you some some ways of talking about closures value0:01:54
proposition that could help you how you will actively use closure and somehow are accidentally at JavaOne okay0:02:03
and the rest couldn't get into Brian gets a sock I shouldn't even mention that Brian gets us to talk right now because we could have people filing or maybe you're just tired of Brian gets talking about immutability in which case0:02:13
you're definitely in the wrong room now I like them Brian's a good friend and his talks are great so I appreciate your0:02:22
bringing this one so if you know very few people had never heard of closure so I'm not going to spend a lot of time on oh I had one more one more question how many people have seen my talk simple0:02:32
made easy and people not okay a few so I made spend a minute describing what I mean when I say simple closure is a0:02:42
programming language it runs in the JVM and JavaScript and pretty much a substantial a subset of closure runs on JavaScript so it's a program lines with0:02:51
which you can target both but originally it only targeted the JVM and the CLR it's they're still a port to the CLR that's maintained but does not seem C0:03:00
wide use released at first in 2007 it's had you know surprising adoption especially from my perspective since0:03:09
then given given its characteristics because it's a lisp it's functional its data oriented and it has a lot of things that make it seem not like the kind of0:03:19
language that would succeed and this talk will really be about the data orientation of closure so a lot of the0:03:29
best programmers and the most productive programmers I know are writing everything in blank and swearing by it and then just producing ridiculously sophisticated things in a very short0:03:39
time and that programmer productivity matters so Adrienne Cockcroft was an architect and Netflix now is the battery ventures how many people think Java goes0:03:49
in the blank okay so we know we know this there's something about Java that makes0:03:59
it not suitable for this blank - maybe we can tease that apart yes of course he was saying closure and this talk is0:04:08
about maybe why like how could this be true what is it that makes closure different and possibly a better fit for0:04:18
that blank so the first thing I want to talk about is is that I think we have this tendency in programming to think about ourselves just a ton and our0:04:29
languages our tools and our techniques and me me me me me us what we're doing whatever we lose track of the fact that we're all working for somebody else who's or ourselves but for a business or0:04:39
an organization that's trying to accomplish something in the world and the software is completely secondary to that task right it should be measured0:04:48
always in terms of the cost-benefit ratio the return on investment right how quickly can we get a product to market and is what we're doing profitable right0:04:57
if we're not doing that we're not really being good participants on our in our businesses or organizations so what do0:05:06
the stakeholders want they really want two things they want something good and they want it soon so something good we0:05:21
think we know it's something good we have you know we know how to make things good right we have these techniques and some things things are good when the techniques are successful with them0:05:30
right so when our types check on our test pass we have something good but of course we all know that with our best efforts in those things which are I'm0:05:40
not saying they're bad activities by the way but no matter what we do there we end up with programs that don't work we all know we have programs of type check and test pass and they don't work and0:05:50
when they don't work from the perspective of the stakeholder in other words they don't do what the program was supposed to do and what's supposed to do was something that was conveyed between people or through documentation or0:05:59
papers or things that are not in program languages they have to meet operational requirements and they have to be flexible okay now there are some times0:06:09
where people just want something soon and they don't want something good there are actually better languages enclosure for that right just give me something fast that I'm absolutely definitely0:06:19
going to throw away will not grow will not expand will not will that take me further so these first two things are0:06:28
means right they're good but they're only good insofar as they help ensure the latter three things so we break it down what is it supposed to do again0:06:37
it's a prospective thing if the stakeholder thinks assume what's supposed to do they're fine of course they're going to have expressed concerns about software what it's supposed to do0:06:46
it should do this when I push this button and they're going to have unexpressed presume things like it should be secure it shouldn't you know cause the computers ago on fire because0:06:55
it's so slow it shouldn't require three new data centers it should keep running and not stop for an hour every day those0:07:04
are sort of the unstated presumptions of something being good but it ends up that if you build large elaborate stateful programs it's extremely difficult to0:07:14
ascertain whether or not they are going to do what they're supposed to do and in fact if you build any one of those things if you just build a very large program or a very elaborate program or a0:07:24
very stateful program it will be as hard to figure out if it's one if it's going to do is supposed to do so one of the things closure is oriented at is making0:07:34
it easier to understand whether or not your program is going to do it's supposed to do mostly by making it substantially smaller and also by making it more functional in terms of0:07:44
operational requirements is a boatload of things as a boatload of unstated requirements of software you know - are you cantide apply it in the normal way with all my stuff with the people who0:07:53
know how to run my machines and everything else and and that's one of the targets of closure closure was meant to be hosted it's just a jar it runs in0:08:02
the environment it's easy to sneak in right let's just add this to one more jar and then we're and then we're running but but it's not a small thing0:08:11
right if you wanted to adopt calm lisps a or Haskell you would be asking your urops team and your deployment team0:08:21
to start manipulating something completely alien whose characteristics they don't understand in terms of security everything that's available0:08:30
from the JVM for security is available via closure and there are performance and other concerns a very important0:08:39
thing though is that now we can also reach the browser so I think you know how many people write applications where some part of the overall system touches a browser yeah so right now you use two0:08:50
different things I almost definitely use two different things and I think it's a strength of closure that we're delivering the same value proposition in both places both on0:08:59
the server and in the client even if they're separate devs the value proposition is necessary I'm almost more so in the browser which is one of the most complex places ever in terms of0:09:11
performance you know a lot of times you might look at a dynamic language and say you know how could it be whatever but you know closures right down there with the fast languages on admittedly the0:09:23
benchmark game which is a benchmark and a game so but it says we can reach that on JavaScript we have a very interesting0:09:32
result here so ohm is a closure script library so closure script disclosure on on JavaScript and ohm is a library that actually wraps react which is the new0:09:43
hotness what's the really interesting things here ohm wraps react and then spanks it in performance how's that0:09:52
possible it ends up that that ohms use of persistent data structures which we'll talk about in a minute make it faster0:10:01
than react because the big part of react is doing change detection and change detection for immutable things is identity comparison so it's super fast0:10:11
and in fact they are reacting their whole thing to use persistent data structures in JavaScript now the react dev came up to me at strange loop and0:10:22
said shook my hand and said you're saving us a ton of money because we're switching to that strategy so the other part of the value0:10:32
proposition I said was flexibility right people that that you know have a stake in software know that they're building a system now but tomorrow things are going0:10:42
to change requirements going to changes we have to do something different change is inevitable so can we change the program can we make it more flexible and ends up there's a0:10:51
lot we can learn from bigger system design in the small to make the sub components of systems more flexible and of course this is old this is the oldest0:11:02
thing right loose coupling right but we talked and talked and talked about it we continued to use techniques that thwart it like every single day you pull out a technique you do something that's that0:11:11
makes this harder so what makes it easier so the my talk was done and0:11:21
somebody said you know I have this quote from Walmart Labs guy you might want to put it in your talk and it was like wow this is great because he's telling my story right with0:11:30
closure we get to market faster with better quality we avoid unintended interruptions in java apps from code in one area impacts the application in another closure shrinks our code base to0:11:40
about one-fifth the size it would have had we written in Java right and these are the points of my talk all right this0:11:49
is what I want this is a stakeholder saying this is what we get by by choosing closure faster time-to-market better quality right we avoid coupling0:11:59
problems that make it difficult for us to change and we have a smaller code base so how does closure do this there0:12:08
are many many different characteristics to closure but I only want to talk about sort of two today mostly one one is and because this is one I think that sets closure apart it makes it somewhat0:12:19
different is that it has data orientation through and through and the other is simplicity so when I say simplicity I mean the opposite of0:12:28
complexity I do not mean ease and I mean closure is an easy language that's easy only you know your type and everything magically happens or some sort of easiness metric0:12:39
complex things are intertwined and simple things are not right they're more independent they're separate even if they have as many things going on this0:12:48
is simpler than this right for things like this is complex and for things like this is simple so that's what I mean when I say simple unentangled so the0:13:01
thing is that you know all these languages can do everything right we can do the same stuff and people are like why you know I know I can do the same stuff in c-sharp in Java and and Scala0:13:11
and and closure and you know anymore any general-purpose language you can accomplish the same things at the end of the day right so what differentiates languages is what they make practical0:13:21
and what they make idiomatic and in closure we focused on something making something idiomatic that I think is not and should be more so that's this before0:13:34
we had all this high fluting you know opinions of ourselves as programmers and computer scientists and stuff like that programming used to be called data processing how many people actually do0:13:44
data processing in their programs you can raise your hands we all do right this is what most programs do you take some information in somebody type some0:13:53
stuff somebody sends you a message you put it somewhere later you try to find it you put it on the screen you send it to somebody else that's what most0:14:04
programs do most of the time sure there's a computational aspect to programs there's quality of implementation issues to this but there's nothing wrong with saying programs process data because data is0:14:14
information information systems I mean this is should be what we're doing right we're the stewards of the world's information and information is just data0:14:23
it's not a complex thing it's not an elaborate thing it's a simple thing until we programmers start touching it so we have data processor processing0:14:33
most programs do this there's very few programs that don't write and data is a fundamentally simple thing data is just0:14:43
raw immutable information so that's the first point data is immutable if you make a data structure you can start mess with that but actual data is immutable0:14:55
so if you have a representation for it that's also immutable you're capturing its essence better than if you start fiddling around and that's what happens0:15:04
languages fiddle around right they elaborate on data they add types they add methods they make data active they make data immutable they make data0:15:13
movable they turn it into you know an agent or some active thing and at that point they're ruining it at least0:15:23
they're moving it away from what it what it is and in object orientation I think this is rampant because our fundamentals construct the object conflates two0:15:34
things it doesn't give us two separate ways to talk about process for which objects are an okay approach and information for which they're a terrible0:15:43
terrible terrible approach right but every time we have a problem we pull out an object we make a new object we make a new class we make a new instantiation of something right and and this makes our0:15:53
programs more about themselves again and less about the information programs are more increasingly about code and decreasing lis about data and I think0:16:03
that's a mistake so closure embraces data that's like the simplest idea of my enclosure it's just there's nothing wrong with data data has these great0:16:12
properties let's use it in fact let's make it a really important first class in your face kind of thing so closure embraces it first and foremost by having0:16:22
strong data literals which I'll show you in a minute and they're plane right plane it just means flat level unadorned no extra0:16:33
stuff code and this is an old listing in closure is represented as data that's important for a lot of reasons it0:16:42
enables macros and a lot of sophisticated program transformation things but it also means that you don't have different stuff right the majority0:16:51
of functions in closure just take data and by data I mean immutable unadorned stuff and they return that same thing giant library functions hundreds and0:17:00
hundreds and hundreds of data manipulation function ready to go they take data they return data so if you have anything that's data you can use all those functions on it0:17:09
which tends to press you towards making everything data because then you have this giant library which you learn once and you can apply to every problem you0:17:18
have and in particular every time we encounter information in closure systems because there's different parts of systems right there's a part of your system that manipulates information is a0:17:27
part of your system that's sort of plumbing right or the machinery of your program like you know a socket or communications endpoint is more like a machine than it is like information0:17:37
right so there's going to be active parts of your program but whenever you're dealing with the part of your program that's just about representing facts and information we will always0:17:47
enclosure choose plain data to do that so there there's a small set of data literals enclosure they're relatively0:17:56
obvious right you can write integers in the normal way and doubles the normal way and big decimals with an M at the end there are ratios and these are proper ratios that don't lose precision0:18:05
strings are in double quotes and strings are java.lang string when you write that you get a java.lang strength and so it's just a literal for a string there are0:18:14
the rules for characters because we use data structures for programming right there's a couple of extra things we need but if you look at Java code is it0:18:24
written all with quotes around every every word no so job I need something besides strings in order to be a successful programming language it needs0:18:34
symbols and it needs identifiers so if you're going to properly represent your code as data those things have to be0:18:43
first-class data structures or atomic data types that are different from strings so there are two in enclosure one are symbols and0:18:52
the other are keywords and they're use in closure which I'm not going to you know dive into is that symbols are generally used to reference something0:19:01
else so like variables and things like that they name something and keywords name themselves then we're like enums if you want they're very useful as keys in0:19:12
maps for instance that's why they're called that there are true or false boolean literals and there's nil which is not it's the same that's a Java null and those boolean's0:19:22
are Java new Java boolean's those characters or Java characters etc etc and there's also literals for regex and0:19:31
then we have data structures some fundamental data structures we have the singly linked list right it's in parens those are all lists their list of0:19:40
numbers a list of symbols a list that has a symbol and then some numbers that's okay can be heterogeneous and they grow at the front and they have linked lists kind of performance0:19:49
characteristics which means it's fast to put something at the front and it's slow to find the 57000 thing it's linear time to find stuff in the middle then we have0:19:58
vectors there in square brackets right again they can be heterogeneous this aims to be one with numbers and one with symbols but you can intermix them they0:20:07
grow at the end in constant time but they also offer fast access anywhere in the middle so they're different from linked lists then and then there are0:20:16
maps their key value key value key value the commas are optional the keys not need not be key words so this first one uses key words mapping to integers and0:20:25
the second one uses integers mapping to strings but you can be heterogeneous in both the key and the value and then we have sets which are just Curly's0:20:35
preceded by a hash and you can have sets again heterogeneous of anything and all of the stuff nests one of the important things about this is that these maps0:20:45
they scale and are efficient from the very small all the way to the very large so you can use them as sort of pseudo objects and the small and have you know four or five fields if you will or0:20:56
entries or you can have a giant map that has you know millions tens of millions hundreds of millions of things in it the same data structures used throughout that range enclosure we don't0:21:07
distinguish those two uses at all so all of the data structures I showed you are immutable there's no way to change them0:21:16
just like there's no way to change 42 there's no way to change a vector what you can do is make a new vector that's slightly different from the vector you0:21:25
started with and you both vectors and there's a technique called persistent data structures which makes this making of a new slightly0:21:35
different version efficient the two versions will have a substantial amount of structural sharing going on under the hood and that's possible because they're0:21:44
immutable right they can share structure because no one can change them and that's what makes it practical to use immutable data structures all the way0:21:53
through the range from tiny things through very very large things it's not copy-on-write right these new modifications that you make still comply0:22:03
with the big o expectations you have for the data structure in hand and this is the key to practical functional programming is having this so the idea0:22:13
behind closure or one of the ideas behind closure is just this right I'd I had done so much object-oriented programming and like it's just so much0:22:22
busy work and so much extra stuff and when I finally later in my career learned Lisp I saw people building very very interesting systems out of much0:22:32
much simpler stuff and I tried it and guess what you can build exactly the same systems out of much much simpler stuff and I said well I can't repeat it0:22:43
but I was very unhappy basically something to the effect of I have been wasting my time in my career doing what I've been doing I need to do something I0:22:52
need to change what I'm doing because I'm wasting my time I'm wasting my life doing it this way because you can build programs the same programs better programs that do the same things with0:23:03
substantially simple stuff in fact you can build them out of the data structures I just showed you plus pure functions that take those things and return those things most of your program0:23:13
you can build that way a little tiny bits of your program you'll have state you'll have communication you'll have the other the other aspects I like runtime polymorphism enclosure has it0:23:22
I'll talk about it in a second but you can build programs out of this and you can build big programs out of it you can build databases out of it right I've done it I've built a database out of it when I made closure I was targeting0:23:32
being able to do everything that I used to do in Java and C++ so I'd built broadcast automation systems scheduling systems yield management systems0:23:43
election projection systems and exit poll systems in although you know C++ and Java and c-sharp I believe that I could have used closure to build0:23:52
anything I ever built right with maybe a tiny little bit of lower-level code in small places but that's the target for closure I wanted to replace what I was doing I0:24:03
think the programs are substantially smaller they're simpler and they're much more robust than an object program or address oriented programming I was doing0:24:13
to accomplish the same jobs so the idea behind closure is let's make that de Matic let's make that the first customer we do things so the syntax of closure there's no more that you just saw it0:24:24
it is those data structures that set of data structures and knows those fundamental things are a format called Eden which stands for extensible data0:24:34
notation but it's just a grown up version of s expressions which have been used in the list community for years basically you build programs out of data0:24:44
structures and I'll show you some of that in a second so the data structures are the code the syntax is not based around characters it's based around data structures you know the definition of0:24:53
what a function is is the list whose head is a symbol called you know fun and whose next thing in the list is a vector of arguments which are themselves0:25:02
symbols etc etc but the syntax is described in terms of data structures it's not like there's no syntax but that's that's where the interpretation0:25:12
happens and everything that would be special in an ordinary programming language declarations control structures function calls operators etc etc they're0:25:22
all just represented as lists with the verb at the front that's it that's the Lisp way also everything is an expression and that's typical of0:25:32
functional programming languages that's about all I'm going to say about closure the language but I'm going to show you a lot more about the use of these aspects0:25:42
of it I have a little bit more later so Eden this extensible Data notation I showed you some built-in things there's also a way to extend it to new tags that are namespace to allow you0:25:53
to describe something new in terms of anything that's already known so you can't make up arbitrary stuff you can't say we'll build a new parser to parse characters and get this thing but you0:26:02
can say I have a new interpretation of a vector of two numbers and we're going to call that a point and so on and so forth and you can cascade these extensions and0:26:11
build richer things but one of the important things about Eaton is it's meant to be useful as for data and code so you saw in a previous slide I said commas are optional that's kind of0:26:22
critical imagine if in between you know for add line in between everything you said in Java you had to put a comma who would like that that would be awful0:26:32
right so you can't have stuff like that so I mean how many people have ever used any system that tried to encode programming in like JSON or XML yeah how0:26:44
fun is that yeah so this is a little bit of like AWS cloud formation has some functions in there syntax which is normally0:26:54
declarative so this is a nested function call but we can't you can't do this right you can't program like this so you get a program using data structures you0:27:03
have to have a data structure format that's amenable to that so this is what closure looks like I don't really expect you to be able to read it but I just told you before that's a list right Kirk0:27:13
you know a paren defin right it defines a function the name of the function is0:27:22
words takes an argument called text and there's implementation and so on and so on and so forth so this is a port of a Peter Norvig spy thong code it's as short as the Python program right and0:27:33
and that's not like a contest right it's it it's more about this everything that's in here is about the problem like0:27:44
all the words you're reading and everything it's all about the problem there's no extra stuff there's no static import blah type this type that yadi ah0:27:54
extra control stuff it's all about the problem 100% about the problem and that's what you want right so it's short it's free of ceremony but most0:28:04
important is it's about the problem which means it's a lot easier to look at now and later and see what you're trying to do what you're trying to accomplish0:28:13
so once we have this format Eden and we use it for code and we can obviously use it for data you use it for everything right so let's say you want to have a0:28:23
DSL to represent HTML right because we know HTML on its own is kind of gross to generate and manipulate because it's XML plus some randomness so this is one0:28:35
of many dsls for representing HTML enclosure but it's Eden this is the same stuff that the program was made out of the same reader reads this if I if I0:28:44
call read which is available a la carte enclosure I get a vector that has a keyword that then a vector that has a keyword a vector blood I get data I you0:28:53
know exactly what I'm gonna get the data structures I'm going to get I have a special thing I have a Dom blah blah blah you know I get data when I read this and that's great0:29:02
it means it's easy for me to process this it's easy for me to write a program that produces this so I can produce HTML without an extra special different thing of course we would use this for our0:29:11
configuration files right why not we can read it we can generate it we can process it we can manipulate it everything we know all those hundreds of functions we know how to do we can use0:29:21
to do this now add up the stuff in Java how much have you got you have syntax for what you use I don't know what some parcel thingy right for that from Java C0:29:31
or something to manipulate you have annotations right you have JSON here you have XML there right maybe for writing your own DSL you use antler something0:29:40
keeps adding up more and more more different things and if you read stuff with an API what do you get some API authors idea of the Dom for this kind of0:29:50
thing right what do you get when you read XML well it depends on how you read it but you could get this you know machine right that calls you back every time it gets a new element Wow0:30:02
this is a Hadoop programming Netflix has quite profitably used closure to build a very succinctly DSL0:30:12
doing Hadoop and big data processing it looks exactly like closure it is closure sort of embedded in closure and the0:30:21
thing is you can run this locally and then push a button and it will distribute it over your Hadoop cluster and run it there same thing and you can0:30:30
run your own closure functions there too and they'll ship them and everything else so we just do this we do this everywhere everywhere we want there's a type annotation system for closure that0:30:39
uses data there's a schema language for closure that uses data there are many kinds of logic dsls that all use data as0:30:48
their as a representation and and this allows you to do something that's very interesting which is to write an embedded DSL now how many people have0:30:57
written a DSL and had more and more pressure to make it taurine complete and you know general-purpose alright you start with the DSL and it did X&Y and then people like could use e could you0:31:07
have conditionals could you have case could you have blown you know they always want more stuff so one of the cool things about doing dsls in a0:31:16
language like closure is that you can sort of you know co-opt all of closure inside your DSL it's like oh you want to do arithmetic well sure you know all you0:31:25
have to do is expose something that you're going to you're going to flow through to closure and let close your eval --it so it's very powerful you keep doing this the other big point of0:31:35
programs as data is that allows you to write program generating programs and that takes a few forms in the small there's a you know capability and0:31:44
closure and other list was called macros they're nothing like C macros from from your past maybe depending on how old your there are functions of data0:31:54
structures to data structures but basically it says the compiler for closure says if you declare something as a macro then if I see that in the program I will call your code I will0:32:04
give you the form as data that I encountered and you give me back a different form you can do any transformation you want which is you can build your own syntax you can build your0:32:14
own constructs you can extend the language however you want you do not need to wait for me or for anybody else or for the Java jsr whatever you just go0:32:25
so we're doing all this stuff with raw data structures and and how do we contrast that with objects right this is0:32:35
the ranty part of the talk just warning in advance so objects are like marionettes every has their burrito0:32:45
analogy but I couldn't connoisseur not like burritos objects are like marionettes okay they have all these methods on them right and anybody who has access to the object it's like0:32:55
they have that that control stick thing wikipedia says that the person who has that is called the master mind ER I0:33:04
couldn't find anybody else who agree with them but that's a cool name like you can remote control the object which you can write if you have a reference to the object you know it's type you can call any of its methods whenever you0:33:13
feel like it and whatever thread you want right just have at it you can do anything you want because you can call those things so whoever writes the object their class they have to defend0:33:22
against that because what's going to happen well in the real program you can start passing around references that object well now you have more than one master binder now you have as many0:33:33
masked reminders as you've shared references to the thing and anybody can call something at any time and what do you end up doing the end up saying well maybe they're friends and they're standing next to each other and they're0:33:42
like okay let's make the fancy horse you know dance and I'll do the front legs and the back legs and that happens in puppetry right but in programming you0:33:52
know sometimes they just go off and like somebody's trying to make the front of the horse go this way and somebody's trying to make the back of the horse go that way and it doesn't work and so you have to have all kinds of protection and0:34:02
you can't you actually can't effectively do this Java and languages like it don't give you an effective way to do this so you're suffering from this constantly as0:34:11
soon as you have a reference leak you're suffering now I know you say oh I use value objects some level you know you don't everywhere you could but you don't0:34:21
and it's not ego Matic and it's still hard and you still don't even know like if somebody gives you a reference right to an interface can you know that it0:34:32
will start dancing on you is there anything in Java that will tell you it's not going to move around it anyway any construct anything that type system0:34:41
anything no nothing so every day you have this unhappy face and then write0:34:51
the final problem right we how many people write a program it's just the program I like it sits by itself and it reads you know standard and spit standard out right compiler writers0:35:01
right that's that right and they write these great languages are really good at that right but that's not the real world right how do people write programs I talk to other programs routinely yeah we0:35:11
do that all the time how people put objects on the wire Wow that is terrifying all right but just so you so you know a0:35:22
long time ago we decided that was a bad idea so I don't know you got the memo or whatever but yeah objects don't travel0:35:31
on wires so you can fake it you can pretend you can make all these elaborate things but most of most of these things have have failed in practice right you0:35:41
can you can get away with it in small circumstances but it's not how things work right we don't actually give somebody a reference to something allow them to sort of remote-control us right0:35:52
so you know this is why I was saying before if your API takes an object especially by a reference to an interface right so you don't have the concrete classes definition to sort of make you feel better about it not0:36:02
changing do you know if it's going to mutate right can your type system help you with this which is really it's such an important thing for the robustness of your program to control this the answer0:36:11
is no you're getting no help at all so your default idioms leave you completely on your own to deal with these problems and really in a position to encourage it0:36:20
encourages you to create these problems just by accident right the other thing is so let's say you know let's say RMI is not allowed at your company how many0:36:32
people are not allowed to use our my intercompany everybody's allowed to use our Wow götze yeah that's actually not as bad0:36:46
but but let's say let's say you're allowed right and somebody says I want you to service eyes your thing how many people would choose RMI over something0:36:56
else like HTTP or anything else all right nobody nobody would right because and then what do you have to do0:37:05
right you have this interface you said I took an object I expected all these methods on the object and now I have to talk over a wire what do you have to do0:37:15
map mapping right object relational mapping object blah map object something map every time you want to get to and from the outside world especially stuff0:37:26
you're not writing so RMI is out of the question right you're not going to get like some webs website to like accept your are my calls or make our my calls0:37:35
to you right anytime you have to go outside of your box out of your world view your object world view you have to map right so you know at Java0:37:46
programmers and object-oriented programs have been like you know kicking sequel saying oh it requires object relational mapping and that's like a problem with sequel no it's a problem with objects right that's objects are not the way the0:37:57
world works nothing in the world works that way people do not hand their strings out to other people to like start yanking on them and like that's how we're going to build this is that so we're going to have like a soccer team0:38:06
right it's like I'm really gonna have a reference to somebody else and like you know and you call pass to me and like we build this big spaghetti nightmare that's not how the world works it's0:38:16
completely not the way the world works it's not how physics works so we say objects on their way to model the real world it's not at all it's a complete programming fabrication0:38:25
it's not very realistic it's not a good fit for almost anything that's in the outside world so you can build your own world where all this stuff makes sense0:38:34
but it's inherently I would call it idiosyncratic but in particular it's not the way that systems work right systems0:38:43
in the large right so what are systems well the word system means to cause to stand and I love that idea I mean I0:38:52
always think of with these legs sort of self-assembling to try to make something that stands up this independent parts that you connect0:39:01
together substantially independent parts right right because you don't need to cause something to stand that's like one thing that already has three three legs you're not you're not causing it to0:39:11
stand up it's the independence of the things that matters and in general we try to build systems in a way that makes them independent right do we want to care if0:39:21
another server is using the same programming language that we are or the same runtime in the same version of the Java Runtime or the same type system do we want to build a system like that0:39:30
where we care no we don't why because it's going to be brittle right it's going to be hard to make changes we have to agree with the other person right0:39:39
tomorrow we're going to have this thing and everything's going to be different you know three to one you know we don't do that the internet does it work that0:39:48
way no right we don't we don't do specific stuff we do general stuff and we try to be as independent as possible0:39:57
most things that happen between systems use one of two techniques right they use RPC with plain data out and back or they use queues where you send0:40:07
data and somebody shows up later and gets the data you just send and you just flow data around this is the way that systems are built big systems big successful systems like the internet and0:40:17
most systems right and those systems are flexible you measure their flexibility in terms of how much independence they support can you independently develop0:40:26
these parts of system can somebody upgrade they are part of the system and not mess up the other person right because like Twitter is not going to tell your web browser when they change0:40:36
their homepage and like Safari you know do something special that's not how it works right everybody has independent development independent time frames then0:40:45
the other thing that's critical to this is that if somebody else on the other end is going to change you have to be tolerant of things being different but you can't say well here's our contract you know the nine hundred things out to0:40:54
be exactly this way and then I'll work and then you'll work then if we're going to change anything we have to you know have lunch and have a meeting and and0:41:03
and decide this stuff again you have to be tolerant and accepting some more things so we don't have to change the lockstep and so these systems are inherently dynamic and they're inherently extensible right that's what0:41:14
it that's why I just said systems are made with dynamic types extensible types right they can accept data that you weren't expecting to see and it won't make them fall over and hopefully0:41:23
they'll do a good job of propagating it right they're all made this way so this is the other fundamental idea of closure0:41:32
we should build the insides of our systems like we build the outsides of our systems all those value propositions that accrue to systems we want them how0:41:41
many people want to have a meeting every time they change a class or a subsystem how people have meetings every time they change your class yeah you have to write this go break stuff is going to break so0:41:52
we should we should communicate using immutable data inside our systems for the same reason we do outside it makes our systems more robust it makes them easier to change it makes the0:42:02
independent parts separate it makes it easier to move them around right we get loose coupling we get subsystem independence we get flexibility and0:42:11
what's the mapping well there's no real mapping right this doesn't need mapping right RPC becomes PC right we can we can0:42:20
call functions we were calling functions before there was our right then they had our we had PC before we had our PC we can go back to PC we can do that right0:42:30
we can pass data to functions and get data back we used to be able to do it then we had all this elaborate stuff now I forgot how to do it right and we can0:42:39
implement queues and flow inside our inside our programs using queues or channels or things like that so that's the other key idea now there's going to0:42:50
be process and state closure is not a closure as a practical language of course you can have processing state or there's no reason to run your program I0:43:00
just make the computer hot and you go home right you're going to have state and effects but this is another area where we're left just totally with nothing in in object-oriented languages0:43:11
like Java you have nothing here like there are very fancy functional languages that have purity although they will force you through this system to identify and isolate all the0:43:22
parts of your program that could do IO or have any kind of effect they'll either do a vie a purity or V effects systems right and then there's the0:43:31
alternative to that is nothing by Muslim most people have is absolutely nothing and then you could also have reified constructs that at least make a state0:43:42
change explicit and that's where closure sets because in Java and C++ and C sharp you have nothing we just have nothing you have some0:43:51
really raw constructs like mutexes and you know a pat on the back and good luck buddy and read Brian's book so so0:44:02
closure doesn't have any purity to it but it has explicit constructs for State these are like you can imagine them being variables that have semantics to them so it's not just like anybody can0:44:11
come in and whack on this variable at any time instead you say I'm going to give you a function and you somehow apply that function to that variable to move it from one state to another but in0:44:22
doing so you can ensure it's free of conflict and free of races and it's never going to become half of a thing and these variables always refer to values so you're always able to observe0:44:32
them or dereference them and get out a value there's nothing else there's only these reference cells that point to values and values there's no mutable object that half of0:44:43
which could be whatever or you know like a date class that you could set the month or things like that right a date is a value you can have a reference to a date you can make that reference point0:44:52
to another date you can't change a date and see things you can change which are references and you have values which are not you can't change dates anymore you can change 42 maybe you can change these0:45:03
references but their atomic they just point to one thing so between those you can get a whole bunch of different variants right Kaz implements a0:45:12
successorship model you say only make this new thing if it was the thing I'd you know if my presumption is still valid right since the tiniest version enclosure has a construct that wraps0:45:21
that so you don't have to write the right to loop or anything else say here's my function apply it to the inside of that use Kaz do the loop for me and I know I'll get0:45:31
a clear successorship there with no race and no conflicts there's also an STM enclosure that allows for bigger transactional kinds of modifications to0:45:41
occur but the point is this construct is doing the job and the construct is calling out here is where the mutation is in the system here is where the state is and the thing is you have a way to0:45:52
get out of it right if I give you something that you don't know if it's going to change how can you save its value like if I give you a reference by0:46:01
an interface to some composite type you don't know if it could mutate how can you save its value what's the safe way to do that you don't know the clone is0:46:12
going to work at all sorry does not work what else what's role that gets ok this get that you got nothing you have absolutely0:46:21
nothing so this is like a critical thing for making a system that works you have absolutely nothing to do this with you have to build up your own convention0:46:30
around this so like I think you should pick right you should either have explicit constructs or go all the way to Haskell because everything in between is catastrophe and then we have a enclosure0:46:42
we have something called core async which is a channel model it's a little bit richer than queues because you have to set up the threads and do all the0:46:51
micro rights they have semantics that are based around something called communicating sequential sequential processes but the basic idea is that you're going to try to encourage0:47:00
especially when you're trying to convey values through a system instead of saying I'll put the Acorn behind this tree and you come by later and find it0:47:10
behind the tree you say I'm going to put the Acorn on the conveyor belt and you can take it off the conveyor belt and this is a big difference between those two things0:47:19
because if you put something on a conveyor belt and then go back to it what do you expect nothing comparable0:47:28
it's moving then that's that's it that flowed so you can't write kind of the logic you can write with variables going back and re-examining a place to update0:47:37
it in place and you know try to read it again it flows and so flow is a much more robust way to build a system data clothes a much more robust way than variables so we want to emphasize flow0:47:48
over places so program size matters right smaller is better right there there's app this is one of the few areas0:47:57
where we have like research right people have done research and said smaller programs have fewer bugs it's just that simple it doesn't matter what programming language of this smaller0:48:06
programs are fewer bugs write bigger programs have more bugs longer time to market they're harder to maintain and they're more brittle right but what I0:48:16
think is interesting is that there's two flavors of small a lot of languages focus on concision writes which is size in the small like how small is your if0:48:26
statement you know how small as a function call how you know tiny are your constructs how much overhead how much syntactic stuff is there and there's a lot of languages that focus on that Ruby0:48:36
and Python a lot of languages are actually very good at concision but the the bigger impact on a program overall right it's0:48:47
not moving from you know 42 characters to 20 characters that only gets you to X right the biggest thing is moving from more specificity which bloats your0:48:57
program to more generality which shrinks it that's the big payoff that's the kind of that's the area where you're going to get a payoff much higher than 2x so one0:49:08
of the other things I think we suffer from in object orientation is death by specificity right all the time we have a new thing we have a new idea a new piece0:49:17
of data boom we have a new class get this get that get whatever I don't care if their value types whatever right it's just a glorified map except you can't0:49:26
even use it as a map in Java right there's no generic way to manipulate something that says get this get that get that so new type new language gots own little vocabulary right so you're0:49:38
going to have more code you can have much less for use you can have more coupling right because essentially what's happening is every object has its own little language my class my0:49:47
interface my own language this is the my biggest pet peeve I want to get away from this when we saw it right and get this get that it's like this is there's0:49:56
no purpose to this this is just life-sucking so let's look at life sucking in there in this is this is just a tiny part I actually skipped this is just the servlet request and I0:50:07
have a little bit of httpservletrequest which really doubles the size of this thing but my question to you is how many maps do you see here we're like give a name0:50:19
you get a value I got 100 I actually can't do the auctioneer thing how many got yeah alright so first of all this0:50:39
game is hard right there there's some I got I got three inside and the overall0:50:48
thing is a map too so I got four right off by you know picking it apart what's really interesting is look at these map interfaces they're all ad hoc guess what0:50:58
else they're all different one has setting one you can actually get the map when you can get a list some you can get out with types there's four different0:51:07
maps in this one class this is crazy right in closure we just use maps right this stuff came over wire in HTTP as0:51:18
text how did we turn it into this what happened what happened why you know this is crazy now who can I told you the0:51:28
curly braces are maps who can see how many math you want to you know you see all the maps there's like there's because there's still maps right if you're going to write code that0:51:37
manipulates that other stuff every single line of code you write is going to be special has to you know use whatever Java X servlet bla right and0:51:48
your rank codes explicitly to this thing if there's another way to do HTTP not that the risks or if what's will probably the only way it sort of to HTTP but if there was another way to do HTTP0:51:58
will you be able to reuse that code know it's all written it's all hardwired to this person or persons idea of like what0:52:08
an HTTP request is right so you okay this is a tiny little benefit right dot works in your IDE whoo oh my goodness because I could never remember that so I bet something better happen0:52:20
when I press stop because I'm doomed otherwise of course I could look at the HP spec and you know like we could agree on these names and I just don't get it0:52:29
you know you can tell a kid not to put a spoon in a blender and turn it on and like they will remember that for their entire lives they will never make that0:52:38
mistake but grown-up adult programmers like weed we need protection right for from this stuff but the protection we0:52:47
get is really minimal what's the cost it's huge right that's an idiot in inconsistent interface it's incredibly idiosyncratic the interface is huge so0:52:57
if you wanted to like have a second implementation you know get to work there's a ton more code to consume it you can't use any of the libraries you already have right with the closure0:53:06
version all the math code like I said those hundreds of functions they work on this you can create this with them you can read this with them you can merge two of these with that and like you have0:53:16
no new code to manipulate this no new code all the functions you already know manipulate HTTP requests as soon as you represent them as data which they were by the way before we map them right the0:53:29
testing right it's easier to test data can you make a program that makes this yeah can you make a program X one of these yeah okay and then certain the0:53:43
other problem is your typical Java program has two to three orders of magnitude more of that more of this0:53:52
right 100 classes to a couple hundred how many people programs with more than a thousand classes yeah that's a party0:54:02
all right so some closure programs are smaller in both ways they're more concise and they support generic programming because we just program with0:54:11
these data abstractions we represent information as plain data so I mean this is always the biggest reservation I love my types I like my dot I like my IDE you0:54:20
know I I can't I can't deal with something and it's true right if you have the types of Java right now fancier types0:54:30
fancier type systems can do more but if you have the types of Java you can you can catch you know typos and pressing in so when you're supposed to pass strings but it's really likely that your tests0:54:40
or your ripple interaction is going to catch that stuff that is not a quality metric that is not sufficient for quality it's part of quality right no typos right but it's not sufficient for0:54:49
quality right the quality is and all this other stuff you have where did we say we have no way to deal with state management we're encouraged to write highly coupled programs we're inflexible0:54:58
we're not meeting the customer stakeholder quality metric at all in fact we're pointed against it time and time again we're pointed at the wrong0:55:08
thing and because our code is so huge we can't even really understand what it does anymore so the biggest source of bugs and programs which is misconceptions right everybody gets it0:55:17
wrong I talked to the stakeholder they told me this I didn't think of one of the situations when I wrote the program that's the act those are the real bugs and programs by everything else their0:55:27
superficial bugs those are the real bugs they're harder to see so I think this default idioms are a big one unlike I spend much time on this but it was interesting to me because I always like0:55:36
to look up words what this economic mean and actually means relating to household management right so the idea of home economics is kind of redundant it's what0:55:46
the word means and I would say that you know sort of our programming house is just like it's like a hoarders delight everything there's too much stuff in it0:55:55
everything is too big we need too many people to do basic things there's a lot more to closure this was not a tutorial on closure but the important thing is that most of it is in libraries closure0:56:06
grows file libraries the core is really pretty vigorously protected against growth it's not like a you know an0:56:15
experiment and language design so the one of the part of closure I'd like to talk about is polymorphism and it just because it's another example of simple I0:56:25
haven't talked a lot about simple but one of the cool things about closure is that polymorphism is independent in other words it doesn't require inheritance you can imagine has0:56:35
something called protocols they're a set of functions work together there polymorphic on the first argument so that's like the same kind of single dispatch you have in Java0:56:44
so you can imagine them as interfaces but they don't require inheritance and the beautiful thing about not requiring inheritance is you can have a protocol and you can extend it to something0:56:53
that's finished you know something that son wrote a long time ago and is never going to change and certainly is never going to implement your interface right you can also take stuff from two vendors0:57:03
right right because what we usually have we have the framework problem there's a privileged framework in Java the one that comes with it and people implement those interfaces but if you have a piece0:57:13
of software from vendor a that has an interface and stuff from vendor B like objects from vendor B that you want to use how do you get vendor be to implement vendor A's interfaces it just0:57:23
doesn't happen which is even worse than the C++ world but in Java we still have this problem there's a privileged framework that people invent the interfaces and otherwise implement interfaces are parochial or small so we0:57:36
have polymorphism enclosures is a la carte and that reduces couple of coupling because you don't need to derive alright I know Mabel maybe closure seems more0:57:45
interesting now so it's not just about technology right it's not even about these programming things it's also about ecosystem things like that and the first thing that's great about using closure0:57:54
is that you get to keep connection to the ecosystem you already know not only the runtime and the deployment environment but those libraries right the interrupts in both cases is0:58:04
extremely good all those closure data structures I showed you they all implement the appropriate Java util map whatever they all implement all the Java0:58:14
interfaces you can just take one of those things you wrote in square brackets that was pretty easy and pass it to something that expects a Java util list that implements random access right0:58:23
that works ready to go the closure is also very stable you know I value that in Java I think it's important part of0:58:32
why Java grew and closure takes the same approach is not like kids on github hacking away adding every new idea or it's not a think-tank experiment and0:58:41
it's made for production use and it's very stable all the programs from a long time ago still run there are books if you wanted to get started there's a lot0:58:50
of books now foreclosure you know I spoke here last five years ago and people have never heard of closure and now does plenty there are tools oh look does an IDE that0:59:01
looks like eclipse with code highlighting foreclosure and like structural NAB and oh that's IntelliJ same thing breakpoints look and you type0:59:12
and it starts popping stuff up this is good we're good and you have a ripple down there which is even better once you get used to that there's a ton of tools0:59:22
in various areas there are lots of libraries that says 12,000 repos on github there are lots of users the0:59:32
mailing list has almost 10,000 users on it and they're all happy nice people I promise now they are I think that matters if you've ever seen the old list0:59:41
community they weren't all happy nice people but close your users are happy and nice people that's where closures add in red monks where our language is that like way up0:59:51
there this functional Lisp what is happening in the world look at it there it is on the tech radar adopt it's actually gone off the tech writer the1:00:00
like of course you should be using closure already and and there right right people are using closure already a lot of people are already using closure banks use closure plenty of startups use1:00:10
closure plenty of analytics houses use closure so people are being successful with it so I think that's the short1:00:21
message for today the idea behind closure is to get you to better and more flexible programs sooner and the way it approaches that is by1:00:31
being data oriented and simple and I really appreciate your time thanks0:00:00
Simple Made Easy 2012 - Rich Hickey
0:00:00
thank you very much thanks for inviting me to give this talk I know everybody's0:00:09
thinking the same thing which is uh what's with that hair and I'm trying to get into the Foo Fighters but so far0:00:20
they haven't called me back so we're gonna talk about simplicity today and I think it's super critical in fact it's0:00:32
often said everybody agrees it's a it's the most important thing but it never gets its own track for some reason so0:00:41
try to make up for that and Dykstra said it's a prerequisite for reliability but you if you replace easy easiness ease is0:00:52
a prerequisite for reliability I don't think anyone would agree so these words mean different things and the point of0:01:01
this talk is to point out our easiness culture and try to give people who are trying to pursue simplicity vocabulary0:01:11
for talking about it in their own organizations and pursuing it in their own software so it's always fun to look at words and if we look at simple the0:01:20
root of the word is simplex which means one fold or braid or in twining and of course it doesn't make much sense to0:01:29
have only one braid so something with only one braid is unencumbered with anything else but complex means to twist0:01:38
the braids together or two braids together and obviously that's the source of complexity if we look at easy the0:01:47
root of this is a little bit vague er it comes from some french word and then it's speculative whether or not that comes from this latin word but i I like0:01:56
this derivation because I think it really points to something critical right so this in this derivation it says the easy means near by or at hand you0:02:06
know to lie near you and I think that's very critical we care so much about that now it's like what can I do in 10 seconds and0:02:15
it's hurting us so simple means one fold what does that mean what does it mean to fold something in software well you0:02:25
might think about it as one roll one task that something has to do one idea that something is about one dimension in0:02:35
which something focuses like security would be an idea of a dimension well it's critical to understand about simplicity is that it's not about just0:02:47
one instance you know counting something or one one operation you know having an interface with one method it's about not0:02:56
having the interleaving okay it's not about the cardinality but the the other thing that's interesting about simplicity is that its objective people0:03:06
are like oh that's easy that's hard it gets it gets wishy-washy but simplicity is actually pretty straightforward things are either twisted together or they aren't so that's a great property0:03:18
if we look at easy again we said it means to be near and how do we translate that to software and development well0:03:27
one one thing is it's like it's already installed that's it that's an idea of something being near it's in our tool set it's what we always use it's our IDE we can get it easily by using some0:03:37
command-line tool the second notion of something being nearby is that it's something that we already understand it's familiar to us0:03:47
the third notion which i think is also interesting but somewhat challenging to talk about is to be near our0:03:57
capabilities this is a sensitive subject right because we're in the problem-solving brainy smart people business and so to acknowledge that0:04:07
something is not near our capabilities is is an ego hit and to say that if someone else is very inappropriate but0:04:18
the fact of it as we'll see in this talk is uh from a capability versus problem size standpoint we're all in the same boat no one's0:04:29
really significantly smarter than anyone else so but but it's important to understand that we have limits to our capabilities and finally as I said0:04:41
simple is objective and easy is particularly not and it's not just subjective its relative once you talk about nearness right you know it's you0:04:52
know to say something is near implies near to what so there's always this relativity to easiness you're somewhere the other thing is over somewhere else0:05:02
how far apart are you is there is a measure of how easy it is or easy it will be for you and and I think that we0:05:11
we have real challenges here we repeatedly choose the familiar and to the extent we do that we're never going to learn anything new and we're really never going to have significantly0:05:20
different results so how do we how do we pull this stuff apart well I think one way to look at it and the most important one you know we often talk about tool0:05:30
fixation or methodology fixation is to distinguish the construct that we're using in our program or the tool or the0:05:39
method or the language from the artifact right we're all fixated on what is it like how many characters do I have to type can my IDE automatically redo it0:05:49
the convenience aspects from a business standpoint actually the ability to replace programmers is a driving force for repeating the same old stuff all the0:06:00
time it actually shouldn't be a programmer objective at all versus the artifact and what I mean by artifact I mean the0:06:09
program the thing we're actually writing the thing that has to run that has characteristics to in terms of complexity and and as Dijkstra was0:06:19
trying to point out if we want to make software that's reliable that works well it needs to be simple and furthermore if we want to be able to change our0:06:29
software or maintain it either to correct problems or address new requirements it's got to be simple so we have to stop assessing the0:06:39
the tools we use and the constructs for use and the languages we use from there from our own si ooh this is fun for me oh I can do this with two fingers it's0:06:49
not about that it's about what we deliver and how that is in practice over its lifetime so I talked a little bit0:06:58
about cognitive limits and we have to recognize that they exist right we can only make things reliable that we can0:07:07
understand and we can only juggle so many balls and somebody got the0:07:16
misapprehension that I was going to implying that software is like juggling software's not like juggling it has the same sort of cardinality to the limit0:07:26
when you're trying to consider a problem or a bug that involves X if X is complex that means it's somehow intertwined with Y it means that when you're trying to0:07:36
understand what's going on over here you've got a load into your head what's going on over there because they're both interacting with each other so intertwine things have to be considered0:07:45
at the same time that means that you know we have a big problem with complexity right every time you combine two things you twist two more things you might say oh I need to I need to think0:07:55
about this because we haven't a problem here and pull it up and get this rat's nest of stuff that's been attached to it and everybody's been there the other0:08:05
critical thing about things being simple is how easy are they to change right so we need to solve problems with them so we need to be able to think about them to do that the other thing is we need to0:08:14
be able to enhance them and again understanding is critical here if we can't understand our software we can't move it forward we can't say okay it0:08:23
does x and y in order to make it do Z the implications are gonna be this or that how do we make these decisions if it's if it's too hard for us to understand I think that's all very0:08:34
difficult I don't think having a great test suite is the way to be able to change your software without fear I don't think that at all it's your0:08:44
ability to understand your software that allows you to change it without fear right because if you didn't think about it for your tests they're not thinking about it and I'm not talking0:08:56
about reasoning and sort of the proof sense I'm just talking about casual reasoning to be able to sit down in a room with your teammates and talk about why something is going to work or not so0:09:09
in addition to our typing fixated culture and IDE fixated culture I also think we have sort of a test and safety0:09:18
oriented culture where people are there's nothing wrong with testing and safety but people are believing that it0:09:27
does more than it does okay so what we like to say what's true of every bug in the field that you found in the field right it passed the type checker its0:09:36
type safe right to the extent you're using a language that does that checking and it passed all the tests so when you pull it back from the field you're not0:09:45
gonna be like well let's run those tests again because you know maybe they didn't work the first time and we can find our problem you know they're not going to ever say anything that they didn't say0:09:55
before you shipped so you need to be able to think about your program you're you're done it comes back to you it isn't working the stuff you use to make it you already0:10:04
used it you got nothing else but your brains and your program to try to figure out your your problem so again our ability to reason about things kicks in0:10:15
we talked a little bit about development speed because here too I think we're making a short-sighted decision right we0:10:24
definitely are emphasizing early speed and there are certainly projects for which you don't care you have to get done in two weeks no one's going to use it in three weeks does not matter but in ignoring0:10:36
complexity is definitely going to slow you down what happens is you know you start off you have your first meeting you have there's no software you've got your0:10:46
methodology you've got your plan you know you're using some techniques you love your tool set but that you're it's just dreamland0:10:57
there nothing's happening and you feel really agile because the first week obviously you're going to make something and it's going to be more than and better what you have the second week but as0:11:06
time passes what's going to happen is your program is gonna be you know a new team member and it's going to grow and grow and grow it's going to turn into an elephant right and if it's not simple0:11:17
it's gonna come into your stand-ups and just trample everyone because you're your artifact actually dominates what you can do my agility is not about0:11:27
process right it's about doing right not doing over and at a certain point in time your program your artifact is going0:11:37
to dominate what you can do so you want to keep it keep it simple one of the problems we have I mean there's nothing0:11:46
wrong with things being easy in fact the point of this talk is to take simple things and make them at hand so you can use them the problem is a lot0:11:55
of things that are easy are complex or complicating they're easy they're easily described for instance assignment would0:12:06
be an example of this right everybody understands assignments oh look it's a new x equals five we're done what could be easier everybody knows that but the implications of x equals five in the0:12:16
middle of a block of code are profound and really difficult to manage so you have to you know sort these two things out they're readily available these0:12:26
kinds of tools and they're easy to use so I think we have to revisit our tool set to assess each construct that we use0:12:38
in terms of what it produces and how complex is that is that result because every bit of complexity that's introduced by a programming construct or0:12:51
an approach to things is incidental complexity right we're an incidental is a latin word it means that it's your fault we don't have we don't have anyone0:13:04
else to blame right so this guy is using a loom he's only going to one kind of result out of using a loom if you're using a loom I don't expect to get0:13:13
something other than a knitted thing out of it so now we have the two kinds of things you could make you have Lego castles and you0:13:22
have a knitted castle at castle on the top there is actually knitted out of yarn all right which one's going to be0:13:31
easier to understand right which one's going to be easier to change or debug it's definitely the Lego one right0:13:40
trying to change the knitted castle I you know it's just really really difficult you get a lot more flexibility also right so there's this whole0:13:49
maintain maintainability I think everybody understands that but the other thing you get out of not having things all intertwined is flexibility right as0:14:00
you want to move forward right so you obviously have maintenance tasks you know go back and move the you know the parapet from one thing to another or you know add a moat but you also have0:14:10
architectural flexibility that you that you lose when you've walked away from simplicity things like being able to change how the policy of some part of0:14:19
your application works or as we'll see later in this talk whether or not you can move it move things move subsystems is is a good example of the difference0:14:30
between those two things and again I'll point out the fact that testing and and and type systems are completely orthogonal to this right like you can0:14:41
test a knitted castle right but it doesn't it's not going to make changing it any easier or go any faster so how do0:14:52
we make things easy right because we want to maybe want to choose some simpler things but we want them to be as easy as the things we consider too easy today right the first thing is just you0:15:03
know get it on your box install it you know download it or get it approved the second thing is to learn it learn about0:15:12
it take some take some courses or go to a seminar or read a book these first two aspects of making something easy are they're up to you and you can easily do0:15:23
this you just choose to do it and you spend your time on it and you've made something easier for yourself totally up to you but there's that third part of easiness0:15:32
right we say that hand familiar then near our abilities this third part I think is is pretty tricky right because0:15:42
how quickly can you change your mental capability now very and as I said before it's not as if it's a race between smart0:15:54
people you know we're all smart the the problem is it's it's the the problem is this big and you know our brains are that big so you're not going to move0:16:03
them so again easiness we said was relative right it's the distance between your problem and yourself right so if0:16:13
you can't become significantly smarter by orders of magnitude in a short period of time what are you going to do to change this distance this can't move you0:16:22
have to move this over right so the thing that you have to do is simplify things in order to get them to become easier right you can learn about them0:16:32
that's up to you if you want to understand them better you can't get smarter so you got to make them simpler and move them towards you and you know0:16:44
again the juggling analogy I think is is sound in fact the numbers are pretty pretty close right how many how many ideas can you simultaneously keep in your head that's like seven plus or0:16:53
minus two right and like the best jugglers in the world I don't know what they can do is it's not more than probably the dozen balls so the the cardinality of those two operations is0:17:03
is similar so let's look at a simple example I have a couple of more involved ones later which is you know it's a it's0:17:12
a frequent complaint of people who are asked to learn or look at Lisp is that parentheses are hard and it's true they0:17:23
are hard right from the easy hard perspective they're not nearby if you haven't been using them they look like they're far away you you don't have them0:17:32
installed you're not familiar with them unless you you know took a course in in college so from the easiness at hand perspectives they are far away0:17:43
but then there's the you know the last one right cognitively are they simple or complex and it ends up that they're not simple and the reason why I use this0:17:52
example is not it has nothing to do with lists but it's a great example of that cardinality aspect of simplicity that simplicity is not about having just one0:18:01
thing right because in Common Lisp in and scheme you know they did everything with lists and they did everything with parens so they only had one thing you know what could be simpler than having0:18:12
one thing well it ends up that having one thing is not simple right because this is these languages that had only only parentheses and only lists they had to use them to represent everything so0:18:23
they're very overloaded right they use them for calls using for grouping they use them as data structure it ends up that overloading that one thing made it0:18:32
more complex and the thing that's simpler then one thing is more than one thing so for instance in closure one of the things that we did was use vectors0:18:43
for grouping instead of parens again and now as it was much clearer parens almost always mean a call and vectors now mean grouping so by having more than one0:18:53
thing it ends up being simpler and this is a this is a general principle that can apply to your own stuff if you find yourself doing a lot of overloading0:19:02
you're introducing complexity right because you're saying oh look there's only one thing yeah there's one thing that somebody types in and there's two things that somebody looking at what0:19:11
they typed in has to juggle to try it what is this what is this this case or that case every time they look at an overloaded thing they see two possibilities or more depending on how0:19:21
much overloading you're doing so again this is this is sort of a general principle it's not about cardinality it's about how much intertwining there0:19:30
is an overloading is essentially intertwining so let's talk about0:19:39
programmers this is an old quote talking about lists programmers who knew the value of all the constructs they were0:19:48
using but not the performance implications or costs or memory costs of them but I think we can move this phrase right to the modern error and apply it to0:19:57
all programmers programmers know the benefits of everything but the trade-offs of nothing right all I see is0:20:06
so look at this it does this oh look it has that oh look it's you know it's it's this fast never do you see but in order to be that fast it does this or in order0:20:16
to be that simple it has this other thing it's all benefits and no trade-offs and that really is hurting us so I'd like to go and look through some0:20:27
of the things that we use and see where the complexity is in these we have state0:20:37
and we have objects and we have an alternative which is values we have methods built into objects and instead0:20:46
we could be using functions and namespaces we use variables but we could use constructs that are that have stronger state transition semantics we0:20:58
use inheritance and switch statements and matching but we do have choices in some languages to get polymorphism that's not that doesn't pollute your0:21:11
types the way inheritance does more Alec Hart we have syntax often when we should have data we have actors and a simpler0:21:24
alternative would be queues yeah this one is interesting we have0:21:34
conditionals and and we could have rules and I think I think doing more rule oriented programming is is definitely in0:21:43
our futures if we want to make more reliable programs and finally we have inconsistency right or eventual consistency which is a fancy word two0:21:53
words for inconsistency and consistency now I don't mean to say that everything in this simplicity column is inherently simple some of it is just simpler than0:22:04
the thing in the other column but you got to look at your tool kits right because you have choices so I want to0:22:17
bring back a dead word called complex it means to intertwine or braid things because you know we saw simplicity means0:22:28
one one braid and complexity means two to entwine things but I don't think these are good words for software and you know like it you braided that on me0:22:38
so we're going to bring back this archaic word it's a verb complex and when you when you unnecessarily combine two things together or intertwine them0:22:48
you're going to say you complected that and when you're sitting in a meeting and somebody ruined your software by doing that you could say were you complected my thing with this other thing you don't0:22:57
want to do this right this is where complexity comes from from complected and we complex we do it we guess we just choose to do it and when we go and we do0:23:06
it we have to be more cognizant of when we're doing it so we have this pretty picture here where we see four strands0:23:17
drape straight down and if I ask you to say you know where does the third strand end up looking at the first picture you know that in a second right in a flash if I say where does the third strand end0:23:27
up in the last picture what do you have to do who's got it right what are you doing what are you doing you're like following going behind0:23:38
looking through the twists this is what we do right every should recognize this you know this should be touching the same parts of your brain that you use when you're programming right this way I0:23:47
do all day long I'm trying to like figure out what's gonna happen and it's more difficult the more complected it gets so and what's the other problem0:23:58
right if this has already happened what do you have to do if you want to move forward you have to fix it this is one of the things we often have to fix right0:24:07
it's not like we got we made a mistake oh we said plus instead of - that's a mistake we have to fix our mistakes this was not a mistake right this was like we0:24:18
went out and did this we we took out our knitting needles and went woo and then we need to fix our software and the first thing we have to do is undo this disentangle right the whole notion of a0:24:29
knot in softwares you know I think well well understood so we know how to fix this right we have another great word0:24:38
which is composed which means to place together right and composing simple components is the way to build robust0:24:47
software that's it yeah this is talk right this is like all0:24:56
right yeah everybody knows that's it yes everybody knows all of this but we sit down to write software and we don't0:25:05
think about it anymore so let's look more carefully at this component based approach right so we know these things right modularity is the key to making things simpler all we0:25:16
need to do is segregate our modules and we'll end up with simpler software but we know it's not that straightforward right because the blue piece could have0:25:26
some knowledge about how the yellow piece works and the yellow piece could have some knowledge about the blue piece works and they're both sitting there in their own modules and you're like what what look this is separate but they're not unless you are are making sure to0:25:40
make sure they're both only ever thinking about some abstraction that they share you could end up with things in separate modules that are not simple0:25:49
they're actually very complected so the physical either partitioning or stratification it does not imply simplicity right you can you can after0:26:00
you've made something simple that's a realization of it right but this doesn't go the other way right the implication only moves in one direction if it's simple it'll be easy to make it modular0:26:09
if it's modular eyes it doesn't mean that's inherently simple so you have to be careful of your current organization state however is unambiguously never0:26:19
simple right it essentially is complex there's no way to uncomplete it it0:26:28
because it combines value and time together all right programming languages make this extremely easy to do but the0:26:38
fact is that once you've used state and a particularly pervasive use of state it's going to cause everything that touches it to become interleaves and0:26:48
everything that touches it either directly or indirectly right if you have a stateful thing it doesn't matter if the things that are interacting with it do so through an encapsulated interface0:26:58
or through modules or anything else like that right they talk to you at one point and you know they give you X and you say Y and some other point they give you X and you say Z0:27:07
Wow how Nelson what's their decision-making process they can't treat you as a very stable constant thing that you're now this the source of of change0:27:18
there will be sources of change you can't build systems that don't have any change right but this is not something to slather all over your application it's something to say I am really scared of0:27:28
this I'm going to make sure when I use it I'm going to be very careful and explicit and put warning signs around it and say look there's some state here0:27:37
here are the rules about it this is what we're gonna do state is this problem has nothing to do with concurrency nothing's0:27:46
zero to do with concurrency state makes your program more complicated even if you don't have any threads so some new0:27:56
languages have constructs that make it more explicit right so you can say this this is a vowel or this is a var or this is a ref and it's important to note that0:28:08
none of these things actually make state any simpler I said it's essentially complex but they do help you know put0:28:18
signs on it look here's here's the state and when and when they have a way to say this is never gonna change right like final is a great thing you say final people don't0:28:27
have to worry about that anymore ooh that's final that can't be my problem that can't be changing that's really great a couple of languages have0:28:36
reference types that actually put more constraints over the assignment to variables such that they can with a0:28:45
combination of an API and the pattern of use make it so that the only thing you ever put in a variable is a value that can't change another thing you can ever get out of a variable is a value right0:28:55
as soon as you do that it's very simple it's much simpler just to manipulate state because there's the state you get something out of it now you're out of0:29:04
the state game you're not constantly referring to something that can change you now have the value and you can use that and the critical thing I think for a lot of programs is you know if you're0:29:15
using more than one variable to represent a thing you're doing it wrong you you have got a problem right because now a sudden you have a big coordination0:29:25
problem right if you have a single variable that holds a composite immutable value you can easily atomically transition from one value to the next if you have four variables to0:29:35
make up a thing how do you how do you coherently transition from one state to another now you get into locking and all those other kinds of problems so I would0:29:46
look for constructs that help you with that okay so let's dig in a little bit more to these constructs and and how do they0:29:55
how are they complex I just said they were so how is that is that the case we already said of state right it can't it0:30:04
complex everything that touches it right because I'm counting on you and you say something different every time then somebody's counting on me how could I not say something different every time because I'm counting on you0:30:14
something's counting on them and it's just a complex your whole system objects especially in the very traditional sense0:30:23
or is delivered by languages like Java boy it's just a pie I mean you can't you just go on and on and on one is a class it's got state it's got an identity it's got values got operations it's like a0:30:33
kitchen sink of stuff poured together and every time you take one out to solve a problem you're starting you with three or four strands already wrapped together0:30:43
you've already got that much just when it's blank you know class X open closed you know what does it have a final0:30:53
constructor how do you make it how do you copy it what are its concurrency semantics what are the set of methods that can do what if it doesn't do what I0:31:02
wanted to do how can I make it do something else it's derivation stuff all baked in blob I mean it just goes on and on and on these are terrible constructs from a complexity standpoint methods are just a0:31:12
subset of that but the same kind of thing to the extent you have a method on a stateful object you're complected us to things syntax is an interesting one0:31:23
right we love our many languages and and whatnot but syntax is again is it by definition is a complexity of two things because syntax means0:31:32
to derive meaning from the arrangement of something so the meaning of something and the arrangement of something are combined and we'll see in an example0:31:42
later how subtle that is but it comes up all the time inheritance complex types right you're saying this type is complected with that0:31:51
type and so it's what you're saying when you say inherits or extends switching and matching right complex a set of inter entry points every says or you0:32:01
should prefer you know polymorphism to that because that is complected right that it makes a point in your program that's baked in as a set of decisions and the more you do that the more places0:32:11
you have to change when you change that decision variables we talked about already complexing value in time loops0:32:20
complex what you want to do with how you're doing it or do this now do this now do this now do that you've not walking away you're not up a higher level saying I just really want to map0:32:30
this function across this collection that's a way to say it say what you want to do without saying how you want to do it but when you look at a loop those two things are together right constantly in0:32:40
Java for instance where we're writing loops and every time somebody walks up to a loop they have to figure out is this loop summarizing what it's going over is approaching another collection0:32:50
that's a modification of what it started with is it changing the thing it's you know you have to reread it because there's no way to say up at the top without a comment this is a mapping this0:32:59
is a filtering this is a reduction you want to start using functions like map and filter and reduce so that you don't have to parse your loops because they've0:33:08
complected these two things actors complex what has to be done with who's going to do it yeah again0:33:18
and and conditionals are also interesting you know so we say if this blah blah blah and we have you know these kinds of conditionals and we put a0:33:27
lot of important business logic into that and that one of the problems with conditionals is that they're situated right if I say if this is a good customer do whatever you know that0:33:36
expression is like in the middle of part of a method of in the middle of my program and and everything about what makes that true has has to do with like0:33:46
where it's situated it's great if we can somehow lift that stuff up right because of what we've done here is we've complected why we're making you know0:33:56
doing this or that with where it sits in our program so what about some alternatives and in what ways are they0:34:05
simpler values I mean if I could say anything it's that you should be programming with values as much as you0:34:14
possibly can because they're dead simple they're essentially simple like we said some things were essentially complex they're essentially simple right so you0:34:23
can get this from using final pervasively or using persistent collections which give you collections that are actually immutable functions0:34:34
are simple right when you do when you try to test a method what do you have to do you have to set up the whole context the state of the thing and everything on the whole world right the method is0:34:44
related to the whole world when you test a function anybody function I mean you know a stateless method that has no side effects okay it's the same every time you can test it completely in isolation0:34:54
it's a much simpler thing namespaces are simpler and if your length language has them that's where you get them another big big one and we'll see this later is0:35:04
data data is is essential we were supposed to be writing programs that manipulate it and the first thing we do is we ruin it instead of just using it0:35:14
directly so use your data structures in your languages PI morphism a-la-carte is great this is harder to get if you don't have these kinds of constructs like0:35:24
protocols or type classes things like that we talked about manage references you can get set functions right that kind of like map filled reduce that you can get libraries that0:35:33
let you say map instead of making you write a loop you should use those cues again you can get from libraries declarative data manipulation again is a0:35:43
little bit harder but certainly things like sequel and data log are great and if you have the ability to express some of your program using these things you0:35:53
should do it because they they separate what you want to accomplish from how it gets done and that's big rules again you0:36:02
can get from libraries and consistency you get from transactions so you have to be careful when you walk away from consistency because it again is going to0:36:12
affect the rest of your program now there are some kinds of complexity that we can't do anything about I would call environmental complexity a lot of these have to do with the fact that our0:36:21
programs are parts of our programs share stuff with either other programs or other parts of our programs right they're sharing resources memory CPU and0:36:32
they're all contended for and that yields all kinds of problems right you get garbage collection problems or memory out of memory problems the0:36:43
problem here is that we don't really have good answers for these right if you try to segment your shared stuff then you have waste right I said you get half I get half you know if you're not using0:36:52
your half we're wasting our computer but then so far most of the solutions to the environmental complexity problems don't0:37:02
compose well right so like I have a thread pool that's great ok this like 14 other thread pools in this application now so how's that gonna work right in0:37:12
other words individual good decisions don't combine to be a good set of decisions this is a tough area so I would say that this kind of complexity0:37:21
is inherent and you know that's Latin for that's not your fault you do need to think about these things but there's no getting around them this0:37:30
stuff you're gonna going to encounter so we had abstraction bashed pretty well0:37:40
yesterday and III think that there's sort dangers and over abstracting but I think abstracting for simplicity is never0:37:50
wrong right if you're if you're abstractions are driving you down right driving down the sets of things you have to deal with you're you're gonna be better off and0:38:01
maybe one of the simplest things is to say okay we know this danger isn't over abstract thing you get ten abstractions your team's gonna have ten abstractions0:38:11
only ten pick the ten that you want it's going to drive you to a lot of things I showed you on the previous slides right it's gonna drive you towards well I wanted instructions that let me map and0:38:21
filter I want abstractions around your data if I'm only gonna have ten I can't have this elaborate I can't make I process factory collection of resource manager0:38:31
or whatever abstraction right because if that's one of my ten I'm doomed but having ten abstractions will make your life substantially simpler so I0:38:42
think one of the things about abstraction is to recognize the difference between abstracting in order to simplify and abstracting in order to hide right if abstracting is like you know putting0:38:52
paint on it or you know just like putting it in a box and closing it because it reeks too bad that's not good that's not actually you're not accomplishing anything there it's not about hiding things I used to0:39:06
have a t-shirt the student made for me that said I don't know I don't want to know because I said it's so often in my C++ class this is the approach you want to have this is really what you want to0:39:16
be doing you want to be doing you want to say so he says oh I have this thing you just need you to blah blah blah you're like I do not want to know that you have to make it possible for me to use your thing and not know that and0:39:26
when you do that you'll be forced to decent abstractions as opposed to sort of fabricate abstractions solving problems that nobody has all right so0:39:35
I'd like to do a couple of examples to make this more concrete and I love this quote I found it as I found everything when I'm doing talks by you know like0:39:44
just I'm doing talking about excellently just looking stuff on the internet so it's not like a bread francuzzi but it's a great thing because it says that there0:39:54
is there are essentials to things and when you find them and learn to recognize them things will become simpler so I want to talk about two0:40:04
things in the last two examples here the first is order and the second is this information so we're gonna talk about0:40:13
lists and order a list we all know what that is it's a sequence of things right one thing comes before the next thing but any time you encounter a list you0:40:22
have this question does the order of things in this list matter right because you can have different kinds of lists like you might have a list like this it's all the same stuff there's one0:40:32
thing and two things and there might be three things or five things or seven things you get a sense of this as being sort of a uniform collection of stuff an order doesn't matter right it's just0:40:41
this is the stuff I put it in a list just you might want to access it that way versus a more couple like use of a0:40:51
list that says well the first things going to be the depth and the second thing is going to be the width and the third thing is going to be the height and immediately you're starting to say wow I'm not sure I would hope you starts0:41:02
to say I'm not sure I love that way of talking about this because like I'm gonna forget or the order of these things right and it's easy when when you0:41:11
have undifferentiated sets of things to just use sets instead of lists right if you use set it advertises to people the order doesn't matter this is a collection of things and I'm telling you0:41:21
the order doesn't matter and that they're unique so that's it that's a nice thing but we're still faced with sometimes we don't need sets we need we0:41:30
need lists and sometimes we have named parts what should we do well why do we even care about this why make a point out of it right it is a complected right0:41:40
to complex one thing with the next what's the next thing you're going to counter and if you moved one or inserted something what would happen it also infects every use right if I'm using0:41:50
your with depth height loop I already made a mistake because I think it was depth width height or something I forgot already so the usage points get infected0:42:00
it inhibits change right if you start doing this and inhibits change right if you have name and email and you say well I'd like to make phone the second thing what's gonna happen to everybody0:42:10
was using your stuff they're gonna break now at this point everybody have IDs now so like my ID will fix it oh man it may0:42:20
fix it for you but if this is your contract over service interface it's not gonna fix it for them so everything all right but this is obviously dumb right0:42:31
we're not going to do this we're not gonna put you know particular things in particular places in the order of the list right we never do that we would use a map or hash or associative thing right0:42:42
so we never do this right we do this all the time it's baked into a ton of things that we do a ton of things right we do0:42:52
this every time we use padishah positional arguments right that's I can have a function that took with that tight right every is familiar with that you might never use it as an interface to a to a service but you do it every0:43:02
day when you write it functions right now I'm not saying of all these things that I'm criticizing as being complex that they're bad and you should never use them right I'm what I'm saying is0:43:12
they had these complexities and you should understand them before you apply them and you should know I am making a trade-off right I may make this trade-off I'm for instance closure made0:43:21
this trade-off it has positional arguments a lot of languages have positional arguments and few have named arguments but you know you'd have to do that with eyes wide open you're making a0:43:32
choice that's going to introduce some complexity syntax the same thing we said order matters it's an example of this sort of fundamental nature right the0:43:41
ordering problem we could call it product types if you use languages that have them are exactly like that with depth height you know the type of that thing could be float float float you're0:43:51
lost you got nothing right you could rearrange those things or get them wrong they're telling you nothing about the semantics of what you're doing imperative programs earnings above the0:44:00
order problem right can I take X equals 5 and move it up three lines or down three lines no it means something different when I move it0:44:09
around Prolog suffers from this problem and it's not as if I feel like you know people are using Prolog but you know if you're using some declarative language0:44:18
or something somebody's given you a DSL or something this is a question you should ask about it does the order of the in which I say things matter because if0:44:27
it does that's an that's a negative cold chain or an example of this right so this object object X called object Y calls object Z that's an example of with0:44:38
death pipe you've baked into your program this chain of things right and if you want to say oh I want to add a step you can't just add it right you0:44:47
have to change the guy before and potentially the guy after but at least the guy before he needs to know he shouldn't be calling X shouldn't be calling Y actually be calling W and W0:44:56
needs to know to call Y and then Y call Z right XML is another great example right what is the M stand for an XML0:45:06
markup what are they marking up documents what do documents have they have ordered order matters tremendously two texts0:45:15
right I can't just take the words in a sentence or sentences in a paragraph or paragraphs in a document and switch the order and end up with the same meaning documents really have this thing but0:45:25
we're using this for non documents all the time how many people are actually using XML for text as markup anymore nobody but tons of people are using it0:45:34
for data where it stinks right because even if by convention you've agreed that you're going to use XML in it in a date away in a data like way maybe even a0:45:44
very rigid way you're gonna say we're gonna make the elements in each guy as if they were entries in a map now you can agree you can by convention0:45:53
say we're going to take a simpler use of XML but your XML tool you're gonna pull out does it know that you're doing that no it doesn't know that you're doing0:46:03
that because it has to support that order problem it's got the nastiest stupid Dom or sequential event triggered interface which you now have to use on0:46:13
every piece of data you choose to represent in XML it's an example of the ordering problem so we have choices0:46:23
right instead of using positional arguments we could use named arguments if our programming language supports it or we could pass arguments in a map which gives us again that nice named0:46:33
labeling and independence over where things appear instead of using texts for things we can use data right we don't tend to use syntax right when0:46:42
you think about going out of the box right going out of the box and making certain things service-oriented there's really helped us because we've seen all the goofiness we do inside our programs0:46:51
that's no longer tolerable when we want to talk service wise right I can't say you can talk to my service here's my programming language for talking to me here's my language who does that who has0:47:02
I'm gonna make a service I'm gonna give consumers of the service a language they can talk to me in no one but inside our programs we're happy to do you know all kinds of syntax stuff product types you0:47:13
can replace with records imperative programs so you can replace with declarative programs in the keynote yesterday morning we're talking that it was said you know it can be difficult for people to know how to map their0:47:22
problem to Map Reduce right why is that because right now if you use like the Java interface to Map Reduce you know how things are done is like in your face0:47:31
you want to be talking about what you want to do there are other interfaces to Map Reduce that use data log where you have no idea how Map Reduce works you0:47:40
say I want to accomplish this go and it figures out how to do Map Reduce moving to declarative programming will really improve your programs and take away these problems because you shouldn't0:47:49
need to care about how Map Reduce works you can replace call chains with cues right X was calling why I was calling Z if extra spending those results on on a0:47:59
cue and Y was wired up to consume from that cue then what would the person who wants to have X send stuff to W need to0:48:08
do just rearrange the cues not the not the caller's right rearrange the consumption of the cues you've now gotten this policy0:48:17
independence right because you have less complected you had that degree of independence I talked about before architectural II my things that are simpler can be rearranged in a more0:48:27
straightforward way and XML we can place and sort of have already with a more explicit representation I mean JSON is not the best thing in the world but it's got at least you know0:48:38
this list type is a list and this map type is a map it's clear that it's a map all the parses can give you back a map they don't have to say well you might0:48:47
have interleaved some text in there so I can give you a Dom instead of a map and that's a beautiful thing and it's sort of winning I think not because people are being explicit about that because that value0:48:56
proposition is there and they're saying they're realizing maybe just incidentally that's making their lives simpler so I can't say enough about0:49:06
using maps you want maps if your programming language doesn't have first-class support for maps I mean idiomatic support you can write map0:49:15
literals you can you have syntax support for accessors you have the ability to do symbolic keys and a library for doing generic manipulation you either hat0:49:26
should push for that to be added or get out of there you're wasting huge amounts of time it inordinate amounts of time taking stuff in and out of objects that0:49:36
are adding nothing so that's the order problem and generally it's solved by using associative like structures we can0:49:46
talk now about the information problem this is actually not a problem we don't have a problem information it's them it0:49:55
starts off simple right what is our problem we're not satisfied with that or our programming language can't handle it it's just you know it's amazing how bad0:50:05
some of our languages are at manipulating data right what do they force us to do oh I I can't actually have a map with with the name and a0:50:14
dresser or with depth and height I can't do that it's cumbersome for me to do I have no syntax for interacting with it I've got to make a class as soon as I made a class I say class box or whatever0:50:25
that was now what have I got nothing as blank I have to define a whole language for interacting with boxes now if I look0:50:35
at box and I look at like person record I don't see two different things I see two instances of the same thing these are two associative types to have different fields in it but no we add0:50:45
these classes right and there's big cost to this you can't write things that generically manipulate classes right and although you know reflection exists for0:50:54
a reason too but it's extremely cumbersome right it ties it to the logic ties the logic to the representation you're using right now right that class is a very specific0:51:04
thing it's tied to your programming language for instance I think we should start representing data as data in our programs so let's look a little bit0:51:14
about what we think we're doing when we're wrapping information we think we're doing encapsulation this is not what encapsulation is encapsulation means hiding0:51:23
implementation details right this is how I work and I don't want you to know how I work right information doesn't work information isn't something whose it has innards0:51:32
there's no innards right there's no implementation of information you added it why did you do that you should really0:51:44
have an answer to that question why did you do that it might the answer might be because I'm using Java and I had no choice that's not a great answer but it0:51:56
might be it might be the truth okay information will have representation that there's no getting around this you no amount of classes you add is gonna change the fact that eventually there's0:52:05
going to be a face to that and that face is going to be what somebody else consumes right so you know you don't solve anything by wrapping information up in stuff you're still gonna have I'm0:52:16
going to see information I'm going to know about XY and Z but you may add a lot of other stuff so what else do we bring in when we take this approach we're going to make classes for our0:52:25
information all right so what does this look like it looks like this and we're being good so we're using interfaces but you're not going to marry my concrete person class have a person info0:52:36
interface I have get name and other hideousness right then there's going to be a consumer right somebody that knows0:52:46
how to do something we're going to pass these things around so 9 implementation detail I'm using it and so the consumer has you know some some activity it does on your behalf and it takes this person0:52:56
info I was at a talk where someone was advocating it was a good talk building systems out of systems you know building0:53:05
systems out of sets of systems out of services and taking that architectural approach from the get-go like when you design a new system0:53:15
you should compose it out of services and really architect that from the from the beginning and I had a question about that which was you know if if we had0:53:24
designed the first you know it was in contrast to a monolithic system that eventually you grow and say well now we got to break this apart and that's really hard would there be a way to0:53:34
start with a single process and using the right architectural principles end up with something that when we decided to service eyes parts of it that0:53:43
wouldn't be a big deal and and I got back a great answer which was that's a great idea but the programming languages0:53:52
don't let you do that and I realized that was the perfect example of this problem of wrapping information because0:54:01
because he was right one way to it's a sort of litmus test whether or not you've got things simple is is can you move it right can you move0:54:14
your subsystem I don't care like how you move it right can you move it out of process can you change it to a different language can you move it into a different thread the fact that you know you might have to move it is a great way0:54:24
to think about things like what if this wasn't exactly what I'm thinking about now but somebody had to move it over there I'm not talking about inventing abstractions to do this or predicting0:54:33
the future right but your ability to move something really impacts really indicates whether or not what you built0:54:42
was simple not being able to move it is the knitted castle effect right if I say well now I'm going to take this thing and it was in process and it was just0:54:51
your you know subsystem and and I want to make it a service that we call if that's a big job you didn't have it simple in the first place so what would0:55:00
a subsystem have to have in order to be easily moved it should have well-defined boundaries right it should have an abstracted operational interface right0:55:09
there should be an interface that encapsulate the purpose and how they work that way we can move it somewhere else they could work differently we can move it to another language it can work differently move it to another threat to0:55:18
have different threading policy it's not there it's just some general way of doing handling and without getting into the that's a deep problem but certainly0:55:28
there are simple ways to get this wrong checked exceptions would be an example of that there like a poster child of complected right because what's going to happen if you move this out of proc your0:55:39
implementation is now going to have a different kind of exception right now you could have i/o exceptions that you didn't have before and that's going to poison the whole call chain the fact that that happens due to checked0:55:48
exceptions indicates check tip set check exceptions are a disaster they're inherently complected it's a terrible terrible idea so this is the critical0:56:00
thing though where do we actually go wrong before it's that subsystem should have data as the interface for what they take in return right we didn't do this right we0:56:12
passed I person infos they are not data and we're not going to solve this by just putting in some serialization why0:56:22
not because the consumer of this thing didn't just get data here they got code who knows what else was an0:56:31
i person if those stuff they could call if I want to now put this over in a service am I gonna can do something you can call we're gonna start making like circular HTTP calls to get stuff done0:56:42
no in fact you're saying wow that's that's stupid right it's stupid one two processes but it's okay and one it's not you should be taking and returning data0:56:52
you wouldn't have this problem even inside so in the end I think we have a0:57:01
lot of tools some of the tools inherently generate complexity some are inherently simpler but nothing makes nothing manufacturer's complexity0:57:13
manufacturers simplicity right there's nothing you can do there's no technique there's no methodology you can adopt or cult that you can join that will force0:57:22
your things to be simple this is a decision-making process you have to make over and over and over again you have to become vigilant about it and it says often a sensibility thing like I talked0:57:31
about this order problem that's the kind of thing you want to develop sensitive sensibilities around right you need to be able to smell that coming in saying whoo I see the order0:57:41
problem here maybe I should do that maybe I should do something about that you want to sensibilities to be based around entanglement not ease-of-use you0:57:50
can make these things easy to use in fact the vast majority of those simpler things are as easy are easier to use and the complicated things there may just be less familiar the other thing is to make0:58:02
sure you don't overestimate your reliability tools right they are not simplicity tools you can test you can0:58:12
type check complex things the same way you can test and type check simple things it's orthogonal it doesn't mean those things are bad but they're not0:58:22
helping you in this area you need to do this by choice so to make simplicity easy right to bring it nearer yourself0:58:32
you need to choose simpler constructs you need to especially avoid complexity generating constructs ones that just start you off behind the behind the0:58:42
8-ball to focus on the fact that it's what you're making not how much of a party you're having making it you want0:58:52
to create abstractions that are focused on simplicity right abstracting is not about fabricating universes it's about boiling things down you often want to0:59:02
try to simplify the problem space before you start that would be a whole other talk about analysis but simplifying during analysis is critical here again if you start with something that's all0:59:12
tied together and say you implement this knot if somebody ask you to implement a knot you shouldn't say okay you should say that's a knot we're gonna tie it before0:59:21
we start and also to remember that simplicity often makes means making more things not fewer it's not about counting necessarily although generally simpler0:59:31
things are smaller and have smaller interfaces and stuff like that so I'll leave you with this quote and you should0:59:40
use this whenever somebody tries to sell you something sophisticated and that's it you [Applause]0:00:00
Simple Made Easy 2011 - Rich Hickey
0:00:00
who's ready for some more character category theory you're all in the wrong0:00:10
room so this talk I hope seems deceptively0:00:21
obvious one of the things that's great about this conference is this is a this is a pretty cutting edge crowd a lot of you are adopting new technologies a lot0:00:30
of you are doing functional programming and you may you may be nodding saying yeah yeah yeah through parts of this and if some of its familiar that that's0:00:39
great on the other hand I think that I would hope that you would come away from this talk with some tools you could use to help conduct a similar kind of0:00:49
discussion to this talk with other people that you're trying to convince to do the right thing so I'll start with an0:01:00
appeal to Authority simplicity is a prerequisite for reliability I certainly agree with this I don't agree with everything Dykstra said and I and I0:01:10
think he might have been very wrong about proof in particular but I think he's right about this we need to build simple systems if we want to build good0:01:19
systems and I don't think we focus enough on that I love word origins there are tremendous fun one of the reasons0:01:29
why they're fun is because words eventually become come to mean whatever we all accept them to me you know whatever is commonly understood to be the meaning is what it means and it's0:01:39
often interesting to say well I wish I could I wish we'd go back to what it really means and use that and I think there's a couple of words that I'm gonna use in this talk that I would love for0:01:48
you to come away knowing the origins of and try to use more precisely especially when talking about software so the first0:01:57
word is simple and the roots of this word are sim and Plex and that means one fold or one braid or twist and that0:02:10
characteristic about being about one literally fold or twist of course one twist what's one twist look like no twists0:02:19
right and actually and it the opposite of this word is complex which means braided together or fold it together0:02:30
being able to think about a software in terms of whether or not it's folded together sort of the central point of this talk the other word we frequently0:02:39
use interchangeably with simple is the word easy and the derivation there is - a French word and the last step of this derivation is actually speculative but I0:02:49
bought it because it serves this talk really well and that is from the latin word that is the root of adjacent and0:02:58
which means to lie near and to be nearby and the opposite is hard of course the root of hard has nothing to do with lying there it doesn't mean lie far away0:03:08
actually means like strong or torturously so so if we want to try to0:03:17
apply simple to the kinds of work that we do we're going to start with this concept of having one braid and look at0:03:27
it in a few different dimensions I thought was interesting and Eric's talk to talk about dimensions because that's definitely a big part of doing design work and so if we want to look for simple things we want to look for things0:03:36
that have sort of one of something they do they have one role they fulfill one task for a job they're about accomplishing sort of one0:03:45
objective they might be about one concept like security and and sort of0:03:56
overlapping with that is they may they may be about a particular dimension of the problem that you're trying to solve the critical thing there though is that when you're looking for something that's simple you want to see it have focus in0:04:06
these areas you don't want to see it combining things on the other hand we can't get too fixated about one in0:04:16
particular simple doesn't mean that there's only one of them right it also doesn't mean an interface that only has one operation so it's important to0:04:27
distinguish cardinality right counting things from actual interleaving what matters for0:04:37
simplicity is that there's there's no interleaving not that there's only one thing and that's very important okay0:04:48
the other critical thing about simple as we've just described it right is if something isn't relieved or not that's0:04:57
sort of an objective thing you can probably go and look and see I don't see any connections I don't see anywhere where this twist was something else so0:05:06
simple is actually an objective notion that's also very important in deciding the difference between simple and easy0:05:16
so let's look it easy I think this notion of nearness is really really cool in particular obviously there's many ways in which something can be near0:05:26
right there's sort of the physical notion of being near right is something you know like right there and I think that's where the root of the word came0:05:35
from you know this is easy to obtain because it's it's nearby it's not in the next town I want to take a horse or whatever to go get to it we don't have the same notion of physicality0:05:44
necessarily in our software but we do sort of have you know our own hard drive or our own tool set or it's sort of the ability to make things physically nearby0:05:54
getting them through things like installers and stuff like that the second notion of nearness is something being near to our understanding right or in our current0:06:05
skill set and I don't mean in this case near to our understanding meaning of capability I mean literally near something that we already know so the0:06:16
word the word in this case is is about being familiar I think that collectively we are infatuated with these two notions0:06:26
of easy we are just so self involved in these two aspects it's hurting us tremendously right all we care about is0:06:35
you know can I get this instantly and start running it in five seconds it could be this giant hairball that you got but all you care is you can you get it0:06:44
in addition we're fixated on oh I can't I can't read that I can't read German does I mean German is unreadable no I0:06:55
don't know German so you know this this sort of approach is is definitely not helpful in particular if you want0:07:04
everything to be familiar you will never learn anything new because it can't be significantly different from what you already know and not drift away from the familiarity there's a third aspect of0:07:15
being easy that I don't think we think enough about that's going to become critical to this discussion which isn't which now is being near to our0:07:24
capabilities and we don't like to talk about this because it makes us uncomfortable because what kind of capabilities are we're talking about if we're talking about easy in the case of0:07:33
violin playing or piano playing or mountain climbing or something like that well you know I don't personally feel bad if I don't play the violin well0:07:43
because I don't play the violin at all but the work that we're in is conceptual work so when we talk to start talking about something being outside of our0:07:52
capability it was you know it really starts trampling on our egos in a big way and so you know due to a combination0:08:01
of hubris and insecurity we never really talked about whether or not something is outside of our capabilities it ends up that it's not so embarrassing after all0:08:12
because we don't have tremendously divergent abilities in that area the last thing I want to say about easy and the critical thing to distinguish it0:08:22
from simple is that easy is relative right playing the violin and reading German are really hard for me they're easy for other people certain other0:08:32
people so unlike simple where we can go and look for interleavings look for braiding easy is always going to be you know easy for whom or hard for whom it's a0:08:43
relative term the fact that we throw these things around sort of casually saying oh I'd like to use that technology because it's simple and when I'm saying simple I mean easy and when I am saying easy I mean because I already0:08:53
know something that looks very much alike that is how this whole thing degrades and we can never have an objective discussion about the qualities that matter to us in0:09:04
our software so what's one critical area where we that where we have to distinguish these two things and and0:09:14
look at them from a perspective of them being easy and being simple it is it has to do with constructs and artifacts right we program with constructs we have0:09:24
programming languages we use particular libraries and those things in and of themselves when we look at them like when we look at the code we write have0:09:33
certain characteristics in and of themselves but we're in a business of artifacts right we don't ship source code and the user doesn't look at our0:09:43
source code and say oh that's so pleasant right now they run our software and they run it for a long period of time and over time we keep climbing more0:09:54
stuff on our software all that stuff the running of it the performance of it the ability to change it all is an attribute0:10:03
of the artifact not the original construct but again here we still focus so much on our experience of the use of0:10:13
the constructs who look only heads type 16 characters wow that's great no semicolons or things like that this whole notion of sort of program or0:10:23
convenience again we are infatuated with it not not to our benefit on the flip side it gets even worse our employers0:10:33
are also infatuated with them right those first two meanings of easy what do they mean right if I can get another programmer in here right and they look0:10:44
at your source code and they think it's familiar right and they already know the toolkit right so it's near at hand0:10:53
they've always had the same tool in their toolkit they can read it I can replace you it's a breeze especially if I ignore the third notion of easy right0:11:02
which is whether or not anybody couldn't stand your code right because that don't actually care about that they just care that someone can go sit in your seat start typing so again as sort of business0:11:14
owners there's sort of an again the same kind of focus on those first two aspects of easy because it makes program is replaceable so we're going to contrast0:11:23
this with the impacts of long term use right what does it mean to use this long term and what what's there what's there0:11:33
is all the meat right does the software do it is supposed to do is it of high quality can we rely on it doing what's0:11:42
supposed to do can we fix problems when they arise and if we're given a new requirement can we change it these things have nothing to do with the0:11:53
contract as we typed it in or very little to do with it and have a lot to do with the attributes of the artifact we have to start assessing our0:12:03
constructs based around the artifacts not around the look and feel of the experience of typing it in or the cultural aspects of that so let's talk a0:12:15
little about bit about limits oh look it doesn't move this is just supposed to sort of lull you into this state where everything I say seems true because I0:12:27
can't use monads to do that this stuff0:12:36
is pretty simple logic right how can we possibly make things that are reliable that we don't understand it's very very difficult I think a professor Sussman0:12:47
made a great point saying there's going to be this trade-off right as we make things more flexible and extensible and dynamic in some possible futures for0:12:56
some kinds of systems we are going to make a trade-off in our ability to to understand their behavior and make sure that they're correct but for the things0:13:07
that we want to understand and and make sure are correct we're going to be limited we're going to be limits our understanding and our understanding is is very limited right there's a whole0:13:16
notion of you know how many how many balls can you keep in there at the time or how many things can you keep in mind it's a it's a limited number and it's a very small number all right so we can0:13:25
only consider a few things and when things are intertwined together we lose the ability to take them in isolation so if every time I think I pull out a new part of the software I need to0:13:35
comprehend and it's attached to another thing I have to pull that other thing into my mind because I can't think about the one without the other that's the nature of them being intertwined so0:13:45
every intertwining is adding this burden and the burden is kind of combinatorial as to the number of things that we can we can consider so fundamentally this0:13:56
complexity and by complexity I mean just braiding together of things is going to limit our ability to understand our systems so so how do we change our0:14:09
software apparently I heard in a talk today that agile and extreme programming have shown that refactoring and tests allow us to0:14:18
make change with zero impact I never knew that I still do not know0:14:27
that that's that's not actually a knowable thing it's that's phooey right0:14:36
if you're going to change software you're going to need to analyze what it does and make decisions about what it ought to do yeah I mean at least you're going to have to go and say what is the impact of this potential change right0:14:47
and what parts of the software do I need to go to to affect a change and you know I don't I don't care for using XP or0:14:57
agile or anything else you're not going to get around the fact that if you can't reason about your program you can't make these decisions but I do want to make0:15:07
clear here because a lot of people as soon as they hear the word reason about they're like oh my god are you saying that you have to be able to prove programs I am NOT I don't believe in that I don't think that's an objective0:15:16
I'm just talking about informal reasoning the same kind of reasoning we use every day to decide what we're gonna do we do not take out category theory and say you know we actually can reason0:15:28
without it thank goodness so what about what about the other side right there's two things you do with the future of your software one is you add new capabilities the other thing is you fix0:15:38
the ones you didn't get you know so you know done so well and I like to ask this question what's true of every bug found in the field it got written yes what's a0:15:52
more interesting fact about it it passed the type checker what else did it do it passed all the tests okay so now what do0:16:05
you do right I think we're in this world I'd like to call guardrail programming right it's really sad we're like I can0:16:17
make change because I have tests wait who does that who drives their car around banging against the guardrail saying well I'm glad I've got these guardrails because0:16:27
I'd never make it to to the show on time right and and do the guardrails head0:16:37
help you get to where you want to go like the guardrails like guide you places no there's guardrails everywhere they don't point your car in any particular direction so again we're0:16:50
going to need to be able to think about our program it's going to be critical all of our guardrails we'll have thought will have failed us we're gonna have this problem we're gonna need to be able to reason about our program say well you0:16:59
know what I think because maybe if it's not too complex I'll be able to say I know through ordinary logic it couldn't0:17:08
be in this part of the program it must be in that part and let me go look there first things like that now of course everybody's gonna start moaning but I have all this speed I'm0:17:19
agile and fast you know this easy stuff is making my life good because I have a lot of speed so what kind of runner can run as fast0:17:30
as they possibly can from the very start of a race right only somebody who runs really short races ok but of course we0:17:43
are programmers and we're smarter than runners apparently because we know how to fix that problem right we just fire the starting pistol every hundred yards0:17:53
and call it a new sprint all right I don't know why they haven't0:18:04
figured that out but right it's my contention based on experience that if you ignore complexity you will slow down0:18:14
you will invariably slow down over the long haul of course if you are doing something that's really short-term you don't need any of this you could write it you know win ones and zeroes and this0:18:26
is my really scientific graph you notice how none of the axes are there's no numbers on it because I just I completely made it up it's a it's an0:18:36
experiential graph and what it shows is if you focus on ease and not an ignore simplicity so I'm not saying you can't0:18:45
try to do both that would be great if you focus on ease you will be able to go as fast as possible from the beginning of the race but no matter what technology you use or Sprint's or firing0:18:54
pistols or whatever the complexity will eventually kill you it will kill you in a way that will make every sprint accomplished less most Sprint's be about completely redoing things that you've0:19:04
already done and the net effect is you're not moving forward in any significant way now if you start by focusing on simplicity why can't you go0:19:13
as fast as possible right at the beginning because some some tools that are simple are actually as easy to use as some tools that are not why can't you0:19:23
go as fast then you have to think you have to actually apply some simplicity work to the problem before you start and that's gonna give you this ramp up so0:19:36
one of the problems I think we have is this conundrum that some things that are easy actually are complex so let's look there are a bunch of constructs that0:19:48
have complex artifacts that are very succinctly described right some of the things that are really dangerous to use are like so simple to describe they're incredibly familiar right if you're0:19:58
coming from object orientation you're familiar with a lot of complex things they're very much available right and they're easy to use in fact by all0:20:08
measures conventional measures you look at them and say this is easy right but we don't care about that right again0:20:17
the users not looking at our software and they don't actually care very much about how good a time we had when we were writing it right what they care about is what the program does and the0:20:28
and if it works well it will it will be related to whether or not the output of those constructs were simple in other words what complexity do they yield when0:20:38
there is complexity there we're going to call that incidental complexity right it wasn't part of what the user asked us to do we chose a tool it had some inherent0:20:48
complexity in it it's incidental to the problem I didn't put the definition in here but uh incidental is Latin for your fault and it is and I think you really0:21:03
have to ask yourself you know are you programming with a loom you know you're having a great time you're throwing that shuttle back and forth and what's coming out the other side is this knotted you0:21:15
know mess I mean it may look pretty but you have this problem right was the problem the problem is the knitted castle problem right do you want a0:21:29
knitted castle so what benefits do we do we get from simplicity we get ease of understanding right that's sort of definitional I contend we get ease of0:21:39
change and easier debugging other benefits that come out of it that are sort of on the secondary level are increased flexibility when we talk more0:21:49
about modularity and breaking things apart we'll see where they were that Falls like the ability to change policies or move things around right as0:22:00
we make things simpler we get more independence of decisions because they're not interleaved so I can make us a location decision it's orthogonal from0:22:10
like a performance decision and I really do want to make you know ask the question agile Astorga is is having a test suite0:22:20
and refactoring tools going to make changing the knitted Castle faster than changing the Lego castle no way0:22:30
completely unrelated okay so how do we make things easy i'm presumably you know the objective here is not to just be moan that's their0:22:41
software crisis right so what can we do to make things easy so we'll look at those parts those aspects of being easy again there's a location aspect making something at hand putting it in our0:22:50
toolkit that's relatively simple right we just install it right maybe it's a little bit harder because we have to get somebody to say it's okay to use it then0:23:00
there's the aspect of how do I make it familiar right I mean I've ever seen this before that's a learning exercise I've got to go get a book go take a tutorial have somebody explain it to me0:23:10
maybe try it out write both these things we're driving we're driving we install we learn where it's totally in our hands then we have this other part though0:23:20
right which is the mental capability part and it does the part that's always hard to talk about the nuts capability part right because the the fact is we0:23:32
can learn more things we actually can't get much smarter we're not going to move we're not going to move our brain closer to the complexity we have to make things0:23:43
near by simplifying them but the truth here is not that that's like they're these super super you know bright people who can do these amazing things and0:23:52
everybody else is stuck because the juggling analogy is pretty close right the average juggler can do three balls the most amazing juggler in the world0:24:02
can do like nine balls or twelve or something like that they can't do twenty or a hundred right all of we're all very0:24:11
limited you know compared to the complexity we can create we're all you know statistically at the same point in our ability to understand it which is0:24:20
not very good so we're going to have to bring things towards us and because we can only juggle so many balls you have to make a decision how many of those0:24:29
balls do you want to be incidence of complexity and how many do you want to be problem complexity alright how many extra balls you want have somebody throwing you you have to try to incorporate in here0:24:39
Oh use this tool you're like whoa you know more more stuff who wants to do that all right so let's look at a let's0:24:49
look at a fact so I've been on the other side of this complaint and I like it we0:24:59
can look at it really quickly only because it it's not this analysis has nothing to do with the usage this complexity analysis is just about the0:25:09
programmer experience right so parens are hard right they're not at hand for most people who haven't otherwise used it and what does that mean it means that like they don't have an editor that0:25:21
knows how to do you know paren matching or move stuff around structurally or they have one and they've never loaded the mode that makes that happen totally given right it's not at hand0:25:31
nor is it familiar I mean everybody's seen parentheses but they haven't seen them on that side of the method I mean0:25:42
that is just crazy but you know I think this is this is0:25:52
your responsibility right to fix these two things as a user as a potential user you got to do this but we could dig deeper let's look at the third thing0:26:01
did you actually give me something that was simple is a language built all out of parens simple in the case I'm saying right is it free of interleaving and0:26:12
braiding and the answer is no right Common Lisp and scheme are not simple in this sense in their use of parens0:26:21
because the use of parentheses in those languages is overloaded right parens wrap calls they wrap grouping they wrap0:26:31
data structures right and that overloading is a form of complexity by the definition you know I gave you right and so if you actually bother to get0:26:42
your editor set up and learn that the parentheses goes on the other side of the verb this was still a valid complaint now of course everybody was0:26:51
saying easy it's hard it's complexing they were using his words really weakly right but it was hard for a couple of reasons you could solve and it was not0:27:00
simple for a reason that was the fault of the language designer which was that there was overloading there right and we can fix that right we can just add another data structure it doesn't make0:27:09
Lisp not Lisp to have more data structures right it's still a language defined in terms of its own data structures but having more data structures in play means that we can get0:27:19
rid of this overloading in this case which then makes it your fault again right because now this the simplicity is back in the construct and it's just a0:27:29
familiarity thing which you can solve for yourself okay this is an old dig at0:27:38
Lisp programmers I'm not it's totally sure what the what he was talking about I believe it was a performance related0:27:47
thing that lispers was just they Kahn stop all this memory and they did all this evaluation and it was it was a big list programs at that time were list programs at that time were complete pigs0:27:57
relative to the hardware so that you know the value of all these constructs right this dynamism you know dynamic nature these things are all great they are valuable right but there0:28:07
was this performance cost I'd like to lift this whole phrase and apply it to all of us right now right as programmers we are looking at all kinds0:28:16
of things and I just see it you know we read hacker news or whatever it's like oh look this thing has this benefit oh great I'm gonna do that oh but this has this benefit oh that's cool oh that's awesome0:28:27
you know that's shorter you never see in these discussions was there a trade-off is there any downside you know was there anything bad that comes along with this0:28:37
never nothing it's just like we look all for benefits right so as programmers now I think we're looking all for benefits and we're not looking carefully enough at the the byproducts so what's in your0:28:50
tool kit I have a you know I have these two columns one says complexity and one0:28:59
says simplicity the simplicity column just being simpler it doesn't mean that the things over there are purely simple0:29:08
now I didn't label these things bad and good I'm leaving your minds just do that [Laughter]0:29:20
so what things are complex and what are the simple replacements I'm going to dig into the details on these so I won't actually explain why they're complex we're going to say state and objects are0:29:29
complex and values are simple and and can replace them in many cases I'm gonna say methods are complex and functions0:29:38
are simple and namespaces are simple and the reason why methods are there because often the space of methods in the class0:29:47
or whatever is also a mini very poor namespace VARs our complex and variables0:29:56
are complex managed references are also complex but they're simpler inheritance switch statements pattern-matching are0:30:06
all complex and polymorphism a la carte is simple okay now remember the meaning of simple meaning of simple means unentangled0:30:18
right not twisted together with something else doesn't mean I already know what it means right simple does that mean I already know what it means okay0:30:27
syntax is complex data is simple imperative loops fold even which seems0:30:36
kind of higher-level still has some implications that tie two things together whereas set functions are simpler actors are complex and queues0:30:45
are simpler ORM is complex and declarative data manipulation is simpler okay even Eric said that in his talk he0:30:55
said it really fast near the end oh yeah an eventual consistency is really hard for programmers0:31:07
conditionals are complex in interesting ways and rules can be can be simpler an inconsistency is very complex it's0:31:17
almost definitionally complex right because consistent means to stand together so inconsistent means to stand apart and that means taking a set of things that are standing apart and0:31:26
trying to think about them all at the same time it's inherently complex to do that a neighbor who's tried to use the system that's eventually consistent knows that okay0:31:36
so there's this really cool word called complex I found it I love it it means to0:31:45
interleave or entwine or braids okay I want to start talking about what we do to our software that makes it bad I don't want to say braid or entwine0:31:54
because it doesn't really have the good bad connotation that complex has complex is obviously bad right it happens to be an archaic word but you know there's no0:32:04
rules that say you can't start using them again so I'm going to use them for the rest of the talk so what do you know about complex is bad don't do it right0:32:15
this is where complexity comes from like complected that's very simple0:32:24
right and in particular it's something you want to avoid in the first place right look at this diagram look at the first one look at the last one right0:32:33
it's the same stuff in both those diagrams the exact it's the same strips what happened they got complected and now it's hard to0:32:45
understand the bottom diagram from the top one but it's the same stuff you're doing this all the time you can make a program a hundred different ways some of0:32:54
them it's just hanging there it's all straight you look at yourself oh I see it's four lines this program right now you can type in four lines in another language or with a different construct0:33:03
and you end up with this not so you gotta take care of that so complex actually means to braid together and0:33:12
campos means to place together and we know that right everybody keeps telling us what we want to do is make composable systems we just want to place things0:33:21
together which is great and I think there's no disagreement right composing simple components simple in that same respect is the way we write robust0:33:32
software so it's simple right all we need to do is everybody knows this I'm0:33:41
up here just telling you stuff you know we can make simple systems by making the modular right we're done I'm like halfway through my talk I don't even know if I'm gonna finish it's so simple0:33:51
right this is it this is the key no it's obviously not the key right who has seen who has seen components to have this0:34:00
kind of characteristic I'll read my head twice because not enough people are raising their hands it's ridiculous right what happens you can write modular0:34:09
software with all kinds of interconnections between them right they may not call each other but they're completely complected right and we know0:34:20
how to solve this it has nothing to do with the fact that there are two things it has to do with what those two things are allowed to think about if you want to really answer for more Phi's and what0:34:30
do we want to make things allowed to think about and only these things some abstractions I thought that's coming out that well that's a dashed0:34:40
white version of the top of the Lego right that's all we want to limit things to because now the blue guy doesn't really know anything about the yellow guy and the yellow guy doesn't really0:34:49
know anything about the blue guy and they've both become simple so it's very it's very important that you don't associate simplicity with partitioning and stratification right they don't0:34:59
imply it right they are enabled by it if you make simple components you can horizontally separate them and you can vertically stratify them right but you can also do0:35:10
that with complex things and you're going to get no benefits and so I would encourage you to be particularly careful not to be fooled by code organization0:35:20
right there's tons of libraries that look oh look there's different classes there's separate classes they you know they call each other in sort of these nice ways right then you get out in the0:35:30
field and you're like oh my god this thing presumes that that thing never returns in number 17 what is that okay0:35:40
I'm not going to get up here and tell you state is awesome I like state I'm not a functional whatever a guy whenever I'm gonna say instead I did this and it sucked0:35:51
right I did years and years C++ you know heman stateful programming it's it's really not fun it's not good it's it's0:36:04
never simple having seen your program is never simple right because it has a fundamental complected that goes on in its artifacts right a complex value in0:36:14
time you don't have the ability to get a value independent of of time and sometimes not an ability to get a value in any proper sense at all but again0:36:25
it's a great example this is easy it's totally familiar it's at hand it's in all the programming languages this is so easy this complexity is so easy and0:36:37
you can't get rid of it everything so I'll happen I have modularity that that assignment statement is inside a method right well if every time you call that method with the same arguments you can0:36:47
get a different result guess what happen that complexity it just leaked right out of there it doesn't matter that you can't see the variable right if the thing that's wrapping it is stateful0:36:56
or thing that's wrapping that is still stay full in other words by stateful I mean every time you ask it the same question you get a different answer you have this complexity and it's like poison it's like dropping you know some0:37:07
some dark liquid into this - of Oz it's just gonna end up all over the place the only time you can really you know get rid of it is when you put it inside0:37:16
something that's able to present a functional interface on the outside a true functional interface same input same output you can't mitigate it0:37:25
through the ordinary code organization things and note in particular I didn't talk about concurrency here this is not about concurrency this has nothing to do0:37:35
with concurrency it's about your ability to understand your program your program was out there it's single threaded it didn't work all the tests passed it made it through the type checker figure out0:37:45
what happened right if it's full of variables what are you gonna need to try to do recreate the state that was happening at the client when it went bad0:37:55
is that gonna be easy no but we fix this right your language your new shiny0:38:05
language has something called var or maybe has refs or references none of these constructs make state simple0:38:14
that's the first primary thing I don't want to say that even of closures constructs they do not make state simple in the case I'm talking about and in nature of simple I'm talking about but they're not the same right they all do0:38:25
warn you when you have state and that's great most people who are using a language where mutability is not the default you have to go out of your way to get it finds that the program's end0:38:34
up writing have dramatically like orders of magnitude less state than they would otherwise because they never need it all the other state in the first place so0:38:43
that's really great but I will call out closure and Haskell's references as being particularly superior in dealing with this because they compose values in0:38:54
time there are actually little constructs that do two things they have some abstraction over time and the ability to extract the value that's really important because that's0:39:06
that's your path back to simplicity and if I have a way to get out of this thing and get a value out I can continue with my program after that pass that variable to somebody else or a reference to0:39:15
something that's going to find the variable every time through the varying thing poisoning the rest of my system so you know look at the VAR in your language and ask if it does the same the0:39:25
same thing alright let's see why things are complex state we already talked about it's complex everything it touches0:39:34
objects complex state identity and value they mix these three things up in a way that you cannot extricate the parts0:39:43
methods complex function and state ordinarily right in addition in some languages they complex namespaces right0:39:52
derive from two things in Java they have the same name method and right doesn't work syntax interestingly complex meaning and0:40:03
order often in a very unidirectional way professor Sussman made the great point about data versus syntax and it you know0:40:12
it's super true I don't care how much you really love the syntax of your favorite language it's inferior to data in every way0:40:21
inheritance complex types right so these two types are complected that's what it means inheritance complected is like it's it's0:40:30
definitional right switching and MACT matching right they complex multiple pairs of who's going to do something in0:40:40
and what happens right and they do it all in one place in a closed way that's very bad VARs and variables again0:40:50
complex value in time often in an inextricable way you can't obtain a value we saw a picture during a keynote yesterday this amazing memory right0:41:01
where you could dereference and address and get an object that I want to get one of those computers right have you ever used one of those computers I can't get0:41:11
one I called Apple and they were like no those you can never get a have a memory is a word a scaler right the thing that was all derided right recovering a0:41:21
composite object from an address it's not something computers do none of the ones that we have so variables have the same problem you cannot recover a0:41:30
composite mutable thing with with one dereference loops and fold loops are pretty obviously0:41:39
complected what you're doing and how to do it fold is a little bit more subtle right because it seems like this nice somebody else is taking care of it but it does have this implication about0:41:48
the order of things this left-to-right bit actors complex what's going to be done and who's going to do it Oh0:42:00
now professor Sussman said all these talks have acronyms and I couldn't actually my slides in time so object0:42:09
relational mapping has oh my god complexing going on you can't even begin to talk about how how bad it is right and and and you know if you're0:42:20
going to do like duals right what's the dual of value is it ko value what's a KO0:42:29
value it's an inconsistent thing whom what's that and conditionals I think are interesting right this is this is sort0:42:39
of more cutting edge area we have a bunch of sort of rules about where a program supposed to do it's strewn all throughout the program can we fix that0:42:49
because that's complected with the structure of the program and the organization of the program all right so if you take away two things from the0:42:59
stock one would be the difference between that word simple and easy the other I would hope would be the fact that we can create precisely the same0:43:10
programs we're creating right now with these tools of complexity with dramatically drastically simpler tools right I did C++ for a long time I did0:43:22
Java I did C sharp I know how to make big systems in those languages and I completely believe you do not need all that complexity you0:43:31
can write as sophisticated a system with dramatically simpler tools which means you're gonna be focusing on the system what it's supposed to do instead of all0:43:40
the gook that falls out of the constructs you're using so I'd love to say the first step in getting a simpler life is to just choose simpler stuff0:43:49
right so if you want values usually you can get it most languages have something like values final or Val cut you know0:43:58
lets you like declare something is being immutable you do want to find some persistent collections because the harder thing in a lot of languages is getting aggregates that are values right0:44:09
you got to find a good library for that or use a language where that's the default functions most languages have0:44:18
them thank goodness if you don't know what they are they're like stateless methods namespace this is something you really need the language to do for you0:44:27
and unfortunately it's not done very well in a lot of places data please we're programmers we supposedly write data processing programs there's always0:44:37
programs they don't have any data in them have all these constructs we put around it and globbed on top of data data is actually really simple there's not a design a tremendous number of0:44:47
variations in the essential nature of data right there are maps there are sets there are linear sequential things there's not a lot of other conceptual categories of data we create hundreds of0:44:58
thousands of variations have nothing to do with the essence of the stuff and make it hard to write programs that manipulate the essence of the stuff we should just manipulate the essence of the stuff it's not hard it's simpler0:45:09
also same thing for communications right are we all not glad we don't use the UNIX method of communicating on the web0:45:19
right any arbitrary command string can be the argument list for your program and any arbitrary set of characters can come out the other end0:45:28
that's all right parsers no I mean it's it's a problem it's it's a source of complexity right so we can get rid of that just use data the0:45:37
biggest thing I think the most desirable thing the most esoteric this is tough to get but boy when you have it your life is completely totally different thing is0:45:46
polymorphism Alec Hart right closure protocols and Pascal type classes and and constructs like that give you the0:45:55
ability to independently say I have data structures I have definitions of sets of functions and I can connect them together and0:46:04
those are three independent operations and their words the generosity is not tied to anything in particular it's available a la carte I don't know I've a0:46:14
lot of library solutions for languages that don't have it I already talked about manage references and how to get them maybe you can use closures from different Java languages set functions0:46:24
you can get from libraries queues you can get from libraries right you don't need a special communication language you can get declarative data manipulation by using sequel or learning0:46:33
sequel finally or something like link or something like data log I think these last couple of things are harder right0:46:43
we don't have a lot of ways to do this well integrated with our languages I think that's currently linked isn't as an effort to do that rules right declarative rule systems0:46:53
instead of you know embedding a bunch of conditionals in our raw language at every point of decision it's nice to sort of gather that stuff and put it over someplace else and you can get0:47:03
rules systems in libraries or you can use languages like Prolog if you want consistency you need to use transactions and you need to use values okay there0:47:13
are reasons why you might have to get off of this list but boy there's no reason why you shouldn't start with it okay0:47:23
there's a source of complexity that's really difficult to deal with and not your fault I call it environmental complexity right our programs end up running on machines0:47:32
next to other programs next to other parts of themselves and they contend they contend for stuff right memory CPU0:47:41
cycles and things like that everybody's contending for this is an inherent complexity inherent is Latin for not your fault in the0:47:51
implementation space and though this is not part of the problem but it is part of the implementation right you can't go back to the customer and say the thing you wanted is not good because I have GC0:48:01
problems but did you see problems and stuff like that they come into play there's not a lot of great solutions right you can do segmentation you can say this is your memory this is your0:48:10
memory this you were me this is your your CPU and your CPU but there's tremendous waste in that right sweep reallocate you don't use everything you don't have sort of dynamic nature but0:48:20
the problem I think we're facing and it's not one for which I have a solution at the moment is that the policies around this stuff don't compose right if everybody says ultra size my thread pool0:48:30
to be the number of you know of course how many times can you do that in one program not a lot and have it still work out so unfortunately a lot of things0:48:41
like that splitting that stuff up and making an individual decision is not actually making things simpler it's making things complex because that's a decision that0:48:51
needs to be made by someone who has better information and I don't think we have a lot of good sources for organizing those decisions in single0:49:01
places in our systems this is hugely long quote basically it says programming is not about typing like this it's about0:49:15
thinking so the next phase here I got to move a little bit quicker is how do we design simple things of our own all right so the first part of making things0:49:24
simple is just to choose constructs to have simple artifacts but we have to write our own constructs sometimes so how do we abstract for simplicity all0:49:33
right an abstract again here's an actual definition that made up one means to draw something away and in particular it means to draw away from the physical nature of something I do want to0:49:44
distinguish this from the sometimes people use this term really grossly to just mean hiding stuff as not what abstraction is and that's not going to0:49:53
help you in this space there's two you know I can't totally explain how this is done it's really the job of designing but one approach you can take0:50:03
it's just to do who what when where why and how if you just go through those things and sort of look at everything you're deciding to do and say what is the who aspect of this what is the what aspect of it this can help you take stuff apart0:50:13
the other thing is to maintain this approach that says I don't know I don't want to know I once said that's so often0:50:22
during a C++ course I was teaching that one of the students made me a shirt it was a boots diagram so we didn't have whatever it is now the unified one and0:50:33
every line just said that that's what you want to do you really just don't want to know all right so what is what what is the operations you know what is0:50:42
was that what we want to accomplish right we're going to form abstractions by taking functions or and more particularly sets of functions and giving them names in particular so and0:50:53
you're going to use whatever your language lets you use right so if you only have interfaces you'll use that if you have protocols or type classes0:51:02
you'll use those so all those things are in the category of the things you use to make sense of functions that are going to be abstractions and they're really sets of specifications of functions the0:51:12
point I'd like to get across today is just that they should be really small much smaller than what we typically see Java interfaces are huge and the reason0:51:22
why they are you just because Java doesn't have union type so it's inconvenient to say this function takes you know something that does this and that and that you have to make of this0:51:31
and that in that interface so we see these giant interfaces and the thing with those giant interfaces is that it's a lot harder to break up those programs so you're gonna represent them with your0:51:41
polymorphism constructs their specifications right they're not actually the implementations they should only use values and other abstractions0:51:50
in their definitions so you can define interfaces or whatever type classes that only take interfaces and type classes or values and return them and the biggest0:52:00
problem you have when you're doing this part of design is if you complex with how right you can complex' it with how by jamming them together and saying here's just a concrete function instead of having an interfere0:52:09
so here's a concrete class instead of having an interface you can also complex' it with how more subtly by having some implication of the semantics of the function dictate how it is done0:52:19
unfold as an example of that the strictly separating what from how is the key to making how somebody else's0:52:28
problem right if you've done this really well you can you can pawn off the work of how on somebody else you can say database engine you figure out how to do this thing or a logic engine you figure0:52:38
out how to search for this I don't need to know who is about like data or entities these are the things that are abstractions going to be connected to eventually depending on how your0:52:47
technology works you want to build components up from sub components in a sort of direct injection style right you don't want to like hardwire with the sub components are you want it as much as0:52:57
possible take them as arguments because that's going to give you more programmatic flexibility and how you build things you should have probably many more sub components than you have0:53:07
so you want really much smaller interfaces and you have and you wanna have more sub components than you probably are typically having because usually you have none and then maybe you have one when you decide to I need to0:53:16
farm out policy if you go in saying this is a job and I've done who what when where why and I found five components don't feel bad that's great you're winning massively by doing that you know0:53:27
split out policy and stuff like that and the thing that you have to be aware of when you're building you know the definition of a thing from sub components is any of those kind of you0:53:37
know red and yellow thinking about blue blue thinking about yellow kind of hidden detailed dependencies so you want to avoid that how things happen this is0:53:47
the actual implementation code the work of doing the job you you strictly want to connect these things together using those polymorphism constructs that's the0:53:56
most powerful thing yeah you can use a switch statement you could use pattern matching buts glomming all this stuff together if you use one of these systems you have an open pi Morse's and policy0:54:06
and that is really powerful especially if it's runtime open but even if it's not it's better than than nothing and0:54:15
again beware of abstractions that dictate how in some subtle way because when you do that you're really you're nailing the person down the line us to do the implementation you're tying0:54:25
their hands so the more declarative things are the better the better things work and the thing that I mean how is sort of the bottom right don't mix this0:54:34
up with anything else all these implementations should be islands as much as possible when and where this is pretty simple I think you just have to0:54:43
strenuously avoid complected this with anything I see it accidentally coming in mostly when make people design systems0:54:52
with directly connected objects right so if you know the your program is architected such that you know this thing deals with the input and then this thing has to do that the next part of0:55:02
the job well if if thing a calls thing B you just complected it right and now you have a when and where thing cuz now a0:55:11
has to know where B is in order to call B and when that happens is whenever a does it right stick a cue in there make0:55:22
use of the way to just get rid of this problem if you're not using cues extensively you should be you should start right away like right after this talk and then there's the y part this is0:55:34
sort of the policy and rules this is this is I think this is hard for us we typically put this stuff all over our application and if you ever have to talk to a customer about what the application0:55:44
does you it's really difficult to sit you know with them in source code and look at it now if you have one of these pretend testing systems and let you write English strings so the customer0:55:53
can look at that that's just silly right you should have code that does the work that somebody can look at which means to try to you know put this stuff someplace outside try to find a declarative system0:56:04
or a rule system will let you let's you do this work finally in this area information it is simple right the only thing you can possibly do with0:56:13
information is ruin it right don't do it right don't do this stuff we I mean we got objects I've just were made to like0:56:22
encapsulate IO devices so there's a screen but I can't like touch the screen so I have an object right there's a mouse I can't touch the mouse oh there's an object right that's all they're good0:56:31
for they were never supposed to be applied to information and we apply them to information that's it's just wrong it's wrong and but I can now say it's wrong for a reason right0:56:41
it's wrong because it's complex in particular it ruins your ability to build generic data manipulation things if you leave data alone right you can0:56:53
build things once that manipulate data and you can reuse them all over the place and you know they're right once and you're done the other thing about it which also applies to ORM is that it0:57:03
will tie your logic to representational things which again tying complected intertwining so represent data is data please start using maps and sets0:57:13
directly don't feel like I have to write a class now because I have a new piece of information it's just silly so the final aspect right so we choose0:57:24
simple tools we write simple stuff and then sometimes we have to simplify other people's stuff in particular we we may have to simplify the problem space or0:57:33
some code that somebody else wrote this is a whole separate talk I'm not going to get into right now but the job is essentially one of disentangling right0:57:42
we know what's what's complex its entangled so what do we need to do right we need to somehow disentangle it right you're gonna get this you're gonna need0:57:53
to first sort of figure out where it's going you're gonna have to follow stuff around and eventually label everything right this is the start this is roughly0:58:03
what the process is like but again it's a whole separate talk to try to talk about simplification all right I'm gonna wrap up a couple of slides the0:58:13
bottom line is simplicity is a choice it's your fault if you don't have a simple system and and I think we have a culture of complexity to the extent we0:58:23
all continue to use these tools that have complex outputs we're just in a rut we're just self reinforcing and we have to get out of that rut but again like I0:58:32
said if you're already saying I know this I believe you I already use something better I've already used that whole right column then hopefully this talk will give you the basis for talking with somebody else who doesn't believe0:58:41
you right talk about simplicity vs. complexity right but it is a choice right it requires constant vigilance we already saw0:58:50
the guardrails don't yield simplicity they don't really help us here right it requires sensibilities and care your sensibilities about simplicity being0:58:59
equal to ease-of-use are wrong they're just simply wrong right we saw the definitions of simple and easy they're completely different things right so0:59:08
easy is not simple you have to start developing sensibilities around entanglement that's what you have to just you have to have entanglement radar right you want to look at some software0:59:18
and say ah you know not that I don't like the names you used or the shape of the code or there was a semicolon I mean that's also important too but you want to start seeing complected you want to0:59:29
start seeing interconnections between things that could be independent that's where you're gonna get the most power all the reliability tools you have right since they're not about simplicity0:59:39
they're all secondary right they just do not touch the core of this problem right they're safety nets but they're nothing0:59:50
more than that so how do we make simplicity easy right we're gonna choose constructs with simpler artifacts right1:00:04
and avoid constructs that have complex artifacts it's the artifacts it's not the authoring as soon as you get in an argument with somebody about Oh should1:00:13
we should be using whatever get that sorted out you know because however they feel about the shape of the code they type in is independent from this and1:00:22
this is the thing you have to live with we're gonna try to create abstractions to have simplest city as a basis right we're gonna spend a little time upfront1:00:31
simplifying things before we get started and recognize that when you simplify things you often end up with more things1:00:41
right simplicity is not about counting right I'd have rather have more things hanging nice straight down not twisted together then just a couple of things1:00:52
tied in a knot and the beautiful thing about making them separate is you'll have a lot more ability to change it which is where I think the benefits lie so I think this is a big deal1:01:04
and I hope everybody's able to bring it into practice or use this as a tool for convincing somebody else to do that so1:01:13
I'll leave you with this this is what you say when somebody tries to sell you a sophisticated type system thank you1:01:22
[Applause]0:00:00
Are We There Yet - Rich Hickey
0:00:00
so I'm gonna talk about time today0:00:13
how we treat time in object-oriented languages generally and and maybe how we how we fail to so I'm trying to provoke you today to just reconsider some0:00:23
fundamental things that I I just think we get so entrenched with what we do every day we fail to step back and look at what exactly are we doing so are we0:00:32
being well served by object orientation as commonly embodied right the concept is pretty broad and there are multiple possible embodiments but the ones that0:00:41
we have have a lot of consistent attributes do we all agree this is the best way to write software so we think this will continue to be the best way0:00:53
certainly today this is a really entrenched model it doesn't matter which language you're using and we're a bunch of everybody has different language of0:01:02
groovy and whatnot Scala and Java and people use C sharp and and they love the differences between these languages and I want you0:01:11
to focus on the similarities between these languages which is they're all single dispatch stateful object-oriented languages and they have a lot of the0:01:21
same kinds of things in classes some notion of classes inheritance fields or interesting concept methods are more interesting we'll talk about them later they're all garbage collected and they0:01:32
have a heritage that goes back to languages like small talk they're not significantly different in some dimensions right they're superficially0:01:42
different they might have mix-ins I might have interfaces even static and dynamic typing I think is not nearly as0:01:51
important as some of the underpinnings that they share you know everybody's so excited because now there are languages without semicolons and other great0:02:00
choices that we have but they have more more to do with the sensibilities of the programmer than they have to do with significant differences in the0:02:09
programming model okay so they're all different cars they're all in the same road is this the end are we done are we0:02:20
going to keep making languages that are just very very slight incremental differences to the things that we know is certainly one thing is undeniable people0:02:31
like object orientation on the other hand I think we've gotten increasingly conservative and which makes sense of course you get adopted by large companies they have big investments0:02:41
people know how to do it it's not something you're gonna move move away from any too readily and certainly I want to emphasize the purpose of this0:02:50
talk is not to beat up on oh whoa but to have everybody just take a step back just imagine you don't love it if you do and think about whether or not it's it's0:03:03
perfect when we look at languages and trying to think of what should be if I could write another language or if I could fix this language or if I could0:03:13
make something add a feature to the next version of the language what would we do why do we add things what drives us to make changes well what drives us to change cars you0:03:23
know to say I'm going I'm gonna stop using this language I adopt this other language and what what things should drive us to that I don't think a lot of people say oh I'm tired of semicolons0:03:33
I'm not I just like I can't do it anymore or curly braces or something I'm gonna switch switch to something easier I think static and dynamic may cause people to switch but I think there are0:03:43
there are examples already in our history that show us what what causes us to switch so the things I'm going to talk about today are a small subset of0:03:53
the kinds of things I think you should think about when you look back at the language you're you're using and try to decide whether or not you want to do something differently I want to talk0:04:02
about complexity today I want to talk about time mostly about time and then about models we can use to better implement time and and some of the0:04:14
principles that underlie object orientation it's a modeling concept right it's based around we can sort of do things in our programs that are similar to what we do what we see in the world though and that0:04:24
helps us understand our programs so the hero of the talk today is Alfred North Whitehead right he's the famous guy who0:04:33
with Russell wrote principia mathematica subsequent to that he also became a philosopher and he wrote some great things and I'm just gonna put them up0:04:42
here because they're great so the first thing is distrust simplicity I don't want to talk actually about the complexity of the problems we're trying0:04:51
to solve we all know we're given increasingly more complex problems to solve bigger problems more data more0:05:00
flexibility expectations of people for software will only ever increase the complexity I want to talk about today is the incidental complexity the complexity0:05:10
that arises from the way our tools work from the ideas that embody our tools for the ways from ways our tools don't work from the ways our approaches don't work these things all become problems that we0:05:20
have to solve and you have a certain number of hours in the day you have to solve problems other problems you're solving the problems of the application domain all the problems you've said in front of yourself by choosing a0:05:31
particular language or tool or development strategy so that's an incidental complexity it's coming along for the ride it's not part of the problem you're trying to solve and it's0:05:42
worse I think I mean everybody knows when something's complication you look at it says are you know calm complex and every says okay well I see that that is0:05:53
scary I know that's a danger zone I know I'm going to be careful with that the worst kind of incidents of complexity is the kind that's disguised as simplicity look how easy this is0:06:04
there's no semicolons I don't wanna beat up on no semicolons it's just an easy way to say look at some superficial aspect of language I'm using this seems easy this seems familiar0:06:13
but is there incidence of complexity hiding underneath it so this is an example again not to beat up on C++ but0:06:22
I spent more than a decade doing this so it's it's not that hard I mean if you get into template metaprogramming it can0:06:32
get it can get hard but the basics are pretty simple right you can write a function that returns the pointer what's wrong with that it's pretty simple you0:06:41
know there's there's new in the lead and this pointers and you can pass them around you can dereference them there's are like five things you need to know you can learn them in an afternoon0:06:50
so it is it really simple for instance the same syntax is used to refer to things on the heap and things that are0:06:59
not on the heap these pointers but it gets worse right and the real problem with that function signature is what do0:07:08
you do with the thing that you get when you call it is it yours is it now your responsibility you know do you have to delete it later is it even something0:07:17
that can be deleted can you hand it to somebody else is that allowed could you save it so the problem there was there was no standard automatic0:07:27
memory management right there's no garbage collection and this was and still is for people using this language a big source of incidental complexity0:07:36
right because managing memory is on you you don't see that there's not a sign on the top of your source code don't forget managing memory is on your head right this is incidental complexity0:07:47
you have to just know that it's not in the source code and it's a big problem I think the lack of garbage collection really impedes C++ and one of its design0:07:56
objectives which is supposed to be a library language all the original design stuff and anytime you heard some talk about is like C++ is going to be a library language but it only ever ended0:08:06
up being a parochial library language every shop had a library but there were not a library and still are not a lot a lot of libraries that go between places because of this problem and we know that0:08:18
Java having garbage collection has a huge library infrastructure so I think people that moved from C++ to Java did0:08:30
so in no small part due to the fact that they were no longer willing to bear this implicit complexity I don't want to do0:08:39
manual memory management it's not part of the problem trying to solve at all it's just another problem on my plate every day when I go to work and I want to do it so let's look at Java it's0:08:52
easier there's no asterisk this is like even better so what's the problem with this it's it's simpler it's definitely0:09:01
simpler right now we only have references to manage memory and we have automatic memory management we have garbage collection this is much much0:09:10
better it's much easier except again we have this hidden complexity right is0:09:19
this a mutable thing or not can I you know when will I see a consistent value if I look at this right0:09:29
now and I and I walk through its fields will the some of the things I've seen represent a consistent value all right this isn't just a concurrency problem0:09:39
right there is a concurrency problem and it's a big one but but even before we had threads and all that part this is a big source of incidence of complexity in0:09:50
programs because we don't know when we have a stable value right can I store this date off and look at it later and0:10:00
know I'm gonna see what I saw when I was handed it you don't know in addition if you hand a date or some mutable thing I0:10:09
mean I know the mutable things have been all deprecated I know they're fixing data I'm not trying to beat up on date but if you hand a mutable thing to somebody or and they may hand it to0:10:19
other people and then you need to change it who is going to be affected by that you have no idea so this looks really easy now this is true listen this is not0:10:28
just Java this is every single language I listed that allows for mutable objects has this problem and there's no way to0:10:37
fix it so what's the problem here I'm gonna say the problem here is we don't have any standard time management okay0:10:47
that may be a really confusing thing hopefully it won't be as we go along so this is kind of a little bit of really0:10:56
reiteration of the points I was making before I think that because we're so familiar with this it's we're absolutely completely blind to it right and and and0:11:06
when we choose languages or when people choose different languages a lot of times they make the decisions I'm very superficial differences like the syntax or perhaps0:11:16
sort of this makes me feel good expressivity differences which I admit completely are real and valid but they're somewhat emotional in the0:11:26
meantime our systems are getting very very hard to build maintain and make correct and in no small part that's due0:11:35
to this incidence of complexity we can't understand big programs right we have these giant test Suites right because and we run them every time we changed0:11:44
any little thing because we don't know if we change something over here that it's not going to break something over there and we can't know and I think you know0:11:55
for me and I think for many people we're gonna find concurrency just is the straw that breaks the camel's back in this area so so we're programmers you know we0:12:07
don't use assembly language anymore we have languages right each time we build a new language or we use a new language we're expecting some benefits in this area right we want to hide chunks of0:12:20
stuff name them encapsulate them get them out of our way so we do not have to think about them and we can build something on top of that I mean somebody0:12:30
who's building houses out of bricks does not need to worry about the inside of bricks right they have certain properties they have certain expectations and I think it's one of the0:12:40
selling points of object orientation that this is a way to make these kinds of units that we can combine to make programs that are easier to understand because we understand the pieces and we0:12:50
put the pieces together and get something we can understand it ends up that that they're really not the best unit for that the best unit for that are0:13:01
functions and in particular pure functions right if you want something you do not have to worry about you should love the pure function the pure0:13:10
function takes immutable values it does something with them the stuff it does has no effect on the world and no connection on the rest of0:13:19
the outside world then it returns another immutable thing so the entire scope of its activity is local0:13:28
it has no notion of time that's going to become important later but it's definitely easy to understand it's easy to change right there's some signature0:13:38
that's the only thing about it anybody else knows we change the signature wait and we change the insides nobody cares pure functions are and should be the0:13:48
bricks that we use because they are the things we can use without worrying about them most readily there are definitely huge benefits from doing this I think0:13:59
they're there you could easily do it in object-oriented languages what people don't in contrast objects and methods do not0:14:08
have this property they do not have the I don't need to think about them property they definitely don't and we're gonna see why in a minute on the other0:14:18
hand as great as functions are as building blocks our programs in general are not functions okay there are programs that are functions right there0:14:27
are compilers and theorem provers you know take the stuff convert it whatever but a lot of programs run indefinitely long and people have an expectation of being able to see their behavior to have0:14:37
inputs as the program runs right and get something different every time I want I don't want Google to return the same result every time I type the same word0:14:46
into it Google wouldn't work for me that if Google was a function it would be no good okay Google is a process it's connected to the rest of the world0:14:55
it's scouring pages and integrating them and forming algorithms which hopefully also should change as a whole it feels much more like a participant in the0:15:04
world than a function anymore it's not an idealized calculation so we can say that you know that the entire program0:15:14
has this behavior we can observe over time although you'll see I don't like the word behavior so so most programs and let most programs that most people0:15:24
work on in industry our processes so so we see maybe we haven't seen the value0:15:35
of functions I certainly don't think we have but we also have seen the limitations object-oriented was a way to say well you know I have functions are great and they're great for calculations and all the stuff but0:15:44
then I see the real world and there are objects in there you know there are windowing systems and there are things and you know object orientation was a way to say alright well how do we take our mental model for the processes we0:15:54
see in the world and embody them in some kind of programming model right and so the essence of the object-oriented programming model is not encapsulation blah blah blah it's really that that0:16:04
behavior that flow like thing you know we have these entities we see doing things in the world we should have entities that do things in our programs so with the first thing we should0:16:16
realize that any programming model is going to be that tries them all the real world isn't in essentially going to be a simplistic thing okay but again there's that be where simplicity you know is0:16:27
this thing too simple to do the job correctly right one of the problems with object-oriented time is that you know0:16:37
that we talk about behavior and state and things like that that really really loosely these terms are almost completely meaningless and in addition0:16:47
even though objects you know putative lis are about process there's no notion no concrete notion of time in objects no0:16:56
more so than there are in functions but lease functions aren't pretending to play with time functions say there's no time right there's my inputs my outputs I'm not I'm not pretending to deal with0:17:07
time objects are pretending to deal with time and yet our object systems don't have any reified notion of time there's nothing you can talk about explicitly because most of them were born in a day0:17:18
when you know your program ruled the computer you know you had a single monotonic execution flow and it just did what it wanted do this do that0:17:27
you know there was a single universal process controlling everything now that that's no longer true you know we try to use locks to restore that vision of the0:17:38
world but that vision in the world was never correct and and you can tell in one key way because we still even with0:17:49
all the locks and everything else we still don't really have a concrete representation we can use for perception you know can I look at something and see it be stable or memory can I remember0:18:00
that right these objects are all live their time bottles right we have gotten this wrong odd the object oriented model0:18:10
has gotten time wrong and we've done so in a couple of ways the first is we've made objects that can change in place0:18:19
and we've made objects that we could see change in place right as I said we left0:18:28
out any concrete notion of time and there's no proper notion of values okay you can fabricate values right you can0:18:37
make a class that has all immutable components and that would constitute a value but there's no proper notion of value in a lot of these languages the0:18:48
biggest problem we have is we've conflated two things we've said the idea that I attached to this thing that that0:18:57
lasts over time is the thing that lasts over time and that's not actually true in addition I'll as I said before our0:19:08
ability to perceive is fragile so I have the hero of the day Whitehead up here who subsequent to do in all the principia mathematica stuff as I said0:19:17
became a philosopher and he tried to concern himself with how does the world actually work informed by the current knowledge which this was back in the 20s0:19:27
of quantum mechanics and relativity and one of the things that he came up with was the fact that time must be atomic0:19:40
and move in chunks and in fact time isn't actually a real thing you can touch but it's something that you derive from seeing these epical transitions so0:19:55
I'm going to explain that more but this is a great quote right no man can ever cross the same river twice right just what's a river we love this idea of0:20:05
objects like there's this thing Changez right there's no River right there's this water there at one point in time and another point in time there's0:20:14
other water there right River this river is all in here so how do we how did we0:20:25
make this mistake what's the real nature of this mistake right it looked like we could change my rain place right we were doing it it's peek and poke and it0:20:35
looked like we could see that yeah we could read but there was nothing about what we were putting in memory that had any correlation to time right it was0:20:45
live again and now we're finding what look at these new computer architectures where is the variable well there's one version of it over here0:20:54
from one point in time right and another one over here and that's on its way to a place that this over there might see at some point it's live now we see the0:21:05
problem right there are no there are no changing values there's values at points in time and all you're ever going to get is the value for a point in time and0:21:17
values don't change all right so the biggest key insight of Whitehead was there's no such thing as a mutable0:21:27
object we've invented them we need to uninvent them okay in whiteheads model which I am grossly0:21:36
oversimplifying okay I don't even understand it the book is completely daunting but it's full of really cool insights and and what he's built is a0:21:45
model that says there's there's this immutable thing then there's a process in the universe all right that's going to create the next immutable thing and0:21:55
entities that we see as continuous are a super imposition we place on a bunch of values that are causally related we see0:22:04
things happen over time we say oh that's Fred or oh that's that's the river outside the back of my house that's the cloud right we know you can0:22:13
look at a cloud for enough time and also it's like well now there's three clouds or the cloud disappeared right there is no cloud changing right you superimpose the0:22:23
notion of cloud on a series of related cloud values so here are the rules again0:22:33
I am NOT restating Whitehead I'm making this up now okay actual entities are immutable right when you have a new0:22:43
thing it's a function in that pure functional sense that I just talked about of the past so the future is a function of the past and processes and0:22:52
the notion of process is what creates the future from the past identities are mental constructs okay we call it a0:23:01
cloud we call it a river we called him Fred it's an extremely useful psychological artifact that's why we have object-oriented languages this is0:23:10
useful to us it helps us understand things but we have to make sure we understand that objects are not things that change over time all right we0:23:20
superimpose objects on a set of values we saw over time that's an object so just because we can we like to think of0:23:29
it this way because it's important to us to understand the causality you lyin lyin lyin lyin lyin you know I better go right that doesn't mean there is a line0:23:40
that's changing there isn't and then time then is strictly again a derivative0:23:49
of this series of of events okay so whiteheads great quote which is extremely confusing but I think it's0:23:58
something that you could try to try to get right now and remember as I keep going is that there's a becoming of continuity right there's this process in0:24:07
the universe that's creating successive values right and that allows us to say oh continuity great it's not the other0:24:19
way around so now we're completely out of Whitehead terms he has a whole bunch of his own0:24:28
terms but this is these are the terms I want to use to talk about the rest of this this problem the first is the notion of a value we need a very proper notion of a value right we we tend to0:24:39
have a decent notion of a value when we say 42 we have a much weaker notion of a value when we talk about dates so the0:24:48
key characteristic of a value is that it's immutable okay it could be a magnitude it could be something like that or any composite of those things that's also immutable is a value these0:24:59
are extremely important to us right then we have identity identity again is the psychological construct we're going to see a succession of values who's who's a0:25:09
causation is is related right well was the was caused from the previous was caused from the previous and we're gonna say Fred Fred again is a label right the0:25:21
important thing is this identity which is just a contract we used to collect the time series a state right it's not0:25:34
something you can change the state is a snapshot this entity has this value at this point in time that's state so the0:25:44
concept of mutable state it makes no sense mutable objects they make no sense finally we have time time is a0:25:54
completely relative thing all time can never tell you is this thing happened before or after that other thing or at the same point okay it's not a measurable thing it doesn't have0:26:03
dimension this all sounds kind of highfalutin why do we care about this0:26:12
we care about it because we're trying to make programs that make decisions but we have logic in our programs you can't have logic on top of rivers that can0:26:22
change okay you can only have logic on top of values right so we need stable values and we need to collect them from0:26:32
other parts of our program we need to see stable values we need to be able to remember the so I'm using the word perceived I understand completely perception is an0:26:41
incredibly intricate and unresolved mental phenomenon but I like it better than just observe because then I can observe the entire room but perception0:26:51
really is kind of that division in two entities it's a little bit a little bit finer on the other hand I do think we0:27:00
need identity I mean I think that the appeal of object orientation is is is valid right we care about this because it's the way we're thinking about the0:27:10
world all the time if I have to change completely the way I'm thinking about the world in order to write a program my life is going to be hard if I can somehow carry over from the way I think0:27:19
about the world something to the way I write my program it will be easier okay but we can't screw up time and and and0:27:28
and state the way we have and have it still be easier because it's now wrong so it looks like oh I understand the0:27:37
objects I understand but it's not right so I saw this great talk at JavaOne where the the people who wrote head0:27:48
first Java which is fantastic book talked about the guy talked about I forget his name I'm sorry you should put a lot a slide of a lion in your talk0:27:57
because it'll get everybody like scared and then they'll be more receptive so this is my lion okay so the let's let's0:28:09
try to like pull that theoretical mumbo-jumbo down to something we can use to write programs the first thing we need to understand is we don't make decisions about the world by by directly0:28:19
by direct cognition we know take our brains and rub it on the table we don't rub it on the you know Fred there's a there's a disconnect between our our0:28:32
logical system and the actual world okay it's not live right this whole liveness we have from I can see memory that's0:28:41
that's not how it works the other thing we don't get to do in the real world right we're going to model the real world we don't get to do this wait0:28:50
okay okay we don't get to stop the world especially not to observe it okay but what are we doing our programs all the0:28:59
time stop wait stop wait hold on you know everybody's trying to stop the world so they can control it completely as we get more concurrent we're going to0:29:08
need to learn to live in a world that's going to proceed in spite of our intention or desire or best wishes that it would not because it would be a lot easier for us if it wouldn't it's gonna0:29:19
we're not going to achieve the degrees of parallelism and the concurrency we want and so we can accept this and embrace it so we need to look more0:29:29
carefully at well how does perception actually work we don't rub our brains on it we don't stop the world it is incredibly parallel right there's empty and people in this stadium they0:29:39
can all watch the game they don't say whoa whoa whoa let me look at you they'll say hang on let me take a picture they don't need to right they can pick a picture in the game can keep going right so the first thing is0:29:50
perception is uncoordinated okay it's massively parallel it is not message passing there's no communication between0:30:02
the people want to see the game and the game so we can again look again we're trying to model reality so we can look0:30:11
at reality a little bit how do we how do we do it how does the wetware do it well it ends up that the first thing you have0:30:20
to realize is we're always considering the past we never proceed perceiving the present right there's the propagation of light it hits my sensory system is0:30:30
incredibly slow system that carries that's when my brain by the time I'm making the decision about anything I am using the past I am always calculating with the past0:30:39
because we're not able to impede time right we can't stop the world right so the world is absolutely continued I have some you know it seems instantaneous I0:30:48
you know I see the person in the front row here but you know they could leave depending on how much time and how much distance because flight is pretty fast again it's like tricky like like you0:30:59
know electrons and makes us think that when we're looking at memory right now but it's really not it's always the past we're always perceiving the past the other thing to pick up from looking at0:31:09
our sensory systems is the fact that they're incredibly oriented around discrete events okay we have neurons0:31:18
that carry chemical signals which could be continuous and we could have built brains that were continuous right that somehow took the world and and and0:31:28
consider it like this this moving thing and the moving thing comes into our brain and it's all moving around okay guess what we didn't do that evolution0:31:38
did not do that why because that's a mess right weak you cannot do logic if everything you're trying to consider is moving around so what do our neurons do0:31:49
they build stuff up and then they go fine they discretize the input what's the next thing that we do we say whoa0:31:59
ten things happen at the same time right so we discretize things and then we love simultaneity we have simultaneity detectors that's0:32:08
what our brains are at a lower level okay so coarsely we like snapshots okay0:32:17
snapshots are good they help us think they're like values another thing we've done an object orientation said your0:32:26
methods and methods are way to read things and perceive things and the way to make things happen well making things happen and perceiving things are completely different they're completely0:32:35
different they shouldn't be in the same construct there are two different things right because action is this other property right no two things can you0:32:44
know effect the same thing at the same time we have to sort of take turns that succession of values that we're going to use to to to understand the world right0:32:57
is atomic it's an atomic succession and while we've grouped it into threads that it helps make make it easy to understand it's not actually that way we certainly0:33:08
understand the fact that you know there can be only a certain amount of stuff in one place at one time and when you're trying to act on that0:33:19
stuff you're gonna have to be there so action has to be sequential an action and perception are two different things so now I'm gonna put this up I'll put it0:33:31
up again later this is a model this is not a picture of some software this is a model for how to think about time the0:33:42
first thing we need is we need a value that's a point in time we said a point in time is is a value right it can't be changed so we'll use values to represent0:33:51
points in time we will still probably organize our programs by identities as long as you remember the slide from before that the identity is a derived0:34:01
notion it isn't a thing that's doing stuff right it's a derived concept we get from this process we can still use0:34:10
identities to organize things because that's going to be useful to us object orientation has shown us that's useful for us to understand processes but how0:34:21
does how do we get through these epical atomic successive events we use functions right we take a function of0:34:30
the past we produce the future so the apps on the top or pure functions right they take the state of the universe or bless to say the state of an eye and0:34:39
that identity at one point in time and produce the next one what's inside them is individual indivisible unperceptive0:34:48
all its atomic the functions are atomic and that's the that's the process of the0:34:57
world right we say behavior in object-oriented systems there really is a behavior that says you know I'm driving right I'm doing this right but0:35:06
when you get hit by lightning who's behaving there's no behavior but there are processes in the world may affect things0:35:15
so those are those functions right we're going to call you know any one of those relative to an entity all right an identity it's state right again it's0:35:26
just a label of a value of an identity at a point we'll call the state and the identity itself again is a derive thing the succession of states0:35:35
is Fredd or the river the important thing also here is that people can be0:35:44
looking at this right there can be observers light can bounce off the river it can bounce off a Fred Fred doesn't need to do that Fred doesn't need to0:35:53
drive that we can look at that so we can observe things and it's very important that observers are not in the timeline0:36:02
and then the blue stuff in there it's not actually reified anywhere but that's time again it's another derived thing so0:36:12
the box around all the states that identity is derived the notion of time it's only because at one point we looked at this another point we looked at that0:36:21
that we know that there is time things don't come with labels this was you know September 22nd yeah all right so how do0:36:31
we do this if we wanted you know take things apart like this and then put them back together what how do we gonna do it well we're going to need two things we0:36:40
looked on a diagram before we saw functions pure functions I think we know how to do that I think we're all agreed we have the technology to write pure0:36:50
functions so that leaves with two other things on a diagram one was values the other was somehow we managed that0:36:59
succession right some sort of time constructs so we need a way to efficiently create values right save them maybe we'll use them as percepts0:37:10
later and we need something that's going to coordinate the the succession of values right so we'll call them time0:37:20
coordination constructs so we need those it ends up that we can and maybe some theoretician will prove we have to0:37:31
consume memory in order to model time we certainly can I don't know that we need to but I'm figuring out a way to deal0:37:40
with that so what do we do we say we pass the old value to a pure function we produce a new value non-destructively0:37:50
right that's going to consume some memory and we know that but that's gonna let us make this correct those values0:38:01
have other have other value because they can serve as our perceptions right what this whole system you know the whole visual system is about making these0:38:10
snapshots all right well in a program I mean admittedly the snapshot in my mind is not the audience here they're two different things but in a program they're not really two different things0:38:20
right if you had a thing if you had a value in the program and another part of the program wanted to perceive it they'd love a copy of it that would be great that's so good that's a good enough0:38:29
record for them so we could use these values as as our per steps right we can also use them as our memories right if we have a portion of a program it needs0:38:38
to remember something this value would also serve that purpose so we have a good system for doing values we can do that and then the beautiful thing is if we're consuming memory to model time GC0:38:47
will erase the past and the memories that nobody cares about anymore so the contract I think we need to do0:38:57
values is our persistent data structures I've talked about them before I mean if anybody doesn't know really quickly we're not talking about being able to put stuff on disk here a persistent data0:39:06
structure is immutable right when you make a new version of it when you try to change it you get a new thing both on the old and the new thing are available0:39:15
after you've made it and both have the same performance characteristics and then make that characteristics of the data structure and the production of the new version also has the same0:39:24
performance characteristics so that's quickie persistent data structure so what good are they in particular they're immutable okay so they're great for the purposes that we0:39:34
need memories and perceptions snapshots essentially they're stable another beautiful just practical aspect of them0:39:43
is they never need synchronization which is back that's just like the baseball game right that's good just 19,0000:39:53
memories there or 19,000 perceiver no synchronization the other nice thing about persistent data structures is in0:40:03
their implementation generally the next version of the value shares a lot of structure with the prior version so that makes them more efficient the other0:40:13
thing that's important is when we make the new value we don't disrupt anybody who's looking at the old value we don't need to say wait stop hang on even if we're not going to0:40:22
destroy it we don't need to do anything and that goes back to the synchronization and and if you have not ever used a functional language or ever0:40:31
use persistent data structures in a non functional language just take my word for it this is so much better if you write a program that uses data structures like this you will just be0:40:40
able to sleep at night you're gonna be happier you know your life is gonna be better because there's a huge quantity of things you will no longer have to0:40:50
worry about all right so this persistent0:40:59
data structure sano involved there you know it's just oh this is really old it's the stuff is so old it's almost embarrassing to put it up here right0:41:09
trees and all the persistent data structures essentially under the hood are trees because trees have these properties that allow you to share structure right and and do updates but0:41:19
in particular I think you know in a practicum a practical sense you can implement the kinds of things you're used to having like vectors and hash hash maps and things like that using0:41:30
trees with a couple of properties at least this has been my experience one is that they have very high branching factors and so therefore they're very shallow and that gives you good0:41:39
performance you can implement vectors you can implement hash maps and I think the the world of things you can do here is still open but the bottom line is0:41:49
they're all trees and they're their trees for this reason trees support structural sharing so the tree rooted in past there is immutable it's never going0:41:59
to be changed when we need to make a new version say add a and a new node we're going to use something called path copying right we're going to copy the path from the root to the two0:42:09
we need to change right so make copies of those over here on the right well this new copy will have the new node we want you know the leaf node we want and0:42:19
we get a new root but that new tree rooted at next shares everything with the old tree except those three red notes so that's good0:42:34
moving forward as we try to make programs that we can paralyze we have to stop writing loops right I think everybody understands it's a whole0:42:43
separate talk that you know we're going to get our future performance gains from parallelization right which means going to have to write more declarative programs and those parallel those0:42:55
declarative programs are going to need to be able to take data structures and do parallel transformations on them and produce new data structures and we want to stick with this model we want them to be persistent right so how do these0:43:06
persistent data structures serve that purpose very well it ends up because they're already divide and conquer I mean half half the work is already done0:43:16
they're sitting there divided okay they're pre partitioned in addition if you if you do it right you also have the ability to construct them in a0:43:25
compositional way without any collisions so you can avoid synchronization in the building of the of the new versions so they're they're they're pretty well set0:43:34
up for doing parallel algorithms I think persistent data structures should be the default data structure I wish there was a language where0:43:43
persistent data structures were the default data structure okay so I mean I'm not gonna lie to you0:43:52
right we everybody's like performance models everything else they're slower they are slower especially for serial use and especially for writing for0:44:03
reading you will be very much surprised at how good the performance can be some of the good performance I see I completely do not understand but it's0:44:13
there reading is actually pretty solid writing those a problem you have that path copy and everything else but you know I am NOT a fundamentalist I'm a pragmatist0:44:23
so if there's this path right and if no one can ever see what happens there right in other words if it's gonna take0:44:32
something immutable and it's going to produce something immutable and those are two discrete instances of time and everything else about this is atomic then nobody cares what happens inside F0:44:41
okay you probably do care if it's a big involved thing you probably still want to do it with pure functions for sanity preservation reasons but for this time0:44:51
modeling reason you can do whatever you want okay which means that when you're bringing you know when you're birthing the next version of a persistent data0:45:00
structure you can do the same old good stuff you know how to do right you can allocate an array and you can bash on it because no one has yet seen that array0:45:09
you can use fork/join but it works great and these things will eventually bridge0:45:18
the gap already on you know my quad core a parallel version of map on a persistent vector is as fast as the loop0:45:28
that bangs on an ArrayList same speed so more cores we start winning because we0:45:37
have all these other great benefits no synchronization required for this persistent data structure share it all you want rest easy comes with all those benefits that ArrayList doesn't the other thing0:45:50
that's possible is you can make what I call transient versions of these persistent data structures and that's something I've been working on recently which have nearly the same speed of the0:46:00
good old data structures you're using and in particular support constant time creation from a persistent data structure and constant time restoration0:46:09
as a persistent data structure and can be made safe they're like 90% as fast as a mutable thing so obviously this is0:46:20
something you should care about on the other hand I wouldn't deny the power of this model because you're afraid of this0:46:29
okay so that's about values now let's look at the time model again so remember what we're talking about when we talk about time okay so we just said we now0:46:39
know what's that what the V's are they're going to be these persistent data structures right so so how do we make sure that there's you know only one blue arrow train for any particular0:46:49
identity how do we coordinate time again remember identity is as a as a side effect right we see that later0:46:58
same thing with time we see that later but it's convenient to us when we're trying to model in a program to pretend we're driving it forward so what is the0:47:09
time kind of truck do its main job is to make sure that you have atomic succession of values okay that's its0:47:18
main purpose that we go from one value to another uncorrupt ibly and that there's no in-between right that's what ethical means right0:47:30
the other thing that time contract has got to do is it's got to provide some way for us to see the identity to see the thing that it's managing has to provide visibility and again has to do0:47:40
that atomically right because what really happens is the baseball game right and there's photons and then there's a point in time and the photons are the same place the baseball game is0:47:49
and then they go their own separate ways again right so there was a moment there where those things could connect to each other but what was represented was that0:48:00
snapshot again a value at a point in time so they need to provide that we want to have multiple timelines again this whole I am the program I control0:48:09
the universe you know I'm stopping everything around them the only thing that isn't working anymore we need to have lots of threads of control which means we want multiple0:48:18
timelines the nice thing about this whole thing is that there's no inherent semantics to this other than you know0:48:27
complying with these couple points there's a variety of different semantics you can apply you can use Kaz which is essentially saying there's one timeline per identity right and it's0:48:37
uncoordinated it's impossible to coordinate two things that are using Kaz timelines but Cass landlines are still useful they have semantics you can understand there are agents or actor systems which are0:48:48
also one-to-one there's one timeline per entity so they can't be coordinated but they're asynchronous so that they're not connected to the timeline of the person0:49:00
enacting the event there are things like STM which allow you to do timeline to coordinate timelines and maybe even0:49:11
there can be new constructs based around locks right because you can look at lots of saying well you know that's the way to enforce timelines it definitely is a way to enforce timelines if you have a0:49:20
way to automate that and package it up into one of these time constructs that's great and you should the difference between them and SDM would likely be that they have fixed reasons as opposed0:49:30
to ask them which is arbitrary regions but if you said well you know all these timelines are a really timeline X X to be represented by a lock if you have0:49:39
some sort of time just construct that ensures lock acquisition order you can play this game so let's look at cached in the time country because it's the0:49:48
easiest possible thing so you have some calves like thingy like atomic reference right that's going to store your timeline right there's no history in it0:49:57
which means essentially that each successive value will replace the other but what we care about is that there is a timeline in this case the wrist represents one identity the thing that's0:50:06
in it that's always going to be an immutable value right cast ensures atomic states accession right if two things the two processes decide I'm0:50:16
going to move value two forward I am the process that's going to do that only one of them can succeed right so this red line that will be prevented by Kaz right0:50:26
and you can wrap up the the logic associated with doing that correctly with calves right which is that spinning thing and just package it in the0:50:36
constructor so it could look something like this right swap some calves based reference using this function which will be the function0:50:46
of the past maybe plus some extra information the args and and what happens in any time construct is this latter point here you're gonna call the function on the0:50:57
current state also passed the args if you want and that will become right that's what the construct does it makes it allows that to become the next value0:51:06
time is derived from that identity is derived from that but that's what's really happening so it looks like that and again under the hood we can automate0:51:15
the spin the other thing you know atomic references allows is the ability to atomically look at what's inside of it and as well as what's inside of it is a0:51:24
value we have good point in time perception I don't want to talk to spend too much time on agents I'm there a lot like has except that there's no longer a0:51:34
coordination in in in Kaz when somebody's calling this function when someone's saying swap there are actually two timelines right there's a timeline0:51:44
of the identity they're trying to manipulate and there's the caller they have their own time line right those two timelines meet at swap with an actor or0:51:55
an agent system they don't meet right you initiate some energy force and it flows out towards that thing and you0:52:04
walk away and eventually that energy force hits that thing and whatever the result is is the result and the thing changes right so there's now an0:52:13
asynchrony between the caller's time line and the time line of the identity but otherwise it's still doing all the same work it's one-to-one relationship between the time line an identity right0:52:22
atomic states accession falls out of two things the succession falls out of the fact that everything's being put in a queue and the atomic falls out of the fact that there's only one reader and0:52:33
they can also provide point in time value perception the reason why I called these things agents and not actors actors typically do not in fact they0:52:42
definitely do not but in an in process model I think perception should always be supported alright so what happens when0:52:51
he needs coordinate two things or more than two things these things these Casas and other things are not going to work right because you can't coordinate them so you need something else0:53:01
one possible other thing it's probably not the only one is software transactional memory right or any kind of transactional thing which0:53:11
allows you to coordinate the activities of multiple arbitrary regions so multiple timelines we're going to say okay this this action um I'm invoking is0:53:23
going to affect three things which means somehow their timelines have to have to meet they have transactional0:53:32
capabilities which are not really interesting for this but the most important thing is we're not walking away from the epical time model this is still the epical time model for any0:53:41
value that's going to participate in an SDM transaction it's still the same thing you're going to have some function on the past produce the future a pure0:53:52
function and values in and out so what does this look like now there's multiple identities right potentially or places0:54:01
or whatever you know whatever whatever contract is meaningful to your program you still have that and any particular transaction is going to you know take an0:54:10
arbitrary set of these and atomically do that function transformation so it's a way of collecting a bunch of little micro processes and making them one0:54:20
process internally each one works exactly the same way as before I just didn't put all those arrows because it would be unworkable and the set of0:54:30
transactions themselves feels like a like a timeline in particular you know if blue and yellow don't overlap they0:54:39
technically happen at the same time really they happen at times there's no time right because there's no succession between those two things there is no0:54:48
time you'd have to superimpose it because we said time only is a derived concept from one thing happening after another so if they're unrelated it0:54:57
really gets messy about time that's physicists will so I left perception out of this because0:55:07
this is too too messy so so what's what's the perception story for for STM can we look it can we look at the whole0:55:16
stadium at one time there can be glance and see multiple entities and it ends up that you can build systems that do that in particular an STM that uses0:55:27
multi-version concurrency control can do it and I'll explain that more later but I just want to show it to you first right essentially what happens is there0:55:36
can be perceivers what's really important about this diagram is they're still not in the timeline there's never been somebody perceiving who got up into0:55:46
this box right perception does not interfere with process you cannot lunge those two things together so relative to0:55:58
these atomic events that in fact more than one thing any perception is either going to occur completely after one or0:56:07
completely before it if it's transactional itself all right that's what STM's provide you can still do a0:56:16
non transactional scan you can you can pan you can look at this part of the stadium and then go over here and look at that okay or you could you know you can look at a car in the road and you0:56:26
can look up at the clouds and you see the red car here and look over the cloud you look over here and you see the red car right but you know when you're doing that what that may be the same red car0:56:35
now you realize when you're panning like that you're not seeing a point in time but you have the choice yes0:56:50
now we'll have to save that now they0:57:04
don't need to agree they don't I mean but you you can you could have multiple STM's right this STM sort of constitutes a little universe okay so we have0:57:17
transactional viewing which is like glimpsing we have non-transactional viewing which is like scanning so one0:57:26
way to do this is using multi-version concurrency control right this is the same all this all the stuff is old alright this is the same old stuff from0:57:35
databases but so multi-version concurrency control means that you're keeping some history in order to satisfy readers that's the that's the database0:57:44
thing but but there's a way to think about it in this model as well right that very critically one attribute the key attribute of multi-version0:57:53
concurrency control is that readers don't impede writers that perception doesn't impede process that's huge I think you cannot do without that0:58:03
everything about everything I've shown you before if you stick perceivers in the middle of those timelines your life is gonna get way more complicated stop the baseball game so we we don't want to0:58:15
do that and no and so the you know in this modeling you know let's pretend we're we're we're doing the real world in our programs you could say that0:58:24
multi-version concurrency control models light propagation or sensory delay right that whole chain that if if somehow the light bouncing off the baseball game is0:58:33
capturing its value you know enough good enough for us that transmission delay means that you know that value has got to be somewhere well0:58:43
it's being transmitted while process keeps going game keeps going so we do this by keeping some history quite interestingly and fortuitously0:58:53
persistent data structures make keeping that history cheap there's something profound about that that I don't understand the other cool thing0:59:05
about multi-version concurrency control is it allows readers to have their own notion of a timeline I saw this then and then later I saw that that is really0:59:17
important to some decision-making in fact when our brain reconstructs behavior that's exactly what it does we just looked at the sensory system its0:59:26
discretizing and snapshot izing everything okay well but well we definitely have per steps for you know there lion is running towards me we're gonna go running we have to read arrive0:59:36
that that running and we do that by saying you know by saying by a mental process sets that somehow allows us to compare a snapshot from before to a0:59:47
snapshot we knows later right and see the deltas of that and say running lion in addition we again we know the0:59:57
difference between the visual scan we know when we've looked at something and we and we've carelessly looked at something else over here we are not allowed to correlate those things and say they happen at the same time so you1:00:09
know this is really not talk about STM but I do think that the one takeaway I'd really like you to have is that STM's are different from each other there's no one STM if you want to beat up on STM1:00:20
you know pick one pick its attributes and you know find out what's wrong with it because there there are some that I think really get time wrong still get1:00:31
time wrong right so if you don't have multi-version concurrency control you either are gonna be limited to scans non temporally related I looked at this1:00:41
thing I looked at that thing I have no idea I have no ability to look at two things at once or you're gonna have some right you're gonna be back - wait wait stop the process so I1:00:53
can get my perception in the middle of it right without multi-version concurrency control that's where you are the other thing about STM's I think is1:01:02
super super critical is that granularity matters okay if you're using an STM that either that forces you or you're incorrectly using1:01:12
an STM and you find yourself requiring a transaction in order to see a consistent value you have got time wrong again okay so STM's that require1:01:24
transaction to read you know four fields of an object consistently or not doing time right they're not really solving1:01:33
this problem okay so in conclusion sometimes I mean all the time I think if1:01:43
you're suffering from excessive complexity you got to think about you know changing something and sometimes people actually do change might remove four languages of work garbage collected1:01:52
substantially to ones that are because that reduces our implicit complexity there's no other reason okay but in the1:02:02
current state of the art object-oriented languages this conflation of behavior and time and identity and state is just making our lives much much harder and1:02:12
it's going to get worse okay we need to become explicit about time in our programs we really need to pay attention1:02:21
to the functional programming people who are saying you know look at all these great properties of pure functions they are there they definitely you know1:02:30
satisfy whiteheads you know we we move forward by taking away they need to understand the insides of things okay I believe1:02:40
that this epical time model is is worth trying I think it's general model it supports multiple implementation ideas and it1:02:50
will work in the local process I'm not you're talking about distributed computing at all the other thing I can tell you is that the current infrastructures that we have are1:02:59
sufficient for experimenting with this for doing the implementation so you know so what what you know it's still unresolved here well coordinating this1:03:09
internal time with the external world it's going to become an important thing and it's a hard problem and you know tying STM transactions to IO you know1:03:20
transactional IO would be a very interesting and I think possible thing again overall though you want to move away from transactionality1:03:30
transactionality is you know control control of control you want to become as happy as you can with a lack of control that will give you more concurrency1:03:39
there always could be more parallelism and the more performance and you know there definitely could be more work done on parallelism inside these data1:03:48
structures there are more data structures to be done I'm sure in better versions there are definitely going to be other time constructs I think doing1:03:58
moving locking under this model is extremely interesting because locking has some particular efficiencies that we'd want to leverage and there's a way1:04:07
to understand it in terms of this and I'll leave as an open question for everybody else you know is there a way to reconcile this with object orientation it could we separate1:04:17
perception of an object from its identity enough so that we'd still get the benefits of objects but we don't get a mess later1:04:37
it's not for questions or I know where right away okay any questions yes well I1:04:59
don't know what those talks were but you know from what I've seen functional programming you know in many respects just tries to get time out of the way1:05:08
get time out of my well that's not what this is this is about time is a important part of programs you have to contend with it's not advocating purely1:05:17
functional programming here at all I'm saying you know there are programs there are programs that are done on one of those boxes right one transition from value to another right that kind of1:05:26
program is a calculator right most programs have to deal with that that progression of time and that's a hard problem so I'm not trying to walk away1:05:35
from it I'm trying to walk towards it1:06:10
yeah well I mean read whiteheads he really did that he really did exactly what you're saying I'm just trying to to program without going crazy but I but I1:06:20
am inspired by am inspired definitely by by that by those notions that I think they're really important for this for fixing the model we have yes yes yes I1:06:46
mean I think I think there's lots of cool things I think that's interesting I think the whole wave you know what's happening in wave right now those operational transforms are a really interesting way to to think about this I1:06:56
mean there's a lot more to this for instance the the composability of transformations and things like that that are very interesting but yes it is1:07:29
I mean the communications part gets tricky I think I like the fact that this is sort of communication free but maybe1:07:38
time constructs different form of communication I know there's different they may be well it is sort of in the1:07:50
flow of time so that's that is definitely in the coordination aspect that's communicating to people you next now you know you yes1:08:07
I'm having a good time right now I don't1:08:17
the garbage collection pressure of this is going to be you know significant so just keep making everything you have faster that's for ten times that's1:08:37
pretend time did you know I don't want to characterize functional reactive programming I will say this I brought my worked in broadcast automation systems for a long time I read that book I saw1:08:49
absolutely no correlation between that and what I actually had to do in the real world at all right turning no but fabricating time and turning it into your own arguments of1:08:59
functions that's that now you're punting again you're pretending right that's not time that's again punting and soon as1:09:10
you connect it to the outside world you'll see that that's the case1:09:23
move back you can store two tables or aggregation as long as all your data is1:09:34
immutable I think you're on the right track you know I think the key takeaway here is there's no such thing as a mutable object you know if you if you can really believe that you can build1:09:43
better systems which is not which again is not to disagree I mean obviously I like functional programming right but I1:09:52
don't see them talking about a lot of the problems that I think real people have Thanks1:10:01
[Applause]0:00:00
Design, Composition, and Performance Short - Rich Hickey
0:00:00
thanks for having me it's nice to see everybody here bright-eyed after a night0:00:09
of heavy thinking this is the obligatory legal disclaimer okay so we're gonna0:00:22
talk about design and of course as you all know by now I don't actually write talks I just look up stuff in the dictionary and so I looked it up in the0:00:31
dictionary I saw this great definition prepare the plans for a work to be executed especially a plan to plan the form and the structure of that thing so0:00:41
this is there's a lot of really interesting stuff in this definition the fact that there's a plan the fact that upon having the plan something's going to be done and the plan is going to be0:00:51
executed and that's about form and structure that's really good there are other definitions they are one of which is this there's nothing wrong with this definition it's just this is not what0:01:00
I'm talking about today we're not talking about how things look at no point today when I say design am i using this meaning of the word design and the0:01:09
route goes back to des ignore II which is which means to mark out and it's also a very interesting thing it has to do sort of with writing but also with sort0:01:18
of demarcating right to designate or to sort of set something aside but you know most simply we can just say it's about0:01:28
making plans and and also writing them down unfortunately I think that you know with all our effort to become agile0:01:37
where we're letting design become less of what we do and and so there are a lot of butts when people say oh design you0:01:46
know it's like you know we already write down code do we need to write down design you know do we still need to do designs because obviously it makes sense0:01:55
if you're going to build a house right you can't just say let's just go build the house you have to write something down so that somebody can go build the house because the realization of the plan has a different form than the plan0:02:04
right the plan is something written down but the realization is made of wood and nails and things like that but codes already written down so we have something to read you know do we still0:02:14
need designs or can we just generate them right can we just write our program and then generate some documentation from the implementation and the answer is no0:02:23
that's that's not a plan it would be something written down but it's not it's not a plan the other argument you get against design is oh my god you know I've lived0:02:33
in the 80s and design stunk you know there's people who thought they could just do everything they were do it's all top-down monolithic designs these giant plans you know etched in foam books that0:02:45
never came true and it's true that those were plans but they're those are not good plans and that doesn't mean planning is bad it means that that that style of planning is it's not not good0:02:54
so what do we mean when we say what do I mean when we when I say good design I think one of the most interesting things about design is that you know people think it's it's generating this0:03:05
intricate plan but designing is not that designing is fundamentally about taking things apart it's about taking things apart in such a way that they can be put0:03:16
back together if that makes sense so separating things into things that can be composed that's what design is if0:03:27
you just if you just make this intricate thing you really haven't designed in a way that's going to support the things designs need to support like change0:03:36
everything every component of a design should be kind of about one or very few things sort of the nature of it breaking things down until they are nearly atomic0:03:47
and only then do you take those things that you've broken apart and compose them to solve the problem you set out to0:03:56
solve but the first job is to take things apart in such a way that you can do that and and a good design process is iterative this isn't a grand plan you do0:04:06
once and then you go and and and finish obviously there are some kinds of design that you do have to sort of etched in stone because you're going to you know build a fabrication plant and have a few0:04:15
few options later but in software we know the the materials are working with us so malleable we can get some cycles back to iterate so when I say taking0:04:26
things apart what kinds of things can we take apart we can take apart all things in this list and I'm gonna break them down one by one so what does it0:04:35
mean to take apart requirements this is actually quite important obviously there's a job another job also sort of gone wanting these days called analyst0:04:44
there used to be analysts and designers and you know this whole water Waterfall model but that but there are requirements and usually we get these0:04:53
now directly from customers and they often take the form of I want this I need that I want I want I need I want I want and the first job we have to do is0:05:02
decompose those wants and needs into problems because obviously underneath all those wants and needs are some problems the the customer wants to solve0:05:11
and understanding those problems is the key to designing something that solves the problems because oftentimes the design that solves the problems is not what the customer said they wanted or0:05:20
needed right we separate requirements into knowns and unknowns from you know we know how to do some part of this job and we don't know how to do this other0:05:29
part that's quite important there are problems you know now again we're dealing with problems we've already sorted things so there are problems that are on the domain side might possibly0:05:39
some domain experts going to need to help us solve them and other problems that are on the solution side how will it scale where we run it how much will it cost to operate how much energy will0:05:48
use things like that the other thing that we have to take apart are the difference between causes and symptoms so sometimes a customer will say I have0:05:57
this problem right the problem is you know my screen is black and that's not actually the problem you know that's not the problem the problem is that's a0:06:07
symptom of a problem the problem somewhere underneath it so another thing you have to do when you're when you take things apart is take apart causes and root causes from symptoms because you0:06:17
want to get two causes because the thing your design needs to address are the causes of problems not the symptoms because you could just throw up you know and add a JPEG over that black screen0:06:28
and you like its face the other requirements are the unstated requirements which are always present there are problems that nobody wants to have in the future0:06:38
like I want I don't want the system to be something I can't maintain I don't want it to run out of memory I don't want it to run really slowly they often don't even say that these are problems0:06:47
because they're just problems you know their future problems they don't want to have but they end up being part of the requirements set other things we can0:06:56
take apart we can take apart time and order and flow the use of cues the use of idempotency commutation and transactions or all ways in which we can0:07:07
separate apart when things happen and and often these are the paths to separating apart who does these things and these terms are become more and more0:07:18
important to see than more and more in systems level design right commutativity is going to become a huge thing for us to be thinking about and then there are0:07:27
times when we really need to know that a bunch of things are going to happen together and transactions help keep things separate by keeping things that are supposed to be together together we0:07:37
take apart places and participants where things are going to happen and what components or processes are going to do them right this classic quote you know0:07:46
just add some indirection and there is a lot of that a lot of design is just putting in appropriate levels of indirection but there this kind of indirection happens at all all levels0:07:55
for instance part of a design is incorporates how it's going to be built you know is this pot is it possible for more than one person or more than one0:08:05
team or people working in more than one language to build this system altogether or you know does everybody have to work in the same space at the same time with the same tool then that your your0:08:14
process for building it isn't going to scale because you haven't taken that apart it's another kind of thing you take apart this next one I think is really super critical and I think we0:08:25
don't understand it or think about it enough in our software which is the difference between the information our systems is going to manipulate and the0:08:35
mechanisms by which we're going to manipulate it right so so to just talk about it simply here we'll say the set of logged in users my system is an idea and it would be0:08:46
information that my systems gonna need to manage but if I have a set class or some sort of set construct in my programming language that's a mechanism0:08:55
by which I might achieve representing that information but unfortunately because we only have you know our programming language and our programming0:09:04
language constructs to represent both these things we often conflate things that might be appropriate as mechanisms as being appropriate for information and0:09:13
they are desperately not so so we have a lot especially in an object-oriented language you have a lot of very mechanical kinds of classes and when you0:09:22
use them to represent information like for instance any kind of mutable information object is an absolutely atrocious idea it's really really bad0:09:32
and it comes out of the fact that we're not separating these two things when we're doing our designs finally after we've gotten something we think might be an answer we need to take that apart so0:09:43
I have I have maybe one or more possible designs that I think address the problems that that I'm trying to take on at this point I have to look at each of0:09:52
those solutions and take them apart from a bunch of perspectives right do that what benefits do they provide and this I think we have no problem with everybody looks at the library libraries like it0:10:01
does this it does that you know it's a floor wax it's a dessert topping you know it's all benefits it's all benefits it's very rare that you see somebody say and here's the trade-offs of using this0:10:10
here's that here's the here's what's not good about it here's where I decided to do X and I and I'm not going to be able to do Y and beyond being honest about0:10:20
that to ourselves about our own designs is really important it also helps us communicate to stakeholders right you are going to get this and you're not going to get that okay there's nothing0:10:30
wrong with that we have to see what the costs are and the other thing we have to do is determine problem fit sometimes you can take on this big solution you know part of which solves your problem0:10:39
but you've taken on this big thing you know and only part of it really addresses your problem do you want that whole big thing is it really a fit is there a smaller0:10:48
solution or answer that's a that's a closer fit because both may solve the problem you can have a set of choices here so I think design is really important I0:10:58
mean that's what I do and and I know a lot of other people do it and I and it but I think it's important to enumerate why it's it's good to spend time on this0:11:08
a design helps you understand a system right without a design you're sort of flailing around wondering why is this this way a design helps us coordinate0:11:17
that becomes obvious as you get into you know having teams you can't build two things and never have talked to each other if you don't have an agreed-upon plan written down0:11:26
somewhere it's unlikely that your two things are going to plug in together and work you know it's like finding a napkin what we wrote it on design helps0:11:37
extension right to the set that you've broken things into separate parts with it with an eye towards connecting them back together it means that you're resulting design is going to have0:11:46
connection points and when you want your your system to do something new you'll it would be possible to do it because there's something there that's why0:11:55
design is not just about you know accreting up to an answer because when you do that you don't end up with any connecting points you don't end up with any building blocks and you can't really0:12:04
extend that thing similarly the flipside of that is to the extent you've broken your problem down into reusable parts and compose them those parts may be0:12:13
separable from your design and useful in another context right and that's how we get reuse reuse comes from design it doesn't come from language constructs or0:12:22
anything like that of course everyone does design driven testing right because that's the right kind of testing if you have designs and0:12:33
they specify things well and you have some automated way to go from that specification to a test that's that's good testing everything else is backwards and the last thing I think0:12:44
that one of the main reasons people say oh I don't I can't do design is I don't have time I don't have time to do design but but I will make the argument that design is the key to more efficiency0:12:55
because it's a lot easier to iterate a design than it is to iterate a solution or an implementation and even after0:13:04
you've got an implementation right because I said it's iterative so you're gonna have some form of it or it generation after you've done this kind0:13:13
of design and you're iterating and you need to do more design that design often takes the form of ah you know what does this part of the system I actually didn't breakdown enough I0:13:22
need to break it down more I need to take this thing and split it in half and that's where I find most often when I have more more to do in an iterative0:13:31
design process it's almost always because I look at something and say that's still doing more than one thing I just need to cut it in half and it ends up that that kind of further0:13:41
decomposition is really easy to integrate in an ongoing process in in further development so it is possible to do in you think about design is taking0:13:52
things apart you can always take something more apart it's great okay so this talk is going to be about design composition and performance and I know0:14:02
you know when you think about composition performance you think about bar talking culturing I know I do so this is the great thing0:14:13
about the dictionary is that you know there's the first definition and then there's like more definition and so there are more definitions obviously and more notions of the word composition0:14:23
performance and this is near and dear to me because as intro said this is my background how many people don't know who bar truck bar taco culture in our0:14:33
it's okay if you don't so Bartok was a Hungarian composer it sort of bridged the Romantic era in the modern modern era and a phenomenal0:14:43
composer who had a great interest in the folk music of Hungary and and really advanced I think still to this day0:14:52
modern composition in very interesting ways and John Coltrane and and by the way was also a pianist and performer0:15:01
right he in and teacher of piano John Coltrane is you know possibly one of the best improvisers humanity has ever0:15:11
produced a fantastic saxophone player played fundamental you know mostly tenor and hear soprano sax he was also a0:15:21
composer and and wrote very interesting tunes and advanced jazz harmony in those in those tunes but he's going to play the role of a performer today so we have Bartok the0:15:31
composer and culturing the performer so when we talk about composition and in particular music composition you know what are we talking about now we're into the arts right and it's quite0:15:41
interesting right because the arts usually are not about solving real world problems at all but if you look at any of the art forms where there's somebody0:15:51
like a composer it could be a choreographer or anything like that the first thing these people do they have they have blank slate they have like blank staff of music0:16:00
blank page empty stage the first thing they do is they make problems for themselves they set up a set of constraints under which they're going to0:16:09
form an artistic work it's quite interesting that they create their own problems to solve from there and this0:16:20
happens again for all these different forms and for they're from there they now have a design problem they're like oh great I just made myself a design problem now I can go and design now I can go try to solve this problem solve0:16:31
those constraints I've set up for myself and make a plan which will be realized by performers so when we look at music0:16:42
composition it's it's quite interesting that there's a variety of specificity and scale to the designs that we that are created often you'll see fully0:16:51
orchestrated pieces so classical music is is typical typically this way where every every note that's going to be played by every instrument is specified0:17:00
in a score and what's what's particularly interesting I think about music composition here is that this specificity is much more common the0:17:09
larger the scale so they're taking on bigger problems and the bigger the problem they take on the more they specify it's kind of an unusual I think0:17:18
the other kinds of compositions you see are like songs and we'll take the lyric side of the picture and just say melody and changes and here you're giving the0:17:28
suggestion of what the piece is going to be but the specifics are left out this kind of design just says here's the melody here's the harm and everything else is gonna be left to0:17:37
the performers they have a lot more latitude and with that a lot more responsibility I think one of the things that's interesting is that software strategies straddles these two worlds we0:17:48
tend we'll work on the smallest part of the system to have the most specificity our designs we're working at the largest scale in our systems we have the least specificity but I think that one of the0:17:58
things that that programmers are afraid of it are these large scale designs that are going to completely tell them what to do they're like oh you know don't0:18:08
repress me man I don't want to see this and certainly that was well-founded not in the 80s people thought you know people would you know make pictures and push buttons and get COBOL to come out0:18:18
and that would be what software was going to become so programmers were legitimately skeptical if not afraid of that so here we have two examples of0:18:28
what I'm talking about the piece on the left is the segment of bartók's concerto for orchestra and you can't see the details but you know this there's all the woodwind parts and then the0:18:37
percussion and the string parts all completely specified the phrasing you know the the tempo everything is0:18:48
completely notated although not totally completely because you see this has been written on and it's been written on by the one person who has some latitude in this kind of an approach which is the0:18:58
conductor and of course the conductor's struggled with some things that might not have been said like they have to translate the Italian tempo markings into you know actual beats per minute0:19:08
and there's lots of contention about how to do that on the right we have a completely different kind of composition this is my favorite things it's Richard Rogers you know from Sound0:19:17
of Music that's a very pretty song but you can see here you know just roughly that very little is specified there's a melody and then some chord changes and0:19:27
John Coltrane quite famously did a rendition of this piece which is absolutely gorgeous and if you've never heard it you need to hear it before you0:19:36
die yes it's great really really great but you see the difference in how much is specified so we've talked a little bit about constraints already in design0:19:46
and again it's the same kind of thing happens here our music composition most compositions are about something and this is true also most dances are about something and0:19:55
most plays are about something and most screenplays are about something or a few things so this problem that the composers and and an artists set aside0:20:08
for themselves are you are normally pretty simple they try to focus on one or a few ideas and a music that would be melodic or Tamil or rhythmic kinds of0:20:18
ideas and they're gonna take this fundamental idea and and and work it work it out resolve it try to see what it's about try to explore what it means0:20:28
and when you get to larger scales like when Bartok does these larger compositions you end up with this set of constraints at each level right larger0:20:39
works just have more structural components but they're very stratified so he has all kinds of techniques for dealing with harmony in the small and0:20:48
form in the large on the flip side now we have the performers space improvisation and the root there you know takes us to not foreseen or not0:20:58
provided so it hasn't all been written down in advance right it's not completely specified there's this melody and these changes and they provide0:21:08
constraints for a performer who has to provide the variations on the fly the thing I think that's really important here is people think that some people0:21:18
think that improvisation is you know some genius just spontaneously emoting it is not that it is not the way it works the the best improvisers practice0:21:29
the most and John Coltrane is a great example of somebody who practiced in an amazing amount of time and studied quite extensively and what you end up hearing0:21:40
when you hear an improvisation is an application of a lot of knowledge and a tremendous amount of vocabulary it's0:21:49
almost like dynamic composition that that's happening in these in these improvisation 's and you can tell like0:21:58
now there are a lot of releases that include outtakes so you go and you listen to this thing you and you listen to it the first track and it's like I've been listening to this for years it was amazing he came up with this and0:22:08
he listened to the outtakes and there's like five takes with very very similar solos I mean Coltrane was working out this composition he was going to perform0:22:17
dynamically but the resources behind it were things that he had prepared he didn't just like make it up as he went along0:22:26
so it's this delicate balance of being able to be dynamic but having compositional sensibilities to apply on-the-fly that's what developers need so there's0:22:35
another great term in this space called harmony and you know the dictionary definition says a corridor congruity and that makes sense0:22:45
you know the degree to which things fit together specifically in music we get this notion of simultaneity so melody is0:22:54
about sequentiality right and harmony is about simultaneity when things are sounding at the same time whether it's an instrument that can play a chord or an ensemble where all the instruments0:23:03
playing at the same time yields yields a combination of tones there's another notion of harmony which is sort of the rulebook right there's a mathematics to0:23:12
harmony which is sort of the something you could study it's an order of science I don't know you can have that argument later but there there is this sort of system there are systems about how0:23:23
things fit together that you could study and I will argue here that harmonic sensibility is a critical design skill this is really0:23:32
what you need to acquire if you want to make systems that are going to work so I like bar talking Coltrane has examples because both of them were masters of0:23:41
harmony and by this I don't mean they were masters of the rulebook they were masters of the the the notion of harmony in fact both of them were students of0:23:51
harmonious nests if you look at their look at their careers and what they did you know Bartok you know came from a0:24:00
Romantic tradition and was aware of a lot of intellectual exercises going on in composition to try to modernize it break free of the old rules of diatonic0:24:09
harmony and go to new rules you know just we're going to get rid of the those constraints and just have differ constraints and and he sort of never really went there with serialism he0:24:19
stuck with with with essentially eternal systems but he went beyond the rules he tried to figure out what exactly worked0:24:28
and didn't work and he explored for instance this Hungarian folk music which was which was tonal but didn't follow the classical rulebook because it was it0:24:37
was folk wasn't you know academic and similarly you see Coltrane doing the exact same thing right on giant steps there's this famous reharmonization0:24:48
system that he developed that really broke the rules of the day but were fundamentally about retaining what0:24:57
worked about harmony and finding new ways to figure out things that work together so they both essentially developed new systems while while0:25:07
retaining a focus on what fits together and I think that artistically they're just tremendous artists because there0:25:16
was a lot of intellectual effort I mean if you look at the insides of some of what Coltrane is doing it just seems like the most emotional thing but it has a tremendous amount of intellectual0:25:25
stuff going on there it's the same thing with Bartok you can listen some piece it will make you cry and then you go and you look at the score and it's full of like Fibonacci ratios it's like oh my0:25:35
god that's really cool all right so what does this have to do with anything that we do right closure programming0:25:44
languages tools or anything like that right its closure like a song is it like a symphony is it like these things or0:25:55
languages like these things no what are they like we're gonna play the like game they're like instruments so let's look0:26:07
at instruments this is what I'm particularly fond of I happen to have one that's just like this instruments are particularly interesting right0:26:17
instruments start with something called an excitation right again we see the same notion most instruments are about one thing right whether you're going to0:26:27
pluck string cause a reed to vibrate whack on a string whack on the you know a drum skin there's this fundamental excitation0:26:38
and the rest of the instrument is completely about it it's about shaping it and conveying it then any particular instruments going to0:26:47
provide some sort of human control interface right these are all interfaces they're really good interfaces you should study them they're very0:26:56
interesting well it's keys on the piano right or on the saxophone frets on the guitar and pedals on a piano and all instruments have these things most of0:27:05
them at least fundamentally address pitch and also volume but they can do they can do other things as well so this is where the pro the performer can0:27:16
exercise control and then the rest of the instrument is oriented towards projecting this excitation so there's an0:27:25
excitation that you get to shape and the the rest of the instrument is about directing that energy at a particular outcome usually it's about directing the0:27:34
sound at the audience but when you try to extract the ideas out of this it's there's a fundamental idea and the rest of the instruments about directing it at0:27:44
a good outcome there's another interesting aspect of instruments which is that while there is this initial excitation once you get an instrument in0:27:53
play once you get this body that has some air inside of it it has its own modes of vibration so it will tend to vibrate at certain frequencies and0:28:04
that's known as the resonance of an instrument and this is sort of the harmony of the physics of instruments right a good instrument designer right is going to try to make an instrument0:28:14
whose whose fundamental resonances are compatible with the the excitation so again the harmony sort of comes into0:28:23
play when we look at instruments we see that they're incredibly limited right the piano can't even play any in-between notes it's like it only plays exactly0:28:33
the notes - which is the keys correspond and a saxophone can't play more than one note at a time except you know with some techniques and even then it can't play0:28:42
arbitrary pairs of notes at a time right most instruments are minimal yet in some ways sufficient right so an instrument may have a limited range but if it's an0:28:51
instrument designed for western music within that range it probably has all the notes right but there may be other things that are limits like certain transitions between notes or registers0:29:02
might be awkward or impossible but there are instruments that don't comply like a blues harmonica doesn't have all the notes it has only one key at a time and0:29:13
it's a little bit like our dsls right the thing is that players overcome these things how many people here play piano right so how do you overcome the fact0:29:22
that there are no in-between notes what's it what do you do yeah like trills right trills and grace notes and things like that are all techniques piano players and piano composers use to0:29:32
deal with the fact that there's no in between notes Coltrane quite famously because he had such dexterity and and speed on the saxophone had this0:29:42
technique that was coined the sheets of sand where he could play a scale so fast that he could he could give you the0:29:51
effect as if there was harmony even though he can only really play one note at a time so they can fix this stuff by performance you know because otherwise0:30:02
what are we gonna do we're gonna submit a patch for pianos let's fix that nasty one at a time you know no one between those problems so why don't why don't we do this why haven't know all these0:30:11
things been fixed why isn't the saxophone been fixed you know to play more than one note and and the reason is that no one wants to play a choose a phone no one wants to be to choose a0:30:21
phone player all right some people do want to be useful let me say this then no one wants to0:30:30
compose for choose a phone right let's imagine you had an orchestra where everyone was sitting in front of one of0:30:39
these this is a modular synthesizer if you don't know right it's an electronic instrument where you you you patch together modules that are either just you know0:30:48
tone generators or filters or things like that and eventually you can control it with the keyboard or some other sort of source but but what happens if you try0:30:57
to compose where the base units are things like this well you have this problem right your fundamental target is complex right each sub component you're0:31:06
trying to reuse is really complex you also have this nested design problem right as a composer if you know composers study the instruments they0:31:15
study flute and they learn what the range is and what transitions are good and what transitions are bad what's hard and what's easy and they've learned that for the entire orchestra and they can0:31:24
look at a flute or think about fluid and know I know what fluid is about and what it can do and therefore I can use that knowledge to build something that uses flute and violin and whatever and make0:31:33
it make something that works together but if each piece is complex you end up with this nested design problem you can't look at one of these things and know what it's going to sound like when0:31:42
you press the key and it may do something completely different tomorrow and like if you were wondering what would happen if you pulled out one of the wires you know that's a good0:31:51
question it's totally fair so it's hard to it's hard to build things out of things like this other interesting things about instruments instruments are0:32:01
made for people who can play them isn't that outrageous isn't that scandalous they're made for0:32:11
people who can actually play them and that's a problem right because beginners can't play they're not yet players they0:32:20
don't know how to do it again I think you know there should be outraged on the internet we should submit patches we should fix like the cello right should cellos auto-tune0:32:32
right or maybe they should have red and green lights right it's green when you're in tune it's red when you're not in tune or maybe they shouldn't make any0:32:43
sound at all till you get it right wait is that is that how it works is that what we want right no that's not how it0:32:52
works look at these kids they're being subjected to cellos there's no this there's nothing helping them here0:33:01
although apparently apparently their shoes have been taken away until they get it right but otherwise those you know they're smaller but those are real0:33:10
cellos they're hard to play they're awkward they sound terrible they're gonna be out of tune they're just it's gonna be tough for a while for these kids but if they had any of0:33:19
those kinds of aids they would never actually learn to play cello they'd never learned to hear themselves and to tune themselves and to listen and and0:33:28
and playing a cello is about being able to hear more than anything else and that's true of most instruments so we need players here's where I would rant0:33:39
I'm not gonna rant but just as a simple well I'm gonna rant a little bit you know as a simple example look here's a guitar player a harp player double bass0:33:49
player all holding up their their blisters imagine if you downloaded a library off the internet and it gave you blisters right the horror right and yet0:34:00
every musician right has overcome a barrier to entry similar to this right the thing we have to remember is that0:34:09
humans are incredibly capable right in particular we are really really good learners we've just evolved to learn that's what we do the other thing we're0:34:19
really good at is teaching right and the thing is that neither of these things are effort free they take time they take effort we should not sell humanity short0:34:30
right by trying to solve you know the problem of beginners in our stuff we need to make things for people to use and we need to teach people and trust0:34:40
people to be able to learn how to do that because fundamentally we are novices and and the other thing is we're only absolute beginners for a very short0:34:49
period of time on the flip side we're beginners forever right we never totally get it we're gonna be learning on an ongoing basis this is just the human condition0:34:59
this is not something to be fixed it's not something to submit a patch for right this is how it works right so effort is not a bad thing right these0:35:09
are two guys who are experts and yet you know they're still trying right this is not this is not autocomplete right so0:35:20
just as we shouldn't target beginners in our designs nor should we try to eliminate all effort it's an anti-pattern it's not going to yield a0:35:29
good instrument okay it's okay for there to be effort another interesting thing about instruments is that they're0:35:38
usually for one user right this is a tool made for two people to use at the same time that's extremely rare for tools or instruments to be for what more0:35:48
than one person to play at the same time and yet we have these complicated T MUX whatever things that people parent now I0:35:57
don't really want to diss pairing because while I personally don't understand it I can see analogies for instance rather than to two musicians at the same instrument to two musicians you0:36:06
know playing in the same room and that obviously has good effects but you have to wonder you know if this is just a wave you know to keep somebody from0:36:15
typing all the time right if we haven't otherwise built in time for design is pairing one way that we're trying to buy it it's a fair question right so what0:36:25
should the ratio be between planning and performance right what's what's the ratio of time spent practicing and studying versus performing and recording0:36:34
for musicians how many people here are musicians or play instrument I it's usually pretty high amongst people in the software world do you spend more0:36:44
time practicing or on stage definitely practicing right if you're hired to perform in an orchestra I mean they don't even book you for even half of your time to be actually sit in the0:36:54
orchestra they know you're gonna have to spend time shedding I don't know why as software developers we think we can just show up when we how many people here dedicate 50 percent or0:37:04
more of their time to not programming yeah not too many but that's not everybody else practices I mean0:37:14
obviously we went to school and study wherever we studied but yeah musicians are not like well I went to school so now I don't need to practice anymore I'm just gonna like go and show up at the orchestra and they're gonna0:37:23
let me play because I showed up this is a great quote you know you you have to you have to prepare to be creative and you have to keep preparing to be0:37:32
creative or else it's not gonna happen you can't just show up and play but I know you know there's gonna be complaints right we're we're in a whole0:37:42
different space right Coltrane couldn't build a website in a day you know I don't know why this has become so important to us is really like a stupid0:37:52
thing to be important especially to an entire industry so I'm not really gonna spend a lot of time on that but it's it is a fair question to say you know in what ways is this different right in0:38:01
software we seem to have saw I mean like this all these ones and zeros and like so many ways to put them together right it's not as simple as making an instrument it's not wood and metal we0:38:10
just have this sort of infinite nature to the to our to our resources you know how does that impact you know the difference in how design works and well0:38:20
I just have to show this so obviously as soon as we had the ability to make sound by using electricity right so having the0:38:29
electricity drive a speaker and have the speaker vibrate and have sound comes through the room I mean the initial applications were we recorded sound with microphones somehow captured it or transferred it live right and then the0:38:39
wire hit a speaker and then it came out on the other side but as soon as that capability was there people started imagining well could we just take out that first part and directly generate0:38:48
some electricity that would sound good when it comes out the speaker and from that was born electronic music and a music synthesis and then the theremin is0:38:57
one of the earliest synthesizers and it's from I think the 20s and it's quite interesting it has extremely simplistic sound generation capabilities there's a0:39:07
little oscillator in there and something that changes the the waveform very slightly and the way you play a theremin is that there are those two things those two pieces of metal are antennae0:39:16
and and the one the vertical one is controls pitch and the closer you get to the intent of the higher the pitch and the further away you get the lower the0:39:25
pitch and the loop antenna controls the volume and the closer you get to the antenna of the lower the volume and the further away the higher the volume and you play theremin by moving your hand0:39:36
through the air you do not touch it so it's an extremely simple instrument but over the years more and more sophisticated electronic music musical0:39:45
instruments came about those modular synthesizers I showed before are built out of modules and again now we're starting to see the same kinds of things we see in software now modularity right0:39:55
there's modules each module is about something right so there's an oscillator there and the filter and and they have interfaces they connect together and0:40:05
it's quite stunning the this is pre digital so the way these things communicate is through control voltage completely analog just dynamic voltage0:40:16
variations are what connect them together in your patch voltage from connector to connector and you build systems but of course we also start to start to see the levels right if you0:40:26
look at the back of one of these modules there's another piece of design there right there's the circuit and these are analog circuits that um that determine0:40:36
what the module does the other thing that's interesting it's a little bit hard to see from this diagram is that each of these knobs has a corresponding jack in other words there's a human0:40:46
interface and and a machine interface to the same things and the machine interfaces were were there all the time in fact they were first and then the0:40:56
human interface has come we have to remember that as design thing right what's wrong with sequel what's wrong with sequel is there's no machine0:41:05
interface to sequel they only designed a human interface to sequel and we've suffered ever since right because we have to send these strings around so you0:41:14
won't always have a machine you can always build a human race on top of a machine interface but and the other is often disgusting all right so now we see the things that0:41:25
we start to see in in in programming right there's this whole design stack this guy Yves Busan he he's amazing I think he's like a biochemist or something and in his spare time he's a0:41:35
C++ programmer and it's his other spare time he actually makes these modules so he designs and builds analog modules and then he of course he builds synthesizers0:41:44
and then he builds sounds with those synthesizers by patching things together and then hopefully sometimes he gets to play that little keyboard there and actually make music but there's this0:41:54
entire stack of design the design the modules the assembly of the modules into a into a rack things oozes your choices then the patching which is designing the sound that the modules going to make and0:42:04
finally maybe some music later so it's interesting to see when that stuff gets captured so this guy happened to help this company make this synthesizer which0:42:14
incorporates several of those kinds of modules inside but doesn't have any wires on the surface so a bunch of decisions have been made about how the modules will connect together the0:42:23
oscillator will fill the filter and the envelope generator will control this those things were all already decided this same guy made those decisions and they were captured in this device that's0:42:33
now a lot closer to the to the programmer in fact you don't see the machine interface at all there's only the human interface and it's more like an instrument that you could just turn0:42:42
on and play of course you still have to do some sound design on the on the top the important thing to know is that as you start looking at things with these design stacks you really have to pay0:42:53
attention to where you are in the stack right when when this guy is is soldering together one of these modules he's not making music that day right he's not and0:43:05
and and this is the problem we have right our problem is you know this is what somebody feels like right this is what somebody feels like when you say you should use emacs they feel like they0:43:16
want to make music and you handed them a soldering iron right that's what that's what you just did and why does this happen to us it happens to us because we use the same stuff all the way down for0:43:27
us it's code at every layer it's always software right in this in this thing that is like this you do something different when you perform you do something different when you're0:43:36
sound design you do something different when you assemble modules and something different still when you you know wire things together behind the scene but for0:43:45
for software developers we all have soldering irons we all can like jump to the absolute bottom and and mess around it doesn't actually mean that we're skilled enough to do that0:43:54
but it's there and it uses the same stuff and we think well you know I got some solder and some spare time I should you know design a filter and and I think0:44:05
it yields to a tremendous amount of distraction and also tremendous amount of expansion you know we change things and enhance things because we can because we're all luthiers you know I'm0:44:15
glad I mean it would be cool to know how to build a guitar but you know if I knew how to do that I'd play less guitar because I'd be fiddling around with with with wood and you need to decide what0:44:26
you what you want to do so there's a sense of which having a lot of choices which we seem to always seek when I gotta have choices you know don't repress me man it's the0:44:35
opposite of enabling us to accomplish things and in fact you'll see time and time again people documenting the fact that constraint is actually a driver0:44:44
right when you have fewer choices you make them and you get on with it when you have a ton of choices you sit around and like wonder about things all day so constraint is a driver of creativity0:44:53
that's not a new thing I mean people have been saying this forever but this is what they mean this is how this is where it comes from so if I was going to advocate anything it would be you know0:45:03
as a as a as a as a community and an industry we spend an unbelievable amount of time catering to ourselves it's just0:45:12
ridiculous it really is because we can talking about it you know enhancing things adding stuff and you know you0:45:22
know where this goes right it just it creates these monsters right and you know the chances are good that every one0:45:31
of these modules is a good idea let's just say it is every one of these things is a good idea but if you if you take every one and you just keep accreting them and adding them together you end up0:45:41
with something that you can't play I mean I don't care if you could configure this thing with spring right nobody wants to play this and in fact no0:45:51
one can play this this one this one actually plays itself it's in a museum somewhere and it spontaneously makes0:46:00
music no one sits it in and tries to play it it's just it plays its off and I mean I'm sure it makes interesting sounds but I don't think it can play my0:46:10
favorite things because it doesn't it has no memory it just makes up stuff so I think the thing we need to keep in mind is there are people who can make0:46:19
music by waving their hands through the air right they don't need Emacs or anything else they don't need it a million options or deep class0:46:29
hierarchies or anything else don't start playing this right now the whole room will be like theremin whatever but it's really cool to listen to this so getting0:46:41
back to the topic what is design I would say on one level design is about imagining things right that that doesn't mean like imagining that you have every0:46:50
possible option you want to embrace the constraints you because you're gonna use them they're gonna help you the trick is to not let them get you down right the0:46:59
key thing the design is saying I have the set of problems I have the set of constraints and I know I can solve that right if you combine the optimism with0:47:10
the constraints you're gonna get designs and so you want to imagine a lot you want to take these two things and run with them and try not to like get the first answer but try to get a lot of0:47:19
answers and then have them to choose from however the net flipside of design is design is about making decisions right after you've got this spectrum of0:47:30
things you think are interesting answers you know maybe you thought of a hundred times as many things as you as you as you need to you need to admit very0:47:39
little you want to mostly say no design is about making decisions right in fact the value of a design is in conveying0:47:48
those decisions to the next person right when that guy worked on that synthesizer with the company and made those decisions it's to help out the next person the musician that doesn't want to0:47:58
solder right they need to be able to trust the decisions the prior person made so they can get on with their level right if you leave every option open and you0:48:09
make everything configurable you're not designing you're failing to design you're failing to make decisions and convey them to the next person you're not actually helping them right you're giving them a chooser phone right on the0:48:20
Performing side we already saw a little a little bit of this the Performing is preparing its it's practicing right which which we do right we do a lot of0:48:30
coding so we get a lot of practice in it but it's also studying and I think the biggest thing in trying to connect you know performing to to developing or0:48:40
coding is is this is this nature right Coltrane was great because he had fantastic design sensibilities that he could he could apply right it wasn't all0:48:52
about his fingers I mean he had incredible dexterity he did great he had incredible technique and incredible tone that he got from playing a lot but what0:49:02
he played was a result of what he studied and the analysis that he did and the thinking that he had done about how music works he if he just had the finger0:49:13
speed and everything else he wouldn't have been the great player that he was so to wrap things up take things apart0:49:22
take things apart with a knot with a hammer and smashing them but with an eye towards how you're going to put them back together that's what design is about you want to design like Bartok and0:49:33
by that I don't mean you know specifying every note for every player that I'm not advocating that I don't I don't believe that's the way software design is or ever should work but to think about the0:49:44
way things fit together at every level of the thing you're working on when you're working on large things as he did you have to bring the design sensibilities to the small things and to0:49:54
the medium-sized things and to the large things and often apply different techniques and different ways of thinking for each level you know ways that are appropriate you want a code0:50:04
like Coltrane right bring those bring that harmonic sensibility bring that design sensibility to to the coding that you do0:50:14
into the code that you right I think most fundamentally we want to seek out and build languages and libraries that are like instruments in0:50:24
all the ways that I just talked about in particular in the simplicity aspect right all those things I showed you were0:50:33
different ways to say instruments are simple in that deep way that's what we want and finally to pursue harmony to0:50:43
actually think about how things fit together it's real easy to code write out a problem and then get an answer but if you haven't thought about how things fit together that answer is not going to be0:50:53
easy to maintain or easy to change or easy to reuse so you have to think about harmony as you go and then go do it I0:51:04
hope you enjoy the rest of the conference Thanks [Applause]0:00:00
Design, Composition, and Performance - Rich Hickey
0:00:00
hi thanks for coming I'm very excited about this conference it's always great0:00:09
and I'd like to thank the organizers for inviting me today to talk about design composition and performance so we start0:00:18
with a legal disclaimer prepared by lawyers we're gonna have some fun with analogies today and the cool thing about analogies0:00:27
is there's there's much fun when they're wrong as they are when they're when they're right so design is something we talk about a lot in in software0:00:37
development but I think is something that's somewhat beleaguered these days not because people don't do it but0:00:46
because I think people are in a hurry and they're trying to get things done and I often get developers asking I'd like to be able to work at the next level and talk about you know the design0:00:56
of things before I just code up solutions so what does it mean to design something and as you all know all I do to prepare these talks is go to dictionary.com0:01:05
and look stuff up so I looked up design and one of the definitions is this which is really great definitions are always great it says to prepare the plans for a0:01:14
work to be executed especially to play on the form and structure of that work and and I think that's that's super important the notion of executed it means that somebody's0:01:24
gonna do this we also have a different notion of executing things and software which is that this is going to run which is also interesting there's another definition which is to decide what the0:01:34
look of something is and there's nothing wrong with that kind of design I just want to say this talk is not about that at all never when I say design do I have this meaning in mind and of course we go0:01:46
to the roots and see that the root is in to mark stuff out and and and that matters so it's really sort of two0:01:55
things here we want to make a plan and we want to write something down and of course here's everybody's like oh you know we did this right and we did this0:02:04
in the 80s and we had all the stuff and and it was terrible we had these phonebook sized specifications and nothing got done and it was all awful so we already write down code is that0:02:14
enough do we still need designs if we this and and the answer is yeah because0:02:23
that's not a plan that's just what you did can we generate Docs for implementation same thing right that's not representing a plan to do something0:02:32
it's like I already did it and so somebody asked me for a documentation or a design and so I pulled a lever and this came out and of course there's this0:02:42
complaint right I don't want to do this these things are big they're top down they're waterfall model etc etc we did0:02:51
that already and it's true that happened and they were plans but they weren't good plans and so when I talk about today's is a little bit about what do we want to have0:03:00
a design and and when we encounter in the world what do we see so I think that a simple idea behind design is to look at it in terms of taking things apart0:03:10
this is the opposite notion we typically have typically people think of about design and they say design is you know making this big involve plan that's gonna solve every issue that the system0:03:20
is supposed to address but I think I think that's not what you do when you're trying to get a design that's gonna survive and live over time I think instead what you want to do is break0:03:29
things apart in such a way that they can be put back together and that's fundamentally what design is about taking things apart so you can put them back together because obviously taking0:03:38
things apart and walking away is not really gonna help the other thing you find in good designs is that they're always about one or very few things0:03:48
designs that survive designs that really Foster reuse are about a single thing generally and then you put them together0:03:57
so the first thing you do is to take everything apart then you compose them and then you're solving your problem but the first thing is to taking apart and there's nothing about this that's in conflict with iterative methods for0:04:07
developing software right this can be an iterative process and and all that happens then is that you get feedback during development to your design so0:04:19
what kinds of things would we take apart there's a whole bunch of things that we could take apart and in fact you're constantly finding more things you can take apart the requirements for a system0:04:28
you can take apart the order in which things happen who's gonna talk to whom information parts of your system from mechanism0:04:37
parts and and you can actually take apart different solutions to assess their merits so let's just look at each of these in turn taking apart requirements this is something we do not0:04:46
do often enough somebody says I want a system it does X I need Y you know it's got a doozy we get these feature feature lists and the first thing that we should0:04:55
do when we're handed that is to break them break them apart and try to find in the set of requirements in the set of features or desired things or needs the0:05:05
actual problems and it's only by doing that that you can start to move forward and say ok I've broken your need into problems that you have and then we can try to make things that solve those0:05:14
problems the other thing you're going to do initially with requirements is divide them up the simple way to take them apart is to say these are things I know how to do and these are things I don't know how to do so your knowns from your0:05:24
unknown you're going to take apart requirements that are domain side the system must do this to satisfy this business thing from solution side things like we need to run on AWS or something0:05:33
like that often you get requirements especially for systems that already exist that are about it's not working0:05:42
and everybody's hard that you know how do you fix it's not working and the first thing you have to do when you're trying to fix it's not you know what's not working is to separate out right0:05:51
what's the cause of this problem from what's the symptom of this problem all right somebody says my screen is black you're not ready I would say okay I know how to fix black screens and0:06:01
start typing because it's not a generic solution to the black screen problem yet and then there are a whole bunch of requirements that are unstated and and0:06:10
these need to always be enumerated if if not taking it apart but they need to be in mind at all times unstated requirements are the things that that0:06:19
everybody wants the system to avoid like I'd like a system that doesn't keep crashing or use up all the memory or cost too much to run or use too much energy or require a lot of manual effort0:06:29
or the users will hate and so the unstated requirements are often a set of things that your software's supposed to not do not cause attributes it's not0:06:39
supposed to have so we want those on the table other things just completely different dimension things we can take apart when we do design which is time right you can0:06:48
take apart the order of things how would things are going to flow from one to the other you can break system's apart so there's less direct calling you can use queues to do that you can support0:06:57
redundant activity with idempotent [Music] approaches commutation is a very important concept that's going to be0:07:06
more and more prevalent as we tried to build systems that are highly distributed which says I can make a system order independent by supporting0:07:15
operations that are all commutative then I don't care how things come in so it's it's a technique for breaking apart I used to have this order dependency and0:07:24
now I don't so now I have to separate things I can talk about independently and transactions are the opposite when you say I do need to know these things going to happen together we can take apart0:07:35
place and participants and there's a certain sense in which design is always about this but you know there's this old adage right you just add indirection0:07:45
but here we're talking about possibly the whole process of building something right having a design is the thing that lets two teams work independently or people work in two independent languages0:07:55
taking apart things is what facilitates the participants the authors as well as the participants for instance the system's by breaking things apart you're0:08:06
able to say well run this on this machine here or run this in this tier or we'll put that on the web this one is kind of interesting because I I don't0:08:15
see it talked about often enough which is to separate information versus mechanism so there's always information that our system manipulates for instance your system may have the notion of the0:08:26
set of users who are logged in that's an idea that enumerated set is a is a piece of information and then you have like0:08:35
you use the set class a collection class from your your you know your favorite framework library to put the logged in users and one of the problems I think we0:08:45
haven't in software development is we use the same stuff for both of these things but these are two very very different things one is sort of this0:08:54
device into which you stick stuff and you can back later it's kind of a little bit of a place and the other is a piece of information which you should really not treat that way at all so pulling these0:09:03
things apart telling talking about your system and clearly differentiating the stuff that's information from the stuff that's sort of the mechanics of your0:09:12
program is it's quite critical and then finally once we've we think we have an answer we have a potential solution we haven't implemented it yet but we're0:09:21
looking at it or maybe we have implemented some of it you want to take those apart to see not just the benefits right those are pretty evident usually but also the trade-offs what part of0:09:31
this is not going to work how much is it going to cost to run and does it eventually fit the problem because a lot of times what can happen is you can adopt a solution that is larger than0:09:42
your problem and then what do you have you have two problems right yeah you have your problem and now you have this thing that was too big too big for it so0:09:53
it's not just about getting answers it's about breaking things apart in a coherent way so I'm a big fan of design I think that we need to do a lot more of it and we need to talk about it more and0:10:02
we need to spend more time on it but but I think it's pretty easy to rationalize why why we need it the first is so that we can understand the system right a design is hopefully smaller than the0:10:13
code that that implements it and so it's easier to get our head around what it's about the other thing as I was talking about is design is fundamental to coordination0:10:23
right if you don't have some plan you can't just send two people off to write you write 1/2 a system you write another half of the system that's the end of the conversation what's gonna happen well0:10:33
they're gonna wonder which half of the system they're supposed to write right because there's no plan so there's no way to have coordination and to have multiple groups working on something without a design design also facilitates0:10:45
extension and extensibility people are always like oh I want to make something extensible but the easiest way to make something extensible is this breaking it apart thing because when you're broken0:10:54
it apart you end up with two separate things you end up with pieces that are meant to connect to other pieces which means there will be connecting points on0:11:04
those pieces and therefore when you want to do something new you can make a new extension and it can Lev courage that connecting point because it had to be in place because the things were separate the flipside of that is0:11:14
this reuse aspect which is when you broken stuff up into separate pieces that have nice interconnecting points you can pull them out of one context and0:11:23
put them in an other context and that's how you get reuse these are not like magical things and they're not attributes of api's necessarily they0:11:32
mostly fall out of this decomposition finally testing is greatly facilitated by design ideal testing takes some0:11:41
design constraints some specification and turns it into tests as opposed to sort of embodying design inside tests that's inside-out but again that's0:11:51
something we have to work more at systems like quick check are interesting because you're basically starting with propositions about your system which reflect a design and saying you write0:12:00
the tests computer and finally I think the thing that's often most readily pulled out as an argument against design is I don't have time this is gonna slow0:12:10
us down and in fact I think it's it's the opposite in particular there you know those all these adages about you when is it when is it easiest to and least expensive to fix a bug right not out in0:12:21
the field right if you've already shipped it it's the most expensive so people like oh we should fix it in you know QA or we should fix it in our code and do you know test-driven design it0:12:31
but the thing is you can keep moving back it's most it's easiest to fix your problems in OmniGraffle right you just say ooh that is not gonna work and you0:12:41
like move some boxes around and it's fixed it's much cheaper than fixing software but even after you've shipped I think that there's a lot more efficiency in systems that have been designed0:12:51
because because you're going to be able to go back to something and usually the answer to your problem in the field is0:13:00
I've just insufficiently broken something down and so the solution I'm going to need is just breaking it down more and that's less expensive then I created this giant ball of everything0:13:10
and I need to untangle it so I do think in the end it's more efficient so that's design the talk is about design0:13:19
composition and performance and so I got to take composition and performance together and one of the beautiful things about dictionaries is0:13:28
there's more than one meaning for each word and so there's more than one meaning for composition which of course we think about composing systems out of pieces like I was just describing and we0:13:37
think about performance and systems usually as you know how fast did they run but when I think about composition I often think about Bartok and when I0:13:47
think about performance I often think about culturing and so these are two musicians now Bartok is a hungarian composer and but he was also a performer0:13:56
he was a pianist and taught piano and Coltrane is a famous saxophonist and great performer but was also a composer0:14:06
so I'm not trying to pigeonhole these guys but we're gonna use Bartok to stand in for the composer and culturing the standard for the performer and talk0:14:16
about two different notions of composition performance and maybe how they might inform software so composition music composition and other kinds of art creation is about0:14:31
addressing constraints it's about addressing problems but not real world problems right art doesn't solve real world problems in general and if it does it's more than art it's something that's0:14:40
something else so it's quite interesting that the first thing that composers tend to do when they have a blank page they can do whatever they want is make up a0:14:51
bunch of problems for themselves they actually create a bunch of self-imposed constraints and that's true of all the other art art forms right in general0:15:01
you're gonna see this and composition is designed for performance so we saw that definition of design on the first slide and it's said to be executed and that's0:15:11
what composition is you're writing something you're anticipating someone's going to perform it or do it later it's the same thing screenwriters presumably going to act it out choreographers0:15:20
presume someone's going to dance it so it's design you're sobbing constrained problems by creating your own constraints and you're designing this something to be executed so it's very0:15:29
much a design pile and it's an organizational challenge right you're trying to address these constraints that you've set up for yourself and that's what that's what0:15:38
composition is it's quite interesting that when you look at music composition you end up immediately seeing a tremendous variety in in the specificity of compositions and the scale of them0:15:48
and and it's it's telling that software source strata straddles these two things the first is you see fully orchestrated0:15:58
music right it's fully arranged all the notes are written out for every part and this is typical at larger scale so bigger compositions orchestral compositions and operas and things like0:16:08
that tend to have full orchestration and then smaller compositions you might have only a melody written out and the chord0:16:17
changes for it for say a song they'll call it a song but we're gonna not talk about words today and and in in those compositions you have a lot more latitude for performers right because0:16:28
you're not saying you must play this note at this register on this instrument at this time this loud you just said you know this is the melody and have at it so there's more responsibility for0:16:37
performers so there's this whole spectrum I think that when people push back against design they're afraid of this first one right because again back0:16:46
in the 80s we had those stuff where people had plans that you would draw pictures and push buttons and it would write programs maybe people still have those plans but they're conducting them0:16:57
in secret but I think programmers are like you know don't repress me man I don't want to do this this this big thing but I think that again you're0:17:06
going to have the spectrum you're gonna need a lot more writing down especially if you're going to share amongst people and then in the small when you're talking about your own individual effort maybe you don't fully annotate the same0:17:17
way so we can see two pictures of this here I don't expect anybody be able to read that but on the left is the concerto for Orchestra it's a Bartok piece and you know this departs all0:17:28
right now for the strings and the percussion and the winds and it's all specified although it really isn't though I don't know if you can tell0:17:37
there's red markings and some other things on here which were notes taken by who the conductor because the conductor says the or I don't know what to do here0:17:47
you didn't tell me exactly exactly exactly what to do so I have to decide what the tempo is or how to balance these two sections against each other but in general that's pretty fully0:17:57
specified on the right we have my favorite things as it would appear in like a jazz real book and yeah this is the tune from Rodgers and Hart sound of0:18:11
music and and this is all you would get if you're a jazz musician right here are the changes and here's here's the melody and you you move from there so I talked a little bit about constraints and and0:18:22
again it's quite interesting to see how this lines up so most compositions are about one or a few things same kind of thing you sort you're setting out you're0:18:32
saying what does this piece gonna be you rarely say I'll use these notes for a while and then those notes for a while then those notes for a while and then you know call it done never you never do0:18:42
that you see composers come up with little motifs and things that they're going to reuse or transform and sort of riff on as time goes by so there's all0:18:51
these ideas that you're going to set up as as boxes in which you're going to work and you have variations of those things that you'll do resolution and0:19:00
then the scale the composition really just determines how many of these things you have and maybe how many different levels there are right so a big Bartok0:19:09
composition is going to have very very fine-grained constraints about melodic motifs in a very particular part of the the piece and then at higher levels of0:19:18
structure deal with you know big form kinds of decisions but again they're self-imposed constraints when we move to0:19:27
the performer side of the coin the improvisation side like culturing I think it's quite interesting again0:19:37
it's an interesting word it means not foreseen or not provided and not provided means you didn't have you know the answer upfront before you went and did it you're like you weren't handed a0:19:46
complete plan before you went and so in the case of a jazz performer you're gonna have melody and changes right and then you're going to go and provide variation0:19:55
make something up but I think that people have a tremendous lack of understanding of what goes behind improvisation for instance a lot of0:20:05
people think Coltrane is just this genius who was spontaneously emoting I think that improvisation in music is just making stuff up off the top of your head you know it's just amazing it's0:20:15
like hacking right just I am so awesome I'm so bright I'm just gonna like make this up but it's been quite interesting to see as we've gone back through the0:20:26
archives and have these new releases of old recordings where they put the the alternate takes in there because you'll see Coltrane I mean he had the solo it0:20:36
sounds incredibly spontaneous but any listen to the other six versions and you realize that everything that went into the solo that you thought was this amazing one-off he had worked out and he0:20:47
was trying them in different orders different justice positions different Cadence's and levels and maybe the order of it you know was was spontaneous but0:20:57
there was a tremendous amount of preparation associated with that so so the sense in which improvisation is is is dynamic composition of prepared0:21:07
materials a planned material and that to be a great improviser means to make those smaller plans or have those kinds0:21:18
of prepared abilities or approaches or sensibilities that you can apply when the time comes in a live situation and0:21:28
you have to have a lot of knowledge to do this and a lot of vocabulary to do it it's just not something that you you make up and Coltrane was a genius at0:21:37
this preparing you know he he practiced more than anyone in order to seem as if he was making it up most fluently so0:21:48
another thing that sort of crosses the lines in composition and performance and in music is this notion of harmony and again we get this nice word for it which0:21:57
is a chord or a congruity right how do things line up and again this is lining up notion in the simultaneous imal0:22:07
tennety associated with harmony right so we have melody is sequential and harmony is parallel right some music did all this before we had computers and0:22:17
so so this is how does things work together at the same time if I played these three notes at the same time what will happen or for Coltrane if I play this note while these chord changes or0:22:26
the set of notes what these chord changes is happening what will that be like bar chalk had to imagine you know when the strings are doing this and the and the winds are doing that what will0:22:35
what will it sound like all together alright there's also sort of a mathematics of harmony right which is the science behind it or the the way you study the the rules if you will of0:22:46
harmony and I'm gonna contend that harmonic sensibility is a super critical design skill this is the thing that you want to nurture in yourself and and it0:22:56
may be a little bit hard to see how the mapping works from from music to software but it's fundamentally what a good designer has they know if they make0:23:05
this choice in this context that's going to go together and those two things going to work well together and and they know that because of their experience0:23:14
and the study that they've done working systems so I think both Bartok and Coltrane are interesting even though they're in completely different genres0:23:26
of music and that they were both masters of harmony if nothing else you can say they're the two are similar because they totally mastered harmony and and in fact0:23:37
what was interesting about both of them was that they were students of harmonious nassif you will that the thing I think that they were most interested in was what makes things work0:23:47
together well Bartok studied obviously the classical tradition but his music was not compliant with those rules and it's0:23:57
because he brought a whole bunch of influences in from studies he had done of folk music of Hungary and and what he studied in that music was the sonority0:24:08
that was possible in these tunes that didn't follow the classical rules but they still worked and so he pulled out what worked about that and wrote pieces0:24:17
that are cards have really recognized as being completely tonal but they are tunnel and there's a there satisfyingly consonant as tonal music is which is quite quite astounding0:24:28
similarly Coltrane invented whole new ways of doing reharmonization / / / chord changes that had that same0:24:37
sensibility about about harmony so I think that what was cool about both these guys is that they both sort of developed new systems that preserved what was essential about things being0:24:48
harmonic or being consonant and and then the other thing that's quite interesting is that on both halves and whether you listen to a Coltrane improvisation or0:24:57
the most beautiful engaging piece of Bartok what's behind this is a tremendous amount of intellectual effort and activity I mean you can listen to0:25:06
this Bartok peace and be stunned by it just blown away by the emotional content then you go study the score and there's like all these Fibonacci numbers and ratios in it like oh my god this was the0:25:16
constraints he set for himself before he wrote this thing that seemed or was so emotionally powerful so there's a lot - a lot to appreciate in both of them but0:25:26
what does this have to do with anything that we do in particular what does it have to do with languages and libraries which is really what I want to talk about today languages and libraries is a0:25:38
language like closure or any other language it doesn't matter this isn't really about closure is it like a song or languages like songs are they like small compositions well they're like big0:25:47
compositions I don't think so I I think that I think that languages and tools to me if you're gonna map this analogy are0:25:57
more like instruments so let's talk about instruments it has to be one of my favorites I have that one again0:26:06
instruments are sort of their own design problem right instruments start with something called excitation right and0:26:15
there's a sense in which most instruments are about one thing right you pluck a string you cause vibration on a reed by blowing on it you strike0:26:25
strings with with the amounts of a piano or you hit drums and things like that and what's quite interesting is that0:26:34
very few inch or about more than one kind of excitation most instruments about one kind of excitation it's quite rare to see the other then this is combined with0:26:44
some sort of control or interface or gold technology on instruments to say and then there's an interface right so this excitation then this is interface for people to go and shape the0:26:54
excitation and finally there's an aspect of an instrument which is sort of its fundamental goal in the world which is to take that excitation and direct it at a problem and the problem for most0:27:04
instruments is how somebody going to hear this right how do we get the sound across the room so somebody can pick it up and so so instruments are about0:27:13
directing the force or energy of the excitation out to out to the audience they're directed and an outcome so it's a little piece of design work associated with an instrument instruments also have0:27:22
this other interesting aspect which is resonance right when you design an instrument especially something like a violin or a guitar or anything that has0:27:31
a vibrating body to it the body itself is going to interact with the excitation so the excitation the string is going to vibrate or whatever and the body is going to go and say ooh that's I like0:27:40
that I'm gonna amplify that and it will amplify some things more than other things so there's a design problem and there's a harmony problem to the physics0:27:49
of an instrument to say well you know if I build an instrument whose body resonates at a frequency that's not a harmonic relationship to the vibe to the0:27:59
to the strings themselves it's gonna sound awful and it's actually a physics problem to get that harmony right in the wood but instruments have a lot of other0:28:10
characteristics and one of them that's quite striking is that instruments are limited they're very limited piano can't play any in-between notes0:28:20
it can only play specific notes 12 root of 2 all the way across or maybe you stretch it a little bit but there's no in-between those saxophone can only play0:28:29
one note at a time this is awful I mean and these things have been around for hundreds of years I mean they didn't0:28:38
have github but I mean somebody should issue a pull request and like fix this right but there's a sense in which0:28:47
they're minimal yet sufficient for instance most instruments don't have any missing notes like for whatever range they cover they have all the notes0:28:56
that's right well at least we're talking about western instruments in the western scales but they'll tend to have all the notes but not all of them well right0:29:06
blues harmonica doesn't have all the notes right there's a kind of musical DSL right it's like you don't need all the notes you're just a business person0:29:15
I give you just the blue notes that's all you get and so you know it's just something to fix right there are all kinds of limits not just in the notes0:29:24
that can play but the registers that can play and things like that why haven't these all been fixed why can't every instrument do everything and there's a sense in which players can overcome this0:29:33
how many people here play piano right so what do you do to deal with the fact that piano can't play the in-between note what do you have you have grace0:29:43
notes and trills and mordants and stuff right that give you all that sort of feel around the note thing right John Coltrane famously0:29:52
became so adept at the saxophone and he had such physical prowess and and muscle memory and combined it with this0:30:02
gargantuan knowledge of harmony that he could play these scales so fast that he can imply not only chords but entire tonality superimpose entire tonalities0:30:12
over chord changes by just playing sheets of sound that's what they called it over over over music so it's not necessarily the case that the0:30:22
shortcomings of these things need to be fixed in the instrument right there may be there may need to be room for the performer to to do it and there's0:30:31
another good reason why we don't fix everything which is that no one wants to play it choose a phone right no one wants to play instrument that like does everything you push here it makes a piano sound and then it makes a drum0:30:40
sound and then this happens and that happens so some people do want to play it use0:30:49
this is a Keith Emerson sitting in front of a mogh modular synthesizer back in the day and that would that was just wow0:30:58
you can make it do anything if you plugged in the wires the right right so so I'll take a step back and say maybe some people do want to play choose the phones but no one I bet0:31:07
wants to compose for a choose a phone ensemble right so just imagine that you're sitting in front man Orchestra everybody in the orchestra had one of0:31:16
these in front of them right and they put the wires in whatever and you're the conductor you went like this and like when you say go what is going to happen0:31:25
right you have no idea you have no idea of what even could possibly happen if you're singing from an auction there's certain category of things that you think might might possibly happen but0:31:35
you can kind of get your head around what that might be and so the problem here is that where you try to where you to try to build a bigger system out of0:31:44
something with as much so let's say parameterization has these synthesizers you'd end up you're trying to target something that's complex and build something bigger still that's a recipe0:31:55
for disaster and there's a sense in which this is just the wrong way to go about things because you've got this design problem that's actually multi-level and it's nested right what0:32:06
happens when you say go well it's the sum of what happens for each person what happens for each person well it depends on where they put the wires and what happens you know what determines what happens when you put the wires well each0:32:15
module has a different thing that it does it may be a filter or maybe a sound generator or something like that so each there's a level there's a set of levels at which there must be designed0:32:24
I must design the modules I must design the sound the patch that cooks them together and then maybe I would try to take on a piece with all of this but unless there was a way to talk about you0:32:34
know one of those arrangements and get your head around what it implied you could never build up higher so another0:32:43
stunning thing about instruments which is just again it's astounding that the world has continued is that instruments0:32:53
are made for people who can play them who can already play them I don't know I hasn't everybody heard of like explain it to me like I'm five or0:33:02
whatever we're not supposed to do this anymore let's just make everything for beginners but instrument makers don't do that they don't make anything for beginners they make everything for experienced players instruments are made0:33:12
for people who can play them 100% of the time but we have this problem right beginners aren't players yet this is0:33:22
this is this is going to cause the world to stop right if you can't have a website with like three buttons on it and and everything that possibly could happen can happen we're done so we0:33:32
should fix this right just works technologists we know how to do this so we saw it's a cello right should we make cellos that auto-tune like no matter where you put your finger it's just0:33:41
gonna play something good play good note like you're good well just fix that should we have cellos with like red and green lights like if you're playing the0:33:50
wrong note you know it's red and you slide around and that's green you're like great I'm good I'm playing the right song right or maybe we should have cellos that don't make any sound at all0:34:00
like until you get it right there's nothing and then then you get it so I0:34:10
mean do we need to fix this here we go we have a bunch of children young children being subjected to cellos they0:34:19
then there's nothing different about these cellos these are regular cellos they're all sitting there in their attitude and it's hurts their hands and it's just awful I think somebody took off took away0:34:30
their shoes until they get it right so this is terrible but but it's what happens because what would happen if0:34:39
they had any of those other things I just talked about who could ever learn to play cello no one no one would ever learn to play cello there's this great article in the0:34:49
current issue of the Atlantic about sort of the trade offs let's say not the pearls but the trade-offs involved in automation it's got a great line in it0:34:58
which is that a learning requires inefficiency and it's it's quite important and when I read it and was thinking about this talk I felt like wow0:35:08
that's it does go together so we need players I would rant here but I won't but look at this my guitar player with blisters of0:35:18
harpists because blisters base by it was blisters right there's this barrier to overcome for every musician imagine right if you downloaded something from0:35:27
github and it gave you blisters right the horrors and yet how many people here playing instrument or have at one point0:35:37
in their lives yeah a lot of programmers do and for how many people did you just pick it up and it was awesome how many people wished like something could have0:35:46
made it more straightforward to get started with them like just made it easy and how many people would have believed after that that they could play later no not at all this is it it's actually0:35:56
quite important the level of engagement that's required is quite important so we shouldn't sell you manatee short humans are incredible in particular they're0:36:06
incredible learners right one of the things that's really cool is you give a five-year-old or I don't know eight maybe a cello and some decent instruction and they will learn how to0:36:15
play cello if they spend enough time doing it in fact humans will pretty much learn how to do anything that they spend enough time doing we're incredibly good at it and we're also really good0:36:25
teachers in general so I don't think we need to go to our tools and our instruments and make them oriented towards the first five seconds of people's experience because that's not0:36:35
going to serve them well it's especially not going to serve anyone well who wants to achieve any kind of virtuosic ability with the tools right no one we become a0:36:44
virtuoso on the cello if they had red and green lights when they started so nine or these two things is effort free but we shouldn't be in a game to try to0:36:54
try to eliminate effort because we where we are we are novices right there's a sense in which we're only going to briefly be novices you're only a complete beginner at something for an0:37:03
incredibly short period of time and then you're over it it's like we should not optimize for that but on the flip side we're always learners in the matter how much you you know how much time you spend on violin0:37:13
who's sister and says I'm done I've completed learning violin I finished it that's awesome I have personally know play violin at all but like I don't think there would be a player on earth no man0:37:22
how great they are who would say yeah I finished violin and I moved on to something else we're constantly it's just a human condition to do this so0:37:32
things take effort right just like we shouldn't target beginners we shouldn't try to eliminate all effort I mean look at these two guys these two guys are experts right is this the face0:37:43
you make when you're IDE autocompletes right does it look like that Oh job I0:37:52
doubt you doesn't happen right so your deal life has just been automated away and and I think that's sort of what's interesting is that yeah it sort of0:38:02
looks hard and in fact it's probably not hard for either of these two guys but what you're seeing here is a sense of0:38:11
the engagement in what they're doing right how engaged you feel on what you're doing when you're programming with IDE that's like doing everything for you you're just like so isolated0:38:22
from from what's happening so effort matters another interesting observation that's not really that important to this talk is that instruments and tools are0:38:32
usually made for one user at a time like this whole notion of like two guys on one keyboard the program that doesn't0:38:41
happen in instruments right now you make ensembles of instruments so I'm doing this and you're doing that we're doing them together in the room it sounds great we do that but dislike two people0:38:50
pulling on one tool that almost never happens right so I wonder if you know this pairing thing is just a way to keep0:38:59
us from typing all the time to buy one person some time to think right a little bit whoever's not pulling that's got a easy0:39:08
easy ride of course this is pretty fast switch back and forth so it begs the question right what ratio of time should we have between planning and performance0:39:18
right which is which in programming which one which one is is typing code and it's yellow0:39:28
but is that like the way other things work no how about for our Kestrel musician how much time they spend practicing verses at the concert way more time way0:39:40
more time and do they do they go and say I practice at college so I'm done practicing no so why do we think that we0:39:50
can do this right so we went to college or whatever we learned whatever and then like we just like we're gonna go and we're gonna like do it every day we just do it from here on once school we're done and I think that we do need to0:40:02
assess how much time we spend so how many people spend 10% of the time designing 25% 50% so I'm going up it so0:40:12
I can know more heads are gonna go up no one's spent 10 percent it's uh it's quite it's quite sad but this is sense in which it's there's a sense in which I0:40:21
mean it is sad it is it's actually sad it's not sad as a joke sad it's it's actually sad but the sense there's a0:40:30
sense in which it was like all right well this is all so it's so different right Coltrane couldn't build a website in a day I could you know I could do that actually I personally couldn't but I0:40:40
know other people who can and that's where another rant would go about how important is that why do we put so much priority on like how fast can a beginner0:40:49
do something and how can you like regurgitate a template in a day as none of these are things that we need to do on an ongoing basis to solve problems for the world but it's a fair point that0:41:01
you know software is not like instruments it's not way it made out of wood or metal right we have these ones and zeros there's so many combinations there's so many ones and so many zeroes like it just seems so open right so so0:41:13
how does this connected so it ends up that there is this connection between instruments and things that are more technological and there are0:41:22
technological instruments or electronic instruments this one oh I show a picture because I also have one of those it's called the theremin and you know no0:41:31
sooner did we have the ability to turn electrical signals into vibrating loudspeakers by having recorded stuff in order to send the signals through that somebody said I wish we'd have to record stuff I wish I0:41:41
could just make up an electrical signal and send it to the speaker let's cut out that performing recording partner let's just do let's just go right for the sound and so electronic music was born0:41:51
and this is one of the first electronic instruments where you you play this thing it's two antennae and you you the vertical one controls the pitch the0:42:01
closer you get the higher the pitch the further away the lower the pitch and the one that the horizontal one controls the volume and the closer you are the lower the volume and the further away you are0:42:10
the higher the volume so you can silence it by touching it and you do that and that's all you got you don't actually touch them at all and the knobs just change the tambour a little bit but0:42:20
really not very much it's not really about that this is an incredibly difficult instrument to play it was one of the first ones then things grew up and now we're starting to see things0:42:30
more like like we know we understand so these are some of the first electronic instruments that were made or these are the pieces of that instrument you saw before each module does a particular0:42:40
thing it might generate sound or generate certain wave shapes or it might be a filter that trims off high frequencies from those shapes or it might generate a low frequency0:42:49
oscillation you can use to multiply something else and one of the really cool things about this is not only do you see the first thing you know first or not the first ones but not only do0:42:58
you see examples of physical modularity but you also see examples of control so there's these little holes these jacks on the front of these things and they0:43:08
actually take or take in or output control voltage it's just voltage you send a voltage in and then depending on the module the voltage might control the0:43:17
pitch or the frequency of oscillation or something or the frequency at which the filter kicks in or the wave shape or various things like that and then the0:43:26
the knobs are are redundant things that sort of give a human interface to what0:43:35
you could have done through control voltages by plugging something into the jack and that there's something incredibly interesting about this and then you see behind these as there's a0:43:44
circuit right so we have the layers of of effort but this is a really good lesson here about human versus machine interface these things had a machine interface0:43:53
first it was all control voltage and then they put knobs on it so you could patch the control voltages around and build customized things imagine if0:44:02
someone had built something sequel without any machine interfaces unix but0:44:11
primarily with human interfaces just the knobs like if I gave you a bunch of these models and modules and they just had the knobs and I was like put it together you're gonna be like by doing0:44:21
what like putting little remote controllers on the knobs sort of like generating sequel text strings or something why would I want to do that or parsing0:44:30
random output from UNIX programs or specifying command-line arguments it's awful right but they they're these hardware guys that are smarter than we are so they built a human and trace on top0:44:42
of the machine interface so we're seeing this thing this stacking that occurs right this guy whose name I'm gonna mess up you son is somebody who's who's0:44:54
awesome he can work at all the layers of the stack he's I think he's a biochemist or something but in his spare time was not a biochemist he's a C++ programmer and his other spare time he actually0:45:05
designs these modules you see behind him a rack of these modules but he designs the modules so he can do the electronics work associated with building a module0:45:14
and building like an analog filter for instance or an analog generator and then he obviously can compose them he helps0:45:23
people make kits and then you can build them into racks and then you patch them together so then you're at another level of design where you're patching things together and setting the knob positions0:45:32
and designing a sound or patch they call them but saying a sound out of the module and then maybe sometimes there's a little keyboard nurse so sometimes he gets to play the keyboard and make music0:45:43
with this but this is all these layers associated to what he does and can do he happened to work with this company our0:45:52
Toria to produce a an analog synth which is rare these days that used to be all since we're analog like the pictures I was showing you but now they're kind of0:46:01
rare everything's become digital they came out with this analog synth and he how them to sign it and he really did design and what's interesting about what he did there was this thing doesn't have wires0:46:10
coming out all over the top of it the decisions about it has the same kind of modules inside it but the decisions about how they go together he made he said or he helped them make0:46:19
he said we should make this go to that and this is how the fool should work and these were what the parameters should be there are still knobs on the top but a lot of the other stuff has been incorporated in a design that allows0:46:28
people to only work at the next level up they do not need to care about what's inside this box and it's quite an important thing because for him right he0:46:39
has different days he has days when he's patching stuff together maybe his days when he's playing this thing and that's all fine but days when he's soldering he0:46:48
is not making music right and and this is what happens to us right this is what happens to us when we say you should use emacs right it's like somebody wanted to make music and you gave him a soldering0:46:57
iron it's like here you go have at it start at the bottom and why does that happen to us and the reason is because for us it's the same stuff all the way0:47:08
down for for in that space it's very different right designing an analog filter is look pretty tricky thing just from a mathematics perspective and then0:47:17
there's all the you know componentry associated with electronics aspect of it then there's actually being able to solder and put it together on a circuit board and then somebody completely with0:47:28
a completely different skill set to go and say I can patch these things together and turn these knobs and listen and understand what the architecture of these things is and make make a sound something else could walk up to that0:47:37
whole patch and say I can make I can make a composition with this sound but for us we had the same stuff at all the levels it's code the top levels code the0:47:47
middle levels code the bottom of those code we can do it all right we had the same mechanism at every layer so essentially we all do have soldering irons it's like any time you want to you0:47:56
can start soldering you could you supposed to be up here doing this you could just start soldering and just because we we we have the soldering iron0:48:05
it doesn't mean we're capable of doing things at all layers but we just do because we can't we have got the iron in hand and and I think it leads to a lot of distraction and expansion of scope of0:48:15
things it's like I was working on this and I realized if I rewrote the driver I could be you know ten percent faster and now I'm doing something I shouldn't be doing so there's a sense in which having0:48:26
so much control over so many parts of stachy gives us this paralysis there's so much that we can do at every point right so what are we going to do and I think that we need to and then of course0:48:36
the problem space has some constraints but we need to bring constraints of our own into play we have to do this for ourselves the same way composers do it for themselves or choreographers or0:48:45
directors do it for themselves they bring constraints in to help them move forward right this is not a new idea it's a very old idea but it's one we0:48:54
have to keep remembering constraint drives creativity when you don't have a lot of choices you're forced to pick an answer and move on you've got choices you could just Mull around about the0:49:03
choices all the time so making your own constraints is a way to help help you do that so I think we need to quit fidgeting right and climbing stuff on0:49:12
and fiddling around things and tweaking I mean oh my god as an industry we spent an inordinate amount of times time focused on ourselves build tools you0:49:21
know automating this and that and like just crazy crazy crazy stuff and talking about it and everything else and we should just be focusing on what we're0:49:30
doing because what ends up happening is when you keep fiddling with stuff and when you have no limitations to scope and no constraints right what happens0:49:40
right this thing happens and every one of those parts may be a good idea right it's probably all good ideas but if you take every good idea you end up with0:49:50
that right and like I don't care if you can figure this thing with spring right it's just it's not playable right no one0:50:00
wants to play this and in fact this particular one no one does play it plays itself the the actual patching of it is0:50:09
the composition and it's it's got like stochastic elements and it's caused it to generate novelty and it just it plays itself in a museum and0:50:20
I mean maybe we want programs like that but maybe we don't so you know in in so we should push back I think you know0:50:30
especially in open source projects right this is constant pressure like take my good idea take my good idea take my good idea and they're all good ideas but you know whatever and we need to remember0:50:39
right there are people who make music by waving their hands through the air that's it they don't need Emacs or anything else0:50:49
they can just do this and I'm telling you if you've ever tried to play the theremin and have it sound like anything other than a siren or a spaceship it's0:50:59
brutally difficult to do and and so I mean I don't know if you can see in her face but she's not making a face like the other guys but she is engaged I think the reason why she's not making a0:51:08
face is because the pitch changes if you make a face it's that sensitive but at some point go listen to that because it's it's it's beautiful so so what is0:51:20
design if we if we take a step back and sort of merge all these things together there's a sense of which design this is imagining right you you're if it's not just regurgitating something that's0:51:30
already happened before you've got your facing some set of problems you have to imagine potential solutions right and the first thing you need to do is rush at the constraints you don't want to be0:51:39
like don't don't constrain me I'm trying to design it's the opposite of that you're like gimme gimme gimme the constraints I want to know about everything and if you haven't given me0:51:48
enough constraints I'm gonna make some make up some because I want this thing to work of course when you're facing all these constraints it seems like negative or can't do this I can't do that it must0:51:57
do this in this size and whatever it's like oh you know so you have to be this a sense in which designing is fundamentally an optimistic activity but you have to stay positive in spite of0:52:06
all these constraints coming your way you have to stay positive remember people do design that have no constraints and pick constraints in order to get outcomes so that that0:52:16
optimism can be borne in the fact that this works and this is a way to make systems that work and you want to imagine a ton of things however actually0:52:27
designing is about making decisions which means you try to think up a hundred times as many things as actually use way more things than you use you don't want to think of one thing0:52:36
and be like okay let's go do that you know I think of ten things and then say this one is the one we want to do so you want to emit very little you want to be able to say no right because the value0:52:47
that you convey in your design is strictly about the decisions you've made right it when when you helped make that synthesizer he made a set of decisions0:52:58
are they perfect no and everything you want no do I wish I could patch a wire from here there yeah sometimes I do but you know what I really appreciate the fact that this thing just works and it sounds great0:53:07
and I can do the next thing I don't have to fiddle around with the inside of it so if you leave all the options open you're not designing that's not design0:53:16
everything everything configurable that's not design that's like do your own thing so performing is preparing its0:53:26
planning right you have to practice you have to study and in the end what you want to try to do is develop0:53:35
sensibilities that you can apply when you're trying to write code right if writing code is the performing part you have to have patterns techniques knowledge about what works and what0:53:45
doesn't to apply to what you're going to do you cannot just make it up as you go so design is taking things apart in0:53:55
order to be able to put them back together and that's really all it is every time I encounter something I can boil it back down to that every time I encounter something that I wish my0:54:04
design was better I need to do more of this it's over and over and over again it's always this I did not take it apart enough you want to design like Bartok0:54:13
that is to say you want to communicate very well you want to be able to work at multiple levels and you want to code like Coltrane you want to take0:54:22
preparedness and experience real experience with doing things not experience by doing the same thing over and over again and bring them to bear in0:54:31
what it feels like a more improvised thing right I'm encountering a new scenario in a programming project I'm really not making it up I'm really bringing my background in0:54:40
play to solve that problem I think you want to find and choose languages and libraries that are like instruments and all the ways that I talked about in0:54:49
terms of being simple directed at one thing oriented around people that know how to use them and expressing and0:55:01
backing some fundamental excitation or idea those are going to be the most satisfying and in the end pursue harmony in your own designs try to think about0:55:11
the nature of harmonious Ness in software right what makes things work together and apply that but thanks very much for listening and I hope you enjoy0:55:20
the rest of the conference [Applause]0:00:00
Rich Hickey On Clojure
0:00:00
0:00:12
I think I'm here to talk about closure it's a little bit different from the other languages you've seen so I'm gonna try to show you a little bit more of it but I don't have a lot of time so I'm gonna go very quickly it's about three0:00:23
years old it is a lisp but if you think you know lisp or hate it or whatever if you could just set any preconceived notions aside I've made a lot of changes0:00:33
enhancements and structural differences with lists of the past and in particular closure is not a port of any existing list for the standard lists in0:00:43
particular one key difference is that closure is functional and we've heard the word talked about a lot mostly in terms of higher-order functions being you know has two functions and methods0:00:55
but really I think the core of functional programming is an emphasis on immutability and on referential transparency on calling functions that0:01:04
are free of side effects I wanted a functional language on the JVM for a couple of reasons one I'm a commercial software developer and JVM0:01:13
and net are sort of the only two accepted platforms but I want to program at a higher level so closures designed to support the kinds of programs that0:01:22
programmers write in Java and my goal is to try to target the application space of Java in particular those commercial0:01:31
programs generally involve concurrency these days so that's a prime objective of closures to support concurrency explicitly in the language so you'll see0:01:41
the combination of the way it approaches immutability in its data structures with these built-in primitives for concurrency sort of got a complete package for functional programming on0:01:51
the JVM closure was also designed for the JVM it's not a port of another language and it was designed to be hosted to interoperate with the host to expose Java embrace Java call Java0:02:02
libraries it shares with Java a type system the calling conventions and everything it is a pretty much barrier-free interface to Java so and0:02:13
being a lisp it's dynamic and of course now lots of languages are dynamic and have learned many things from list so has the typical things it's interactive0:02:22
it has a ripple it sports that kind of devel you can change programs while they run particular has the characteristic of list that makes lists a lisp which is0:02:31
that code is data you represent your code as data structures on the compiler and evaluator evaluate data not text so it makes it possible to write programs0:02:40
the write programs which is the key to macros and syntax and tactic abstraction closure has a very small core you know0:02:49
half a dozen primitives plus some extra primitives for accessing Java which means there's a good chance I could get that part right in addition it makes a0:02:59
major change to Lisp I think in lifting off of the data structure which was the console that represented singly linked0:03:09
lists it lifts that abstraction off of that data structure and allows the libraries to be built on top of an abstraction and that's made the language0:03:18
much more powerful than lisps have been in the past and in particular also allows the power of lists to be extended to Java in addition closure is not0:03:28
object-oriented I'd be happy to talk later more about that but it's a it's a feature of closure closure has all the0:03:39
basic data types you would expect in particular it has good math that doesn't wrap so arbitrary precision integers0:03:48
doubles big decimals ratios those are all represented by their java capital letter types except for ratios that had to add it has strings it has character0:03:57
literals it has symbols and keywords which are a little bit different they're used for identifier z' and for keys but they're prime datatypes in closure so0:04:08
your composure programs out of these data types those boolean literals Anil unifies with Java null they are the same thing and they represent nothing in closure and district has regex literals0:04:20
so these are the atomic data types then it has a set of data structure literals okay so there's a way to write these0:04:29
data structures in your program and they mean what they say here okay parenthesized set of things they could be numbers they could be anything0:04:38
is a list and lists are singly-linked and they grow at the front whenever I say grow change add remove there's0:04:48
quotes because nothing actually changes but they grow at the front they have fast access time to the front linear access time down down the line closure0:04:58
also pretty uniquely has immutable vectors which are in square brackets they support very fast indexed access0:05:08
and in addition they behave like a ray list and that they grow at the end in constant time there are map literals there in curly braces it's key value key0:05:19
value key value key could be anything value could be anything all of the data types are values so the key could be a vector if you want ditto the value0:05:28
commas are whitespace put them in if they make it more readable but they're not mandatory a set is in curly braces0:05:37
preceded by a hash and that will be unique set of things everything nests that is the syntax from the language0:05:49
there's nothing else those data structures are the code there is a text representation that's where I just showed you and there's something called0:05:58
a reader which can read that out of a file and turn it into the data structures at that point those data structures get handed to an evaluator which compiles them immediately to Java0:06:10
bytecode where they're executed that's if you ask for them to be evaluated you can also use the reader to read data it's a nice format nicer than JSON or XML but in0:06:19
particular the syntax of language is not really based around text it's based around these data structures and how they're interpreted and everything else0:06:28
is very simple everything that would have been an operator or declaration or anything else is the parenthesized lists where the operator is the first thing in0:06:39
addition there are no statements everything is an expression so it's a pretty simple language this is a spell corrector written by Peter Norvig in0:06:49
Python and I don't expect to be able to read this my son not too bad but I just wanted to give you an idea of the the syntactic weight of closure Python is an excellent0:06:59
target because I think it has one of the lowest syntactic weights of any language given that it uses whitespace space in the way it does so the closure program is the same size it's the same number of0:07:09
lines and with the Python program those are the two shortest versions submitted to Norvik he had many people do them in many languages these programs are0:07:18
roughly equivalent so it's pretty I'm not really gonna have time to explain all this thing but it has lots of cool0:07:27
features list comprehensions pervasive destructuring all kinds of good stuff some syntax related to Java you can0:07:37
access statics by namespace slash the name that'll work for methods or will work for data variables there's a syntax0:07:47
with dots that allow you to get at members generically a member could be a member of a static thing or a member of an instance and course closure has macros and what you're seeing here0:07:56
already is a macro dot dot will take an arbitrary number of things and will put dots in between them as if you wrote that Java so that system dot get properties that get Java version we can0:08:09
call new we can also call new like this again though we have macros so instantly we can write code that's a lot shorter0:08:18
than Java Java code that's a lot shorter than Java due to a new jframe add a new label with this name pack it and show it0:08:27
you know the Java that is that which involves a lot of this not the thing that's not the thing that type of thing and in fact this is the expansion so0:08:38
like most lists there's a really tiny core and the rest of the language is written in language in terms of macros and functions the Java is is complete0:08:49
and good and wrapper free okay you say import this thing you're using Java there's no wrapper classes nothing in between you and these calls I'll explain0:08:59
that a little bit more a little later okay this is sort of closures reason for being the idea here is to allow people0:09:09
to do functional program in a way that's roughly as easy as programming in Python or Ruby and it starts by making all of0:09:21
those data structures that you saw immutable so you write these literals and your program with these objects but they cannot change like really really can't change in the you know in the all0:09:31
their members or final kind of way you couple that with a set of library functions a huge set that manipulate0:09:40
these data structures that have no side effects so these functions take arguments they produce new values they don't change things and this really0:09:50
makes for better programs your programs are easy to understand easier to test your mock object is kind of a joke that0:09:59
object is mocking you for needing it but I think III think that I think that even0:10:10
if your second currency side functional program functional programming yields better software but when you put concurrency into the mix I think0:10:19
functional programming is a huge huge win for many reasons and of course you read Brian's excellent book and you0:10:28
know how many times it's a immutable many many times but it's hard closure makes that easy because it's the default0:10:37
it's really functional like let bound locals are immutable I don't even call them variables because it would just be too disappointing there's a function there's a functional0:10:48
looping construct which sort of works around the fact that there's no tail calls and of course it does also have higher-order functions and and that kind0:10:57
of thing the library this is just a small taste of it for sequences is extremely powerful and it allows you to write programs that when you read them0:11:06
they explain what they're doing as opposed to you know imagine the equivalent loop you would have to write in Java and imagine reading that loop to try to determine oh he's splitting this0:11:16
up into pieces oh you know he's putting these in between those these functions tell you what they do but there's a couple of things that are interesting about sequences like I said it's an0:11:25
abstraction on the list concept of Lisp but because it's based around an interface in Java it means that it can be implemented on many data structures0:11:34
not just lists and in fact all of these functions and all of the sequence functions in the library are supported on all of the data structures of closure on java strings Java arrays everything0:11:44
Java iterable all Java's collections all of this stuff works on all of that so it's very nice another aspect of raising0:11:54
that abstraction off of a concrete data structure means that we can implement sequences lazily that cycle function returns an infinite sequence of numbers0:12:03
okay take only uses the first nine but it returns an indefinite amount Clos all of these functions and and the vast0:12:13
majority of closures sequence functions are lazy that is to say they do not produce a complete result it's another advantage of not working on a concrete0:12:22
data structure is that your intermediate computations do not need to return a complete intermediate data value you can you can concatenate a bunch of0:12:32
operations and walk through it and it's effectively like pulling on a pipeline and you're getting one thing at a time you're trading off some ephemeral0:12:41
garbage for never needing to have a giant data structure or a set of giant data structures in your program so it's all very pretty oops0:12:53
in addition is equivalent set of functionality for maps and sets you define well def just makes a top-level0:13:02
variable right we have a little literal map maps are functions of their keys SATs are functions of their keys arrays are functions of their indexes because0:13:11
well they really are and so close your lesson be that so you can call a map passing it a key and get the value at that position in addition there's a0:13:21
little extra sugar keywords themselves are functions of associative things so we can look up something in a map by just doing0:13:30
or that we can access the keys of the map the bowels of the map there's all kinds of functionality associated with maps associate or a so she is the way0:13:41
you would associate new keys and value pairs with an existing map and you get a new map M is not changed by that and0:13:53
there's some really nice functions right you need to work walk through a bunch of maps and increment counters per key that's it merge with plus a map and a0:14:05
set of other maps happens automatically when a build list you merge with conch move up a level working with sets as a0:14:14
set of set logic Union intersection move up one more level sets of maps are relations and there's a relational0:14:24
algebra for that it's a neat one because it doesn't require rectangles so I said everything0:14:34
is immutable which should be scary because immediately people say oh my god the amount of copying must be incredible and it's not viable to do full copying0:14:44
so there's this notion of persistent data structures it's an overloaded word like all the overloaded words we have in this case we mean of course persistent data structures or data structures that0:14:53
are immutable we're making a new version is cheap and both the new and the old version are available after the operation and both the new and the old version support all the operations of0:15:03
the data structure with the same time characteristics as our promised by the data structure so you can mate you can modify an element in a vector in0:15:13
constant time and both the new and the old version have access to the elements in constant time so they can't be full0:15:22
copies making that copy can take a linear amount of time or that wouldn't be that would match the performance guarantees so of course the trick is to0:15:31
use structural sharing but it ends up that vectors and maps are hard persistent data structures lists you see all the time binary trees red black0:15:41
trees easy but lie again is usually the best you can do closures persistent data structures are particularly neat the sort of the secret0:15:51
sauce of closure I based him around Bagwell's bitmap hash trees but I made them persistent his design was not persistent they're much faster than big0:16:02
Ola again they're actually big ol log 32n which is a great example of the difference between you know two things0:16:11
I'm in the same shape and one being practical in the other or not the constant factors matter and this is a little bit of what how the representation looks a bitmap hash tree0:16:21
divides the bit divides the hash into five bit vectors which are used for 32 way branching yeah Java there are0:16:36
interfaces there are Java interfaces for absolutely everything I'm showing you this this library is completely factored into a set of abstractions those abstractions for sequences lists Maps0:16:46
vectors a associative reversible it's a nice set of abstractions and you can use it from Java ok we're doing fairly good0:17:00
so that's how it works so it's efficient to use these persistent data structures which makes it practical and that was the key problem I had to solve for0:17:10
closure because I couldn't convince professional developers to use this if I didn't have hash tables really everybody needs them so now I'll talk a little bit0:17:19
about concurrency I'm not really talking about data parallelism like the kind of stuff they were talking about in the fortress talk I have some support for0:17:28
that I call you know I have a wrapper around fork/join and do some of that stuff but here I'm really talking about task level parallelism the coarse grained parallelism most applications do0:17:38
when they set some threads off to listen to a socket and a couple more to talk to the database and a couple more to deal with the web and this is a big part of the architecture of closure so what do0:17:48
we do now right we have direct references to mutable objects this is a Java program and really any object-oriented language that isn't sort of special right so we0:17:57
talking Java and Python Ruby they're all the same in this way and that they unify identity in value so we have direct references to mutable things which means0:18:06
if we're going to have concurrent operations that might change them we have to use locking in closure we're going to add back those method handles right or those object handles in order0:18:17
to get an indirection between a mutable reference and an immutable object it's inspired by SML wrath and then closure0:18:28
as concurrency semantics for the references is three different reference types which now means that the language can help it can enforce these concurrency semantics and you end up0:18:38
with no locks on the user side so this is what happens now there's gonna be many points in your program they're pointing to a chunk o memory okay and0:18:48
that chunk of memory is the identity of the thing so once that's the case they have to take care of all the stuff by themselves right any kind of consistency0:18:58
for that chunk of memory they have to arbitrate and there's no language level support for that so enclosure what we do0:19:07
is we have an indirection they're going to point to this box that's their identity that box in turn points to an immutable data structure any of the0:19:16
things I showed you could be a vector it could be a map it's often a map people will enclosure use those Maps instead of where you would use objects so we separate identity and value to obtain0:19:27
the value you have to dereference the values can never change so you can never get an inconsistent value if you've got the value you're safe and you can share0:19:37
it easily between threads sure you can this is the real if you ever at Henry0:19:47
Baker's paper paper on equality a closure implements ygal he called it where equality flows through until it hits the modifiable thing right and then two modifiable things are only equal if0:19:57
they are the same out of file thing so so you could have a member of a structure that was the reference but I0:20:08
don't recommend it the one time you do that if you try to sort of control your concurrency maybe lighten up your contention by0:20:17
having more references yes I'm gonna this is still an abstract description0:20:27
I'm going to show you the three variants of these cells so we haven't seen how to use this yet this is the concept so the concept is we have a reference to a0:20:36
thing what does it mean to edit something then okay we're gonna have this immutable thing we just saw a bunch of cool functions I can take one immutable thing and make a new one with some changed value so that's what's0:20:45
going to happen a new value is a function of the old value they may share some structure it's all immutable it's all fine so thread-safe no problem never impedes the readers if0:20:55
you're reading this well if somebody's creating that again no problem and then we edit okay0:21:04
the trick is this swap to say this now points to that that's under the system control there are semantics for that in all cases no munging about with no with0:21:16
no control so it's always coordinated there's three semantics I may get to show you to the next dereference we'll see the new value any consumer of either0:21:25
value is unaffected by this edit so I've been talking about references generically that box there are three0:21:34
kinds they're called refs agents and bars refs allow coordinated change and it's a transactional system agents are0:21:45
asynchronous autonomous change you can think a little bit about actor models but the difference between closures agents and actors is you do not require0:21:54
message passing to read values from agents bars are thread-local so in all0:22:04
cases you have safe way to change things whoo all right wrestling transactions closure has an SDM it's it's my own0:22:13
homemade STM I have big differences with the design of a lot of STM's because I think they're designed to allow you to keep doing what you've always done which is work with mutable objects and they're0:22:22
just going to turn every mutable field into a tea for whatever I don't actually believe that's gonna work but good luck0:22:31
so it's an STM it has what you your usual expectations of transactions in fact it very much maps to what you think of when you think about a database0:22:40
transaction I have a atomicity isolation I also have consistency you can hook up a consistency function to any of these refs and it will be run before the state will change you don't pass doesn't0:22:50
change I don't have durability because this is an in-memory system right and then it's basically everything happens or nothing does right your usual expectations the big difference with an0:23:00
STM is that transactions are speculative they may get run more than once okay they're going to happen they may get to a point they may see somebody else one they'll try again so you have to avoid0:23:12
side effects I actually can't help you with that because you can call Java and I can't tell if that's going to be ok or not very easy to use all you do is to0:23:23
surround your code with do sync any number of changes to references either directly in that body or in any function that calls will participate in the same transaction if those are in turn in0:23:32
their own transactions they join the outer transaction I use multi-version concurrency control which is also a unique part of closures STM it means0:23:42
that readers see a consistent snapshot view of the world readers don't have to get restarted to the writers ever and0:23:51
there's no read tracking so this is what it looks like to use it will define food to be you're now a reference to this map0:24:03
ok we can dereference foo we see what's in there we can grab the value of foo and make a new value we haven't really changed foo we're just playing with its value foo is unchanged this function is0:24:15
one of the transactional functions those alter commute and Sat commutes takes a function it takes a reference a function0:24:24
in any arguments and we'll run that inside the transaction except I insert transaction so not ok again the system0:24:33
is helping you tried to change something outside of a transaction that's not going to happen rat that same thing in a do sink and it works I only have five minutes left so0:24:44
I'm not gonna have too much time to talk about this agents do independent state it's sort of a message system where you say send the things you're going to send are going to be functions they'll make0:24:54
the change in a thread pool at some point in the future there's no way to coordinate more than one of these changes at the same time they're not the same as actors what's0:25:04
cool about it is it looks very similar who is it as an agent with that value we can dereference it the same way we can send foo a function with arguments it0:25:14
will change at some point in the future if we immediately went to read it maybe we wouldn't we wouldn't see that change if we wait this basically puts a message in its queue and waits for it to be0:25:23
consumed now we know this will have happened we'll see that if nobody else is also talking to fill the job0:25:32
integration closure is very good there's a lot of unification closure strings are Java strings the numbers other numbers the collections all implement collection0:25:41
my functions and closures are runnable and callable passing to executives that all works perfectly well everything I've shown you all the core abstractions have0:25:50
interfaces so they're very easy to consume from Java it's also easy to extend closure from Java or closure as I0:26:00
said before the sequence library works on everything you can implement and extend Java interfaces and classes in closure and I have primitives support0:26:10
infiltrate in a second from an implementation standpoint I compile dynamically to by code in memory I use Azzam it's great I don't do any ahead of0:26:20
time compilation right now every function turns into a class one interesting thing I think about closure is that those function objects actually have a overloaded arity so you can pass0:26:32
a function object that has multiple methods with different era T's so it doesn't do any already munging together and it can be very fast you can pass a0:26:42
function that might be called with two arguments or three and they'll both be as fast as possible all my signatures take and return objects what's cool is my very attics are built on the0:26:51
sequences so you can write a function and pass an infinite set of things and as long as it doesn't try to use them all it will work calls all the function0:27:01
calls are straightforward method calls so there's no type system I don't have any thunks I'd open any extra arguments in it's very easy to call from Java you0:27:12
can call Java there's only two things are gonna happen it'll have to be a reflective call or it'll be a direct call right there's no wrappers caches thunks none of that stuff0:27:21
I support tie pins with the very slightest bit of type int and inference I can make all of your Java calls direct calls so there's no need for reflection0:27:31
and I personally don't need invokedynamic because of this it's amazing how few hints you need I do support locals that are primitives0:27:43
so I make them into primitives then I inline the math calls to static methods as James said before that's as far as you need to go I make a static method0:27:53
called hotspot turns it into primitive math this is the code enclosure this is the speed it's actually really pretty good but you say int int and boom that's0:28:05
exactly the same amount of time it takes Java it's exactly the same bytecode after hot spots finished so in your inner loops if you need to do fast math I support primitives and arrays the0:28:15
primitives it's exactly the same speed as Java my esteem implementation is very interesting I'd be happy to talk to0:28:25
anybody about it who's interested in SDM the two things I would ask for if we're asking for things I'd really love telco0:28:34
optimizations it's a big deal it's a big criticism or closure being on the JVM to JVM has no tail calls I've worked around0:28:43
it considerably I think between loop recur and the lazy sequences for people that bother to try closure they find the pain of not having Tuggle's it's not that great I still want it and as0:28:53
everyone has said so far I would love tagged fix thumbs or anything that makes working with box math fast because I want to have to have people make those0:29:03
declarations so to conclude I've been very happy I love the JVM I think it's been a great platform because I directly targeted it0:29:12
and because I targeted one five I'm not sitting needing anything I work really well in one five closure is really fast there's ten times as much to closure as0:29:22
I've shown you but it does fill a nice niche there is no dynamic functional language on the JVM now there is its closure and I've only been out for a0:29:32
little bit less than a year and uptake has been rapid extremely rapid so as0:29:42
fast as I can go [Applause]0:00:00
Persistent Data Structures and Managed References - Rich Hickey
0:00:00
how many people program in a functional programming language0:00:09
okay so halfway preaching to the converted and not in a functional programming language a non-functional program so stole out of that I think0:00:21
this will be useful to both audiences in particular if you're not in a functional programming language in fact if you're not in Erlang which I think has a complete story for how they do state all0:00:32
the other functional programming languages have you know two aspects they have this is the functional part and then and then you know Haskell has this0:00:42
beautiful side where the type system keeps this part pure and then there's the other part which is kind of imperative do this do that and then they0:00:51
have a bunch of constructs to make provide facilities for when you need state on that side similarly there are a lot of hybrid functional languages like0:01:00
Scala and f-sharp where I think there are questions to be asked about okay here's the pure part what's the story about the other part so what I want to0:01:11
do today is to talk about functions and processes and to distinguish the two in fact the core concept in this talk is to try0:01:20
to parse out what we mean by identity state and values try to separate those concepts and see how programming with0:01:32
values while a really important part of the functional part of your program ends up being a critical part of the non functional part of your program the part that actually has to manage state and0:01:43
behave as if things are changing and there are two components to that one is how do you represent composite objects as values a lot of people who are new to0:01:52
functional programming wonder about the efficiency and representation issues there and I'll talk about that and finally I'll talk about one approach to0:02:02
dealing with state and change in a program the one that closure uses which is compatible with a little bit of philosophy I'm going to start with I'm0:02:12
not really going to talk about closure very much how many people were at my talk yesterday okay people who weren't know something about closure okay this is not really a0:02:24
closure specific talk there'll be some code later and it shouldn't be too threatening just going to summarize quickly with this one slide what closure is about is a dynamic programming0:02:34
language is dynamically typed it's functional in particular its functional in emphasizing immutability not just in you know supporting higher-order0:02:43
functions all the data types and closure are immutable it supports concurrency in that it's a0:02:52
two-part story one is you have to have good support for immutability and pure functions the other part is you have to have a story for when multiple things are happening at a time and you're going0:03:01
to have some perceptible change and closure does in fact that I think it's an important part of a language that you know purports to be functional that have a story about the non functional parts0:03:12
closures not particularly object oriented it may be clear after listening to this talk why not because I think as currently implemented a lot of object technologies have big problems when they0:03:23
face concurrency and functional programming as I said from a conceptual standpoint nothing about this is really a closure specific so what do we mean by0:03:32
functions I think that there's a really easy way so I will function some that you call and that's not what we're talking about here we're talking about a very precise notion of a function which0:03:41
is something that you call that takes values as arguments and produces the value as a return when it's given the same arguments always proves the same0:03:50
value it doesn't depend on the outside world it doesn't affect the rest of the world so many methods and your classes are not functions by this definition but0:03:59
in particular two I want to highlight the fact that functions pure functions have no notion of time time is going to be a critical notion through this talk0:04:11
so what is functional programming there are lots of answers to this question and I think people that are into type systems will claim a stronger argument0:04:21
for what constitutes functional programming but I'm going to limit the definition here to you know programming that emphasizes program with functions those we want to try to write as much of0:04:31
your program you can with pure functions when you do that you get a ton of benefits they've been talked about in other talks it's not really the focus of this talk other than to say even without concurrency0:04:41
your program will be easy too easier to understand easier to reason about easier to test more modular and and so forth that all falls out of programming with0:04:51
functions to this graded and extent as possible on the other hand when you step back and look at your entire program very few programs on the whole our0:05:00
functions you know that take a single input think about it and produce a single output maybe some compilers or theorem provers work that way0:05:09
but most real-world programs that I've worked on and I think most real-world programs in the real world don't work that way in particular well even if you0:05:20
claimed your program was completely functional if it's going to produce any output it's not because otherwise we're just warm up the machine but even if0:05:30
it's mostly functional there's still observable effects of a purely functional program running right it's running on a computer as soon as it's running on a computer it's not math anymore right it's a program running on0:05:41
a computer it's consuming memory it's consuming clock cycles it's observably using doing something over time so all programs do things over time but most0:05:55
real programs as I say actually have observable behavior that is not just the fact that they're running on a computer but that they're doing things they're interacting with the outside world0:06:04
they're talking over sockets they're putting stuff on the screen they're putting things in around the database in particular though we'll use one critical measure about how to define state which0:06:15
is if you ask the same question twice and you get different answers then there's state I don't care where you put it you put it in a process you put it in an agent and add them you know in a0:06:25
variable doesn't matter in a database if you answer ask the same question twice and get different answers at different times you have state so again the word time just came up again there so I think0:06:38
most programs are processes which means we need to talk about the part of your program that can't be purely functional the part that's going to have to produce a different answer at different0:06:47
times how do you do that and not make a complete mess out of what you've created with the shiny pure part in particular0:06:57
though I want to highlight the fact that this talk is strictly about the notion of state and time in a local context I'm talking about in the same process there0:07:07
are a completely different set of requirements and characteristics of distributed programs where you cannot do the same things that you can do in the0:07:16
same process so I'm talking only about same process concurrency and state so I0:07:25
want to be a little bit more precise about what I mean when I say identity state and value and these kinds of things in particular I want to talk about state and I'll talk about it twice0:07:34
one is just a generic statement state is a value of an identity at a time maybe none of that makes sense maybe it sounds0:07:44
like a variable from a traditional program which right because I think if you ask somebody who's using traditional programming language do you have state you're like yeah I have some variables0:07:53
and I change them and that is not a good sound definition of what constitutes state so it's our variable state do they0:08:04
do this job do they manage the value of an identity at over time we could have a variable I we could set it to zero we could set it to 42 we can assign one0:08:14
variable to another right is J 42 that depends in a sequential program probably0:08:24
kind of sort of in a program that had threads what could go wrong well I didn't say what order these things happened in right or what threads they0:08:34
happened in for instance if you said J equals I in a separate thread what bad thing could have happened to you would it see 42 necessarily no definitely not0:08:45
because that memory may not have been flushed through to the other threads cache okay it's not volatile necessarily I what else could happen it's bad0:08:56
maybe eyes along maybe setting along it's not atomic in your programming language bad so variables are not going0:09:05
to be good enough to do the job of managing state all right they're predicated on a single threat of control they actually don't work at all otherwise they're horribly broken by0:09:15
concurrency the whole notion of a business this piece of memory is does not work and our programs are built substantially on this whether it's a variable you know sitting on the stack0:09:24
or fields in your object same problems ok pieces of memory or insufficient abstractions so we have the problem the0:09:33
knob up nuh-nuh animus city of long writes ok that's a problem in a lot of languages there's just not a time I could get half of a number if you look at it from another threat right visibility and memory0:09:44
fences have to be accounted for once you have multiple threads of control on a true concurrent box if you have a you know an object and it has a bunch of0:09:55
these things collected together the constitutive state now you have the problem of composing you know a composite operation because making it into another valid state requires touching you know several of these these0:10:05
variable things which now makes you impose locks or some sort of synchronization to say stay away from me so I can pretend there's only one thread0:10:14
of control because that's what my language thought when they wrote it or the language they copied you know thought when they wrote it all of these0:10:25
things are examples of the same problem we're having to work around the lack of a model for time ok because there's no point to have having variables if you0:10:34
don't have time and just think about that for a second if there's no time notion why would you need a variable if you can't go back to it later and see0:10:43
something different and how is it a variable so if we want to be clearer about time which we're not going to be0:10:52
in a non physics lecture we're just going to say some what what are some things you think of when you think of time you think of things being before or after other things you think of0:11:01
something happening later you think of something happening at the same time two things happening at the same time you think of something happening right now which is sort of a self relative respect0:11:12
the time but all these concepts are important in that they're inherently relative right when you think about time you don't it's not a lot about time0:11:22
that's hours or you know this particular moments you know with a name on it that's most of our notions of time have to do with relative time the ordering0:11:31
between two discrete things what do we mean by values again here's a here's an area where there's just so much0:11:40
ambiguity and loose thinking that we can't write correct programs and so we sort of nail this down so the core characteristic of a value is that it's immutable right some values are obvious0:11:51
you know numbers we all are comfortable with that concept it's a value but I will contend that until you start thinking about composites of these0:12:00
things like numbers as values you're doomed in the future you may not be doomed right now but at some point it's going0:12:09
to be a problem for your programs so what went wrong okay we all think 42 is you know indivisible of course we saw if0:12:18
we stored it along it may not be depending on our language but but with the idea of 42 we consider to be an atomic concept but we have a big problem in the way we think about composite0:12:28
objects some of that falls out of our languages I think you know we have date libraries where you can set the month is a crazy concept it does know there's0:12:39
this state and there's that date there's not setting the month of a date and it's another date I think that's there right away you have that problem I set the month of a date and it's0:12:48
another date if it's another date you have two dates you don't have one set able date and our class libraries have destroyed our brains in this area0:12:58
also the default behavior of our languages you create a new class in most languages and everything is variable and instantly you have this stateful mess that maybe you have to clean up with a0:13:07
lot of discipline on your part so I'm going to contend that dates sets Maps everything is a value and should be0:13:16
treated like a value and you should separate your concept of value from your concept of change so one more concept0:13:25
in the philosophy portion of the talk which is the concept of identity okay this is probably the most nebulous of these things but it's it's an important0:13:34
thing what happens in the real world what happens in the real world when we talk about you know today or mom or you know Joe Armstrong is that a a single0:13:45
unchanging thing or one way to think about it is we have a logical entity that we associate with different values0:13:56
over time okay in other words at any particular moment everything is frozen okay in the next moment we look we see0:14:06
something different is that the same thing well if the same if some force is acted on this thing to produce that next thing I consider it to be the same thing you know otherwise they're unrelated0:14:19
things two things can pass through the same space they're not the same thing because they're in the same space so a set of values over time where the values are causally related is something we0:14:28
need to name these are different values right they may be in different spaces I could walk over here I'm still rich so0:14:37
what's happening what's happening is uneasy to understand if you have three notions right there's a state I'm standing right here there's a state I'm0:14:46
standing over here right there both values you know if you could stop time for a second nothing about me would be changing and it's me because you know0:14:56
I'm using my legs to move myself over here and I'm still talking and you see a set of causal connections between me being here and being there so you say0:15:05
that's all register it's not two people you know doing this identities are not the same things as names I just want to make that clear okay I have a mother you0:15:15
now have that concept in your head this entity you know this identity but I call her mom and you would call her mrs. Hickey I hope these identities can be0:15:25
composite we can talk about the New York Yankees or Americans right no problem those are sets but there are also identities they change over time but at0:15:36
any particular point in time they have a value these are the guys who run the Yankees right now any program that's a process needs to have some mechanism for0:15:46
identity okay this all goes together so I'll go back and talk about state we have some terms that hopefully mean0:15:55
something right now we can say a state is a value of an identity at a time hopefully that makes sense right the identity is the logical thing it's not necessarily place it's not a piece of0:16:05
memory okay I'm value is something that never changes okay and time is something that's relative now it's easy to see I0:16:15
think why we can't use variables for state in particular that variable and they may not refer to something that's immutable it's already earned that's a problem if you refer if you make a0:16:24
variable refer to a variable okay with your building on sand okay the key concept is variable or whatever we're going to do to manage time has got to0:16:35
refer to values sets of variables as we traditionally have them can never constitute a value because they're not0:16:44
atomically composite okay because we're saying a value of something that's immutable if you could change the parts independently then it's not immutable0:16:53
right because there's going to be a moment when one part is halfway there and another part is not there that's not a valid value now you're something's happening in the middle and more0:17:03
globally you can say about variables their problem is they have no state transition management okay that's the management of time a coordination model0:17:12
for time how do you go from I'm in this state now I'm in that state both states being immutable values so this is the0:17:22
sum summary of the philosophy portion a key concept I think is things don't change in place we think that they do0:17:31
but they don't the way you can see that this is the case is to incorporate time as a dimension okay what time is it dimension once you have XYZ time guess0:17:43
what that's over here if something is happening here this is no different okay things do not change in place time proceeds0:17:53
functions of the past create the future okay but both things are values there are a couple of aspects I think to the design of the things are going to show0:18:03
you that I think are important when you try to model time in the local context these are things I don't want to give up these are things I know I can achieve by0:18:12
brute force in Java and I can't sell my language if I can't achieve them in closure for instance which is co-located0:18:21
entities can see each other without cooperation okay there's a lot of messaging models that require cooperation you know if I want to see0:18:30
what you're about I have to ask you a question you have to be ready to be asked that question you have to be willing to answer my question but that's not really the way things are when0:18:40
you're call located now I know what's happening in the next building but I can see all of you and I could certainly look at the back of your head without asking you permission the other thing I0:18:49
think is really important in a local context it's it really should be written off as it impossible in the distributed context is you can do things in a coordinated manner with co-located0:18:59
entities in the same process you can say let's all work together and do this alright now as soon as you're distributed you can't do that so the models are going to show you support0:19:09
visibility of co-located entities and coordination so let's take a little example a little race Walker pal0:19:18
detector ok race walkers they have to walk they can't run they have to walk step step step heel toe and then they can't both feet off the ground at the same time it's a foul and then you get0:19:29
you got kicked out of the race so how do we do this while we go and get the left foot position we see it's off the ground we go we get the right foot position we stay off it's off the ground so they're0:19:38
running right yeah it sounds funny but you I mean everybody writes programs that do exactly this all the time0:19:47
exactly this and you wonder well I mean why white why didn't it work okay we can't work that way okay we can't have0:19:58
time and values all munched together where things are changing while we're trying to look at them that doesn't work we can't make decisions we don't make0:20:07
decisions as human beings this way snapshots and the ability to consider something as a value at a point in time are critical to perception and decision-making and there is critical in0:20:17
programs as they are to us as human beings if you look at our sensory systems they're completely oriented on creating momentary glimpses of a world0:20:28
that would otherwise just perceive to be you're completely fungible now how do we achieve this you know programmatically0:20:37
well one of the things I think we have to advocate if we want to write programs that can work on multiple cores and benefit from being on multiple courses we can't stop the race we can't stop the0:20:48
runner we can say well could you just hang out for a second I wanna see if you're if you're if you're running also we can't expect the runner to cooperate and say could you just tell me if you're0:20:58
running but if we could consider the runner to be a value like this guy on the right here it's kind of nice that we can look at him as if it was a value0:21:07
there's a point in time that was captured by this photograph right there's a single value I don't have to independently look at the left and right while time proceeds right I've got I've0:21:16
got a value in hand it was captured at a point in time the race kept going but I can see that guys got a foul on the right he's got both his feet off the ground that's easy that's the kind of0:21:26
easiness you want to have in a logic of your applications you want to be working with values you do not want to be working with things that are running away from you as you're trying to0:21:35
examine them so it's not a problem to do this work if we can get the runner's value in a similar way we don't want to0:21:44
stop people from conducting sales so we can give them bonuses or do sales reports we need to move to a world in which time can proceed and we can do our0:21:53
logic and we don't need to stop everybody so we can do our logic the two things have to be independent so how0:22:02
does it work well the first thing is we have to program with values we have to use values to represent not just numbers and even small things like dates but pretty0:22:11
much everything collection sets things that you would have modeled those classes should be values I'm not saying that there couldn't be an object-oriented0:22:20
system that worked this way I don't know of one that does but you should start looking at your entire object as if it was a value should never be in pieces that you could twiddle independently you0:22:29
want a new state of that object you make a new entire new value so then what's the problem with time it becomes a much0:22:38
smaller problem all we need to do is get some language constructs or some way to manage the succession of values right0:22:47
and identity is going to take on a succession of values over time we just need a way to model that right because we have pure functions we know how to create you know new values from old0:22:56
values we only need to Martin to model the time coordination problem what's nice about this is when you separate the two things when you haven't unified0:23:05
values with pieces of memory you end up with multiple options for the time semantics you have a bunch of different0:23:14
ways to look at it there's message passing and there's transactional but because it's now a separate problem you can take different options you can even have multiple options in the same program so I'm going to say that it's a0:23:25
two-pronged approach one part is programmed with values the other part is in this in this example that I'm going to be talking about today enclosures example is a concept called managed0:23:35
references which you can think of as they're kind of like variables except it fixes all the problems with variables in other words they're variables that have coordination semantics so they're pretty0:23:44
easy to understand they're just variables that aren't broken so there are two parts we're going to talk now about the values and0:23:54
one of the things that people cringe at initially if they have a new functional programming languages before is that sounds expensive you know if I have to copy the whole runner every time he moves his foot on0:24:04
you know this is no way I'm going to do this and in particular when you start talking about collections and things like that people get extremely paranoid because what they know are sort of the very bad collection classes libraries0:24:15
they have which either have no capabilities in this case or some very primitive things you know sometimes there are copy-on-write collection so every time you writes what the entire thing gets copied but there is a0:24:24
technology which is not complicated and has a fancy name it's called the persistent data structure that has nothing to do with databases or disk0:24:33
but it's a way to efficiently represent a composite object as a value as an immutable thing and to make changes to0:24:42
that in an expensive way so change for one of these persistent data structures is really just stealing quotes they don't change it's a function that0:24:51
takes an existing instance of the collection or composite and returns another one that has the change in enacted but there's a very particular0:25:01
meaning to persistent data structures which is that in order to make these changes the data structure and the change operation have to meet the performance guarantees you expect from0:25:11
the collection so if it's a big old log in collection or collection that has you know constant time access or near constant time access those behavioral0:25:21
characteristics have to be met by this changing operation it means you can't conduct a change on something that you expect to have login behavior and copy the entire thing because copying the0:25:31
entire thing is is linear behavior right so that's the critical thing the other critical aspect of persistent data structures is sometimes you'll see libraries try to cheat okay and they'll0:25:42
make the very most recent version this good immutable value but on the way they ruin the old version okay that also is0:25:51
not persistent that's another key aspect of the word persistent in a persistent data structure when you make the new version the old version is perfectly fine it's immutable its intact it has0:26:01
the same performance guarantees it's not decaying as you produce new values every functional programming language you know tries to cheat this side and eventually says forget this we're going all0:26:11
immutable and we're going to pay whatever the performance costs are burkas the logical costs of having old versions decay or have some bizarre behavior either from a multi-threading0:26:21
perspective or performance perspective is too high so what I'm going to show you our legitimate persistent collections where the old values have the same performance characteristics and0:26:32
the particular example I'm going to show you is the one that's in closure it's derived from some work that Phil Bagwell did on these ideal he called them ideal0:26:42
hash trees and they're bitmapped hash trees that have really good performance his versions were not persistent and so what0:26:51
I did for closure was I made them persistent the secret to all persistent data structures is that they are trees there you go now you know there are lots0:27:05
of different recipes that I think people are very familiar with you know B trees and red black trees and maybe you know or Lang uses some generalized Ballas0:27:14
trees I think which are interesting and there's those trees that use randomization techniques for balancing and and other things these are different in particular they're different because0:27:24
they are their trees with the I some people call them tries trees the the idea behind that is that you're not going to have a fixed path down to a0:27:35
leaf you're going to use only as much of a path as you need to produce a unique leaf position you usually see these things in like string search things or0:27:44
maybe I think they're also using like Internet routing tables and stuff like that but here the model is very simple we want at least I wanted for closure something equivalent to a hash table I0:27:54
know I can't sell closure to Java programmers if it doesn't have something equivalent to a hash table they don't want to hear about a red-black tree they've know that they know it's okay0:28:03
but it's not as good as the hash tables they're used to they need something faster than that and these are the way they the way they work is that you hash0:28:12
the value you want to put into the collection you end up with a 32-bit hash you're going to use the first five bits of that hash to see if there's a unique0:28:21
position in the first layer of your tree so effectively what's happening is you have a 32 way branching going on in this tree in addition there's some fancy bit0:28:32
twiddling going on in each node so that those nodes are sparse they're not fully populated so you're not wasting the space of of not fully populated nodes0:28:41
and abusing a combination of population you know bit pop and some algorithms you copy out of what's that hackers factors0:28:52
the light by that or you can just use closures in fact closures the vectors which is the same kind of technology it's been ported to factor0:29:02
and Scala already which is fine by me so if it's unique in the first five bits we're done we put it in the first level0:29:11
of the tree if it's not we're going to look at the next five bits and walk down one more level to the tree until we find some unique position and that we're done we're gonna put the put the value there0:29:20
the key thing about this is how deep can this tree get this one's the root so0:29:30
down one two three four five six if you had whatever you know four billion things it so if branch is extremely fast0:29:39
and you know you can get a million items in depth 3 it's very very shallow so the combination of it being very shallow and using this bit twiddling to walk through0:29:50
the sparsely populated nodes in the intermediate levels makes it really fast so that's the representation now we only0:29:59
need to talk about how do we make a changed version efficiently and the key there as is true for all of these things is structural sharing all functional0:30:09
data structures are essentially recursively defined structurally recursively defined which means that you can make a new version that shares0:30:18
substantial structure with the version you just had and that's the key to making efficient copies you're not copying everything you're copying a very little bit and I'll show you in a0:30:27
picture in a second how a little bit you use since everything is immutable sharing structure is no problem nothing is going to change about the structure0:30:36
that you're sharing which means it's safe for multi-threaded use it's safe for iteration you get none of this you know mutate it while iterating nonsense0:30:46
so how do we share structure we use some technology called path copying again this is true for all tree data structures they all work exactly the same way which is if we ignore the0:30:57
right-hand path here that's the tree I showed you before it has 15 leaves ok we want to add one under that red outlined0:31:07
purple guy at the bottom we I want to add a new node and a 16th guy so what needs to happen is we need to make a copy of that note0:31:16
obviously because we're going to be giving him a new child a copy of his parent and finally the route this copy0:31:26
gets one additional child and the rest of the structure of the old version was shared so I said level three levels could hold a million items right 32 320:31:38
times 36 I get that right that's 32,000 a lot well 3 levels down right 3 levels down from 1 if you count the route it's0:31:47
4 levels what however it populated this last level was making a new note here it's only ever gonna copy 4 items okay how's0:31:57
that old tree looking good still fine we didn't touch it and this is the path we need to copy to make the new one which0:32:06
looks like a new tree with this extra item if we're no longer referring to that route it will get garbage collected as will the things that we referring to0:32:16
that are no longer referenced it's sort of it's kind of basic but I wanted to show it because a lot of people just are not aware that this is a possible thing this is the kind of data structure I0:32:25
think you should be using all the time unless you have some emergency reason and that's why closure works this way all the data structures work like this0:32:34
by default you have to you know go through extraordinary efforts to pick something else ok so that's a way to efficiently represent composite objects0:32:43
as values we've got one part of the problem solved now we need to talk about coordination methods the conventional way is not really a method conventional0:32:52
ways it's your problem ok we saw in the Scalla talk there was a bar it didn't have volatile semantics but it happened0:33:01
to be the case that the actor's library in Java conducts some synchronization thing which causes it happens before happens after memory fence effect in0:33:10
order to make sure the contents in that bar was valid in another threat you know in your own program you're gonna have that's going to be your worry okay it's0:33:19
nice that the actor's library takes care of bars in actors but in bar in your program otherwise are your problem and typically if we're trying to0:33:28
do composite objects we have to use locks and everybody knows the problems with locks I think everybody know the problem with locks every know the pain of locks ok locks like they they experts0:33:40
can build programs that work with locks but most people don't have the time or energy to do that well and maintaining it is really really difficult it's extremely difficult0:33:50
so in closure we're going to just do is just add a level of indirection so directly referring to memory those variables we're going to use in direction and then we're going to add concurrency semantics to these0:34:00
references if you watch me talk yesterday I said that but I'll show you some more details today so this quickly is the picture of the current state of the world and a lot of object-oriented0:34:09
languages you have a lot of references to the same chunk of memory ok basically it's a free-for-all they don't know that they're going to see consistent object that all the parts are related to each0:34:19
other and no one is twiddling with anything unless they can somehow stop the world but the core problem here now that we have lingo for it is this unifies identity and value right the0:34:29
only place to put this value for this identity is in the same piece of memory that's a problem we just looked at how to do new values it's great except what0:34:38
do we need to do we need to create some new memory right to represent that new value if we say all values of foo have to end up in the same chunk of memory space we can never do a good job so that0:34:52
has a lot of problems how do you solve this you just use indirection it's the solution to all computer science problems right one level of indirection and now we have options ok because this0:35:04
guy now could be immutable right we've separated the value which is now immutable and the identity which we're going to model with these little boxes0:35:15
values never change right if you want to see the current state of an identity you have to dereference that you have to say give me your state what you get out of that is a value that can't change you0:35:26
can spend all day looking at it just like you can spend all day looking at the photo to try to see if the runner was foul I want to emphasize if you0:35:36
think your object into programming languages encapsulation techniques are a solution to this problem that is not true okay if you have a variable or a field inside your0:35:46
object and you write three methods that can change that field that people can call those methods from different threads you haven't encapsulated anything from a concurrency standpoint okay you've just0:35:57
spread the problem and hid it behind something so I'm going to call those boxes references we have too many overloaded terms I can't think of any0:36:07
new words it's a reference because it's it refers to something else so identities are references that refer to their values but the critical thing is0:36:17
in closure these are the only things that you can mutate unless you drop to Java and use Java stuff of course there's still classes and arrays and all that but if you want to follow the closure model you're gonna have these0:36:26
references they are the only things that can change and what they do is just manage time okay in other words you can atomically move from one value which is0:36:36
immutable to another value which is immutable and each of the reference types provides different semantics for time so what are the characteristics of0:36:46
the semantics one of which is one of them is can other people see these changes I'm making is it shared okay because there's one way to manage time0:36:55
which is the Star Trek alternate universe model where there's a bad Kirk in one universe time line and a good Kirk and another and they'll never meet0:37:05
of course the problem with that is that occasionally they do meet but one way is isolation so we'll see the last model is isolation but in general most of these0:37:16
models are around making changes that other parts of your program can see so sharing the second part is synchronicity and here we mean synchronicity in the0:37:26
sense of now what now means to the caller in other words from a self relative standpoint is the change I'm asking for gonna happen now or at some0:37:36
other time relative to me is it independent and we're gonna call those differences synchronous if it happens now relative to me it's synchronous it doesn't have now relative to me it's0:37:45
asynchronous it's a sentence at some other some point in the future can't say exactly when and the final characteristic of these references where you know again you get0:37:55
different choices and options is whether or not the change is coordinated okay I can be an independent runner and run all by myself and I'm completely fine but a0:38:05
lot of times you need to move something from one collection to another collection right you don't ever want to be in both you don't ever want to be in neither okay that requires coordination0:38:15
that's impossible to do with independent autonomous entities you need coordination and it ends up that in the local case you can do coordination0:38:24
distributing coordination like this is you know a fool's errand probably but people keep trying I don't think that0:38:33
there's ever going to be distributed coherent coordinated change but people are already recognizing the fact that you know if you're willing to delay consistency you can sort of have0:38:45
coordination but in the local model it's perfectly possible to get coordinated change otherwise changes autonomous okay I change by myself I don't care what0:38:56
you're doing we and no two of us can do something together so now we have these these four these three characteristics closure has four types of references0:39:05
that have make different choices in these three areas refs are shared people can see the changes they're synchronous they change right now they change in the transaction which0:39:14
means that you can change more than one reference in the same transaction and those changes will be coordinated sort of the hardest problem is that coordinated change problem0:39:23
agents are autonomous they'll feel a lot more like actors in an actor model their shared people can see them they're asynchronous so you ask for a change0:39:33
it's going to happen at some point in the future but you're going to immediately return and they're autonomous there's no coordinating the activities of agents atoms are shared0:39:44
people can see the changes they're synchronous they have right now so that's the difference between them and agents and they're also autonomous you can't change more than one atom in a0:39:53
single unit of work and finally closure has something called bars they isolate changes with the you know good Kirk bad Kirk alternate universe model0:40:03
is only for any identity there's a unique value in every threat so you can't possibly see the changes in different threats I'm not going to talk too much about that that's kind of a0:40:12
special-purpose construct it's arrived from lisps okay so one of the things that's nice about the way these references work is they have a uniform0:40:22
state transition model all of them have different functions that change the state that say move from one state to another state and they use different names because they have different0:40:31
semantics I know what people get all confused about is this happening asynchronously or synchronously or do any a transaction but the model is always the same you're going to take a call one of the changing functions0:40:41
you're going to pass the reference the box and you're gonna say please use this function so you're gonna pass a function maybe with these arguments apply it to the current state of the box and use its0:40:53
return value as the new state okay so the function will be passed the current state under some constraints either atomically within a transaction some way0:41:02
it will be passed the current state it can calculate a new state again it's a pure function you're passing that new state becomes the new value of the reference you can always enclosures0:41:15
references you can always see the current state of a reference by D referencing it in other words that's the local visibility because it's completely free to do and it yields much more0:41:26
efficient programs to be able to do that if you have to ask for permission to see collections every time going to see them it doesn't work in the local context in0:41:36
addition one of the other shared attributes of these things is that they there's no user locking you don't have any locking to do this work and none of these constructs can deadlock so what0:41:47
does it mean to edit something in this new world you're going to have a reference to a value right we can make a new value Alucard on the side we're going to call a function create this new0:41:56
value which we intend to become the new state of foo all right the new value is the function of the old it can share structure we just saw that doing this doesn't impede anybody who's0:42:05
reading foo right they're completely free to keep reading they don't have to stop while we figure out the new version of foo in addition it's not impeded by people0:42:14
reading we don't have to wait for people to stop reading so we start making a new version this is the kind of thing you're going to need for high-throughput concurrency and then0:42:23
going to a new state is just an atomic swapping of this box to look at the new value the new immutable value that's always coordinated there's always rules0:42:33
for how that happens I just showed you the multiple semantics anytime somebody references this after okay more time words after this happens they'll see the0:42:42
new value consumers are unaffected okay if I was looking at the old value I don't get disturbed by this happening I'm just looking at an old value it's like I'm looking at a picture of the0:42:51
runner to see I mean I know the race is over that's okay we need to behave that way if you've been programming for so long as I have that you know it's really0:43:00
hard to break from I own the world and I stop the world the world goes when I say go and I mean we have to just break from that that's that's the future we have to understand that we're going to be going0:43:10
to be working with data that is not necessarily the very latest data that's just the future for us okay so the hard0:43:20
references as I said are there the transactional ones closure has a software transactional memory system I almost hate using this term because people like to criticize STM as if it0:43:29
was one thing there's a whole bunch of different stands they have radically different characteristics closures is radically different from the other ones but they all share some things which is0:43:40
basically a model that feels a lot like a database model you can only change them within a transaction all the changes you make to an entire set of references rafts inside a transaction0:43:49
happen together or none of them happen okay that's atomicity you don't see the effects of any other transactions while you're running they don't see your effects so normal things the one unique0:43:59
thing about SDM transactions is that they're speculative right you may not win somebody else may win and you will automatically retry up to a certain0:44:08
limit which means that your transactions cannot contain side-effects this is the way you do coordination you can't really0:44:18
do coordination without some technique like this you can't build a system on independent entities and and do this this kind of work so in practice what you do you just wrap your code with do0:44:27
sync which just faint means this is a transaction there are two functions altering commute which worked like I described they take a function a reference in a function and some args and say apply this to the0:44:38
reference in the transaction and make the return value of the new state internally closure uses multi-version concurrency control which I also think is a very critical component to doing0:44:47
STM in a way that's going to work in the real world a lot of STM designs or you know you just write your app in the terrible way you were with your object-oriented language you know0:44:56
banging on fields and STM is magically going to make that better I don't believe in that at all closure system is not designed for that kind of work if you make every part of0:45:06
your object a ref it isn't going to work and I'm not going to feel bad for you because I just explained how to do it you make your object of value and atomically switch that value and0:45:15
everything is better but you do have this issue of again like people would criticize STM is universally because most STM's do something called read tracking in0:45:25
order to you know make sure that nothing bad happened while your transaction was going on they track every read that you do in addition to all the rights that you do I also believe that that is not0:45:34
going to work so closure there's no read tracking the way it accomplishes that is with a technique called multi-version concurrency control which is the way Oracle and PostgreSQL work as databases0:45:44
where essentially old values can be kept around in order to provide a snapshot of the world for transactions while other transactions that are writing can0:45:53
continue then ends up being extremely effective but it falls out of this necessity to be using references to0:46:02
values it's got to be cheap for me to keep an old value around for you right I just showed you how it is cheap if you're using persistent data structures all these things go together you don't0:46:12
do all this stuff together you don't have an answer to this problem in my opinion but when you when you do this it's really nice so nbcc STM does not do0:46:21
retracting so what does it look like in practice we defined food to be a ref that's a transactional box to that map0:46:30
okay we can dereference foo and we see what's in there unfortunately the names order changes because their hash maps so they don't guarantee any of water of0:46:40
iteration we can go and manipulate the value sighs Fuu we can say give me the map that's inside Fuu and and associate the0:46:49
a key with Lucy that returns a new value right nothing about that impact of the reference when I took the value out I made another value okay we can do all kinds of calculations completely outside0:47:00
of the transactional system it's still a functional programming language right get the value out and write functional programs so that didn't have any effect on Fuu okay we can go and we can use0:47:10
that commute function which actually says you'll take a reference commute a reference with the function associative map the key a and the value Lucy and0:47:19
that fails because there's the semantics to those rests which is that you can only do this for reps inside a transaction so you get an error if you0:47:30
however you put that same work inside a transaction it succeeds and when the transaction is complete that is the value of foo I don't have a lot of time0:47:41
to talk about the implementation details but again don't think that SCM is one thing if you've read one paper on SCM you know nothing about SCM if you've0:47:50
read all the papers about SCM you know a little bit more than nothing about us to him none of us know anything about SDM this is still a research topic but I do0:47:59
know this this works and it works really well and it makes it easy to write programs that don't use locks I think all the programs that I've written in my career I could have used this anytime I0:48:10
needed coordinator change and it would have been fine people can bang on it and try to push the scalability issues and whatnot from a correctness standpoint this is a godsend however inside unlike0:48:23
some STM's closures SDM is not spinning optimistic it does use locks uses wait notify it does not churn processes will wait for other processes it's got0:48:32
deadlock detection it's got HBase barging this extreme minimum in fact I think what is actually the minimum amount of sharing and the transactional0:48:41
system which is one calves which is that for the timestamps you know people have demonstrated you can have more on one calves continuously with eighty threads0:48:50
and that's about the limit of scalability but when you actually have some work in your transactions it's no problem you know I've run stuff on an as little box with six of course and that Kaz is not going to0:49:00
be the problem as I said there's no Reid tracking it is important that this SEM is designed for coarse-grained orientation it's not one of these snake0:49:10
oil STM so you can do what you were doing you have to do this new thing you have to use references to immutable values then you can use my STM it's not0:49:19
going to make your old programs good and the breeders don't then get impeded by writers and and and vice versa0:49:28
it also supports community I don't really have time to explain right now I do want to show you one other model because it's very different and it's nice and that it's very different yet very much the same we've sort of0:49:38
isolated change from values you can take a completely different approach to time okay so in an agent which is another kind of these reference cells each agent0:49:49
is completely independent they have their own state and it cannot be coordinated with any other state changes through actions which are essentially0:49:58
just ordinary functions that you're going to send to the agent with the function called send or send off that function is going to return immediately so you're gonna send this function and0:50:07
some data say you know at some point in the future apply this function to the current value of the agent with these arguments and make the new return value of the function the new state of the0:50:17
agent that happens asynchronously on a thread from a thread pool only one action per agent happens at a time so0:50:26
agents essentially have sort of an input you know mailbox queue so they also do all their work serially so another promise of the semantics of of an agent0:50:35
again as with the other reference types you can just dereference it and see what's in there if you do successive actions two agents inside the same0:50:44
action they are held in to the action completes so they can see the new state the agents do coordinate with transactions which is kind of nice so0:50:53
one of the problems is you saw no side effects and transaction so you're wondering you know how do I send let somebody know I completed this transaction successfully I need to send them a message or do something0:51:02
side-effect II it ends up that if you send an agent action during a transaction that's held until the transaction commits so if the transaction gets retried those0:51:11
just don't go out until the transaction actually succeeds so that coordination is a really nice feature these two things work together they're not quite0:51:20
actors the difference with an actor model is that's a distributed model you don't have direct access to the state in an actor model because you can't because you can't distribute that since I'm not0:51:30
doing distribution I can let you access the state directly which means it's a suitable place to put something that you actually may need to share a lot without0:51:39
necessarily serializing activity so what does this look like to use I say def food to be an agent referring to a map i dereference it i see the contents of the0:51:49
map I send that reference the same function associate a with Lucy I look at it right away it may not be there yet right some0:52:00
amount of time will pass though I can't promise you what and then it will be different okay this is a different way of thinking about things with people who program in Erlang completely you do0:52:09
amazing things thinking about things this way right things could be asynchronous you cannot keep probing your computer as if it was you know your old Apple and there was only you and your assembly0:52:19
language and you know you were king of the universe things happen at the same time now Adams a very similar story to0:52:29
agents right they're independent you can't coordinate change to Adams there's a different name for the state change function it's called swap again it takes an ordinary function of the old state to0:52:38
the new state and the change happens synchronously now so that's the difference between Adams and agents happens right now this is a model for a compare and swap ok comparing compare0:52:50
and swap or compare and set is a is a primitive that is going to let you look at a piece of compare and set memory and say I wanted to turn into this and it0:52:59
will turn into this only if it's no longer l me if it is still back so you look at it you see it's that you want to turn into this if it's still that inside0:53:08
atomically it'll say ok I'll make it this the problem with calves like by itself is you usually want to read the value do something with it and then put0:53:17
it back and so you get this integral between when you looked at it and when you try to do the casts and of course when you do that and somebody else has done something that can is gonna fail I know what do you do well0:53:28
typically a well-written casting where cast is a suitable data structure will have a little spin loop but you're gonna spin in a value into a cast well Adams0:53:37
do the spinning for you as a result the function may be called more than once again we're in this world where you should be programming with these side effects free functions because they need0:53:47
to be called more than once both in transactions and in atoms so you have to avoid side effects but the value you get out of this is that when you succeed you0:53:56
know the function you applied was applied to the value of the function was passed and the result that got put in had no intervening activity occur on that atom that's a powerful construct0:54:06
you need to have and look at these that looks like the other ones right to find food to be an atom that refers to that map right the reference it is there we0:54:16
swap immediately we get the new value so this is uniform state transition model right that's what refs look like start a transaction commute or alter your raft0:54:27
passing a function and some arguments the results of the functions your new value agents same thing except completely different time semantics it happens asynchronously in a thread pool0:54:36
sometime later you return immediately atoms happen right now but are independent from the others you need all these things to write a real multi-threaded program especially in the0:54:46
local model these are all things that I need to do in my career writing concurrent programs in the local part of the program and I don't think you can do without them so here they are but it's a0:54:56
uniform way to go so in summary immutable values are critical for functional programming but is that they're also critical for state right we0:55:07
cannot really manage time and state without immutable values if you're gonna let two things change time and ends value you're you can't do anything0:55:17
that's reliable persistent data structures let you represent composite objects efficiently immutably once0:55:27
you're able to accept this constraint of immutability on your values you have all these options I mean I'm working on a fifth reference type with different slightly different semantics it's easy0:55:37
to do because of separated time management from value management and finally I think this is pretty easy to use if you've seen some other models this is a lot like0:55:46
variables that work so thank you0:00:00
Keynote The Value of Values - Rich Hickey
0:00:00
thanks very much for having me yes so today's talk is going to be the value of values and I like to start by0:00:14
pulling the room how many people are in IT or an IT related field this is great the key is to start with an easy0:00:24
question after the party so what does that stand for it stands for information technology and0:00:38
one of the themes of this talk is going to be keeping in mind what information means and what we're actually trying to accomplish and looking at the tools and0:00:49
technologies we're using and seeing if they are actually suitable for accomplishing what we're trying to do so I will start with that keyword because0:00:58
the technology part I think is straightforward and we'll look at information and of course we'll start with the definition everybody knows this0:01:08
is my stick right now if you're gonna do a talk you just pick a word and then you go look it up in the dictionary and and you're rolling and it's a it's a it's a0:01:18
cheap trick but it's actually quite useful because there's a lot of the history of human thought sort of boiled down into into language so if we look at0:01:27
the word information it's based on the word inform and inform actually means to convey knowledge via fax to to shape0:01:37
your mind or to shape someone else's mind by communicating facts to them that's what it means to inform and the0:01:47
key word I think in here again which is going to be a theme of this talk is the word facts because we're gonna try to give more precise meaning to that and see if our information technology0:01:57
actually manipulates information because that's what information is information is those facts that are used to inform0:02:06
and not anything else not any of the artifacts we use to to represent it so start again with you know what is the0:02:16
fact no this is not not the dictionary definition so the fact is is a place where information is stored and there's0:02:28
a there's a place for every piece of information and every fact has a set of operations like definitely get and maybe0:02:38
set and although maybe if set is doesn't have the right controls for the fact there might be other kinds of operations0:02:47
that would control that and then that's essential right that operations control how facts can change and and then we0:02:56
want to communicate about facts we all we need to do is convey their locations right how many people are uncomfortable0:03:08
right now I am I can't even keep a straight face with this slide are you kidding me this is not right this is very very wrong if if my partners0:03:20
through Holloway's in the audience we probably almost had a heart attack well well the slide is up this is not what a fact is and yet a lot about that0:03:31
description is similar to what our programs do so let's dig in to the word0:03:40
place and a place means place means a particular portion of space and space is another word that's very interesting is going to come up later the the key word0:03:51
here is particular and portion and the delimiting nature of this the same this other definition has that same characteristic right an area used for a0:04:02
particular purpose you know a specific area and we're really comfortable with the notion of place right because we have two critical places we're0:04:11
constantly manipulating with our programs one is memory and the other is you know the disk and they very much Co0:04:20
align with place right they are places you know there's only so much memory and their particular justice in memory and there's only so much disk space and there's sectors on the disk and these are all places they're subdivisions of the0:04:30
universe you know there's only so much in the universe that's on my harddrive there's only so much of the universe that's in in the memory of my computer0:04:39
so I mean I want to look at what we are calling information systems now because we're building information systems and0:04:48
in memory we're building the matter of mutable objects but mutable objects are actually abstractions of places right0:04:57
there they they don't actually have meaning other than that there are little barricades we've set up in front of memory so that we don't have to directly0:05:07
manipulate memory addresses anymore so have this abstraction that is an object that sort of helps us manipulate that place you know without too much0:05:17
craziness and a key characteristic here is that objects have methods right they have those operations we talked about before that facts really don't have0:05:26
objects critically had them there they're operationally defined and we use them to provide a layer of abstraction0:05:35
over the places that our program uses and the same thing happens in storage right we again have you know tables and0:05:44
documents and records and these these higher-level notions that fundamentally are born of the desire to again abstract0:05:54
away the details of that fact that we're working with the place but the abstraction isn't really a first-class abstraction other than to hide place0:06:03
from our programs actually isn't a sound abstraction above that and one of the ways you can tell that it's a place oriented abstraction is this update the0:06:14
same notion of going to a particular part of the universe and manipulating it so these are what we were building information systems on right now and I think we may have some difficulty seeing0:06:27
how that's correct so I have a new label for this it's called plop and plop is place oriented programming that's what0:06:36
most of us have done for most of our careers and most of us continue to do and it's characterized by a very basic operation which is new information0:06:45
replaces the old information it's that simple if that's happening you're doing place oriented programming it doesn't0:06:54
matter if it's in memory doesn't matter if it's on disk if new information replaces the old you're doing place oriented programming and it doesn't matter if your implementation technology0:07:05
is not actually doing that directly so I don't care if you're using MVCC or an append-only database if the logical result is that0:07:17
new information replaces the old that's an in-place system even if you know for efficiency it depends on the disk if if the if in the end it can only give you0:07:26
the most recent piece of information that is a place oriented system it doesn't matter if it's actually going back to the same disk set sector and there's a very good reason why we were0:07:38
doing place oriented programming decades ago right computers the first computers were really tiny they had incredibly limited memories very very small discs0:07:49
if they had any discs at all and so we had to do place oriented program there was no ways to get a computer to do anything useful unless we took the tiny0:07:59
amount of memory that we had and completely mapped out the role of every place in in in terms of you know we defined our program in terms of the role0:08:09
of every place of memory in helping our program accomplish what was supposed to accomplish guys still gave a talk let in the most recent couple of years where he0:08:19
had this great anecdote about you know the worst program he had ever written and he held up a card showing what it was and then he described the computer on which I ran which had you know 4,0000:08:28
words of memory and how you know there was this map of that memory and this part here was this dispatch table and then there was some code here but sometimes you could cram data into it then there was this other jump table and0:08:39
then some data structures and every every program knew exactly where in memory those those portions were and directly use the addresses and that0:08:50
that's how you had to do it and then we got bigger memories and we said that we don't really want a program with dresses anymore and so we added some stuff but the basis for the way we0:08:59
computed was still that right we're still just trying not to do that but probably not deal with the the hassles of knowing addresses directly you know0:09:08
we use some indirection there the the problem is that those constraints the constraints sky steel and the early pioneers of computing faced before him0:09:20
they're gone you know computers just in the time I've been using them are a million times more capacious in memory0:09:29
and disk then when I started which was after he started but we're still doing place oriented programming I think we0:09:38
definitely need to consider why that is so one reason that always gets brought up right away is that there's this efficiency to manipulating places and0:09:49
and that's definitely true and I'm not opposed to that I've bashed as many bits as as the next person and and I know how0:10:00
much fun that is and I know how fast that can be and and I think that there there still is a role for that and there will always be a role for that and one0:10:10
way to talk about when that's appropriate has to have a notion of this birthing process of a point in time when you're starting to create a new value0:10:20
and we'll talk more precisely about values in a minute and in setting up that value you need to manipulate memory for instance you need to manipulate places that's completely ok0:10:32
I would never advocate languages that didn't let you for instance manipulate the contents of an array because during this process you need to be able to do that in order to write efficient0:10:41
programs but but this birthing process is this is a window that ends and it ends whenever the thing that you've made is going to become visible to any other0:10:53
part of your program at that point it's it's become a fact it's become perceptible and then you have to stop0:11:04
doing place oriented programming because as we'll see it's not a fit for the models we're trying to build so this use of place to0:11:13
create values or the use of place to represent values under the hood is an implementation detail0:11:22
we of course we have to use places where computers have memory and they have disks but what's important is that our program is not about places right it's0:11:34
information technology it's not technology technology right we've taken abstractions of the technology and raised them up to being what the program0:11:43
is about and that's an error there was a reason why we had to do it but we don't anymore so two words that I think are0:11:53
very very important to our memory and records right we have to remember these words had meanings before we started to try to emulate them with tiny computers0:12:03
right we use these words from millennia prior to that and we've we've not only co-opted them but I think that we're0:12:12
starting to believe our own myths about what memory and records are they are what our programs say they are as opposed to what they really are right0:12:22
and real memory is a is a cognitive a cognitive abstraction over how our brains work and some of the0:12:32
characteristics about it that are really interesting are the fact that it's associative okay if your friend gets a new phone number right it doesn't go into your brain and find the phone0:12:42
number neurons for your friend and and overwrite them with the new phone number right that's not how memory works right0:12:51
you get the new phone number it's it's it's some novelty in your brain of combat it accommodates it and and what was there before so it's associative there's some connection between my friend and the phone number and those0:13:01
numbers but it's also open it's not a place your friend's phone number is not a place in your brain and memory is0:13:11
about that activity acquiring your friends phone number when he changes it and then records right record-keeping0:13:20
existed before we had computers records are enduring right people didn't go back to their you know parchments and scrit you know scrub them out when there0:13:29
were new facts and then go back to their stone tablets and you know pave them over with concrete and then re them right they just wrote new pieces of paper and carved new stones they're0:13:40
enduring so we keep them around and they're accreting right if you have new information in the old record-keeping systems you added it to what you had0:13:51
already you didn't go and erase it so these are critical notions we actually do pretend that our systems do this work0:14:01
but to the extent that we're using memory records the way we've come accustomed to when we're not actually so the point of this talk is that values0:14:13
have many advantages over this place oriented programming and I'm going to talk about values in many different ways in order to try to give you a better idea of the many kinds of meanings that0:14:25
can have for programming in particular though I want to I want you to focus not on not only on values in memory and functional programming wherever you0:14:35
think of when somebody says you should program with values but also our use of values in communicating across processes and our use of values inside storage0:14:45
systems because there are many architectural advantages to values that go beyond the parochial notion that a program might have the other point of0:14:55
this talk is you already know this stuff you were all made uncomfortable by that first slide your activities show that0:15:04
you know this stuff the only thing that sort of counteracts the fact that you know this stuff is the fact that you continue to choose some of you technologies that don't implement what0:15:15
you know to be true and there may be many reasons for that but the most important point of this talk is that place itself has no role in an0:15:26
information model it is only in implementation detail if you elevate place to be a first-class thing in your information model it is0:15:35
only an information model it's pretend it's not actually doing its job so let's dig into the word value this is a0:15:44
particularly tricky word and the title of the talk is a little bit tricky the value of value seems to imply two meanings of the word value the first one0:15:55
is relative worth right the value of values is new it's how do you estimate the the worth of something is the notion of value and then we have what's0:16:06
probably the clearest mapping to programming of the word value which is a particular man attune right or numeric value or an amount and you know this is0:16:15
you know 42 this is the one we can really hang on to everybody understands 42 is a value but it ends up that these0:16:24
two definitions are not different when you when you take this third definition definition into account which is precise meaning or significance because what0:16:35
ends up happening is all notions of value are about being able to directly perceive something and compare it to0:16:44
something else and we'll see that that allows us to have a broad notion of value which will not only cover 42 but other things that we encounter so what0:16:55
else might we encounter strings our strings values how many things0:17:04
strings our values I think strings are not values okay it ends up that the answer this question is a question right it depends on your0:17:14
programming language our strings immutable in your programming language if they are then strings are values if they're not but then they're not and how0:17:25
many people work in a language where strings are immutable how do you feel have ever worked in a language where strings are immutable how many people worked in a language where strings are0:17:35
mutable and now they're not in the language they're working now there have to be some people who programmed in C and then Java yeah okay of people who0:17:44
have programming languages that had mutable strings and then ones that didn't how many people want to go back it's immutable strength Wow0:17:53
do you work with other people it's really tricky right because you know we've we've sort of accepted at least0:18:02
some job I mean how people run in Java here okay so in Java we sort of accepted string as a value we've moved on from 42 we said oh no this composite thing that0:18:12
has a bunch of different parts a string could be a value and it and it ends up that it is right if it's immutable it now taps into that definition of value0:18:21
we saw before because by being immutable we can go and take a string value and another string value and say are they the same do they have the same magnitude0:18:31
are they talking about the same thing are they expressing the same specific meaning all those definitions of values apply to something that's not mutable so0:18:41
that relative worth thing kicks in and I don't think anybody whose program was both wants to go back I'm not actually0:18:50
sure I believe you but by and large this is something that we've accepted so so if we want to expand the notion of value up and talk about programming values0:19:00
we're gonna have some characteristics we really care about the first and unconditional one is that they be immutable we're gonna see as things become mutable our ability to do any of0:19:10
the things we say we can do with information and values disappears on the other hand another important characteristic of values is that they0:19:19
don't need methods now I'm not saying the values can't have methods I'm not saying you can't have an object in your programming language that has the role of a value and meets the criteria for0:19:28
values and has methods I'm not saying that that's that's not allowed but the important thing about values is that they don't need to have methods they're not operationally defined if I can0:19:38
convey a value to Deus to you somehow and I've forgotten to give you any code you can use it because semantically the value is accessible and so that's the0:19:47
other critical thing it must be immutable and it must be semantically transparent there can't be any operational interface over a value that tries to encapsulate what it means or0:19:56
your ability to do equality on it okay you might have additional methods you might have you know two upper on a string that's just sort of0:20:05
object-oriented you know goofiness but it's harmless in this case the important thing though is you can't have a value where you know only on Tuesdays by0:20:14
calling this method and that method can you see what it's about there has to be semantically transparent and it's okay again to have abstractions in particular0:20:23
when you start talking about composites and collections as values you'll often have an abstract definition of that but that abstract definition satisfies the0:20:33
other two critical properties it is immutable and it's semantically transparent the abstractions not trying to get in the way of you seeing what it is and seeing all of what it is it may0:20:43
just be you know hiding the storage part so let's go through some of some some properties of values and how they compare to places the number one0:20:52
properties values can be shared and they can be shared freely and that the way you share them is just by aliasing them right because you know that they're0:21:01
immutable if you ever encounter a value you can just start using it and it's funny because people talk about functional programming and you know0:21:10
higher-order functions and all of this stuff and you know concurrency and other advantages but when someone actually goes from not using a functional0:21:19
programming language to using one one of the deepest pleasing benefits they have is this one is the fact that when you program with values you can share0:21:28
pervasively and you never need to think or worry for one fraction of a second right you can't mess anyone else up and they can't mess you up right all values0:21:39
are freely shareable if you've never done it before it will change the way you program forever it really makes a big difference one of the things that also happens especially when your values0:21:50
are implemented with persistent data structures is that incremental change is cheap so it's quite common to say somebody gives you this big thing and you're like I love that big thing except0:22:00
for the first thing I'd like to have that big thing except for the first thing and that ends up being completely straightforward to a do and inexpensive to do so that's really great if we0:22:10
compare that to programming with places what happens defensive copy how many who have heard the term defensive why do we need to defend ourselves from ourselves this is really not a great0:22:20
phrase to be using you know everyday cloning another nasty notion and locks these are all things that are either0:22:29
part of or in the way of sharing when you're programming with places reproducible results are another fantastic benefit of values right0:22:39
because operations on values are stable you do them over and over again they never they never give you a different answer this is really a great benefit when you're doing testing obviously0:22:49
because if you want to say it still works hopefully you have you know code that is reproducible in the first place and you actually spend a fair amount of time0:22:58
with place oriented programming making that sentence true that the test actually when run twice should return the same result where if your program0:23:07
with values that's not even a question debugging is also critically different when your program programs with values especially when your architecture is based around values right so some some0:23:17
customer has a problem in the field right and you have a value oriented program right you can say obtain the value from your database and the quarry0:23:26
you were running and email them to me just the value that was part of the input to the process and just to process the query those two things and I can reproduce here over email versus what0:23:38
how many people have ever tried to set up a database and a running process that emulates a customer failure that is not a party right not fun not fun and that's0:23:51
the problem with places right you had this sort of global state that you have to reproduce in order to debug a field problem that's very very tough another0:24:01
advantage of values is that they're easy to fabricate right anything can can can create a value any programming language can make a value right you may have0:24:12
written it in this and then later you need to you know have somebody who uses different languages drive it to see if it's working right so if we're testing it's really fantastic that you can0:24:21
fabricate inputs to to test programs using any technology you don't have to sort of get the class library that you know has the right classes and it phases related to the hook that you used0:24:30
in your program it's like your program takes data you now can write another program that can produce data to to test it and also for simulation purposes so0:24:40
when you start raising your testing up to the next level and you're trying to trying to drive your program to different kinds of situations right all you need to do is algorithmic generation0:24:51
of data to get a variety of simulation points for your program if your program can only get into a particular state by0:25:01
a series of interactions through objects how are you gonna algorithmically drive that program to different kinds of test cases it's a huge problem it's just a0:25:11
mess whereas if you can just algorithm to generate data you're done and again it goes to this point about places with places you have to emulate an operational interface and that's a ton0:25:21
more work and you also when you want to drive it you have to drive it through the operational interface instead of with data imperative this we love it0:25:32
right and and values are in the way that's a feature that that's not a0:25:41
that's not a negative aspect they just refuse to help you do this and and and I think that once you start using0:25:50
languages that make values the default you feel frustrated initially about this but in the end it's a tremendous it's a0:25:59
tremendous benefit because imperative code is just more complex as as used to it as you maybe it's more complex right and the problem with places is they0:26:10
force you to do this it's the exact opposite right values thwart you and places force you to write imperative imperative ly and therefore in a more0:26:20
complex way starting to lift the game a little bit out of your local view which might be you know I'm a job and I'm doing this I have these interfaces I0:26:30
have this class model and blah blah blah a great thing about values is language independence right if you ever want to pretend you're a polyglot shop you're0:26:39
going to immediately face a challenge right with all your interface driven object driven designs which is you can have them all over in your Java program0:26:48
but then you know your Python program or your JavaScript program it can't it doesn't know how to talk about that stuff has no means of doing it and immediately you're gonna face this0:26:58
pressure to move away from that and towards what towards values is where you're gonna end up they are the tool for polyglot programming they are the tool that gives0:27:07
you this independence in language because places are defined by language constructs you're stuck you're really stuck you don't have a definition0:27:16
independent of your language that you can use as a basis and sure you know you can build proxies you can automatically build you know soap interfaces to your0:27:26
objects and and and remote your objects and generate you know matching objects in different languages but that's just a ton of effort it's not really adding any0:27:35
value so this language independence actually falls out of a bigger property of values which is that they're generic0:27:44
right we can get representations in any language as we said but the other thing is that there are very few values you0:27:54
know in the in the in the general sense right once you start praying with values you don't end up with a lot of specificity right there's enough there's0:28:03
a logical notion of a list there's a logical notion of a map and a logical notion of a set right and strings and numbers or whatever but you can probably exhaust what you need to use in the0:28:12
value space with fewer than 20 of these things whereas how many people can build you know a system with 20 Java classes just 20 no large system right as the0:28:26
system gets larger how many more classes do you need more and more and more and more and more just they keep going on and on and on and that's because operational interfaces are specific0:28:36
right that generates a ton more code and instance it actually is a counter-argument to the promise of object-oriented programming one of the0:28:45
promises one of the promises was reuse right that's the big lie of object-oriented programming every new thing you have to do you're right in your class0:28:54
where is the reuse in that there's none right the other thing is you're sort of breaking away from the job you're trying to do right if you're trying to represent information you need to0:29:04
represent facts you need to have values in order to have the things be comparable right if I have a person question you have a person class in their own namespaces and they have name0:29:14
address and email and name address an email what can we do with those two things nothing even though they're0:29:23
semantically identical they use the same names and they use the same names for the fields they're completely not interoperable even if they all had like public getters like that there's sort of0:29:32
complying with with the accessibility part the specificity specificity that you added killed your reuse and again0:29:44
getting more in the large or looking towards programming in the large values make the best interface this is actually0:29:54
one of the biggest problems I think we have right now is that when we're working in the small right we say we're gonna have this new thing and we start with sort of a monolithic design but0:30:03
within that design that's not monolithic we say oh no we have a subsystem for this and a subsystem for that and a subsystem for this that's all great and then it's like oh you know what I wanna0:30:13
that's getting too big for this box I want to move this out of that box to this other box and when I do that I think this is a different programming language so to make that easier or it's0:30:23
gonna be shipped to another team that works in a different programming language and so we're gonna do this other thing right if you have a value-based interface you can do that move right if you've programmed with0:30:33
data-driven interfaces you can do that move you can port that code right or you can write new code that interoperates in a different language because it's0:30:42
data-driven right another critical thing you can do if you have a value oriented interface is you can in cue it so even if you stay in process a lot of times0:30:51
one of the architectural needs you have is you know what I'm calling this implying that in calling that and I need to buffer I need to do some more management of things or maybe I want to0:31:02
get some more concurrency and play right and therefore I'd like to incubation that that those calls so I want to set0:31:11
up a queue so now I have this this flow and maybe get some pipelining in my program if you've called a specific interface is called a specific interface and called a specific intricate interface and then0:31:21
you want to pipeline that what can you do you're stuck right because you've got to go and build like proxies that look like your objects that then have a queue0:31:30
inside that then spit out on the other end another thing that looks like what it was talking to and then if heaven forbid it was bi-directional you're just totally toast but if you had a0:31:40
data-driven interface like this guy was calling that guy but he's just passing data if you want to stick a queue in the middle of that that's straightforward to do because you can put values on queues0:31:50
so the in contrast if you're doing place oriented programming your stuff is application specific your stuff may be language specific and it may be coupled0:31:59
to your program flow architectural you're dramatically limited and this is a big deal because you desperately need to be able to take your small programs0:32:08
and make them large programs and take your one machine programs and make them and machine programs but if you if you can't start with an M machine program you're not forced into this but the0:32:18
thing is we know this right because when we program in the large we don't pretend we have objects we don't create operational interfaces we don't chat you0:32:27
know we don't use korba anymore we that's dead that lost for good reasons right when we when we actually start out building a more distributed system we0:32:37
program with data all the time we already know how to do this we use data on the wire we just you know restful interfaces everything is different in the large why are we still doing this0:32:46
arcane goofy memory abstraction oriented stuff in the small it doesn't match the large it's not gonna help us make our programs bigger and there's no benefits0:32:55
to it as soon as we look at our programs in the broader sense we don't do this we don't make the same choices in the large we're still making them in the small I0:33:04
think it's just because we're comfortable with our programming languages another key advantage of values is that they aggregate in particular values aggregate to values0:33:15
right so if I have five values I put those five values in a value list that resulting thing of value in particular everything I've0:33:24
said about values accrues to that composite right that composite thing has all the advantages of the value that all the value parts of it have right it's0:33:35
transparent it's transmissible although characteristics are great now contrast that with programming with places right if you have a bunch of objects mutable0:33:44
objects and you combine them into a bigger thing right what properties does it have that you can understand even if you really understood all the sub0:33:53
components what properties is that does the composite have none you have to start from zero again defining the operational interface of the aggregate0:34:02
right even if you had very carefully defined cloning and copying and locking policies for each part right as soon as0:34:13
you combine them together you're toast none of those things work you now no longer have a copying policy no longer have a cloning policy no longer have a locking policy on the aggregate so0:34:22
nothing composes with places that's a big negative so now I'd really like to start broadening the notion of what0:34:31
we're talking about when we're talking about values to outside one process right to talk about them in the large and in the small still mention the others and and talk about a few what0:34:40
I'll call extended value propositions right using values as a mechanism to convey things and to perceive things using values Azzam as a mechanism for0:34:50
memory how values will reduce coordination how they provide location flexibility and finally how they're essential to making programs that0:35:02
support decision-making which is our job in AI T so we have conveyance and conveyance means to send something to somebody else writes this is sending0:35:12
right in the small with values it's really straightforward if I give you any reference to the value I'm done conveying it to you I've conveyed it's0:35:22
extremely cheap and again as we saw before it's worryfree right imagine though that you want to try to do conveyance with with places so you have this mutable0:35:32
object and you put it on a queue and later somebody's going to consume that cue what actually have you communicated to that person nothing you0:35:41
don't know I mean you put it on the queue now but it's just a reference to a thing that could change whatever your intent was in conveying it it's not captured by that mutable thing on a cue0:35:51
see so conveying places is extremely difficult thing to do we waste a ton of time I mean everybody just thinking about places that you know you know I do0:36:00
these things but I spend a huge amount of work trying to do them right you have to try to clone it or something like that if it turned into a value essentially right now look at conveyance0:36:09
in the large again here I think you know we figured this out values rule on the wire we don't really do anything other than values on the wire now right HTTP0:36:18
really all all distributed programming puts values on the wire we don't set up multiple objects with with tiny little interfaces and chat across the wires we0:36:28
just don't do it right people imagine that right when they when they first tried you know objects we were like all the rage and we're like Oh distributed objects because that's all we can think about and so we'll think about it0:36:38
broadly but it's a it was an utter complete total failure right and we're done with it again in the large we understand this so that's sort of the0:36:49
the the wire part and then in the databases we have the same problem right if I give you the primary key of a record in the database if I send that to0:36:58
you over a queue what have I actually communicated to you nothing right because what you're gonna see depends on when you look that thing up just like0:37:09
with objects before putting an object on a cue sending summary the primary key of something if what's behind that stuff is places you actually haven't conveyed anything specific in other words you0:37:19
haven't conveyed information all right perception it's the flip side right I know there's something out in the universe and I want to see it I want to0:37:29
perceive it and and the word perception is an interesting word it really means it's sort of to take in in the entirety of something and it's very important0:37:38
right because you need to take things in the entirety to get that value proposition from before so in the small again it's really straightforward if0:37:48
you're programming with values right if I can reach your value however you passed it to me or it's in the collection that I can see if I can reach it I can see it I can perceive it my0:37:57
part of the program can capture that value because I know it's never gonna change so as soon as I can reach it I've acquired it places it's amazing how0:38:07
difficult this activity is how can you perceive a mutable object that has more than one getter what's the way you do it how do you do0:38:16
it we're all object oriented at one point in time who knows how to do this who could say right now how to do this no one can right yeah you can yeah either Stu is like he cannot I know he0:38:31
can from personal experience no you can't you can't do this because uh because you need this other thing right you need the recipe for doing this and0:38:41
the recipe is something everybody has to make up over and over and over again right the copying recipe the locking recipe the cloning recipe we got to make0:38:50
this stuff up because the thing could change right and we have multiple independent operational interfaces to the parts we can't actually perceive the whole can't do it without help without0:39:00
these recipes and again we know those recipes don't aggregate same thing in the large on the wire right we do not go0:39:10
and chat with an operational interface to a thing and grab its pieces we just imagine doing that imagine if HTTP you know in order to get a web page you had0:39:19
to say get the header get to get the cookies get to this get to get that get the other thing get to get the title get the first segment get this div get that div blah blah blah I forget about the0:39:29
communication overhead you couldn't actually know that the end of all that communication was the page that anybody ever wrote at one point in time that was something that somebody looked at and0:39:38
said yeah that is what I intended that is the value of the page right now because the operational interface is in the way of you perceiving the entirety0:39:47
of the thing and we don't do that right we don't do that on the web yes for the page you get the whole page you get the entire value it's a little bit trickier with databases many databases will give0:39:57
you a the ability to capture the as a value some subsection of what they have in a coherent way but beyond that they either can't or they require a0:40:08
transaction to do that we'll talk about that in a second all right so what about memory right it's very important for our programs and for the users of our0:40:17
programs that our programs remember things at various points in time so what does it mean to remember something in the small again it's there's really no0:40:26
there's just nothing to it right remembering is aliasing if I if I can touch the value I can remember it I can keep a copy of that indefinitely right0:40:35
with places I'm really in trouble I'm back to that copy problem I need to copy it if I want to remember it because I know it's mutable lifetime is gonna take0:40:44
it to different values so I need to be able to copy it if I can in the large the same thing comes about right how many people remember the early days of0:40:53
the web first it was all static pages right that was great you go the page you get the static page you know maybe people updated it whatever right then people had websites that were based0:41:02
around programs they're like oh cool I can generate pages this is awesome right whoo-hoo around that time period of the web when that was first possible ever0:41:12
like said oh I'm gonna I'm researching this thing I'm gonna bookmark all these things I encounter that are interesting and relevant to what I'm doing and then went back to those bookmarks a month0:41:21
later well to find that absolutely none of them pointed to the thing that you were looking at before and and you actually you had remembered nothing eventually we figured this out and we0:41:30
have this conventions about permalinks and things like that but again if you don't have something like that you don't actually have a memory system you only have places out there and it's the same0:41:40
problem happens with the database right if I were to remember something in a database how am I going to do that because people are saying I'm talking about databases and databases that lose track of things and people say well you0:41:49
know we don't you know we only we only add stuff to our database but you're doing it yourself then right you're doing it yourself you're saying I have this place oriented thing I'm not going to use it in that way I'm gonna maintain0:41:59
you know time myself and I'm gonna keep keep new values myself and you can do that right how many people have ever written a system that you know made a0:42:08
new record for every new piece of data and kept timestamps on those records right how many people wrote the now query for that0:42:17
yeah is that fun as not fun how many of you'll try to make that now Cory fast not fun very very difficult so you don't0:42:28
want to do it yourself their reduced coordination is another critical benefit of values right in the small when you program with values there's no there's just there simply is not question about0:42:38
this not it's not a question to answer it's not it's a question that doesn't come up there's no such thing as contention for values right and the problem we saw about places you know0:42:47
exists here the lock policies don't aggregate we have to lock and and we can't combine those those policies in the large this is another big problem0:42:58
right for databases when you have a place oriented database if you want to read consistently you have to read in a transaction you have to go to the0:43:07
database server and hold up the world to some degree in order to see something coherent this it's really a coordination problem it's an architectural problem right it's a it's a throughput problem0:43:18
and it's a scaling problem okay this is a big architectural disadvantage of place oriented programming and I think it really highlights one of the big you0:43:27
know wrongness here in addition even if you think you know all right no I have to do read transactions this is one of things that's most frequently gotten wrong right people just don't understand0:43:36
we committed or Howry committed is combined or how independent reads in a batch file work I mean how many people think the programmers in the shop actually don't know how that stuff works0:43:46
yeah they don't they really don't ok so another key benefit of values is location flexibility right in the small0:43:56
again with values if there's there's actually there's no need for more than one location because aliasing covers every case we've seen so far memory perception conveyance it's all covered0:44:05
by by the fact that you only need one copy on the flip side with place oriented programming this very special miss to that master copy all right for0:44:14
one another value I've got to manipulate that master copy and coordinate with everybody about doing that which means where that is starts to matter to me in the large again we really we do this and0:44:26
we really care about it right we've we've incorporated those things an HTTP protocol and whatnot so that we can do cache and we can say this expires this is this value is stable and0:44:36
therefore you don't need to come to me every time to figure out what it is you can go to this cache over here right you can go to this content distribution0:44:45
network over there we saw already well one of the interesting things about conscious distribution networks is why don't we have CD ends for databases why0:44:56
do they make sense for webpages but not databases that's not making sense to me the other thing we saw is that database0:45:05
interfaces are movable inherently right I don't really care where you are we're gonna communicate data I don't care what language you're implemented in or if you move around or if I have to redirect to0:45:15
get to you or things like that so again I think we understand this in the large except in the data storage but definitely in the communication protocols so now the big point facts the0:45:27
things we said are the source of information our values by all the definitions I've given and have all the benefits we've said they're not places0:45:36
right but don't facts change right did my friend get a new email address didn't that change the fact of his email0:45:45
address no it did not there's now a new fact which is today your friend's email address is this it did not change the fact that yesterday0:45:55
your friend's email address was that they don't change and this goes down to the very core of what fact means a fact0:46:05
means something that happened something that existed that's what fact means it doesn't mean the slot where you keep your friend's email address right we all0:46:15
laughed at that slide earlier but it's really it's true this is what a fact is and the roots of the word fact actually0:46:24
go all the way back to Latin where it was a past participle it said something done factum something that done something that happened so this is0:46:34
really critical if we want to build information systems right because information is based around facts and facts doesn't mean just the most recent facts right we know nah0:46:43
just arrived from facts right we compare facts to each other we combine them we make decisions about that but one of the critical things we do all the time right0:46:53
is compare facts from different time points right imagine if you only knew the present value of every fact that's0:47:04
relevant to you like you only ever knew you only knew the present value of everything what kind of decision-making power would you have compared to knowing something0:47:13
about time it will be dramatically reduced I don't know if a be so the film memento or whatever we're like if you had a limited window of time over which you knew what had happened or if you0:47:22
only knew the present you actually can't make decisions it's like built into our brains that you if your programs are serving humans it is built into our brains to Delta now0:47:32
with before that's how we make decisions so building systems that only keep the most recent values of things is not those are not information systems right0:47:41
so you can't update a fact there's no such thing as updating a fact the fact that's not a place that you can't do that any more than you can change the past0:47:51
so now let's revisit information systems what should they be they should be about facts they should be completely about maintaining manipulating facts they0:48:01
should be about giving our users leverage over facts right helping them make decisions based upon the facts that the system is maintaining on their0:48:10
behalf right that means that our system should be value oriented they should not be place oriented right we happen we0:48:19
have to stop using process constructs for information right I'm not trying to bash objects universally right you can go home and do that but but but the very0:48:30
few places where they're appropriate are more process-oriented places and their use for their use for information is actually an idea bereft0:48:41
of merit there is not one good component of using mutable objects for information it's just wrong so we know this is wrong0:48:52
right because we're decision-makers we all do stuff all the time we know what it takes to support our own decision-making process right and it's0:49:01
information right we build systems rerun shops we have stuff we have to accomplish we're like many businesses programming shop is like a mini business it has stuff it has to accomplish it has0:49:11
successes and goals and objectives we need to compare the present to the past we try to spot trends and rates and things especially in our in our own systems we often need to aggregate0:49:21
things right this is what decision-making is about both in the large and them small and for businesses and for programmers and a lot of this decision-making requires a time0:49:30
component so let's look at program or IT right because we also use computers to support our own decision-making process0:49:40
don't we what kind of what kind of systems do we give ourselves what kind of information systems do we give ourselves well one big one is source0:49:49
control anybody keep their source control in a directory right where you know when somebody has to file they save it into the directory over the one that's their update in place how many0:49:59
people do update in place source control no we don't do that how many people store the stuff in source control with no dates on the edits no timestamps no0:50:11
we keep track of time time people throw away their old source code no we don't do that why would we do0:50:20
that we couldn't make decisions about what was happening in our in our business in our programming business right we would be crippled by that what's another critical thing that we0:50:30
keep track of might we run our programs and we keep track of what our programs do in logs right because we need to look at those lives we need to make decisions0:50:39
about is our program working is it working well is it using memory well is to have good performance if there was a problem what went wrong all this decision-making we need to do as our0:50:48
little programming business write our logs update in place anybody want to use a log system that only keeps track of the last latency or that the last time0:50:59
somebody communicated with a particular endpoint maybe want to do that no no they are not update in place anybody have a log that has no time stamps in it0:51:10
no we keep tons this of course we do how are we supposed to make decisions without timestamps and without keeping track of everything that happened because we want the facts when we need0:51:20
to make decisions right this is our IT right anybody want to have the ability for somebody to go back to an old version of a source file and change the old version in place or change logs in0:51:33
place and it blogs any value in that No right our IT systems are not like this0:51:42
so let's talk about Big Data it's my contention that a certain portion and quite possibly a very large portion of0:51:51
Big Data the hot new topic the big thing is this its business is saying to programmers I like your database better than the one0:52:01
you gave me because your database has everything in it the one that you gave me it only remembers the last thing right I can't track trends I can't see0:52:10
Layton sees I can see where everybody was on the site I can't I can't make the same kinds of decisions that you can I want to mine your logs to get out business critical information because0:52:20
that's the only place that it exists because you're not you're not keeping your own database this way these logs have everything they have time on them they're huge rich sources of decision0:52:31
making power ok they're all filled with stuff we're not putting in a database for some reason and and I think it's0:52:40
actually quite embarrassing I think that IT right now is in a very reactive place here business has discovered the value that was in our logs what our lives were0:52:49
just like for ourselves so we can see if our programs were working but they happen to keep track of where everybody clicked how long things took what you know what the flow was between events0:52:58
and everything that happened including stuff that happened that in the database ended up overwriting old stuff that's all in these logs but really mining logs0:53:09
write flat files we know better than that right we have technology that's better than flat files anybody really happy that their logs are in flat files0:53:19
in the end I mean obviously it's efficient to sort of append on them but in the end you struggle right after that to try to get leverage over that data because we0:53:30
know flat files are not great one of the advantages programmers gave to businesses was the invention of databases the inventions of indexes and trees and these other data structures0:53:39
that really let people leverage information we're not actually putting this critical information into a leverageable place right now and it's0:53:48
commingled with a bunch of crap that's not actually useful to the business like Layton sees and things that were really you know to communicate with us this is me this is mix in the logs of stuff0:53:59
about seeing if the system's working okay and an actual activity against the business right if you could pull out this part and put it in a leverageable store on your businesses will be a lot0:54:09
happier and that's where we're gonna end up you know big data is forcing us to do that but you should all look at the deep reasons why this is happening they have to do with the fact that we've built0:54:18
better information systems for ourselves than we've delivered to to our business customers so I think we're entering the0:54:27
space age and the space age is the age where we have access to space from our programs right we said place was a0:54:37
portion of space space is the unlimited expanse in which all things exist and and and all events occur and this0:54:47
definition of space I mean it goes all the way back to you know the the roots first spatial and in the Latin there it's always incorporated both a place0:54:58
and time the notion of space is always encompass both place in time they're connected together quite significantly are we in the space age already0:55:08
programming now with space I think we are right we had virtual memory which really took us a level away from the actual addresses right then we had GC0:55:18
which meant war was served transparently available whenever I needed more I could get it guess what if your program runs0:55:27
indefinitely long and calls new and new never fails your program is running in space not in a place it's running in space0:55:37
if s3 never fills up or if you can always go to the store and buy another hard drive and stick it in your array dining out live your programming with0:55:48
space there's no place there there's no limit there's no D limiting that that's space what does this mean it means that0:56:01
we can take a different approach to the way we do things we're gonna say we are building information systems those systems should be maintaining facts and that new facts require new space and we0:56:12
have space right this is the end or this should call for an end to place oriented programming right if you can afford to0:56:22
do this why would you do anything else what's a really good reason for doing something else right and guess what you0:56:32
can afford to do this you already can't afford I mean you're already running programs like all new you know indefinitely and don't fail right and you have access to s3 or things like it0:56:41
there will be garbage there will be different characteristics to our use of space especially storage that are very analogous to what we saw when we had0:56:51
when we enabled space and memory right the whole notion of garbage collection is gonna happen in storage but you know if you have a grip on that you have no0:57:00
problems understanding a space orientation with storage so to summarize for some reason we continue to use place0:57:12
oriented programming both languages and databases right we even make new ones it's actually the saddest thing is the0:57:21
fact that we continue to make new languages and new databases right that's still emulate the decisions that were0:57:30
made when computers were tiny and we were we needed to program in a place as opposed to in space right this is not a no sequel versus old sequel there are0:57:39
older systems that maintain time correctly and most no sequel and new sequel things are still place oriented right it's not about old and new it's about place oriented or not right the0:57:50
now if we're doing this is gone I don't think anyone could deny the benefits that I just enumerated for values and in fact by your actions especially in your0:58:01
own IT services for yourself you're demonstrating you know this value proposition right we recognize this so0:58:10
we need to start making information systems that are really about facts right that are really about information I think the demand is clear this new0:58:21
this big data explosion is saying businesses are saying I demand to know everything that happened I demand not to lose track of the facts these things are0:58:30
important to me they're important to the decisions my business makes and they're soon gonna say why the hell is that only exists in a log why are you sticking it0:58:40
there you have indexes and databases and things like that you can use for this so I'll leave you with this and thanks very much0:58:50
[Applause]0:00:00
The Database as a Value - Rich Hickey
0:00:00
I think it's very exciting to have this talk in this track because one things I think is interesting about thinking0:00:11
about functional programming is thinking about the fact that most of our programs most of our systems live outside the bounds of any process0:00:24
we write in any particular programming language so we think if we think that you know we do functional programming in Haskell or in Scala or enclosure or whatever we don't any more than we write0:00:34
all of our programs in Java or C sharp because most systems are composed of a0:00:43
bunch of programs that are interacting together and one of those programs is usually a program you didn't write which is a database that you're consuming and so it's interesting to try to step back0:00:54
and say you know when I'm outside the bounds of what any single language can do for me can I get some of the benefits of functional programming when you know the Haskell type system is not helping0:01:03
me out because this part of the system is not in Haskell it's not under its purview so I think it's important to0:01:13
think before you do anything about what what problem are you trying to solve in other words so we're going to talk about the database as a value which I'll describe in more detail as we go you0:01:25
know what why bother even thinking about this why pursue functional programming at all by an interesting talk at the beginning in this tract about some of0:01:34
the benefits of using Haskell you know from a practitioner which is really great there was also this great paper called out of the tar pit which is0:01:43
written by Mosley and marks several years ago I had read it after I had started building closure I found it really inspirational because they were0:01:53
talking about all the same kinds of things I was trying to trying to work on and in particular they were sort of addressing the problems of pro0:02:02
complexity in programming and espousing an approach to solving those problems by adopting functional and declarative programming as well as adopting in their0:02:15
case in their example a relational model for data inside your program in particular because that model allowed you to do a more declarative style of0:02:25
transformations than we can do even in functional programming and so I found it really inspirational as one of the things that kept me on track for making closure a functional Lisp because it was0:02:36
kind of a crazy thing to do at the time but I was always bothered by one part of the paper which was the fact that they know they talked about using functional0:02:45
programming techniques they talked about having a relational model for data inside your process and using a declarative you know relational algebra to manipulate that model inside of your0:02:56
process but you know everybody seen the comic where this this guy's got like a million equations on the board and then there's this spot and he says and then a0:03:05
miracle occurred right and then there's the answer well this is part of the paper that's like that too where they're talking about how this relational model0:03:14
somehow gets updated like things happen and somehow it's different and and that0:03:24
was kind of like whatever that is is scary and it's gonna be over there but then you'll know it changed and then things will be better and when you have some novelty you can do something and it0:03:33
will go there and it will be and things will be different and so I think I feel that this paper really punted on what I would call process right everybody's0:03:44
talked about you know what functional programming you're fine but I know that there's some part of my program that doesn't fit right where things are changing or I'd like to say that they're0:03:54
changing and that's the process part and so I always wondered about whether or not be possible to close the loop and and really build a model that modeled0:04:03
process that means that your your program is either going to encounter input from users or other systems or events in the world that it's going to process that's going to have to maintain0:04:13
some record of that novelty in addition it's going to need to see changes from the outside world that either0:04:22
incorporated in that or are integrated from external sides and and close the loop right so a program that's running on an ongoing0:04:31
basis has got these interactions with this thing right where novelty is accreted and that we usually call the database so how do we do that how do we0:04:41
do that better it's certainly I think quite easy to see the complexity we have from interacting with databases all of us know this I mean there's lots of lots0:04:51
of nuisances and the small but in the large we have a bunch of problems the first is that they're stateful right of course they're stateful their job is to0:05:01
maintain state and and state itself is inherently complex right because it's going to combine values and time and when you combine things that's how you get complexity the big problem though0:05:12
with databases is that there's no way out there's no way to get away from their statefulness there's no way to say okay I understand your job is to maintain state but can we0:05:23
stop doing that now can you give me something less variable to work with to think about and it's that it's the fact that the statefulness is inextricable0:05:32
that's really the problem of databases we see that right most simply by issuing the same query over and over again the issue the same query over and over0:05:41
against your database what do you get different answers right different stuff we just heard about functions before and how wonderful they were there's nothing0:05:50
like that in the database world you keep asking the same question to keep getting different results and I'm gonna call that a problem of a lack of a basis now0:05:59
we can't obtain a basis from a database and I'll explain that more in a minute other sources of them complexity with databases of the fact that they're almost always remote right the database0:06:08
servers or these other processes they're over there they're across the wire you have to talk to them with a special language and again there's some inherent complexity to the fact that you have to0:06:18
processes right again because you're trying to accomplish one thing with two processes they're intertwined in some way that that's by definition complex but it's more complex than it needs to0:06:29
be and we'll see that that's because of the the model that's used for databases is one that requires coordination to a higher degree than is really necessary0:06:38
for the problem databases are trying to solve and that means that the fact that it's over there is more complex that needs to be another source of complexity with databases is0:06:47
the whole notion of update you know you have relational databases which you know at least have some mathematical foundation especially on the query side0:06:56
but on the update side again it's another a miracle occurred kind of situation you have this relational model and transactions happen and somehow the0:07:05
relational model is that has been changed and exactly what that means is is can be very nebulous and then of course you get to non-relational0:07:14
databases and it gets weaker and weaker the actual notion of what it means to update something gets incredibly weak but most of these systems share a characteristic which is that the notion0:07:24
of what's happening is based around a place orientation for state which is catastrophic ly bad it's kind of0:07:34
unfortunate that the order of talks this week because I have a keynote on Wednesday we're gonna talk quite a bit about places and values and a lot of that it would be great if you had heard0:07:44
before this talk but it you shouldn't be lost without it so I said databases don't don't provide a basis so what do I mean I mean we here0:07:53
talks about functional programming and they use phrases like referential transparency and things like that I don't think you need to go there to understand this problem right when you0:08:02
perform a calculation or when you try to make a decision you may need to think about or incorporate in the calculation more than one thing so you're going from0:08:12
thing to think right or you may need to revisit a component more than once I need the average of these things and later I need the total of those things0:08:22
the same things so I want to talk about those things twice and the problem is that if anything that you've referenced0:08:32
during your decision-making process or during your calculation were to change during the calculation like if you would0:08:41
say let me get the the details for this thing and then the master for this thing and somehow one or the other could change when you move from one to the other but when you try to revisit that0:08:51
list of stuff you took an average of before and they I take a sum of later your calculations or your decisions are broken right0:09:00
because the basis for those decisions is the fact that the thing upon which they're operating the things upon which they're operating are stable during the course of the operation now we solve0:09:11
that in the small with locking and things like that when we don't use functional languages but for use functional languages we don't even have this problem but the fact is that basis0:09:20
is broken by simultaneous change if while you're thinking if while you're making decisions things change you're in trouble and the problem you have now is we said we're going to cross0:09:31
process boundaries and we're almost certainly going to cross this time boundaries as well you imagine trying to do a fairly involved processing or0:09:42
decision making job that required you to interact with the database more than once right what happens when you do that more than one round-trip to the database0:09:52
will happen to your answers and your totals and your decisions they could be all corrupt right because each time you talk the basis changes so it's an0:10:01
example of that in other words the database is acting as this giant global variable for your program and then you know I mentioned before we have this update problem right what does it even0:10:12
mean I think if you ask people for any particular database what does update actually mean you will almost never ever ever get a completely0:10:21
correct answer yeah we happily use these systems every single day for very critical things and none of us know what update means but one critical question0:10:30
I'd like you to ask of any database system you're using is does the new stuff replace the old stuff by I don't care if it's a relational model I don't0:10:39
care if it's a document or whatever if when you put new stuff in is the old stuff still there or is it gone right that's a fundamental kind of constant0:10:48
construct replacement and and then and what is the granularity knew something replaces old something is it a new document replaces a document a row0:10:58
replaces a row a table replaces a table a set replaces another set some combination of those things in a shape dictated by0:11:07
the transaction boundary and and what is the relationship between that and things that are happening at the same time in other words what are the visibility properties of change right chances are0:11:18
really good you cannot answer these questions for your the databases you're using today and I don't care how good a sequal practitioner you are it's very0:11:29
very difficult right how many people have seen problems related to programmers writing queries without the right you know read modified read0:11:38
committed settings right anybody who's ever used a relational database has seen this go wrong this is one of the problems that happens right so what happens to us because of these this0:11:49
complexity right if we say complexity is inherently bad well it doesn't really mean anything how does it manifest itself it definitely manifests itself in the fact that our programs are not correct0:11:59
we write programs they just have these bugs right read committed bugs people relying on transactionality of statements in a batch right it's a cause0:12:09
of a tremendous number of bugs people just don't even understand the semantics of multiple independent operations in in in a batch script we have scaling0:12:21
problems right again this sort of is more subtle but it comes down to the fact that this place orientation of databases requires a high coordination overhead right because the database is a0:12:32
place whenever you want to put something in it you have to talk to that place whenever you want to ask a question you have to talk to the same place well as we try to scale our system there becomes0:12:41
a lot of pressure on that place and we need to figure out how to reduce that pressure it seems like one of the answers right now is to have even less0:12:51
ability to reason about our programs you know eventual consistency and I don't think that's a great answer because there are better answers once you look carefully at the problem we have this0:13:00
fear of round trips all right everybody says well the problem around trips is the the overhead of that you know the cost of multiple conversations it0:13:10
actually isn't right because you you lots their webpages say that are built with you know 500 conversations or 100 conversations per page you're not actually afraid of conversations you're0:13:20
afraid of the fact that if you talk to the same database twice in a row instead of trying to get everything in one conversation you're gonna get a mismatch and have have a broken program0:13:29
so we're afraid of that we're afraid of overloading the server because it's a it's a unique resource and and this has all kinds of interesting problems right0:13:40
that it manifests itself for instance the the fact that we commonly couple trying to figure out the answer you know the set of entities for which something0:13:49
is true that we couple that with pulling out all these fields that we need to paint this web screen right we do that all the time right how many people issue a query to figure out the affected0:13:59
records then issue another query to pull the details for painting the screen not too often right sometimes you do but a lot of times you'll be like oh you can't0:14:08
do that because of this this synchrony problem but if you fail to do that you've really introduced this coupling in your system because the two parts of the system the one part that has the0:14:17
business rules that say these are the entities that need to be you know displayed and the other one that says you know we've got a screen it looks like this and we have that we show and the pictures and the names those are two0:14:27
independent things but if you feel afraid to manipulate those two parts of the system independently especially when they talk to the database you've got coupling right that's what0:14:36
the definition of coupling is so we have a bunch of choices into in and how we in how we deal with this one is open right0:14:46
there's a coordination choice right how much coordination is going to be required right we're we're admitting a database as a stateful thing and we're admitting the process requires0:14:56
coordination right you know two people cannot park in the same parking spot just because you have a database that has eventual consistency they can't0:15:06
write there's actually a parking spot you have to go there and coordinate I'm gonna go in and you're not my eventual consistency doesn't make that possible suddenly so process requires0:15:17
coordination but the critical thing is that perceptions shouldn't require coordination and it does right now in a database that's place oriented we end up0:15:26
with coordination just to see things because we have a problem seeing consistent things unless we coordinate so that's one of the areas where we have0:15:36
a real choice I think that there's no choice in trying to solve these problems except to embrace immutability and we'll see we'll see how that comes into play0:15:47
so I want to do a couple of terms if you've seen me talk about closure or closures STM or process or other things you may have seen this slide already0:15:57
because this is the fact of it whether we apply it to programs in memory or to databases these things don't change right these notions these ideas don't0:16:07
change so we should try to use them over again so one idea is that is the idea of a value right we know what 42 is a value we don't expect 42 to change into 43 we0:16:18
think there are two different values you don't add one to 42 by changing it right it's an operation on two values that produces a third right so in the small I0:16:28
think we have a good good grasp of values but that notion of values should extend upwards to various sizes transparently in other words a composite0:16:38
of values that's also immutable is itself a value right so we can think of collections in memory as being values0:16:47
that's less common but for functional programmers we do that all the time I think you can just keep going right why can't we think of the entire database as0:16:57
a value it seems tricky maybe but there's no good reason theoretically not to start thinking that way then we have the0:17:06
notion of an identity and this is where I think everybody gets confused right an identity is an idea in our head usually it's associated with a name but it is0:17:16
not the name that we connect to a series of causally related things over time so we have like the Yankees right the Yankees is an idea of a team that plays0:17:28
in New York in any particular year there's a manifestation of the Yankees that includes certain players and not others but over time it's a different team but there's still the Yankees just0:17:38
like you know there are rivers same kind of idea right says this this is this is continuity notion which is identity that's independent of what I would call the value of an0:17:49
identity which is its state at any particular point in time any identity has a particular state you know the Yankees right now have a particular0:17:58
roster don't ask me what it is because I don't Yankees and roster is all I know about baseball the words I don't actually know anything else about baseball but but but but there is a0:18:08
value of the Yankees in any particular point time which is the roster which may be later if you talked about the Yankees you'd have to be talking about a different thing those values are0:18:18
immutable themselves right the identity the Yankees is applied to different values over time and time is just a relative ordering right there's a notion0:18:29
of causality to time that this thing happened before that because it caused it other than that it's all relative but there are important words associated with time like before and after this0:18:41
leads to an idea for how to meet how to model state then I call the epical time model it's just a made-up thing it's not0:18:50
an official phrase or anything but but the idea is actually pretty simple let me see if this thing works from over here ooh okay we have a proper notion of0:19:02
the identity which is the big box right we're gonna make a thing out of the identity that thing that constitutes the identity is going to be different from0:19:12
its value right so if you're used to object-oriented programming those two things are smashed together in a way that's inextricable and it's why it's a mess I'm not going to get into0:19:23
that today so we're going to we're going to reify identity there's going to be a thing that represents identity right and then over time there will be the0:19:32
different values that the identity takes on those are the values are the identity state at any point in time to get from one value to another we're going to0:19:41
apply a function to one state and get the next state that's a pure function it's just a function of one piece of0:19:50
data - another piece of data so that's how that's how process happens and time moves forward is we take the past we0:20:00
apply a function to it and we get the present and we apply function to that we get the future and stuff just moves forward what's critical about this is that we have the0:20:09
separation between identity and state the other thing is that the transformations act on the on the values themselves you'll notice these aren't0:20:18
functions of the box of the identity they're functions of the values similarly there can be observers right these are the thinker's these are the0:20:27
calculations and the decision-making parts of our program they operate again also on the values on the states not on0:20:36
the box right when you adopt this model and this model has plenty of realizations you can implement this model with compare-and-swap you can0:20:47
implement it with STM you can implement it with actors right there's a bunch of manifestations of this model in memory in programs and even across programs0:20:58
when you get to actors but the important thing is the separation of the value of an identity a point in time and the identity itself and the fact that0:21:07
observers and transformations act on values not on identities okay so this is what we want right closure0:21:18
implements just three different ways Scala has away Erlang has away Haskell has nice ways to do this with references this is what we're shooting for because0:21:27
this is the path to sanity when we do this in memory I'm not going to talk too much about this because it was mentioned in the prior talk and people have seen0:21:37
talks about functional languages to talk about it but just so you're aware you know as soon as people see value a functional transformation another value they're like oh my god that seems0:21:46
expensive especially you're talking about aggregates right you're talking about a whole collection has one thing added to it I get a whole other collection yeah was that a whole copy that sounds that sounds ridiculously0:21:56
expensive and if you're gonna tell me now you're gonna go from one value of a database to another value of a database you better have a fast way to do that because otherwise I'm not buying any of0:22:05
this and and there is a way right it's in memory it's called persistent data structures and it's a very simple idea it's basically you represent ever thing as trees and you use structural0:22:17
sharing so that a modified version of an immutable thing is actually another tree that shares most of the leaves and it looks like this and if we take the tree0:22:27
on the left being the past you should just take my word for it if you if you're not familiar with it that you can use trees like this to represent any data structure you can represent vectors0:22:37
this way and and sets and maps and all the things you're familiar with all the composites you're familiar with can be implemented using trees when you do that0:22:46
it means that making a new version of a data structure means making a new tree but you can share a whole lot so here0:22:55
all we did was we added one one leaf or we modified one leaf on this tree and what had to happen was just the path from the root to that leaf had to be had0:23:07
to be copied but not the rest of the tree so the new tree shares a whole bunch of nodes critical aspects of this though are if you were looking at the0:23:17
past like if you had a reference to the root of this tree on the left and somebody built the new tree do you care a do you even know it's happening0:23:27
no does it impact you at all no do not care what happens to the past0:23:41
it depends right it depends on if someone's still looking at it in memory right if no one's looking at that left root it will eventually get garbage0:23:51
collected if someone is it'll be kept around and so they don't care about it anymore this garbage collection notion and the notion of being able to access0:24:00
roots is important you want to remember this in a few slides but everybody get the general idea this is how you implement data structures that can be immutable but efficiently modified you0:24:11
know and modify it always comes in quotes so we know where we want to go we want to go here and we know where we are which is here right this is what we0:24:24
actually have we have the database place it's like the Oracle ooh what interesting word right it's this place0:24:34
you go and you bag it to please take my data right that's what happens up here when you have things and says okay what actually happened that's up to the thing0:24:45
right the place and then what happens when we want to ask questions what are we asking questions of the place the Box where trance we're issuing0:24:56
transformation requests to the box and we're asking questions of the box now encapsulation can be good but this is not good this is really bad because the0:25:06
semantics of these operations are destroyed by the fact that they're all unified in a place right so we ended up with some form of identity right we0:25:16
usually in our programs this is usually the connection for the database right you send transaction you know strings through the connection and stuff happens0:25:25
at the place or you send query strings to the connection and you get answers back from the place right and so transactions are they're not0:25:34
functions of anything they're just requests of a place and queries are not functions of anything they're just requests of a place so we're missing something really critical here but we0:25:44
should recognize this because I can replace the database place with something else right I can replace it with the object in your object-oriented programming language it's precisely the0:25:54
same problem it's just on a bigger scale with more than one process but it's the same problem it has the same0:26:03
characteristics which is it collapses identity and value into this box and it destroys time just basically is a time0:26:12
destroyer you know what happened no one knows it's gone it just eats it it's like a black hole so this is not what we0:26:21
want but this is what we have critically the difference between that so this is the same time model with the pieces labeled four databases right we're gonna0:26:30
have a connection to the database that is an identity the database right the thing we go back to over and over over time the database that we think is0:26:39
changing that's an identity the Yankees the database my company's database right inside we really want proper notions of the value of the0:26:49
database at any point in time we want transactions to be functions of the value and we want queries to be functions of the values so this piece0:27:00
here is really what's critical we need to introduce values into our databases without them we can't have a sane model of of change right because things you0:27:12
know what happens inside the box between queries you don't know what is it what does the transaction actually do you're not sure okay so that's what we're0:27:22
shooting for so we have a we have the first question we have to ask ourselves is what is going to constitute this value if we're going to try to say a0:27:31
database is not this magical blob that answers questions and accepts transaction requests what is the value of a database going to be as it is it0:27:40
going to be a a linked list or a snapshot of a referential model that would be a candidate right you could0:27:49
definitely say that if we could do that that might be nice I'm gonna I'm going to claim that a really good model for0:27:59
database state is an accretion of facts and I will spell out accretion in fact more but the critical thing is that it's0:28:08
an accumulation in particular that the past does not change I don't know if you can see it that well but that's a picture of our tree rings there it's the0:28:18
same kind of idea right the tree grows and there's just more stuff but the inside doesn't doesn't change right0:28:27
there's a couple of things that fall out of this idea and this is not about how to implement it yet and I will talk about that but the first concept is that new that process right change novelty0:28:40
requires new space we're not going to try to reuse space just like the tree doesn't write it gets new space as it0:28:50
grows and fundamentally this is a move away places by moving to accretion is the first step to moving away from places0:28:59
and moving towards values so what do I mean by accretion or how do we do accretion well we can if we go back now0:29:08
and remember that model for persistent data structures right we had a tree which had a root then where you needed a new value so we found the parts of the0:29:18
tree that we wanted to keep and we made some noose we allocated some new space right we there we used just memory allocation to get the new nodes and we0:29:27
made a new root and then we just transition to using the new root that actually doesn't work for databases like0:29:36
we can't just take that model and transliterate it over for databases and for its for several reasons the first is that we're crossing process and temporal0:29:46
boundaries in other words we often will need the ability to convey a route to somebody else right inside our program0:29:56
inside our single programming language I can have a reference to the old database you can have a reference to a new database you could pass me a reference to the database by passing me what0:30:05
whatever references are in our programming language so Java reference a pointer and object something right you'd have an ability to do that now we're0:30:15
crossing process boundaries so we have to find another way to do that the other thing is that we have a tendency in databases to be if I were to convey for0:30:24
instance a route to you it's going to escape the process that is the database itself it's gonna be on a wire or somewhere you're gonna get it you're gonna need to go someplace where the0:30:33
database is being maintained or places with that reference and say give me this route well if it went away during garbage collection you'd be toast right0:30:42
in other words because time passed between when the route was created and maybe a new version of the database was made you couldn't go back with with the0:30:51
ie with the information or the value of a ruler of this descriptor of a route and get it unless it was kept somewhere and if you tried to keep all the routes0:31:00
then you have to have a way to find all the routes you end up with this metal layer above all the routes and then how is that being me end up with a circular problem of0:31:10
stacking persistent data structures other problems related to trying to mimic that model directly are the fact that you can't do global garbage0:31:19
collection because you don't know all the extant processes that that still care about parts of the roots so what we're going to do instead is model that0:31:28
tree a lot more which is to say that newer values of the database include the past inside themselves just like the0:31:39
tree includes it's old rings right and that the past is just a sub range of the present that a database will have it's0:31:50
passed inside of it and therefore you can take any database and say I'd like to talk about some point in the past just like you can talk about you know a0:31:59
sub sub range of a vector even those things have been added later you can still talk about the past of a vector right it's it's just only up to here I'd0:32:08
like to talk about that that seems pretty straightforward we need to think about vectors and it should be equally straightforward with databases as long as we're doing the secretion model there0:32:18
are lots of other good region reasons to do accretion and I'm not talking about them on Wednesday I think that if you really want to have something you're gonna call a proper information model0:32:28
this is essential if you don't do this you're actually you're not you're not doing information modeling at all you're doing something else pretending okay0:32:39
so I said we were to creep facts what do we mean by a fact well it's quite critical that we boil down novelty into something minimal as0:32:49
I'll talk about in a in a next slide and so what we want to do is try to come up with a representation for information that eliminates or reduces to the group0:32:59
as great an extent as possible structure structure is something we can superimpose later but if we were to try to implement change using structure we'd0:33:08
end up with a lot of problems so we're looking for something atomic in day Tomic we call it a datum but it's just a quad right it's entity attribute value0:33:19
and in our case we use Tran action but the critical part of putting transaction there is that that's a path0:33:28
to time if you want to have a fact it must include time without time it's not a fact because nothing is eternally true0:33:38
and was was eternally true you know in the past and for all all the future facts are temporal they're temporally qualified so so that's going to be our0:33:48
primitive and what we're going to do is we're gonna represent process by reifying facts by making them into something we can touch right because0:33:59
what what actually is in a database right now that you have issued transactions to what went into it where is the list of things that went into0:34:08
your database where is the list of activities against your database transactional oh how many people can query their transaction log yeah how0:34:19
people have an information model for their transaction log no you can we heard a little bit about event sourcing which is one one small step on the path0:34:29
to this but there's a lot of value to reifying the process to reifying what happened to making a concrete thing out0:34:38
of what happened right because what is the database otherwise it's just the result of the effects of a whole bunch of operations it's just the answer to0:34:48
and you don't even know what what what the question was or what what the path was to obtaining it that's completely gone so on the other hand if we really0:34:58
want to make a thing out of what we're storing our thing out of what happened we need to make sure it's primitive right if we said every time you change0:35:07
the document save the entire document is that what you did is the entire document what you did you know you change the typo and there's an entire new document0:35:17
is that what happened no that's what happened plus the entire document that surrounded what happened right that's not going to work right trying to solve at a unit of0:35:27
granularity that's bigger than the novelty is gonna crush you right because I'm sure a few of you will like accumulate everything that ever happened in my database is gonna be huge0:35:37
it's actually not going to be huge you do not have as much knowledge E as you think you have but if every piece of novelty is accompanied by the entire0:35:46
document that was near it or the entire you know record that surrounded it then you will drown right so what we want to0:35:55
do is we're going to say we're going to reify process by only maintaining the most primitive representation of what actually occurred we can describe anything that actually occurred in terms0:36:05
of assertions and retractions of those really tiny facts that we had before and in that way I think you can say that this representation of process is0:36:14
minimal okay we can talk about event sourcing later but that's not minimal right because you're storing the verb0:36:24
right and what actually is the result of that verb is in some code somewhere which is going to change when you make the code better later and therefore is0:36:33
gone so you can you can take any other transformation and you can boil it down in terms of this so that's the idea how0:36:45
do we do this the first thing is to go back again to this how do we represent state and I think it's also critical0:36:54
here to talk about what actually what it means to be a database because I think this is another notion that we're definitely compromised on substantially right databases are about leverage0:37:04
anybody called our file system a database yeah I people treat their file systems like databases but is it really0:37:13
a database right but you'll cook you'll call a key value store a database though which does less than your file system no problem that's a database because it's new right no databases are about0:37:25
leverage right we invented database we already had file systems when we invented databases right we already had file systems then there was databases so there must be something more right why0:37:35
do we have databases we have databases because they give us leverage in particular one of the key things databases do is they organize data such0:37:44
that we have leverage when we try to answer questions right like they have indexes they have query engines they're sorted0:37:53
probably they have a whole bunch of characteristics they may slice things on attribute boundaries or column their boundaries so that we can quickly get at certain pieces right if we're not0:38:03
getting any leverage it's not a database it's just a datastore it's a place to put stuff it's like squirrels hiding acorns in your backyard you know that's0:38:13
not a database so if we're gonna say we're gonna recruit facts one representation of that that gives us leverage is to store the facts as sorted0:38:23
sets so that's what we're gonna try to do how do we store our state as sorted sets of facts it ends up for reasons I'm not going to go into here that it's been0:38:33
discovered repeatedly that trying to maintain sorted persistent sets live on discs is a bad idea right you just0:38:43
cannot do it efficiently and fast enough you consume way too much space we don't mind in memory because our reclamation time is is really really fast right when0:38:53
we move from the old to the new and nobody references the old you know that garbage collection is extremely efficient on disk we'll have written everything that happens so we need to we0:39:03
need to do it in a batch orientation and so BigTable and many other systems are examples of doing this what happens is you're going to accumulate novelty in0:39:13
memory which is very fast and you can keep it sorted there and provide leverage there then periodically you're going to take that memory and merge it0:39:22
with a sorted version on disk right you'll log it in the meantime it's not an adorable 'ti question right as data comes in you'll log it but in order to0:39:31
get leverageable representation you're going to take a batch orientation you're going to accumulate change in memory and periodically merge it to disk BigTable0:39:41
does this with a flat file structure they accumulate these big flat sorted things in memory and then they put them on disk and then they merge big flat0:39:50
things on disk into bigger flat sorted things on disk because we're taking an immutable approach to this we're of course going to use trees and we're0:40:01
going to take you know use persistent trees the same kind of shape you saw before but we're gonna do it on disk so this looks like this just sort of generically this applies the big table and applies to de comic and other0:40:11
systems right transaction processing or the the input to novelty is going to both log novelty just so that it's not lost but it will put it immediately into0:40:21
a memory index which is sorted and periodically some merging process will take memory and merge it out onto0:40:30
durable storage so then you have a an index on durable storage a new index on durable storage and if at any point in0:40:39
time you want to leverage this you're gonna have to combine two things right the stuff on storage is as of the last time it was merged and the stuff in0:40:48
memory is since then all right so every time you merge you can drop everything you have in memory because at that point everything is in storage so that looks0:40:57
like this in in DES Tomic there's a whole bunch of other things that are happening that I'm I'm not going to talk about because I'm mostly talking about the idea of making databases into values0:41:06
and not the details of the architecture here but I'll just point out a couple of things the first thing that's interesting is this is three separate0:41:15
processes the whole notion of a monolithic database is something that can go away can go away once you start having values because we get some0:41:25
independence of location but essentially novelty will come in to something we call the trans actor it will immediately be logged then it will be retransmitted0:41:36
out like event sourcing style you can get these event streams to anybody who cares and periodically an indexing job will transfer memory into storage and0:41:46
any query process that wants to get an answer that involves all the data will do a live merge between the live index0:41:57
and whatever's on storage and that merged data will be able to answer questions so it's the same kind of architecture as BigTable right it's just0:42:06
accumulation of novelty and periodic merging of that into adorable sorted version the net impact the net result is a0:42:16
leverageable sorted view of the data so let's talk a little bit about the memory index the memory index is a persistent0:42:28
sorted set so it's exactly like the trees I showed you before it's just another flavor of a persistent data structure the shape it has has particularly a large internal nodes so0:42:37
it's not like a binary tree it's not like a red-black tree it has your big internal nodes with you know thousands of things inside like a bee tree but would have except it's a persistent data0:42:46
structure so it's immutable both the stuff on disk and the stuff in memory is immutable when I say we merge onto disk it means we're making a new set of index0:42:57
nodes on disk in the same manner right we'll share some nodes with the old index and we'll have some new nodes all on disk it means eventually we'll have some garbage on disk that we'll need to0:43:06
get rid of so as large internal nodes you can plug in the comparators you want to at least sort the data two ways you0:43:15
want to sort it with an entity first orientation and this would support any entity or object like queries or document style queries right if you have0:43:25
a sort led by entity then attribute you can pull up it you can pull a document out of this by just grabbing an entity and saying everything that you find by following the tree down from this thing0:43:35
but you haven't actually stored a tree you haven't stored anything structural similarly you want to at least sort it by attribute first and that will let you0:43:45
do analytic style queries right which are columnar which are often driven by you know I just want to see total sales0:43:54
it's like I don't really care about whatever else was associated those entities I just want their prices maybe or their sale prices and optionally you0:44:03
can get other sorts driven by attribute than value or the reverse index from entity to entity is value first so when0:44:13
I talk about these indexes both in memory and on disk it's important to note that these are covering indexes so a covering indexes and index that includes all the information so they're0:44:22
really effectively sorts of the entire data set like in each index is the information each index has all four parts they're not indexes in the sense0:44:31
of being pointers into something else they're they're complete data sets sorted sort of different ways in storage0:44:40
it's a similar kind of thing we do have logs right so you're going to keep track of everything that happened it will be a queryable thing that's concrete we'll0:44:50
see the details of what constitutes a process but it's those assertions and retractions are logged and then the same thing it's these covering indexes what's0:45:00
sort of neat about breaking this apart like this is that you start to have the notion of there's a bunch of jobs the database does and one process doesn't0:45:09
need to do all those jobs you know database is traditionally handled transactions and did indexing and answer queries and managed storage there's0:45:18
really no reason for that that colocation is actually all a secondary effect of the fact that they had a place orientation in other words there was no0:45:28
place you could go to get an answer to a question except the keeper of the place unless you went to through elaborate0:45:37
mechanisms to make multiple places you know replicas that actually did all the same work at the same time so what we require storage of a storage subsystem0:45:46
is that it just be able to store the data segments of the indexes and so you could use any key value store as a storage for a database that's0:45:56
implemented this way in addition there are a couple of trickier parts right so for instance we saw on the first picture0:46:05
of the epical time model there was entity whose identity right and then values so those identities actually transition from value to value they're0:46:14
in fact the only mutable thing in a system that does this right so closure has refs you know Erlang has actors Haskell has refs also right those are0:46:25
the things that can change what gets put in them are immutable right but they can change so when you want to use a storage to do this job you're gonna be putting index segments in they're immutable but0:46:37
the storage itself has to support can additional put in order to make atomic transitions from one state to the next in order to do that identity part of the0:46:47
job okay now we can put this all together in the bigger picture so what actually is a database what does the0:46:57
database value look like when you start doing this de Tomic happens to be implemented in closure but you can put it in any construct here that you want0:47:06
right so it could be a Haskell ref this happens to be one of the closure reference types there's a reference type that is the identity in other words the database as of right now inside it this0:47:17
is a structure here which is immutable which points to other things and that's the value all right so this is immutable0:47:26
and everything over here is immutable this box that goes from one structure to the next to the next is the only mutable thing in the entire system you can build0:47:36
an entire database with one mutable ref one just I mean the point was made in0:47:46
the prior talk you mean you don't need much state guess what you need a fantastically small amount of state really a mutable state so we have the0:47:56
structure I'm not going to talk about all the pieces but here's the two segments right there's this memory index which is a persistent tree in memory there's actually several of them one for each of those different sorts right and0:48:06
then there's a pointer to a communications infrastructure that lets you interact with storage where this index segments that are stored durably0:48:16
are kept and they're all immutable so there's a bunch of roots there and there's a cache right and when you hit the cache and you get a cache failure you're actually going to get i/o so this0:48:27
is a really interesting thing right we have what feels to be an immutable value that accessing it does IOH mm I mean it0:48:37
definitely feels like a value to the program I'll leave it as an exercise to the type systems to find the Monad that will let this work because I don't see0:48:48
it but in any case what happens is it feels you had the entire database in memory but you don't what you have is a0:48:57
communications path that will let you get any part of the database that you want but because the database has leverage right it's organized it's it itself is a tree it means you can0:49:08
efficiently grab anything that you don't already have because the exponents are working on our behalf and so we can also0:49:18
do this caching which is also critical right once your database segments are immutable do you have to go to a single canonic place for them no you can copy0:49:28
them around anywhere you want that's a beautiful thing we already know it's a beautiful thing where do we do that all the time on the web that's right right0:49:38
we're way way way ahead of our databases here we need to we need to catch up so in storage this is what any particular0:49:47
index looks like right there's a set of routes to the various sorts then any particular index is just a three-level tree there's a route that points to a0:49:57
bunch of directories each of which I have a bunch of segments that they point to and those segments just store datums sorted so it's a simple tree on disk0:50:08
when you merge indexes some of these leaves will become invalid and there'll be new replacement leaves maybe they'll be replacement directory entries and they'll definitely be a replacement route so making a new index means0:50:19
keeping all this stuff on disk putting some new stuff on disk that points to a lot of the old stuff and then the the old stuff needs to get collected so you0:50:28
have to keep track of it those segments are what gets put in storage so individual facts don't get put in storage entire segments get put in storage so this system uses a key value0:50:39
store like a database used as a file system to store blocks of index structure not individual facts I talked0:50:50
a little bit about process right I said you could boil all process down to assertions and retractions but you know that's insufficient right I can't add $10 to your bank account by merely0:50:59
asserting your bank account is bigger right because that would be potentially a race condition if somebody else is trying to do the same thing at the same time so how do we do that it0:51:09
ends up that well one model for doing that is something called a transaction function which is a function of the database itself and some arguments that0:51:18
results in some transaction data where transaction data is either an assertion or assertions retractions and other transaction functions with this you can0:51:30
now build the model that can represent any process any process transformation you're either asserting something we're tracking something or you're doing some composite operation which is a function0:51:40
of the database coming in there word returns new transaction data on the way out and what happens is these if there are so a transaction actually looks like0:51:50
a certain assert retract assert call this transaction function assert retract assert and what happens is that these transaction functions will get called0:51:59
until they no longer produce transaction functions in other words so they bottomed out on assertions and that looks like this so you may have a transaction with some assertions retractions and called to the0:52:09
transaction function foo during the transaction foo will be called it'll be passed the database and whatever inputs it was given it may produce two new0:52:18
transaction function calls they all in turn be called eventually all transaction functions will bottom out in assertions and retractions and this is what happened right this is the process0:52:29
if you want to keep track of the high level request somewhere you could do that but I think you should do that as independent data in other words this was produced because somebody sold something0:52:39
as opposed to storing somebody sold something as the only thing in your database because what that means today and what it means next week are0:52:48
different and and the other nice thing about this is you do not need to recreate a point in time by replaying all of the prior history you you have0:52:58
every point in time immediately accessible so we saw a couple of other components there on the diagram the trans actor basically just applies the0:53:08
transactions right it does that expansion it puts it in the log and it will broadcast that change right because we have reified process we can send process as a message we can say this is0:53:18
what happened anybody who cares this is what happened to the database somebody just now did this and because this is something we can talk about as opposed to somebody just issued this transaction0:53:28
you know script me what do you imagine if I sent you that message what are you gonna do with it somebody just applied this script to the database I just sent it to you what can0:53:38
you do with that nothing right it's you can't do anything with it but if I said here are the new facts and retractions that happened as a result of0:53:47
somebody doing something you can say oh look some of this matters to me right now the trans actor index is in the background and indexing itself creates0:53:56
garbage as I said so you do need some notion of storage garbage collection but it can't be route tracking based right because we don't know what processes are0:54:05
using the past the good thing though is the past is in the new present so at some point you can just say look if you still want to consume the past you can0:54:14
do so but you have to base it off the current route because I'm gonna get rid of the old routes once a week say you0:54:23
also saw the pier in the prior diagram it again I'm not really talking about the architecture here and more of the features but what's neat about it is that's an independent process that has0:54:33
its own query engine right do we need to co-locate query with anybody in particular no everybody has0:54:43
access to the information it's all immutable haber the Leicestershire query can issue a query if they have access to the storage so the critical thing is detaching storage so it's accessible0:54:52
from third parties and then you can have third parties that do their own queries and they can do this job only but only all their only requirement is that they have this live memory index and the0:55:02
ability to merge that and storage so the transaction will propagate to peers peers keep their own memory index they can access storage directly which means0:55:11
they have the recipe to merge those two things and answer questions they also do extensive caching so this model0:55:20
dramatically simplifies databases right we now have that apical model that I showed you before we've reduced the amount of coordination in the past what0:55:29
do we have to coordinate we had to coordinate rights and reads right you couldn't issue a wild flying read on a database and have0:55:38
anything consistent happen which means this coordination associated with that I don't care if you're using MVCC or whatever inside to reduce the overhead of it it's still there it's still0:55:47
coordination and you know research has shown recently that you know you research behind the volt right showed that that coordination is huge cost huge0:55:59
cost you're much better off serializing activity but if you had to do that for reads you'd really be crippled now we have the ability to get stable bases0:56:08
right we can say I want to manipulate this database and as long as I have that value and those loans I've dereferenced that ref and I had that value I can hang on to that all day0:56:18
I can issue a hundred queries over ten hours and nothing about my answers will change I have a stable basis the same query the same result all the benefits0:56:28
we heard about functional programming start to apply to the database itself in addition transactions are well defined now we know what they are they're a functional a function of the database0:56:38
that causes it to accrete novelty in the forms of facts you can say that in one sentence is nothing about it that's confusing I don't know maybe it is0:56:48
confusing but it would not be confusing if you kept saying it over and over again it doesn't have a lot it doesn't have a lot of related0:56:58
oh but stuff right like read committed does and serializable very very difficult concepts those are other0:57:07
benefits we get that basis that we can have if we if we said dereference the connection and give me the value that's great what if I want that same value0:57:17
next week you can get it what if I want to tell you about that value oh I just issued this report and something seems screwy go check it out I can now communicate that to you I can0:57:27
say the database as of you know transaction 567 looked messed up to me can you please run your advanced analytics on it and see what what's0:57:37
wrong and that's not going to go out it's not going to disappear during that conversation it's not going to disappear on you if you don't get around to doing that analysis job until next week0:57:46
right it's a communicable recoverable basis that's huge we also saw in the architecture diagram we can now move stuff around right when0:57:56
you start parking with values you get a high degree of relocate ability which is as an architectural premise is a big advantage right when you talk about0:58:05
scaling queries you want to add more things that issue queries is that straightforward sure definitely you want to make storage parallelizable and use0:58:14
one of these cool new distributed acorn hiding key value stores you can do that right as storage because they work great0:58:24
for that they're actually they really do have tremendous value in doing what they do which is given a key give me the stuff they can do that scalably and they0:58:33
can do that elastically so that's great we can leverage that you can do time travel I want to see the database as of last week as of last month since last0:58:43
Tuesday those are all incredibly straightforward things to do like literally you can say in day Tomic DB dot as of last Tuesday however you say0:58:53
last Tuesday as a date in your language so that's a big deal in addition querying the database the same database against itself at a prior point in time0:59:02
how much of analytics how much of business decision-making process does that a huge amount but if you're using a database that's a place where it0:59:11
everything that happened before has been overwritten you have to reimpose that right you have to manage time yourself you have to put all that stuff back in yourself and you saw that we can we can0:59:21
get these events this event style process triggers automatically we can build it in so I will contend that the database as a value is dramatically less0:59:30
complex than any other approach especially any approach that involves places it's more powerful it's more scalable because we can relocate things0:59:40
and start leveraging scalability techniques we have in those areas and as I'll talk about on Wednesday it's also the source of a much better information0:59:49
model and thanks for your time you0:00:00
The Design of Datomic - Rich Hickey
0:00:00
thanks Alex so much for putting us together it's great to be here again and hear everybody's stories about using0:00:10
closure and also to tell my own so I know everybody's got the same question which is what's going on with the hair0:00:20
and so somebody told me if I grew my hair like this and got a phone booth something interesting would happen but I0:00:29
some found the phone booth but no seriously I'm trying to get into the Foo Fighters but they they haven't called me back so today I'm going to talk about day0:00:39
Tomic it's a it's a database that we built in closure and I really want to talk about the ideas inside a topic it's0:00:48
not really about showing code or trying to sell it or anything and I'm going to talk about it from the perspective three perspectives first what problems were we0:00:58
trying to solve the second what are the solutions to those problems sort of an abstract and third a little bit about0:01:07
how how we implemented those and then a little a little summary so in terms of problems because all the problems turn0:01:16
into the same problem at some point right which is dealing with complexity and when I was first working on closure my first year or second year on closure0:01:27
I found this paper and it was very inspirational it's called out of the tar pit and the authors you know had sort of0:01:38
pegged the the set of problems that I had also pegged you know we're we're dying with this complexity and most of the complexities is caused by state and0:01:48
the way we manage state or fail to manage state and they also identified control as the source of complexity and by that they meant sort of imperative programming or even functional0:01:57
programming being too explicit about control and this paper was really inspiring and actually was the paper that caused me to stick to my guns and0:02:06
make closure immutable by default and all the data structures immutable I may had all kinds of ideas in the air and I was right on the edge but when I read this paper I'm like well0:02:16
let me just do it and so I credit it that's very much an inspiration but there's a lot of challenges in the paper it's it's not a paper with answers0:02:26
really paper that poses a problem statement and they proposed functional programming and declarative programming0:02:35
and declarative programming as expressed by relational algebra as sort of a recipe for getting out of the tar pit0:02:45
writing programs that were easy to understand that we're fundamentally simple that were free of the complexity of both state and control but there's this problem with the paper it gets to0:02:55
this point where there talk about you know there's the state which you're going to interact with relationally and then maybe your program gathers information or produces information then it goes back and like this state it0:03:06
magically appears and it magically can get changed and the paper really doesn't address it at all it doesn't even propose answers it's just like somehow0:03:15
this could get updated and for me that's this is this throbbing problem because this is really where we end up if we adopt a functional programming language language or a functional style of0:03:26
programming we're still stuck with process right we have some constructs and closure for dealing with process but in the end usually process produces0:03:36
information that information has to go somewhere that somewhere is the database and then the program has to get back out of the database and start dealing with it and that interaction of process0:03:45
outside the system and back is really a problem it's you know everybody who's working closer still knows how nasty it is to try to deal with the database so the atomic is about closing the loop and0:03:56
solving the process problem another problem we're trying to deal with is the lack of declarative programming and we've already seen this situation0:04:06
improved now right we have Casca log we have core logic that kind of stuff is really important because declarative programming and logic programming it's just another level up from functional0:04:16
programming it's another further abstraction away from how to do things and a concentration and focused on what you want to accomplish you know what the0:04:25
problem is and right now most people if they do a declarative programming at all they only do it when they write sequel right it's something that's over there these servers know how to do this they0:04:35
have these declarative languages and they're very powerful they're much better at manipulating data than our programming languages including languages like closure they're another0:04:46
level higher you know you don't typically write by hand a parallel hash joint but a database server when given some declarative commands does it for you all the time0:04:56
so there's a lot of value there but again one of the problems with all logic languages traditionally is they have this notion of an ambient basis when0:05:05
we're designing an issue of query against what where is that stuff how did it get there and when you look at it again will it be different and how do you know when it's gonna be different0:05:14
and why and can you go back to something Prolog datalog all these languages didn't really focus too much on the data side they focused on the query side and0:05:23
even the query languages we have again don't really focus on the data side so how can we give this kind of programming a sound basis is one of the problems for0:05:32
trying to solve in addition to the declarative programming being over there we have a general problem with stuff being over there at all0:05:41
client-server programming has a ton of built-in limitations that often you don't really get to see the first is this basis problem right if I issue two queries in a row against a server what0:05:53
has happened in the interim you have no idea you have no way to recover the same basis for asking two questions which leads is it all this fear right you're afraid of issuing multiple round trips0:06:03
so you try to pile on like everything you might want to know into this request the other thing is even to read anything you have to send it over there so you0:06:12
end up with very complex queries in order to in one shot get everything you might need and those queries burden these servers so if you're afraid also of overloading them and and you see it0:06:23
in little ways you don't even recognize like in sequel you'll often ask your query and part of the query is actually answering your question you know which records satisfy these constraints and0:06:33
another big gob of your query is about pulling out the fields you want for your reporting job because you want to get it all done in one roundtrip those0:06:42
things have nothing to do with each other you've just climbed them all into a quarry because that's how client-server works so can we do that better the other thing we're looking at0:06:52
trying to solve is how do we embrace some of the advances that have may have been made recently in making scalable systems that are you know arbitrarily0:07:03
scalable and distributed and a question I had was what's possible how much of that you know the goodness of something0:07:12
like big table or dynamo can we leverage without giving up consistency because I think consistency is really important if0:07:21
you don't have consistency you've now taken on a huge boatload of additional complexity there are definitely cases in which you have to make that trade-off you say I need arbitrary right scaling0:07:32
I'm dealing with huge data I'm gonna have to make this trade-off and that's why these systems were built but these systems are full of cool research and0:07:41
capabilities that don't necessarily have to be applied to the problem that they were designed around so can we can we get some of the best of both worlds can we get consistency and if we do how much0:07:52
scalability can we get at the same time another advantage of these kinds of systems as their elasticity there as a rule tend to not be driven by0:08:02
pre-configuration but they're much more dynamic so that dynamic nature is something also we'd like to get I think everybody who pursues a database other0:08:11
than a sequel database partially is looking for some more flexibility and data representation you know everybody's tired of the rectangles and the rigidity there and0:08:22
people want to do sparse data they have a regular data they have hierarchical things and so document stores have become popular for doing that multivalued attributes are particularly0:08:31
nasty and everybody would like a nicer way to do that I think the biggest thing you want though in pursuit of flexibility is long-range flexibility0:08:40
all right day one flexibility is straightforward I use this document store I can stick anything in it okay that's great that seems flexible but flexibility isn't you know this isn't0:08:49
flexibility flexibility is moving right so now time passes you have to remain flexible and that's where you can trip up because0:08:58
if you if you've encoded any structure at all and that could be a set of tables or what you chosen to put in a document you've now got this structure that your0:09:09
future changes needs to contend with that structure is something that's impeding your flexibility to change your system and to move it forward so I call0:09:18
that provision of structural rigidity does it get into your program make your program harder to change and another0:09:27
thing we'd like to solve is the time problem in general I think databases don't actually deliver the words they claim to they take claim to be storages0:09:37
of memory and to have records but before computers these things had meaning that was a lot stronger than the meaning we deliver typically right when we remember0:09:47
things we actually remember them we don't replace you know our our old phone number with our new phone number in our head in the phone number spot you know0:09:56
memory is something that you keep around and records or something that you keep as well many many systems have to do this many systems don't have to do it0:10:05
we'd like to be able to do this because it really gives you a lot of power you can audit things you can potentially look at different points in time you can0:10:14
really keep track of what's happened I mean how many people have ever worked in a system where it wasn't really working that well and then more data got into the database that it started working and0:10:23
no one could ever figure out why and like no one ever bothered to go back because they actually couldn't I written hands yeah I mean this is what this is0:10:34
what it's like so could this be better how can we leverage time and actually get it get it right and finally the last problem is how do we get how do we0:10:45
incorporate perception and reaction was another big part of the tarpit paper was building systems that were reactive that somehow knew when there was novelty so0:10:55
they could act upon it versus having a poll and this is something that's really difficult to do I mean some databases have stuff built in you can set up an0:11:05
event system or a queue system alongside sort of get-get events but in general there's the sort of inversion problem you have where we're all you know about0:11:15
changes that it's happening over there and if you really want to know the change happened you have to go look over there and you have to keep looking over there and so we'd like to do that without pulling on the other hand0:11:26
ideally perception is is a consistent thing right when the light bounces off of you and barring some really nasty interference what you get is a pretty0:11:36
stable view of what happened and that stability that consistency of perception is another thing that's really important if you want to build a decision-making system on the other end so we'll just0:11:48
again sort of go through the same points and look at how to how to do it this is this is what we're what we're trying to get to we're trying to make applications0:11:57
that are really empowered they're empowered to perceive what's going on in the world there are powered to react to the things that are of interest to them0:12:06
they're empowered to remember anything that's of interest to them and to make their own decisions and the idea behind0:12:16
behind the atomic is can I design a system that does this and what do I need minimally under the hood to facilitate0:12:25
this because it's pretty easy to understand what you'd want in the peers but how what do you need underneath that to make that go and that's glom together right here is something I'm calling coordination0:12:34
services but I'll split that out later so the first thing we have to do is really get coherent about what we what0:12:43
we mean when we say state and this is the this is the problem now we're trying to incorporate process and we know how to do state you know in a program right we have values the value is great okay0:12:53
but then we know time passes and there's another value when you want to put that in a database what are you going to do traditionally databases have replaced0:13:02
things in place I'm going to call that place oriented programming and there's a lot of negative aspects to that we need0:13:12
a different idea we want to try to preserve the tenets of functional programming but what does it mean to have a value that changes well one way to think about it is0:13:21
if if it only changes by expansion or accretion like these can you see the tree rings back there I wasn't sure I0:13:30
also know that those are tree rings so tree rings might they just keep going out right the inside it's still there right it's like a value that gets bigger0:13:39
but it never actually changes in place so this is in between world right there's there's stuff and then I change the stuff that's update in place and0:13:49
like what happened who knows it was all over itself and then there's stuff more stuff that's actually pretty easy to understand because if I knew this and0:13:58
then you said that's like okay this Plus that I have this I can get a grip on that as a value there's a more complex way to talk about this which is the0:14:08
value extends over time and it's you're just discovering it as time passes it's like was it Starcraft like you're moving and you see more of the mat I should0:14:17
just run your business more and you see more of your database it's a way of thinking about it although my wife warned me about the determinism problem of saying that and that upset it but the0:14:30
fundamental idea is that the past doesn't change that gives us something we can hook on to and build on top of it's not exactly a value it's not like0:14:39
42 or 42 doesn't get a little bit bigger but it's a new notion of a value where the core of it isn't its remains intact and all you can ever do is sort of grow0:14:49
it when you think about it this way you realize a few things right away which is that process and process is what I mean by either genuine novelty there's new0:14:58
information in the world or what we would consider updates right that word update is kind of bizarre right if you can have a new phone number you didn't actually go to your other phone number0:15:08
and change some of the numbers right it's not really update we think of it as update because we're associating with the same attribute of the same entity0:15:18
but it's it's actually novelty so process is about novelty when you take this idea of state you realize it the first implication is new State means new0:15:31
space we have to get new space now I don't think anybody with tree in their backyard is worried about the the trunk size of the tree you know0:15:41
over running their house right there's actually not that much novelty in the world relative to whatever is already there I no one's growing their business at 10 X per year or anything like that0:15:51
for an indefinite amount of time so we're gonna say we're gonna accept that process requires new space it's it's a given it's like when closure when I said0:16:02
I want persistent data structures that requires garbage collection you have to sort of make that decision say okay I'm gonna accept that as a given and then design a system from there the other0:16:13
thing that's fundamental is we have to move away from places places destroy this entire idea there's no way to have places and this idea of a value so let's0:16:24
talk about process now so process is about novelty there's neither new information or information has changed right this is the way you say changed it requires four fingers it's the same0:16:37
thing change right the first thing you need to do with process to build this system is to reify it right you need to make a thing out of it what happens in0:16:47
an update in place system right you look at the world and it's like this and you look at later it's like that what happened where is what happened it's0:16:59
gone it's just the effect of a bunch of independent things it never really had a life of its own and so what we want is we want to reify process we want to we0:17:09
want to make it something that we could hold on to touch pass around this tremendous value in making novelty concrete and so we're gonna do that the0:17:20
other answer to the process thing is that you need to find some representation for novelty when people hear about you know new new information requires new space they're like oh my0:17:30
god if I change your email you're gonna save the whole document again well that statement has a ton of presumption in it right it presumes that the only way you could possibly change somebody's email0:17:39
addresses to store their entire document again who says that it's not written down anywhere it doesn't have to be that way so one of the things we want to make0:17:48
sure is when we when we grow our our value and we represent the process which is what's new that that process is minimal that a0:18:00
representation of novelty is minimal it's as small as it can be and one way to do and I'm not saying this is the only way but one way to represent novelty is to say we're only gonna0:18:12
represent data as facts and the only way we're gonna represent process is as the assertion or retraction of a fact and we'll talk a little bit more about facts0:18:21
later and anything else that might happen in the world we can boil down into that we can we can represent that way and that that becomes really small0:18:30
I now assert your email address is this I don't care about your document there is no document and there's no more space required to say that than the amount of novelty for declarative programming I0:18:42
think this is actually the easiest part we've already seen you know this get added to closure and other languages languages like data log we're particularly going to choose data log0:18:51
but there's nothing that says you can have other or more query languages the important thing though is we want this in the application we want to move this0:19:01
declarative power inside the application we also want it to be integrated with the rest of the application as opposed to being sort of a dedicated system that only works about itself we want to0:19:13
extend this programming so that it can apply to the data structures you have in memory and the rest of your program so you want to make queries work on anything and you want it to be extensible to user code so then this is0:19:23
the way it really becomes part of your application so that it can call your code and vice versa to solve the over0:19:34
there problem we have to move the data to to neutral territory right one of the problems of client-server is it's it's based upon a model of hardware that's0:19:44
that's disappearing on us right there were these servers they were really expensive it was expensive to get a machine with big processors and big memory and big disks so it became a0:19:54
really important and rare thing so sending the stuff over to the server made sense now with SSDs and0:20:03
fast networks the the privilege of data access that was possessed by that server is no longer unique to it it can be possessed by any machines and therefore0:20:14
moving the data out of the privileged control of a single machine becomes a critical underpinning to building a system where you have peers and you0:20:23
actually can spread around your computational load and your data access so we want no privileged access this0:20:32
idea begs the question of so what is storage I mean it's not actually a disk there's not a lot of disks that you can connect multiple processes to now there are sands and things like that and there's systems built around that and0:20:42
that's a flavor of this idea moving the data to neutral territory but you can you can encapsulate the idea of a sand and and all of these concepts of data0:20:51
being over there and universally accessible in the idea of a storage service right to some service that has the storage I don't know if it's one0:21:00
machine or ten machines or hundred machines I don't know where it is I don't want to know anything I just want to say give me this block of information it says here you go and those services0:21:10
exist and some of them are very capable so you want to take advantage of these services no dedicated machines no special accessors we can always0:21:20
superimpose that right if for some reason for for works working set you know coherence we want to have machine you know one machine only does the A's and when it only does the Bea's so we0:21:30
get more scalability you can always do that over the data you have to make a machine and a B machine with a disk and the B disk right you can make an a working set in a B working set and a0:21:39
storage service that everybody can access you'll still get those benefits of locality so you want to separate and you know this is the same thing right0:21:48
it's the same thing we saw in closure most design is taking stuff apart and we're going to call the applications peers at the point in time that they0:21:57
have as good access to data as any server ever had their peers in the system and the notion of a privilege server for this purpose goes away so how0:22:09
can we take advantage of these scalable storage systems and redundant storage systems the key is to actually separate reads0:22:18
and writes right as long as you're doing update in place there is a place you're putting stuff there if someone wants to0:22:27
see what has happened they have to go there if they want to see something consistent they not only have to go there they have to go there and stop everybody else right again0:22:37
this place orientation is a killer architectural II it just stops everything you can't do much better than that there are very complicated systems that try to pretend everybody has a writable0:22:47
database locally that's updated in place but they're very expensive and very very complex and not really delivering a lot0:22:56
of these benefits so we want to separate these two things when you separate them when you say writing happens over here reading happens over there you now can make independent decisions about the0:23:06
availability consistency scalability trade-off and that's what day Tomic does it says you know what I'm happy with0:23:15
traditional scalability of consistent systems in other words I like transactional systems I like transactional servers I have yet to build a system that soaked the server0:23:24
I know plenty of other people who are in the same boat right the only way these soaked servers is with queries almost never with rights so if you had a server0:23:33
that only handled rights you would be able to accommodate a lot of business and other problems so we so we segregate rights and we use transactions for0:23:42
rights and we have and we do only that there and then what we do is that we have the output and the results of transactions be immutable by making it0:23:54
immutable we've solved the place problem right if I want to see something immutable do I have to go where it was first made can I go to a copy sure a0:24:04
copies just as good as the original place so we write our stuff someplace where we can make many copies these storage services generally are highly0:24:13
redundant they keep three or more copies and then we can read that Alucard from anywhere at any time so now the0:24:22
consistency problem you don't need transactions to deliver consistency for reads you only need that because you would do place oriented programming when you pull0:24:31
those two ideas apart you can make a different decision so everybody says all you have to do bla you say are you talking about place oriented programming because that's not the only kind of thing there is or it could be once you0:24:42
have a mutable data you can move it around which means also once you've read something from somewhere could you remember it sure any are you gonna worry0:24:51
about that I wonder if this is still good no you don't have to worry about it as soon as somebody told you it's immutable you can cache it relentlessly as many0:25:02
places as you want as often as you want wherever it makes sense so in this way we get sort of both worlds with different trade-offs we have consistency0:25:12
which means yeah we have availability limited to availability for single server solutions which can have hot standbys and the various traditional solutions then we0:25:21
can have real cool modern distributed storage for our reads a little bit of both gives you an interesting system0:25:31
that's consistent and scalable on the read side and then on that query side for the flexibility we're just going to remove the structure right a good0:25:41
example of this is RDF right RDF said we can represent anything in the universe with subject predicate object and they're almost right that's almost right0:25:52
but you just just build it up right what's a fact sallie not doing it right Sally likes0:26:03
still not really a fact Sally likes pizza sounds like a fact it's like ooh that's good that's a fact has she always liked pizza will she0:26:12
always liked pizza now we're sort of seeing that's not actually a whole fact is it maybe she was allergic to dairy and then she figured out how to deal0:26:22
with that and now she likes pizza she started liking pizza at certain point in time then maybe she becomes allergic to Tomatoes and she stops liking pizza0:26:31
facts have time so we have to we have to become atomic but we have to make sure we're complete so we have a our notion0:26:41
of this we just call it datum mostly because the plural of datum with au is data and that's just a generic term that means nothing now so we can have datums0:26:50
if we spell it this way it also makes for a cool trade markable name which0:26:59
will consist of entities attributes values and some representation of time which I'll talk about on the next slide but the fundamental thing is if it doesn't have time it's not a fact it's0:27:10
it's it's as incomplete of fact as Sally you just you need there's something out there ok so how do we deal with time well one one way would just be to put0:27:20
like a time of day time stamped on every fact but it ends up that as you see you'll see in the rest of the system because transactions are serialized0:27:29
they're as good a universal time line as anything else and then so we can instead put on the datums instead of putting the time of day we can put the transaction0:27:38
they were part of and associate the time of day with the transaction as soon as you do that you like if I could associate the time of day with the transaction I might want to associate you know who did it or what process did0:27:49
it or where did we get this information from with the transaction you say you know what there's not a good reason for transactions not to be first-class entities so do it for transactions are0:27:58
first-class and time is just an attribute of the transaction as could be other things and then you get to the sort of the critical point right we0:28:08
talked a couple times about basis you now have a notion of a basis for computations which is the database value0:28:17
at a point in time if we can figure out how to deliver that we really can deliver the time promise and it was an objective of mine in this design because0:28:27
I've made a lot of systems in the past that had some manipulate time they had to keep everything they had to keep the time of everything anybody ever write system like that right you start adding timestamp those were ever how is that query for finding0:28:37
the latest vert value at the now like that query as that per for how many people have tuned that query0:28:46
yeah that's hard that's really hard and then once you built a system that has this what do you end up with often right0:28:55
so much of your system needs now right so you build a whole bunch of system that's now right because now is like you can't say what now is you're just some sort of I want to ask your query I want the ant the freshest answer so now you0:29:05
have processes and and methods in your program or functions and you have stored procedures right they're all like the now ones and then so he says okay well0:29:14
we stored all those time stamps so that we could do historical work and and auditing I want to ask those same questions as of last week and you're0:29:24
like now I need another version of all of that parameterised by time and I have to go into all my joints right and and flow around that time basis I've done0:29:36
that I think that's incredibly difficult to get right and does not perform well so the idea is instead of saying make a time a parameter of everything and if0:29:45
instead you could say look just give me the value of the database as of last week if there was some way to do that then the same queries and the same programs could just be handed last0:29:56
week's database and would work the same and that problem about that bug we had before we had a certain amount of data we could go back and look at it when was that happening it was happening at 110:30:05
o'clock give me the database at 11 o'clock let's see it oh look there's a problem here's the problem in our code it's fixed the next time that won't happen to us so0:30:14
we're gonna shoot for that as far as perception and reaction the idea is we've reified process right whatever we say we're however we're gonna represent0:30:23
change we're gonna be able to push it around and that's all we're gonna do it's gonna be data and we're gonna push it right once I pushed you some data you've got a query engine that can query0:30:34
arbitrary data if you want to react to specific things what do you do you just query the novelty and you can do that it0:30:43
ends up that the novelty alone isn't enough to answer all the kinds of reactions you want to have like you might want to see this stuff was new and what was the basis for it or this stuff was new and where did it take us so it0:30:54
would be nice if we could get the values of the database before and after the process event so that all sounds good how do we do0:31:06
this this is the overview of the architecture and this is the full production system you can actually0:31:16
because it's all defined in terms of protocols and whatnot you can fold this up and run it in a tiny in-memory thing but the logical model is this your application is a process it's going to0:31:27
incorporate a library which is going to empower it to be a peer that library is gonna have some communications bits obviously because it's got to talk to the service parts of the system it's0:31:38
gonna have a query engine I'll talk a little bit more about the live index and caching but that's that's goes in your application then there's this dedicated0:31:47
machine or it could be a set of machines acting in standby for each other called the trans actor whose job is only to handle transactions and rights all right0:31:57
so so we're gonna split reads and writes apart and we want a consistent world so we want transactions so essentially what a trans actor is it's like it's like a0:32:06
database server you took away their query you took away their well they kick into internal Cori but you took away their serving queries for others you took away their serving reads for others0:32:15
you took away their having to manage story explicitly with the storage explicitly and what they were left with is handling transactions and some occasional background indexing and so0:32:25
that's all the trans actor does it just coordinates transactions it accepts all the transaction requests serializes them and puts them in storage and it finally you had this storage service dynamodb is0:32:37
the first one we we support but it's it's an example of a storage service you can look at a lot of things you know there's the real guys out there I hope someday we can work on top of them they0:32:47
have a really good storage service right it's a service that runs as a cluster it's redundant it's distributed it has all those great properties of dynamo the0:32:57
idea here is if you built it on a service model the interface to the service as well talked about is actually pretty small you can make independent choices again you're moving away from0:33:06
something monolithic you're gonna end up with more choice choice about location choice about scale choice about price but dynamo is an example of a storage0:33:15
service and we put off all the actually sticking bits on disks on this service and the peer as you can see can directly0:33:24
read back from the service so the trans actor will put stuff in storage the peers will read it right directly from stores they don't read it back from the trans actor the only interesting thing0:33:34
that trans actor also does is you'll see this line back from the transactions to the peers it also reflects the process it reflects the change back to any0:33:43
connected peers and we'll see how that works in a second so that's the overview of the architecture so the first problem we encounter in trying to represent this0:33:52
is how do we represent this immutable expanding value what's the data structure for an immutable expanding value it's tricky right the first thing0:34:04
is it has to be organized right I think a database you know why is the filesystem not a database or is it a database I know most people when they0:34:15
buy databases they're like oh now I can replace you know XFS with this database or vice versa usually the database adds some more value to give me whatever stored at key0:34:26
or file name X it's organized it's organized in such a way that it can help you leverage the information you have one way to leverage the information you0:34:35
have is to support queries for instance so we want organized state and we want to be expanding so one representation of0:34:44
that is a sorted set of facts meets this criteria right your email was this your emails now that because those have x we can just add more facts right you change0:34:55
emails you like new foods just add so we have the additive process because facts have timestamp they're going to be different from each other but they have to be sorted in order to0:35:05
support query what everybody has discovered in the past and BigTable is an example of a solution to this is that sorting live into storage is a bad idea0:35:18
it's too expensive has tremendous overhead for the actual right volume the coordination and everything else and it's like place oriented programming at its worst is0:35:28
actually maintaining a sorted index live on desk so a big table the design is really simple it says you should have a bifurcated system you should accumulate0:35:37
novelty in memory and periodically merge it to disk and then you can keep a sorted version of things on disk and0:35:46
BigTable does this with big flat files and what they say is in order to see a sorted view of the world at any point in time you're gonna have to merge you can emerge your memory view with your0:35:56
periodically created storage based view in order to get a coherent view of now so this merging is an efficient way to0:36:05
do this and BigTable is a great example and was an influential paper for this design so you want to cumulate novelty in memory you want to occasionally merge0:36:14
it into storage in an assorted manner the difference between what the atomic does and what BigTable does is the atomic uses trees unsurprisingly he uses0:36:24
because what we wanted memory is not some mutable thing we want an immutable thing we know the right way to represent that is with the tree similarly in order0:36:34
to get the caching and addressable characteristics you want from putting Storage putting your data on storage that the peers can access you really0:36:43
don't want gigantic flat files right it's hard to say I've cached this portion of this flat file and it's hard to do multiple indexing jobs and have any portion of that caching you did0:36:53
still be valid because the files sort of slid around if instead you store a tree of nodes in storage then I could remember this node and if that node0:37:02
hasn't changed in the new indexing job I can still remember it so we use persistent trees in memory and persistent persistent or durable0:37:12
persistent trees in storage and basically implement the same idea accumulated in memory periodically merged into storage so what goes in0:37:23
storage obviously novelty comes in that has to be recorded right away if you want to meet acid it has to be durable0:37:32
before the transaction returns so there's a log created of just the asserts never track it's again unlike traditional logs and0:37:41
that it's not one contiguous file again that whole idea of one contiguous file that's super oriented around spinning disks on one machine that's ending it's0:37:51
just over there's no reason to do that a better representation is a tree again for the same reasons caching locate0:38:00
ability and things like that so we log directly into storage the other thing that's in storage are these index trees and there they're covering in other0:38:10
words a H datum is entity attribute value and transaction every index is a sorted view of all of that so it's not really an index like an index of points0:38:20
of something else a covering index has all the data you need inside it so it's basically multiple sorted sets of the data sorted in different ways as trees0:38:30
because we want this to be to work with services you want to minimize the the footprint API you use to talk to storage0:38:39
and so in in our case what we're fundamentally based around is a storage that essentially implements key value right we're not going to use the values to store you know pizza0:38:50
all right that's too tiny we're gonna use the values to store entire segments of our index trees so essentially we're using it as keys to blobs of index0:39:00
storage much the same way that a traditional database would store blocks of its b-tree on the file system as blocks we store blocks in the key value0:39:10
store but it ends up there fully implementing the state model requires something a little bit more than just give me what set this key and value potentially inconsistently which is the0:39:20
way a lot of these key value stores work in order to emulate in Adams which is part of the model we need consistent read optionally from these storage0:39:30
services need to be able when we required to ask for something and make sure we get a consistent value back and a lot of this systems can do that you can just you know by turning up the0:39:40
number of reads you issue on a request you can ensure that what you get back is consistent the trickier thing which is less readily available in key value0:39:49
stores right now is conditional put and this is used by the atomic storage engine for the moral equivalent of pods which I'm so reluctant to see that say that word but0:39:59
effectively that's what de Tomic has inside about you in order to accumulate log you need something that allows you to build up the tail of what's going to become the new part of the tree if you0:40:08
look at the inside of closures arrays vectors that's how they work right they build up a tail and only when the tail is full do they pay the cost of merging it into the tree so you need something0:40:18
like that and in order to do that potentially safely from dealing with multiple trans actors succeeding each other you need conditional put this is0:40:28
something that DynamoDB has it's something I'm looking forward to being present in more of the key value stores so that we can work on top of more more of those so index stores look something0:40:38
like this you have to read the details but it's a tree right up at the top there's a route that has you know the entity datum sorted by entity then attributes and value then time then the0:40:47
same datum sorted by attribute that entity than value that's been time potentially a reverse index for any reference based attributes we also do0:40:56
leucine we keep in the same way and then there's a three level tree a root an intermediate set of directories and a0:41:05
set of data segments inside each data segment are sorted datums and it's those segments and those tree segments that we put into storage entire blocks like that0:41:16
that's the granularity of what gets stored a database value is essentially a pointer to a couple of things it's a pointer to the live tree in memory which0:41:26
represents everything that's happened since the index on disk was made so you have a persistent in-memory sorted set0:41:35
that you merge with the one off disk so you have a pointer to the memory one and a pointer to the disk one there's a lot of other subtleties in there with0:41:44
history and things like that but the fact is the disk base one is lazy right obviously you're not going to pull the entire universe off storage into your0:41:53
process if you don't care about it so effectively you have a proxy for the storage engine sitting there saying if you ask me for something I don't have I'll go get it I'll pull it into cash0:42:03
and then manage your working set in cache and the cache is hierarchical I'll talk about that I think in a second well how do we do process itself well one of0:42:12
the things is asserts and retracts can't actually express everything right I can't increment with assert if it was 42 and I want to make it 43 I can't just I0:42:23
could do the calculation somewhere but I can express the calculation by saying 43 so we have another notion of what what0:42:32
constitutes part of the input to a transaction it's called a data function it's actually a function of the entire database inside the transaction plus any arguments and what it outputs is a0:42:43
another transaction segment so a transaction can consist of assertions retractions and data function calls and data function calls in turn what will be0:42:54
passed the database and can return assertions retractions and data functions and we expand and splice in the results until it's all over starts and retracts so it looks like this right0:43:05
you can have a transaction that was a certain assert retract a call to the data function foo and retract a servicer and then foo could have expanded into0:43:14
calls to data functions bar and baz and bar and baz could then in turn expand eventually into assertions and retractions was this look like macro0:43:25
expansion that's right this is the process version of macro expansion but it's really cool because you end up with these primitives you do end up with a primitive representation0:43:35
of process on which you can build something that allows you to do transformational updates we talked about0:43:45
the trans actor right it has a couple of jobs it accepts transactions those transactions look like what I just showed you it does the job of expanding them it will apply them to the in-memory0:43:55
version of the database if nothing bad has happened and that's a successful transformation of the database at which point in time it will log it to storage0:44:04
and broadcast it back out to the other Pierre saying this happened what gets broadcast are the final assertions and retractions so the peers don't have to do all the0:44:13
transformations again they just get the answers to them an inch and periodically the same trans actor will do indexing that's something we could move to another machine but right now the trans actors0:44:23
do it an interesting thing about this system now again we've seen parallels I would hope constantly to the memory version is that indexing right is going to create at least a new route probably0:44:33
some new directory nodes and definitely a whole bunch of new data segments what about the old ones they're not getting updated right so where are they they're0:44:44
still sitting there there now garbage so you now have garbage collection in storage it shouldn't be surprising you have analogous things happening in0:44:53
storage that you had happening in memory and there's no problem with that so there's garbage and there's garbage GC I'm storage based GC that that cleans it up later when no one ever cares about0:45:03
that root anymore it can go away or on a time basis declare a program and we do by embedding data log a couple of things0:45:12
about doing data log well for this purpose especially for having a different basis it's typical of languages like data log and Prolog that the database is kind of ambient like0:45:21
you're just writing these things and it's as if the database the story is there but that's essentially like global which means you have no way to like talk about I wanted issues query on this data0:45:31
so we have to make sure to make sure that both the data sources for queries and the rule sets that are used in queries are arguments to queries they're0:45:40
not ambient and not global right when you ask a query of sequel do you get to say on what version of the data do you run no it's like whatever's happening right then your query runs right then in0:45:50
the middle of the server whatever is happening it's always now so if you wanted to be other than now it has to be argument so they're arguments we've extended data logs so it can work with scalar values and collections and memory0:46:00
and things like that and you can you can extend it with your own code so now we don't have the over that problem now we have over here right we can directly access storage we have our own query0:46:09
engine we have this live index right when the when the trans actor pushes process back out to us we update our own in memory index so just like the trans0:46:19
actor is accumulating an in memory index and and then eventually it merges it with this storage one a peer is accumulating the same in memory index and referencing the same0:46:29
one until the transactions trans actor says I've made a new stored one and you can drop what you've been accumulating in memory and move over to that as your stored one and we start from scratch0:46:38
here just build up a window of stuff merge it then drop it build up a new window stuff merge it and then drop it that's how it works inside the peers0:46:48
there's a two tier cache right eventually end up with a ton of datums and you can really overwhelm your Java heap by having like gazillion objects as0:46:58
your cache so we instead have a two tier cache where stuff we've already pulled from storage we actually keeping the lowest tier of the cache as compressed0:47:07
indexed segments that look to the Java heap as if they were just byte arrays and only in a higher level of the cache0:47:16
- we turn them into you know thousands of objects so there's a level that's thousands of objects or millions of objects and in a level which is compressed indexed not yet expanded guys0:47:26
that's faster to just expand them and to go across the network to recover them so you have these two tiers ones on heap and one is potentially off but at least0:47:35
is not high pressure so we're now starting to get the consistency and scale stuff we were looking for the rights go through the trans actor it has a traditional model of scalability right0:47:45
you you can scale as big as one machine can scale but that server is no longer burdened with any query load or or read load it's not indexing live on the same0:47:57
cores that it's using to serve transactions and it has a traditional availability model you run hot standbys and they're easy to run because they're0:48:06
reference data is not like a live connection from the other server it's the storage so it's very easy to transition from one to the other and you have these complicated shoot the other0:48:17
guy in the head relationships between streaming servers the immutability is all we need to support consistent reads so we get consistent read but we also0:48:26
get scalable consistent read because we can now use these highly scalable redundant storage services which is really a great combination that's sort of the what day Tomic is about is0:48:36
getting this combination of features if you don't need transactions it may not be the right thing but if you it's an interesting combination and query scales with peers right your0:48:47
computational load is now and appear if you want to run a long-running query and you have your own box okay that computation is not interfering with other computations so having a set of0:48:57
peers that just do analytics is not really in anybody else's way which is part of the the objective in terms of the flexibility side what do we have for0:49:07
schema well we boil it down we said the whole we're going to store these little facts and in fact that's that's the end of the representational structure there's no other structural things0:49:16
there's datums that's what we store so the only thing you do have to encode are your attributes and it's important that you do this right there are systems that say just stick in whatever you want just0:49:25
put any text we'll just take all your text and they redundantly store all the text they were done and you know they take all your typos they don't understand the types of things it's not really that great as a database it's0:49:35
worth spending this much effort to say you know what the name attribute is a string and the time of day attribute is a date and the other thing is is a time and so you have to tell us the name0:49:45
I don't even put type on here they even type and then cardinality cardinality is a really interesting problem right think about your typical database how much0:49:54
different is it to say fred has a new email and fred has a new friend it's different right0:50:03
why is it different I don't know I don't think it's a good idea that should be different I think it should be the same so that's what we support you tell us0:50:12
what the cardinality is and if you if you tell us emails cardinality one when when you say fred has a new email that's the email we're gonna return when you ask if you say fred has a new friend that friend will be included in the set0:50:23
of friends we returned when you asked and that's how I think it should work think about a relational database what happens with those two things what email may go right in a record but friend definitely has to go in a different0:50:32
table right even in a document store same kind of problem right email is like directly in the attribute a friend is in what I don't know if it's JSON it's in a0:50:42
list right which means that making that change to add a new friend means changing that list that can become precarious and that's why things like0:50:51
Redis have smarter primitives for that that's ugly if you don't do that better another notion of an attribute is whether or not it's a component right0:51:00
your arm if it was in separate entity is a component of you your grandmother is not the grandmothers is so both are entities possibly depending on what your0:51:09
what you're doing but one is definitely part of you if you were to go away your arm would go away too and the other is a relationship so we want to know that we deal with uniqueness we also have a way0:51:18
to talk about things by name which matters in fact if it gives you multi maps so for time we talked about this a0:51:28
bunch of times but the database ends up being a value you can issue queries across multiple databases or a database in the past and database from now but the critical thing is you can take a database and say given this database I'd0:51:37
like to see this database as of two weeks ago once you do that you have another database value you can place into a query and get answers as of two0:51:46
weeks ago similarly you can do sense a point in time the other thing we can do is we can do as as if right you can ask for this database with this transaction0:51:55
applied not going through the trans actor it's just a value transformation what would this database look like if I issued this transaction on it let you do what if stuff without interacting with0:52:05
the rest of the universe or interfering with it which is really a big deal and there's also an efficiency thing here and how we deal with time and that we remove the past any retracted pass into0:52:14
a separate index so it's not in your way finally for perception reaction this is actually straightforward now right we saw there's already a live feed of process from the trans actor to the0:52:25
peers you can basically just tap into that in your program and say I'd like a queue of those process events and then you can take them what you passed0:52:34
actually is the process event the database before it and the database afterwards and you can destroy any queries you want in order to you know trigger any activity you find0:52:43
interesting so to summarize I think the state time identity model should be familiar to0:52:52
this audience it really I didn't set out to make it the same I want to make sure that I just answered these questions it0:53:02
ended up being the same which i think is great but but obviously state time identity is not enough to solve the database problem you0:53:11
need this process and so that's what really what you add you had a reification of process and ability to manipulate it an ability to get values of databases from a technical standpoint0:53:20
that dynamic merge thing that BigTable does it's essential you cannot store persistent data structures on disk live you can try it I mean you can0:53:31
technically do it it will always perform terribly and use up the ton of space so it's a good idea that we copied the other thing is just again how many times0:53:42
I say immutability in this talk mutability rocks it's one of the things that's very interesting about it is you cannot represent change without it you0:53:51
can't correctly represent change without immutability it's a profound idea it's not my idea I think it's just some essential characteristic of the universe0:54:02
but it really needs to be recognized in our architectures and so if I had any recommendation to you at all it's just if you think about designing system and you're not sure you can answer all these0:54:11
questions in the forward direction choose a mill immutability you could almost back into a little bit more than 50% of this design just by having taken0:54:21
immutably as to constraint and saying oh my god now what am I gonna do I'm not allowed to change this I better do this it'll keep forcing you into good answers so if I had any sort of architectural0:54:30
guidance from this it's just do it choose the mutability and see where it takes you so that's it [Applause]0:00:00
The Functional Database - Rich Hickey
0:00:00
thanks for coming this talk is about the functional database Mike it's it's there0:00:11
we go it's green this talk is not specifically about des Tomic but the0:00:20
atomic is the example I'm going to use you could substitute any functional database you like so the first question I have is how many people here do0:00:29
functional programming how many people do it in a functional language and people do it in a non functional language all right many people are0:00:40
skeptical of functional programming it's okay all right so why do we do it do we0:00:50
do it to stop us from accidentally doing IO during computation many people have written you know some math library and accidentally talked to the internet0:00:59
while they were doing it and they need monads to keep them from doing that nobody that's right that's not why we do functional programming I don't think one0:01:13
way to think about this and think about functional programming and what our programs do is to try to distinguish the parts of our programs that are0:01:22
computationally oriented and the parts of our programs that are like machine oriented they're like little machines and I think that when you think about0:01:33
languages and you think about programming you can think about languages being on a you know a spectrum right there's assembly language which is sort of like talking directly to the machine and it feels very much like make0:01:43
the machine do stuff and then there's C which is also still very squarely in the category of make the Machine do this then make the Machine do that that access this part of the machines memory0:01:53
and do whatever and and some of those programming languages are like that because the people using them are building things that are substantially like machines like operating system0:02:02
kernels and things like that but but that's really hard to do and it's extremely hard to do well and so when we consider higher-level languages we0:02:12
measure them sort of by the extent to which they keep us from having too think about things in terms of make the computer do X because just just because0:02:21
the computer is a machine and it is one it doesn't mean that our programs should consist of machines right where machine is something stateful that goes from0:02:30
this to that it's doing this and then it's doing that and then doing another thing and there's a sense in which objects are like little machines and I mean that in the most pejorative way0:02:40
possible right that's what they are and and it ends up that that's really difficult to think about right and0:02:49
especially when you think about combining different machines you know things that are moving around it's very difficult to reason about so the whole life benefit package I think the0:02:59
immediate benefit package from functional programming is it allows us to choose something other than machines as the model we use in our head for thinking about what's happening in our program it gives us alternative to0:03:10
building computations directly out of machines and there and there and the recipe for functional programming is basically combining values and functions and I'm not going to dive off into a0:03:21
talk about values except to say that when I say value I do mean something immutable how many people were in Eric's talk earlier right so immutable means not in the IO monad it's just another0:03:34
way to say the same stuff it's something that can't change and everything that he said and all the premises and all the math that he showed you doesn't work if0:03:43
any of the arguments to any of the functions are mutable in the way that we consider it to be true so it ends if that immutability is quite important but when you combine immutable values and0:03:52
functions which take more than one you know a value or more than one value and produce another value and always produce the same value as a result of given the same arguments you end up with something0:04:01
that's more like math and that's very much easier to reason about than raising a reasoning about machines that are moving around in different states there's a sense in which mathematics0:04:10
doesn't have any notion of time and and or state that doesn't mean that there aren't machines in our programs at the bottom of even the coolest functional0:04:20
programming constructs are little machines right you can't implement laziness in a functional language without a little machine that does cash for you but that doesn't mean that's the0:04:30
abstraction or that's the level at which we want to be working most of the time because we don't it's far too complex we all like doing that right everybody who's implements a really cool data0:04:40
structure or cache or Q or something like that has written a little machine but we want to isolate that work and the kind of the amount of time we spend0:04:49
doing that kind of work to the corners of our applications where that's necessary which is very very small corners so it's more like mathematics0:04:58
and and the bottom line is it facilitates reasoning I'm not talking about any kind of formalism here except thinking all right let's just think about our programs and the easier our programs are to think about the better0:05:08
our chances are of getting them right so if we're going to have something we're gonna call a functional database I think it has to meet a couple of requirements the first thing is it has0:05:17
to provide a database as a value right and and I'm gonna go even further than immutability in a minute and talk about persistent data structures but it really needs to provide the database as a0:05:26
persistent data structure something that's immutable because only then can you get the second part of the recipe for functional programming which is you0:05:35
should be able to write functions that take database a database or databases as arguments and maybe even return a database as a return value if you can do0:05:46
these things you have a functional database in addition if it's going to be called the database it has to be able to do the other database stuff right ask to0:05:55
be durable and I'm going to distinguish that from persistent persistence is a sort of a characteristic of data structures immutable data structures in memory and durability is you know it's0:06:04
been put somewhere so that if every machine was turned off we turned them all back on we could find it again consistency is another thing we seek in0:06:13
databases who don't always get leverage is an important property of a database right database isn't just a bag into which you've dumped stuff and every time you want to do something you have to go0:06:22
through everything in the bag right if you're doing that then you're not getting any leverage it's just a pile of bits so databases have to give us some leverage and the other traditional characteristic of a database although0:06:32
it's not always present something we desire most often from databases is that they're shared and this is where it gets tricky right because we have lots of constructs for dealing with immutability and state0:06:42
management in memory in process in a particular programming language but the nasty thing about a database is it's outside of all that and there's more than one process possibly in more than0:06:52
one language and definitely with more than one idea of what's going on sharing this thing so so far databases have been treated like giant shared global variables since0:07:02
all the all the values we seek in in choosing functional programming are not present for databases normally I'm gonna make the claim that a functional0:07:11
database value right so the value that you get when you look at a functional database should be in a creative thing that means that it should be something0:07:20
that grows as more information enters your system that it accumulates everything that's happened and I've given other talks about why that is essential for an in from information0:07:30
model I also think it's sort of an essential implementation detail of doing this right because your alternative is something like snapshots and then you have the whole question of how do I find0:07:39
a snapshot corresponding to a particular point in time and can I ever do anything that crosses time so I'm going to be showing you some things today that that are benefits of being a creative in0:07:49
addition to being benefits of being functional I think they go together in any case there's no doubt if you're going to have a value of a database it has to be immutable all right you can't0:07:58
do any functional things with it so so far I've been saying the word database over and over again and I think there are two notions of a database that we normally don't disentangle that we now0:08:09
have to start doing and if you've seen my talks about objects you know this is the same kind of thing right we need to separate out the identity of an object from the state it takes on at any point0:08:19
in time and when we do that we can start writing functions of this state and we don't get into trouble you know trying to manipulate a machine directly so this is notion of a database system right0:08:29
there's something that's going to go and facilitate multiple processes interacting with some shared data set and being able to get values out of it0:08:38
and grow it by putting more information into it and that is actually going to be a machine that's going to be some process that works a lot like a machine right there's a conveyor belt stuff gets taken into it it bakes some new cakes it0:08:49
gives out new cakes to everybody else and that's where the identity part of the system is going to live right there's some database that's called our customer database when we know the idea0:08:59
of our customer database persists through time but the values our customer database takes on change over time right we get new states of the database usually there's more and more customers0:09:09
in it it could be the case that when we lose a customer we erase them but more and more we're finding that businesses value information models they don't want0:09:18
to forget things that they knew because there's a lot of value in doing time-based analysis of what happened does this person change their email address all the time or move all the0:09:27
time well if every time they give you a new email address or a new address you erase their old one you'll have no idea so as your supplier change their prices all the time I don't know I just take0:09:36
the price and I overwrite the old price you can't learn as people we use time-based decision-making all the time yet we build systems that are supposed0:09:45
to be information systems that forget what happened so we want to accumulate stuff inside this thing and there's going to be something that coordinates that and then this machine is going to0:09:55
deliver to us when we look to perceive the database values and these database values are the things we're going to use to do computation disentangling this is0:10:05
sort of the essential job of building a database that's going to be more functional this talk is primarily going to be about this second part but let's0:10:14
look at what this looks like so this is the traditional thing where I'm going to say the entirety of a database is just one thing just that first thing I talked about the database as a machine this is0:10:23
the kind of database you're usually dealing with right there's something that get novelty into it maybe people are doing transactions against the database and then if you wanted to do some sort of computation you know issue0:10:34
a query or something like that you're going to make up a request which is you know something I want to have run and you're gonna have to pass it in to the database process and the database0:10:44
process is going to execute that in this black box context inside the middle of this moving machine and give you back some result I think it's really critical0:10:54
to think about some of the issues associated with that for instance what's allowed in this function right what are you allowed to do well you're allowed to do whatever the0:11:04
database system says you can do in whenever language they support maybe it supports sequel maybe it supports JavaScript maybe it supports nothing you0:11:13
know what you can do is do HTTP gets or something like that the other question is can you get reproducible results if I send the same request again later well I get the same0:11:22
answer generally in a system that has the database as a process and doesn't disentangle the state from the identity the answer is no you don't get the same0:11:31
answers over and over again and the other tricky thing about the the machine bringing the thing into which you send the computation is how do you how do you compose that how do you make queries0:11:41
that manipulate more than one database as it's at a time but you have to make because you're sending the request into a particular set of data how do you make0:11:50
that request deal with this set of data you make all these linking things or some nasty thing or you have to copy the stuff out into another place where you need to federate it or make some external thing to to do that job0:12:01
so we'll contrast that with a functional database process where obviously this little same thing right it's a creating novelty different different processes0:12:11
have said oh I learned about a new customer here here's the new information and then instead of doing anything else the only other thing we get out of this0:12:20
is the ability to say let me see your state let me see what the value of the database is and so this machine will dispense values on demand here you go0:12:29
here's the value of the database so the question here is well where's the computation I showed you more stuff on the previous slide what where is it here and and the answer is it should not be0:12:40
here this is a machine this is a nasty moving thing right you play with machines what happens you get hurt you stick your hand in the machine so it's0:12:49
bad right composing machines it's just it's difficult so we want to separate computation from process there is process right there is some state here0:12:58
we're gonna move on we're gonna put stuff on disks or in some sort of storage but it doesn't mean we have to co-mingle that with computation so we're gonna have functional database0:13:07
computation is completely orthogonal to functional database processing right once we've got the values we can write functions0:13:16
we can write a function and and the arguments of the function is not some ambiguous who knows what's inside the database while it's running it's a clear thing there's a database value we're0:13:25
gonna pass it to a function and we're gonna get a result out the cool thing about having separated this is that there's no problem having a function that manipulates more than one database0:13:35
value that's just a function of two databases and there's not a problem writing a function whose result is another database value all right so we have a function of two databases that0:13:45
produces a database this is not a problem because computation has been split out presuming we can get something that looks like a value of the database0:13:54
I've given other talks about how to do that architectural II so for the purposes of this discussion you're gonna have to presume that that's possible because it is so there are a bunch of0:14:06
value propositions of values that we'd like to get from having to you know done this and what's the point it's not just about to say oh I have a functional0:14:15
database you know and where's my t-shirt it's about you know getting some benefit and and we're looking for a lot of the benefits we get from using values in our0:14:25
programs right the first thing is a database is just data and data is language independent right we're not talking about objects with methods we're0:14:34
in a particular language with an interface or whatever we're talking about information information doesn't have a programming language right what's the programming language of information there's no such thing it doesn't even0:14:43
make sense right it's just information so it's just data that's language independent the other thing about data is that it composes and aggregates two data0:14:52
when you combine two pieces of data you get another piece of data when you combine two machines you get trouble right so that's a value proposition0:15:02
we're looking to get there are other value propositions you get from using persistent data structures when you use a functional programming language so persistent data structure is a way to0:15:12
implement an immutable data structure something big like a collection such that you can give that collection to0:15:21
anybody it will never change but if somebody wants to make a modified view of that color and they can and it's inexpensive to do so in particular it's less expensive than copying the whole thing over that's0:15:32
the benefit proposition of persistent data structures that's what you get from functional programming languages most of the data structures are persistent data structures and so what we want is the0:15:41
database to behave as if it was a persistent data structure so in spite of the fact that there's you know there is in fact in the implementation I'm talking about there is a persistent data0:15:51
structure in memory plus another persistent data structure in storage and they get merged dynamically I can treat that entire thing as a persistent data structure and if I want to make a small0:16:01
incremental change that's local to me I can do that and not impact anybody else because it's a persistent data structure and some value and and my changes aren't really changes they're their local trees0:16:13
with shared structure with the rest of the stuff so some of the benefits we get from using persistent data structures are freedom from worrying when we alias0:16:22
right if you pass around copies of machines to people or point just to machines to people you have this immediate problem right which is coordination of activity versus with0:16:31
that thing because it's moving around people are issuing requests to it that causes it to move in different ways those requests can collide it's almost impossible to obtain the state from the0:16:40
thing so you worry a lot when you don't have immutable data structures about sharing them with anyone it's like you know all the defensive copying right0:16:49
what a horrible term that is why do we do that so when we use persistent data structures we're free of that worried aliasing is is cheap and and and0:16:58
completely free of worry because it is immutable you can't interfere with each other and we also want to get these benefits of efficient incremental change so the0:17:12
other thing we get from functional programming is you know we revisit that old perlis quote which is it's better to have one data structure and a hundred functions that manipulate it than it is0:17:22
to have ten ten data structures with ten functions each and that's super true I think people are learning in very tiny increments but I believe that quite0:17:33
strongly and so another benefit of having inverted this thing instead of saying here's my computation and must comply with your rules go let me do this please0:17:42
you turn it inside out you say give me your state then what can you do with that you can do anything you want de Tomic happens to support data log queries but you could write other Corre0:17:52
languages that manipulated the same data you can directly access the indexes and write you know kind of ordinary you know functional mapping code across0:18:03
across the data you can you can get a view of the data if it happens to be hierarchical as if it was a set of entities by manipulating data and by0:18:12
walking through data structures not by having to create and fabricate objects or do other kinds of or I'm silliness there is in fact information there it is0:18:21
hierarchical there are connections between the things and you should be able just navigate that all is data without any kind of additional stuff so this is another expectation we get from0:18:30
having separated values and functions right we have a value we can write a whole bunch of functions right you a new function every day if you want that can manipulate this value because the value0:18:39
is it's not opaque it doesn't incorporate code it's basically exposing itself and it does so because it's immutable so what0:18:51
can you what can you possibly do in your function you can't hurt it right so have at it write as many functions as you like and that's the kind of approach you want to have it should be an open thing you have this information why should you why0:19:01
should you be limited in how you consume it another really interesting property of persistent data structures and functional programming is that speculation is done completely0:19:11
differently than it is in an imperative context all right so if you want to have a what if scenario and you have a whole bunch of data structures and an imperative you know object-oriented0:19:20
program and and you wanted to say what would it be like if I you know moved everyone from here to there or if I gave everybody a discount and then they told their mother about it and then what would you know what would happen and you0:19:30
know you just have to like copy everything and go into this alternative universe if that data happens to be in a database there's almost no way to do it or you can't put your speculative data0:19:39
in the database you know maybe you can try to abuse temp tables or something that's really not the same but when we do functional programming or try to write things that do like speculative0:19:49
tree walking right we can keep you know incrementing and incrementally enhancing our data structure and if we get to a point in the tree where we say this is the dead0:19:58
end what does it take to backtrack well you just drop that value you don't have to undo it you don't have to like refix your state or reset everything that you set along the path you just drop it0:20:09
so speculations of being a really important thing in functional programs and when you can do speculative work with the database that completely changes your life so I'm going to show0:20:19
you in a minute a capability of diatomic which is something called width which takes a database value and some transaction data that you would normally0:20:28
send it in a transaction and get a new database value but you never did it like it's just a function of data a database value and transaction data and it gives0:20:38
you a new database value but you don't have to send it over the connection and have everybody see it and when you have that it means you can do very very interesting things the simplest thing0:20:48
you can do is you can try before you buy right I want to do this transaction anybody ever seen somebody issue a transaction and then you know wish they hadn't yeah everyone's done that right0:20:59
well imagine if you could have tried that transaction first and seeing how it worked out in fact in seen how're your reports looked and seeing everything about your system as if that had0:21:09
happened but you didn't have to do it in a way that anybody else could see it just really changes things and then we've had people using the system to do0:21:18
that kind of tree propagation work where they had a very sophisticated change they wanted to do where it's it's hierarchical and there's some parent and you have to calculate something over the0:21:28
tree and calculate a new value of the parent and then recurse into each child and they need to see the decision that was made by the parent and they sort of need to see a new view of the world and0:21:38
you can actually flow the database through and have each branch incrementally enhance it and then issue queries against that what-if database to make their decisions about what to do0:21:48
and collect all the changes they intend to make up back up when you return and say ok hey that all worked and B here's0:21:57
a set of things I need to transact against the database to bring it to that new state or you can discover something about it didn't work and and make a different decision0:22:06
finally again this is not strictly a property of functional databases but if you believe me that they must be a creative that is they must accumulate0:22:15
new information and not ever get rid of information then you get a whole other bunch of great things that you're used0:22:24
to having right you will use again how people use get how many use people use a directory in their on their file server for source code and you just like0:22:34
overwrite stuff I mean then we have who does it now yes we don't do it anymore0:22:43
right because we see so much of a value proposition in keeping what had happened before so if you'd have an accreta value of a database that means you have all0:22:52
the history in any particular value of the database which means you can go and pretend it was an earlier time or you can see something that's happened in a particular time range or you can0:23:02
actually do queries that cross time like how many times does somebody moved or how many times have they changed their email address or what's the frequency of change of our suppliers pricing to us0:23:12
right and all those kinds of things that you need to actually think about across time right if you only had the current value of everything does you really0:23:21
couldn't be very good at making decisions and in all other cases we don't accept that but with databases we do for some reason so we want to be able to do this0:23:30
finally and everybody's always very interested in what the testing story is well it's phenomenally good how you go like testing databases code the react0:23:44
guy likes testing database code that's okay I understand that it's fine right so how many people like code the flows connections around everywhere0:23:54
really breeze right because what's a connection a connection is sort of like a pointer to a machine right and then how do you know what happens I called this code I flow a connection through0:24:03
and it passes through and we all know what's the solution to this is right the solution is ambient connection pools so we're not to flow it my neighbor who wants to at any point in time in our program can immediately say give me a0:24:13
connections or whatever and let me go issue a query I'll get whatever is happening right that we know that's brutal right because we have no idea what's happening if if0:24:22
five different points in the code go and make an independent call over that connection as part of a sub components of a bigger computation we have no idea0:24:31
that they all got things that make sense together right there's no shared basis between that code that's all independent so it is like this big variable that we're passing around so now we just0:24:41
completely switch from that we passed the we once grabbed the value of the database and we passed it through our program and then we know absolutely everything underneath is using the same0:24:52
value and it means that we can write tests that are reproducible another phenomenal characteristic of values is that they're easy to fabricate so if you0:25:03
have the database as a value it means that mocking a database isn't making a connection that says yes to everything or pretends to do stuff or you're having0:25:13
to reset a database from blank into some particular state it means that you can just fabricate values and test code and send it to code that would otherwise expect the database and that code will0:25:23
work and you know the test code the test values or things you can generate make sure they exhaustively test ranges and things like that so my intent with this0:25:32
talk was to give you a sense was not to talk and talk and talk which is what I usually do about abstract things but to give you a sense of what it's actually0:25:42
like to use something like this and touch it so I'm going to at tremendous risk to myself and especially to you try0:25:52
and to do something live so that's not big enough I'm presuming is it okay it's better if I can show you a little bit0:26:01
more that way we won't be scrolling around too badly so this is a brand new repple I just called up and this is closure card if you don't know closure0:26:10
don't worry about it it's it it just means exactly what it says which is quite convenient and there's no extra stuff that's all0:26:19
you have to that's all you have to know so we're going to load up some support code just to be able to talk to the the database and I should have put a note in0:26:29
here which is if I don't change this you should all yell at me right now but here I am changing it so there's no need for yelling so we're going to create a name0:26:38
for database is actually a trans actor running on this machine and we're going to create a name which is a URI for the database somewhere make one and we're0:26:49
not going to make one because I don't know if I have this I had to restart this hang on one second there we go you see it's peril at every at every point every time I sleep my0:27:02
machine I lose my connections so there we go and so what's happening0:27:14
here dupes like this big again what's happening here just so people know is this is this is the this is the way you program a way you program enclosure0:27:23
which is you leave your source code in one file and there's a command I can hit on the keyboard it's probably the only thing I know how to do in Emacs that says evaluate the thing my cursor is on0:27:34
so that's what I'm doing so if you see my cursor appears a true that means we just created this database okay and then we're going to get a connection to the0:27:43
database so this is that critical thing this connection this is that machine this is the machine side when you're using databases traditionally you're used to using the connection all the0:27:52
time one of the things I want you to note in this in this part of the talk is how infrequently we do that so we now have a connection to the to the database0:28:03
and this is this means that anything we do via that connection is stuff that everybody else can see and can share I'm not going to this is not really a tutorial but there there's a minimal0:28:12
schema required for the atomic you basically define what attributes you're going to be able to put on entity so we're just going to define one for this this says that there is there isn't0:28:22
there is something which is an attribute which we're defining right here that's name is email it's value type is string you can have one of them it0:28:31
should be unique across the database and the last attribute we're going to assert here is that that should become an attribute in the database this object this value is that this thing I'm0:28:41
pointing out here isn't is a representation of a closure map so it's just key value key value key value you can insert any Java map the same way so0:28:50
you know have at it in your own programming language and then we're going to actually transact across the connection so we're going to transact across the connection just sending that data that's what we do we send data and0:29:01
remember that was that novelty so there's some novelty we're gonna we're going to tell you about the email or do that right here and that gives us back this thing undef just as called this0:29:10
name that value so now we have schema ret which is the return value from having transacted that schema data into the database and that that returns a map0:29:21
and so we can look at the keys of that map we see some really interesting stuff right away transacting across the database returns the database before the transaction as a value the database0:29:32
after the transaction as a value any data created by the transaction and a resolution of any temporary IDs that you created that's pretty cool so0:29:44
theoretically the database after should be the value of the database having done that schema we're going to call that new DB just so that later we can talk about the database this is the0:29:53
blank DB new DB and we can see is the new database the same as the database before we transact it we come down here and we says no and we and now we're0:30:04
going to go we're going to say this function DB goes to the transaction says get me the latest database value and because there's nobody else using this0:30:13
database right now and that should be the same thing as the after value and it is and people have ever used equal or on to database instances okay so things are0:30:25
different already and now we're going to define a little query we're just gonna keep reusing this query on different values of the database throughout the talk so this says find me some e and0:30:34
email where he has an email basically this is get me all the emails give me the entity and the email all of them in the database and we're going to0:30:43
go and issue this query and that's what Q is we're gonna say issue this query so sending this value this is again just a data structure it's a Java util list you can write it however you want in your0:30:52
own Java you know JVM language it's just a list so we send that list and we're going to pass the database value to query notice there's no connection in0:31:02
this call right does not a query to a connection or to a server or anything else it's a function of the database and the query value and this is the empty0:31:12
set down here we don't have any emails yet we created the attribute we haven't made any entities so let's make one we're going to transact again against the connection this is the same kind of0:31:21
thing we just put it in line we'll say there's some new entity with the ID we're just making up whose email is Fred at email comm and we're going to transact that and just like before0:31:31
that's the same kind of thing it's a transaction it returns those same keys right the database before and after and whatnot so it'd be interesting to recover Fred's identity because we're0:31:41
gonna want to talk about Fred later so the temp IDs part of that return value is a map it's all the temporary IDs to the actual IDs you got in the0:31:50
database because the IDS are auto-generated so this little piece of code just says go into that return value get the temp IDs get the first one of0:31:59
them and its value as Fred's ID no fluff there and if we look there we see that's Fred's ID we'll use that later and that0:32:09
function was so interesting we're gonna name it an ID because we can grab other IDs that way now if we issue a query because we just added Fred we got0:32:19
nothing why do we have nothing well I didn't do anything new DB that's it's0:32:28
immutable I hope I didn't mess that up something else could be using that right I don't I didn't do I didn't grab the new value of the database right I just have I have the old bag so the DB after0:32:39
that was returned by the transaction is the value there now what if other people are using this database if I immediately went back and said get me the database could I got could I get exactly the0:32:49
database that that was produced by the Fred transaction I don't know maybe maybe not but I don't have to worry about that right it was returned to me from the transaction so I'll just grab0:32:58
it so the DB after I issue the Fred transaction is the one that includes Fred I hope and let's just see if that's0:33:07
the case there he is we have Fred in that value of the database so now we're going to do similar thing let's get this in view0:33:17
we're gonna a definite Emperor ID and email as Ethel and we're going to put the return value of transact into Ethel0:33:26
DB has the same things DB before DB after IDs and whatever we're gonna name the value of the database that was created from that transaction the Ethel0:33:35
DB we didn't mess up Fred DB still has Fred in it if we carry the Ethel DB we see we have both and if we do actually0:33:44
go back over the connection again because we're not sharing the stage faced with anyone we will see that the live database will make the slow wider the live database also has both Fred and0:33:55
Ethel every okay so far alright well so having fun so let's say let's say Fred wants to change his name to Freddy and0:34:05
he's he's done it he's told his mother and she's still upset about it but he's gotten his new email address and that's it moving forward but let's say this is0:34:16
the first time you ever use this database and maybe you're not really sure that you know how to do this right so you're gonna try it first I had this transaction I want to issue0:34:27
to make to make Freddy's Fred's name Freddy and I'm gonna write that as data here I'm just gonna name that so I have a value called Freddy TX and it's just0:34:36
it really is just that anytime I type down here you should tell me don't do that because you're just gonna mess up and and use code you didn't try before now there it is okay so this is says0:34:45
we're gonna make a guy you know we're gonna make Freddy here then we're going to do something completely different right we're not going to call transact this arrow thing just says take the0:34:55
first thing and pass it to the next thing and pass it to the next thing it's a way so I can write code in order that you're probably more familiar with so we're going to take the value of the database from the connection and we're0:35:07
going to get a value of the database we saw this before we get a value from doing this they want to say imagine that database or give me a new database as if I had done this Freddy transaction to it0:35:18
the one I just wrote up here and that returns the same stuff that transact us so I'm going to get the DB after it I'll call that Freddy DB to see what happened0:35:27
and now what's gonna what's gonna happen when I look at Freddy DB well Fred become Freddy no actually I made a0:35:36
mistake what was my mistake yeah I actually just made a new guy I0:35:46
didn't talk about Fred I just said there's somebody else who has a Freddy but thank goodness I didn't really do this right if I go look at the connection look at the database that0:35:55
everyone's seeing I didn't talk to the connection you know to do this job therefore I didn't change anything so let me see if I can fix that right what0:36:05
I want to do is say Fred's new email is this I'm just adding a new fact I'm not doing anything else and I'm gonna you know name that thing and I'm gonna try0:36:15
that again the same way take the fresh value of the database imagine it with Fred transaction and see what happens and I will lose that and that will issue0:36:25
the quarry here and that's what I expected to see this same ID 418 that was Fred is now Freddy that's good Freddy Fred's is named and changed his email so0:36:36
now we'll do this for real right we're gonna take that same transaction data and instead send it to the connection using transact like we did before say now okay apply this change so that to0:36:48
the value of the database that other people might say this is just chaining it take the connection pass that transaction to it dereference it and0:36:57
grow the database after so that's that's the new Freddy DB and we issue the query we see that but now if we look at the database that everyone sees everyone sees Freddy already good0:37:08
okay so now we're going to look at that a creative nature which is that every value of a database contains all prior values of the database so we're just0:37:17
going to get the latest value of the database so we're going to call it latest dB it's but we ask the connection to get the value and we now have the value we're done with with the0:37:26
connection and we'll just issue our same query again to make sure what our basis is this is the current view of the world Freddy and Ethel but now we have this new function here called history and0:37:36
history takes a database and says instead of just showing me the current truth the most recent true value of every0:37:45
attribute that you know show me every value of the attributes that you ever knew and you would like to be able to do that0:37:55
sometimes yeah I did so there we go so this is very interesting so now we see unlike in all0:38:04
the other queries that def that 418 can come up more than once we have Freddy here and we have Fred there now this is kind of neat except I don't know if I0:38:13
could make any decisions on this basis why not I don't know when boy I wish I had a database I remember when things0:38:23
happened and it ends up that you do right so what I've been showing you is a query that binds only three of what are0:38:32
actually five parts of what we call a datum right which is that which is a fact the first is what entity the second is what attribute and then the value but the fourth thing is when now what0:38:43
transaction was this true so I'm just going to enhance the query and we'll call it T query which we'll pull back that additional thing so we'll get the ID the value and the transaction and0:38:55
when I eschew this I get I get back to transaction times but what else is really interesting how many results that0:39:06
I get last time my use history 3 how many did I get now four hmm and what's0:39:17
interesting about those four is that the two of them have time 3:17 at the end here one says Freddy and one says Fred0:39:27
so what I'm still not really able to distinguish these because these two things happened at the same time what what's different about them do you think0:39:39
right one was I'm saying this is true and the other one is saying I'm saying this is no longer true right one was an assertion and one was a retraction and it ends up that 5th part of a datum0:39:52
tells you that so if you are looking at history you're probably interested in not only in the time but also whether or not things were asserted or retracted so the full the full Kahuna can grab the0:40:02
entity the attribute the value the time whether it was an assertion or retraction so that's full query and if we issue full query we see that we as we0:40:13
were and we were asserting Freddie at 3:17 but we were retracting Fred at that point in time that seems like something0:40:22
you could do reasonable reasonably cool analytics with I think so the other thing that's useful about having all of history inside something is is if it was0:40:31
a way to say I found this cool thing back then or I just did something at this point in time I'd like to tell you about it maybe gonna send you an email but I know the database is going to keep0:40:41
moving along the processing on the business is active and things are happening I wish there was a way for you to get back to the thing that I saw for me to get back to the thing I saw and that's about having a basis that you can0:40:52
recover and look at and send around so wouldn't that be nice if we could get that and we're going to go back now and remember we captured Fred DB right after0:41:01
we added Fred and ends up that we can ask any database for its basis T which is the time you know there's a monotonic0:41:11
time that increases in a database that labels every every point so we'll say what was the point in that database and we'll see that's 1001 and it ends up that T's have a relationship to T X's so0:41:22
we saw T X's earlier and you can translate from one to the other just a TX is actually an entity because transactions or entities but the T is inside their transaction ID so we can0:41:32
turn one as the other so this is 313 and that's what we have up here right this assertion of Fred came in transaction 313 and 1001 and 313 are the same things0:41:42
it's just we don't want to have to send those giant numbers around and how people try to understand them so now we have this basis when we called it Fred T so now we have two other really cool0:41:51
things we can do we can say this is the latest database but I'd love to see it as it was at time 1001 and I'm gonna go0:42:01
in and get that value and pass it to query now you notice I'm not changing query same query different value of the0:42:11
database and if I issue that query boom it's as if I never saw Ethel or Fred never changed his name the other thing I can do is I can say let me see the0:42:22
database only the stuff that's happened since that notice after that and if I issue that query I'll see Freddy and Ethel so I can see0:42:31
the database before sense or combinations of that give you windows of time and finally we can talk about seeing more than one database so this is0:42:42
a slightly more involved query and we add this in clause which says I'm going to give you data sources if you don't supply the in Clause query presumes you're gonna give it one0:42:51
argument which is the data source implied for all of it if you're going to have more than one data source you have to name them in in and you have to name them in these binding clauses so we're0:43:01
going to name two data sources d1 and d2 and we now expand so the actual binding form is database entity attribute value time assertion and this basically says0:43:12
show me everybody that was the same in both databases had the same email in both databases so what should we get if we issue this query against the Ethel0:43:22
database which has one in it Fred and Ethel and the latest Davis has one in it Ethel and Freddy what should we get0:43:34
Ethel oh we should we should make this exist there we go we get Ethel so there0:43:43
we go we can imagine sending other databases and doing interesting things between databases but the principle I showed in that picture applies so we were talking about testing I wonder if0:43:54
what I would do if I wanted to test this should I create two databases and fill them up with stuff and that's how hard right even with this is0:44:03
hard why should I have to do that that's all this like I have to set up machines to run I wish I could just supply data in the same shape that the database values are and have quarries0:44:13
work against that so here's that same query here's two relations written as lists of lists and when the query is0:44:22
what's the same in both databases same query query against database about databases we just saw it work against databases that database could have the0:44:31
entirety of you know giant thing on DynamoDB behind it we could test it with two lists obviously we know how to make lists okay we're gonna switch gears now0:44:43
and I just want to show you the the paradigm shift so I'm gonna actually connect to a database that's already installed which is a large music database derived from the musicbrainz0:44:53
data and get a connection to it and get a value of the database from that connection I'm going to call DB second database another connection I0:45:03
grab the database value from that angrist no problem doing this and if I had any relationship between these two databases I could supply the music database and the email database but I don't okay so we're gonna go a little0:45:15
bit more quickly through this this is a query that says find me the artists whose name is the Foo Fighters and it's just going to go and grab the first return marecus will be only one and0:45:24
that's the Foo Fighters ID so we do that and we look at FF ID and we see is some number we don't really care but now0:45:33
there's another another completely different way of looking at the same data value right the same value of the database which is more entity oriented0:45:42
it that's not to say it's object-oriented and the objects aren't created this just it's as if there was a big map and it could point to other things and each of the attributes points to other things it's implicit in the0:45:51
data right the data has hierarchical structure implicitly albums have artists and things like that so I'm going to make an entity which is sort of a lazy0:46:00
way a lazy map from the value of the database so we're done with the connection not using the connection from the value of the database to the value0:46:09
of the music Davis which is huge hundred millions of stuff in it hundreds of millions of facts I'm gonna give me the entity corresponding to the Foo0:46:18
Fighters and then it's it's behaves like a map so I can ask for its keys and so the Foo Fighters is an artist so they have a start year and country and some0:46:28
other stuff so we can say what year did the Foo Fighters start now I'm not I'm not in a query language here I'm using closures facilities for manipulating0:46:37
maps but if you had if you want to use Javas facilities for manipulating maps it would work just the same way it's not it's not different you call get and stuff like that so that's kind of cool0:46:48
but it ends up that artists aren't that interesting they have facts about themselves I know the database knows which albums are by the Foo Fighters but that actually is a property of the album0:46:58
so albums have artists instead of artists have albums and there's always that kind of choice in which should it be good night if I get that wrong will I be screwed and not be able to find my stuff it ends up no right first of all0:47:10
it is correct to say the artists have albums because there might be more than one artist on an album but since I've organized it that way what if I wanted to find what albums were by the Foo0:47:20
Fighters do I need to go back to quarry and things like that ends up that no it ends up that I happen to know that that there is this attribute called release0:47:29
artist which is what are the artists on a release you know who made this album and if you use this little underscore it means you can go backwards so this this0:47:39
basically says show me all the releases that have the Foo Fighters as an artist and this is not a query this is just you're using this like it's like a field0:47:49
look up its imagining the Foo Fighters had this attribute which was who points to you it's fixing objects right I'm just pointers only go one way here it's0:47:58
relational point just go both ways in fact they're not really pointers are more like bridges and there we go these are all the releases so that's not that0:48:07
interesting because these just the IDs but you'll notice these are like little maps in fact they're lazy entities just like the one we just saw one for every release that has the Foo Fighters so if I take that that's set0:48:18
and grab the first one and look at the keys I now can see these are the key of a release right this is the DeeZee attributes of Elysee has and so finally0:48:28
I can go and just use regular functional programming to take the Foo Fighters grab the releases that have the Foo0:48:38
Fighters as an artist map this higher-order function that says grab these three attributes and put them next to each other and a couple and then print them there's no query language0:48:49
here this is regular functional programming and we do that and we get that so with that I'll go back so I0:49:02
think I think that when you finally have this having a functional approach to a database brings the same kind of benefits you get from using functional0:49:12
programming otherwise and there are there many and they're extremely valuable you also get a lot of new stuff there's several things here that I think0:49:21
are extremely difficult if not impossible to do otherwise and the most important thing and the real reason to be here and and to pursue this is that0:49:32
this makes programming and programming that involves databases much simpler in the in the in the core way and the core0:49:41
meaning of simple much less entangled then then it has ever been and that's that's really the point so thanks very0:49:51
much you0:00:00
Writing Datomic in Clojure - Rich Hickey
0:00:00
sumption how many people here know closure maybe people don't know closure0:00:10
okay one of the great things about closure as it is in hiring it's also0:00:20
with audiences a great way to sort of filter out the the best of the best I appreciate your coming so what I'm gonna0:00:29
do is talk about how we implemented the atomic enclosure and I'll start by describing the atomic itself and the0:00:38
overall architecture and then dig into a few of the implementation details and sort of summarize what I'd like to do is go relatively quickly and have some0:00:48
question and answer but that may be too ambitious because I do have almost 40 slides which usually takes me two hours so why don't we get started so I guess0:01:00
the other question is how people are familiar with daytime look at all a little bit okay so I'll go through what day Tomic is about fundamentally daytime0:01:10
works a database and the overarching goal is to move away from a model a0:01:19
monolithic notion of what constitutes a database to one where the facilities of a database are distributed in particular substantial portions of the power0:01:28
normally attributed to the server in the database are moved into your application servers themselves so you get more ability you know more power over0:01:37
programming with data inside your application logic because typically that's been over there and outside of your your scope of your program itself a0:01:47
couple of serve architectural changes with a data model it might seem familiar if you are at the keynote yesterday because it really strives to to achieve0:02:00
the objectives that I discussed about modeling information itself and incorporating time and it's been enabled by the fact that computers have gotten0:02:09
faster and networks have gotten faster and many of the old architectural presumptions about data locality in particular about the advantages of0:02:18
a particular machine having the data on its disk are gone now due to the way networks work there aren't there are not0:02:27
advantages for that machine so so why do this I think for these two reasons right we want to pursue this different architecture what happens when we deconstruct the database and I you know0:02:38
I want a different data model so architectural II we can look at you know databases of moving through this spectrum we start with traditional client-server this was born of the fact0:02:48
that you know wayback computers were really expensive they were they didn't have very much capacity and getting a a single one big machine was a big expense0:02:59
and so that machine became very special you could only afford one and you put all the good stuff there and the clients were pretty lightweight but this design0:03:10
fundamentally has a bunch of capabilities that we want to track as we move from architecture architecture right a traditional database has query support it supports transactions it0:03:21
ensures consistency of the information that's placed into it and usually traditionally that server also was in charge of storage moving forward from0:03:33
this we can look at well okay maybe in an effort to get more throughput or some scalability we would cluster this server0:03:42
again we still have sort of a single logical entity we're using more than one machine to accomplish it but it still acts as a single entity handling0:03:51
handling the same jobs and sort of acting as one unit and this is very difficult to do and as you know probably usually expensive to do and a second way0:04:03
and of course there's still real big challenges there because the amount of coordination between clustered servers is very high so in an effort to get even more scale we've taken those servers and0:04:14
divided them up and said let's shard the data at the point you shard the data you really end up with independent databases you know you can maybe the sharding is done sort of0:04:24
globally but really your information is no longer connected this is three different databases and so you start to you start to lose right0:04:34
you start sharding and then you can't Corre against shards you can't do transactions across shards you can't ensure consistency across shards and really that's why I think you should consider these independent but you still0:04:44
have a maybe storage subsystems that are being that are being serviced but sharded servers really have fewer capabilities and now we move to sort of0:04:55
the newer generation of things that say well you know what by the time you're starting you're not really getting many of those advantages let's just have key value stores now we have true0:05:04
independence we're just gonna you know consistent hash your keys and then go find the Machine that they're gonna run on and now we have independence here and0:05:14
at this point I think you completely lose everything right you no longer have queries of any you know certainly not ad-hoc queries there's no transactionality left there's no0:05:24
consistency left everything is anatomic and independent so the idea of multiple things being related to each other is gone you're really just in a storage system and I think you should basically0:05:33
start labeling these things distributed storage services because they're not at all like databases any more databases really had a lot of leverage one of the0:05:42
words I use this today databases give you leverage alright there's no leverage in a key value store leverage is gone this is like a glorified file system I've put this blob there I called it one0:05:53
two three four and if I ask for one two three four I get the blob back that's not database no leverage but there are lots of interesting architectural0:06:04
advantages to this in particular for scale and for read scale so we want to tap into that so the atomic starts with with that idea of a distributed storage0:06:13
service and says there's a lot of value to that proposition but let's try to re obtain the leverage we had from0:06:23
traditional databases and the first thing we want to do is reinstall queries so obviously if we use the storage service we have storage and what we're going to do with queries is we're going0:06:32
to place them in the application servers so now every application server gets its own brain we still have transactions or consistency at this point but you'll see0:06:41
if we move from the key value store you know everybody every parts every client of a key-value store could write to the key value store and read from it we're gonna change that we're0:06:51
gonna say you can only read from the storage service and we're going to re-install a privileged member here0:07:00
which we call the trans actor whose job is strictly to coordinate transactions and consistency and by reintroducing0:07:09
this element we end up with a hybrid model with some interesting properties right we have distributed reads we have distributed storage we have redundancy and storage but we get back transactions0:07:21
in consistency so this is a hybrid model that has a sort of a traditional model for rights and a new model for reads and0:07:32
this trans actor only deals with rights it doesn't handle any read load at all it doesn't answer queries or anything like that so when an application has some novelty right new facts it's going0:07:42
to send it to the trans actor trans actor is going to broadcast this back to the other peers we call them that are connected and also committed to storage0:07:53
so it's transactional it's durable and won't return until it's in storage and at that point it gets reflected back but you also have this nice 1/2 of novelty0:08:03
being distributed by this by the system at this point we have everything back so0:08:12
one way to look at this is to sort of take take the pieces of the traditional architecture and see where they end up in this model so if we look at a traditional database again we have this0:08:21
monolithic thing it does indexing it does transactions it does i/o to storage it handles query load your application itself is very you know it doesn't have0:08:31
much power at all it's completely relying on the server to do all the thinking but it but it often has you know like a caching layer or something like that because this unit gets0:08:41
overloaded so what happens is applications would go you know issue queries and then say wow that was expensive maybe somebody else will ask the same question I will manually go and0:08:51
put that answer in a cache and then then I'll manually check the cache before I do this and it's all sort of on me to add this layer to try to avoid overloading0:09:00
centralized resource so we can look at where all these components end up IO moves to a storage service and it's0:09:09
handled completely by that and that's isolated transactions move into the trans actor which only does that indexing is currently a background0:09:18
process inside the trans actor that could move out of there but that happens to be where it's done right now and then the application process itself and you0:09:27
can read this as an app server right that that t-ruler was your app server now has a query component inside it which is embedded as well as this0:09:38
propagated index so it's got a live index and the ability to read from storage and by reading from storage as needed and live merging with this index0:09:49
it can provide the application with very very fast local query capability that's isolated from the rest of the system and so we can go and dig into what0:10:01
attributes are trying to achieve by breaking it apart like this so the first thing you want to do is you really want to accept the fact that as we move0:10:10
forward with more virtualization we have virtualized memory now a virtualized storage and now we sort of virtualized the architecture you know we have0:10:19
machines we can you know machines that we can start and then they go away we have storage we have all this ephemeral all these ephemeral components so you want to design systems moving forward0:10:29
for components that you every component is ephemeral everything can die you know failure is a continuous state so you want to design for that you want to design for unreliable disks and that's0:10:39
part of what's happening here you need redundancy as soon as you do that right you can't have a privileged machine with the privileged disk so you want redundancy in the storage0:10:48
and you want to well we want to be able to piggyback on top of existing reliable storage services so dynamodb would it be an example of an existing reliable0:10:58
storage service right there's no reason why you know every database needs to solve every problem associated with databases are just going to get a million implementations of how to put0:11:07
blocks on the disk and how to read them back and there's really no advantage to doing that anymore and this substantial advance as you'll see in making this an0:11:16
independent architectural component in particular you can have more than one storage service so we support memory we support sequel we support in finis ban0:11:27
and there's a bunch of other interesting key value stores like Couchbase or react that make logical sense in this role as a storage service so combining these0:11:37
things makes a lot of sense let each component do what it does best the other thing we're looking for is scale right so I think one of the problems we've had in deconstructing architectures so far0:11:48
is that we've taken everything together and said there's this monolithic database let's make a bunch of monolithic databases and that's how we get distribution but now we're taking0:11:58
everything right storage query transactionality or the lack thereof and saying let's divide that up and have a bunch of little pipe pieces and all look the same I just saw on the prior diagram0:12:09
I don't think you actually want to do that I think you want to make independent decisions about transactionality storage and query when you do that you get independent scaling0:12:19
characteristics and independent failover characteristics for each part of the architecture so in the case of query because we've put query capability in0:12:29
each application server when you add more application servers you get more query power transparently and that's0:12:38
truly elastic when you need less query power you just stop some of those machines from running and it goes it goes down it's easy also for this kind0:12:47
of scaling to tap into demand driven scaling right so when you set up auto scaling in in Amazon for instance right0:12:56
you can you can make that triggered by the amount of demand the amount of load that's being seen contrast that counts and you're getting query scaling out of this contrast that was saying you know0:13:05
I'm going to pre configure a cluster or I'm going to manually go to a console and add a machine to a cluster so that I can have more you know a bigger cluster server that's a very explicit action0:13:14
it's not responding to demand dynamically so it's a big deal to tap into demand driven scaling capabilities and that's what's happening now with0:13:24
with reach with query when you in better than the applications here so you get your own brain we're gonna look in0:13:33
more detail at what these pieces are from implementation detail but you have query communication engine you have memory and caching engine all built in to your application servers by including0:13:43
the pierre-pierre library and the what the effect of that to your application is the database feels like a local resource this is a very big deal because0:13:53
I don't think people really appreciate how much how how much client-server has invaded the way they think about what applications can do and and and what's0:14:04
okay right because the this conversational aspect and the and the fact that you think it's expensive and I think that your think that you're tapping into a shared resource that's0:14:14
potentially getting overloaded really compromises what you think what your application is allowed to do and when you move this stuff local now it's0:14:23
completely different right if you want to run if you have your own app server and it's doing analytics and it needs to run queries that run a long time is that okay now listen it's completely ok those0:14:33
queries don't tie up any shared resources no one's waiting for you it's your own machine that's a complete inversion of logic from when you have a server or even a0:14:42
set of servers that's a shared resource that you really count on so ad-hoc queries long-running queries that all becomes okay because you have your own brains from a logic side again support0:14:54
you know one of the one of the mission statements was to bring the power of programming with data into the application that's what we're trying to do so we have a declarative logic engine0:15:04
built into the application peers it uses data log this is a how people know data log is not too many so it's a it's a0:15:14
it's a query language it's kind of pattern oriented it's very simple in a in a truth notion of simplicity and but0:15:25
it has power equivalent to relational algebra so it's the full power of relational algebra but the point is now that that's in your application so this is a really great thing when sequel is a0:15:35
really great thing it's set oriented it's declarative it's very powerful but it's always been over there you know in the server in a different language I have to send strings to it I0:15:44
get something back it's conversational there's a lot of a lot about that that doesn't make it useful to you in your application it would be nice if you could use it in your application and0:15:54
that's that's what we've done so you have data log locally in daily log itself joins or implicit right it's very evident kind of code as you read it and0:16:04
in in addition to being able to access the database we really want applications to be more declarative themselves so the0:16:13
data log engine can be applied to your own data in memory or data from other sources so you can combine database sources and in-memory sources and0:16:24
collections together in queries you can query nan you can query collections you can query you know system get what is it called get environment know properties system0:16:35
get properties right you can query that you can query the results of that instead of having to write this loop and I think that's really important right because those kinds of applications are0:16:44
are clearer and easier to debug and make correct so perception is also important it's the thing I talked about yesterday0:16:54
perception in the real world is is a facilitated by passive communication you don't ask things to see them right lack0:17:03
light bounces off them and you just observe it because it comes to you it bounces off them and comes to you and you didn't do anything and they didn't do anything and mimicking that0:17:12
architectural e means push essentially right means some sort of broadcast and so you can get a cue of transactions inside of here it will include your own0:17:23
transactions as well as the transactions of others and then you can hook into that for any purpose you want so you can make reactive systems that don't have to pull the database again you're trying to0:17:32
sort of get away from this there is this central thing I need to go and keep asking questions of from a consistency standpoint because we've established a0:17:43
significant component in the trans actor we get back acid transactions with full acid capabilities including the ability to be consistent across any0:17:54
piece of data inside the system at all so there's no sharding and there's no separation there there's no per document transactions or per document set or anything like that the other thing the0:18:06
other notion so that's the DES traditional notion of consistency that has to do with process right change I want I want a set of changes to be done all together or not at all0:18:15
and isolated right those properties but I think there's another notion of consistency which really matters to application logic which is can I present to my application logic a consistent set0:18:25
of data so that I can run a run a report right how many people are able to make reproducible reports against an in-place update database it's very difficult0:18:35
right because the database changes between when you made the report and not and then you have to do all this extra work and sometimes this is simply impossible what happens inside the0:18:44
atomic and what we're trying to achieve and as part of what's interesting about the implementation from a closure perspective is we're actually trying to present the entire database to the application as a value as a first-class0:18:54
value like I talked about like completely immutable you don't see any of any other changes every every part about it is evident it's comparable and all those other kinds of characteristics0:19:04
and so the way we achieve that is to make sure that all the data that's kept in the storage service is immutable we0:19:15
want the database to be programmable obviously as closure users we care a lot about programmability of programs and it's a traditional sore spot for0:19:24
databases that are not particularly programmable so you'll see as we go through this that transactions and rules and queries they're all data they're data in their data out there defined in0:19:33
terms of data not in terms of operations and there's a bunch of you know common-sense things in terms of extensibility and predicates and also0:19:42
critically queries can invoke your own code so it's an extensible system in terms of the logic that can run unsurprisingly the model from a data0:19:53
models perspective is that it's a database of facts and when you try to follow through some of the things I talked about yesterday all the way down you realize that it's very difficult to0:20:04
make an efficient database of facts if the granular of fact is is a row or a document right they're too big if you're gonna say I want to record change unless you have an0:20:15
independent Delta scheme you really you have to say you know I got a new address right let me save you again that doesn't make sense you need something that can0:20:24
represent just you know new new email address and so that means boiling down to sort of an atomic notion of what0:20:34
constitutes a fact and you can easily build up the requirements for this right Sally is not a fact Sally likes is not a0:20:43
fact Sally likes pizza is also not yet a fact because what do we learn about facts yesterday yeah facts are things0:20:52
that happen so Sally likes pizza you know as of when when did that when did that happen that now is a fact and rather than store like a time of day0:21:04
component here what we do is we store the transaction that recorded the fact because then we can put the time of day on the transaction but we can also put0:21:13
other interesting and useful stuff on that transaction like who did it what was the source of the data you know has it been approved and other things like that and you know only only incur the0:21:24
expense of one additional component to this thing so we call this thing a datum entity attribute value and transaction and the only schema that's associated0:21:34
with with it atomic database is the definition of attributes right there's no other structural constructors no records there's no types there's no0:21:44
classes there's no documents schemas or anything like that but you do need to define attributes before you use them they have a name they have a type they0:21:54
have a cardinality they have whether or not it's a component of the thing that points to it and a few other things0:22:04
is it a runtime I want a new attribute of this you can make a new attribute and attributes are data attributes are facts right they're stored in the database0:22:13
just like and right alongside everything else completely queryable and time coordinated everything they're just data they're just facts every fact has a0:22:29
transaction that occurred in and transactions have time associated with no there's a new fact which is you no0:22:38
longer like pizza and that fact happened on a different time and that's how we do it so there are assertions and retractions we don't go back to facts0:22:47
and change them right there's just a new fact I moved so that does mean my address is no longer this but just saying I have a new address is0:22:56
sufficient to say that imply that it's not the old address and cardinality can help make that automatic to all right if you have a cardinality one thing and you0:23:05
now have a new value of it it means the last value has been superseded okay like many of the newer databases we do want0:23:15
to try to address some of the pressures that have been faced by people trying to use rectangle rectangle oriented databases you know traditional relational databases force rigid schemas0:23:25
that you know can't have blanks and have difficulty dealing with multi valued entities and hierarchical data and sparse data and of course the beautiful0:23:36
thing about a about a model like this that's atomic is you can build any of those things that you want but you're not forced into them and news talk later is going to have a great examples of all0:23:46
of that that talk is not the next talk it's the talk in the iconoclasts tract0:23:56
so we want it we want to move away from structural rigidity as as you know most people most people do and of course we saw in order to be a fact-based database we have to have time built in we saw0:24:06
every datum has its own transaction right the transactions are totally ordered their first class and because the database is the value and because0:24:15
everything has time on it you actually say of the database itself give me this database as of last week or since two months ago and then issue0:24:24
queries against that view of things but focus of this talk is on implementation so let's look at the architecture a0:24:33
little bit a little bit more close-up this is just the components of des Tomic now we're going to see the application process uses a peer library has some communications components has something0:24:43
that represents the index as a caching component and quarry component the trans actor has indexing and transactions and storage the things we're trying to0:24:52
accomplish with this design or write the problems we're trying to solve right in the design its first thing we have to do is deal with state right how is the database of value and there was how can0:25:01
you think of something that is getting new facts as a value and one way to conceive of it is sort of like you know the rings of a tree right as you add new0:25:12
rings the outside of a tree the inner parts don't change and so with that kind of a notion of value in other words it's immutable but it's it expands over time0:25:21
you still not updating in place so you still actually have all the value benefits as long as you're able to talk about it as a particular point in time so that's sort of a philosophical hurdle0:25:31
you have to get around but at that point you're fine but again the key here is leverage right the point of a database is to organize information so that you can get leverage0:25:41
out of it my mind leverages query right and it's organization as indexes so the way we do this is we have status represented as a sorted set of facts but0:25:53
one of the things that's really important is to learn the lessons of big table and what BigTable showed us was maintaining sort and maintaining an0:26:02
index live on disk is a bad idea it's incredibly inefficient you keep rewriting the head of something and what you want to do instead is accumulate0:26:11
change right and then and merge it periodically so everybody know how BigTable work everybody know how big they work yeah so BigTable accumulates a0:26:20
bunch of stuff it keeps a sorted set in memory and a sorted set on disk right and then every now and then it will merge from memory to disk why when you ask it a question it always at answers0:26:29
the question by giving you a live merge of memory plus disk so it's the same strategy here right that's very very important I don't think you can do an0:26:38
efficient live indexed immutable thing you just use way too much storage and you're constantly you know garbage collecting so we do an occasional merge0:26:49
into storage now that doesn't mean that the data is not durable right we keep everything that comes in as of the transaction acknowledgement has been made durable but it's not necessarily0:26:59
been merged into the adorable index and until it has been it's kept in memory and unsurprisingly for closure users0:27:09
write the whole thing behind this is persistent tree so now we have durable persistent trees so we can look at what the memory component of this looks like0:27:19
right it's it's a persistent sorted set it is not the one that is in closure which is a red-black tree this is a new0:27:28
persistent sorted set that uses big internal nodes so it works a lot more like so than the other the other closure data structures and it has pluggable0:27:37
comparators you know that you would expect and there's always two of these maintained so without any user intervention we're always maintaining all of the0:27:46
datums in entity attribute value time sort and attribute entity value time sort and then there are ways to get or0:27:55
request attribute value sorts and the reverse indexing right we do reverse indexing automatically for any entity to0:28:04
entity references so if Sally likes Fred there's a fast way to get from Fred to the fact that Sally likes Fred so we can0:28:15
look a little bit about what we do in storage itself unsurprisingly we use storage like a tree right unlike BigTable which takes the segment's that0:28:24
it produces in memory and and does a full merge sort into another big block on the disk we're not that disk centric we're0:28:34
anticipating using more key value store like storage engines so we want to keep the block orientation that's traditional and databases so we use store0:28:43
as this akin to the way a traditional database uses the file system we store blocks in storage so what we're going to0:28:53
do is we're going to have starts and retractions that's all the fundamental data boils down to a search and retractions stored in these sorted indexes what we require of a storage0:29:03
engine is most basically key value storage right we're going to associate a key not with some particular fact we don't use these values are actually big0:29:13
chunks they're like segments they're like blocks of a traditional database we put blocks of des Tomic disordered des Tomic data into storage but0:29:22
interestingly doing this full implementation requires the closure data model right the closure state model we have essentially a distributed version0:29:32
of the closure state model with the moral equivalence of atoms and refs and that puts the requirement of the on the0:29:41
storage system that they offer consistent reads so we can implement atoms and conditional put or caz or something like it so we can support pods or you can I hate to say pods but yeah0:29:53
pods are like accumulator like Refs so it's the closure state model now extended to storage and storage looks like this essentially there's a single0:30:05
root of the index okay there's additional storage associated with the log so as data comes in it's immediately put in in the log this is the thing that's periodically0:30:14
updated and again it's not updated in place everything we put in here is immutable but there's a route that points to a bunch of the different sorts0:30:24
we also do leucine inside this and its associated with it and in transaction time in other words this includes all transactions prior to this one and0:30:36
unsurprisingly it's a tree it's a very very high branching factor tree and it's very shallow so it's only three levels no matter what okay there's a root which0:30:46
is a bunch of pointers to directories which are a bunch of pointers to segments each segment is a bunch of sorted datums sorted compressed datums0:30:56
and it's these things that we actually store key value like into storage it's a whole big block like 64k of sorted datums going at a time into storage0:31:05
that's how that looks and then we can lift up a level to what a database value is so we said the application has a database value all right so we know it0:31:14
like a map is in closure because it's like it's all in memory okay so how do we do a value like that to present it's the application that's actually backed0:31:23
essentially partially by storage and has IO associated with getting that storage and that looks like this there's a there's a root which is the the the DB0:31:35
value object which is stored in an atom this is it this is all the closure State stuff that I need to implement a0:31:48
database enclosure that's all there's not like a pile of stuff one atom is wrapped around their database it's got a0:31:58
pointer to that memory index so this memory index is that persistent sorted set a set of persistent sorted sets and it's got a pointer to a hierarchical0:32:08
cache that's backed with you know on cache miss with IO to the storage system and that's again in parallel so there's0:32:17
going to be a BT up here the a BT down here AVT a VT and everything else is the new the novelty is accumulating memory and everything that happened before is0:32:28
accumulating here so periodically will flush the novelty and it will be here this will empty out and all of this is immutable this is immutable that's0:32:37
immutable this is immutable this is the only mutable thing right this atom we're going to swap it occasionally to to new stuff so you have this mirrored set of0:32:46
things a memory version and a disk version and the disk is ultimately back by storage I'll describe a little bit0:32:56
how you go out for that so that's sort of the storage side we other part of the system is the process side right how do we send change across and fundamentally0:33:06
process is done with assertions and retractions but assertions in rich fractions are insufficient to do transformation right if I want to say I want to add $10 to your account balance0:33:15
if I just assert you know if your balance was a thousand I said a thousand and ten that's potentially a race condition with somebody else who's0:33:24
trying to adjust your balance so you really want to have functional transformation of values in the database and the way we do that is with transaction functions a transaction0:33:34
function is a function of the value of the database and some arguments and what it produces is other chips it produces other transaction data alright so if a0:33:45
transaction is assertions and retractions and maybe calls to transaction functions then a transaction function can return assertions retractions and maybe calls to other0:33:54
transaction functions right so that's what it looks like a set of these things and what happens is transaction functions get called and their results0:34:05
get spliced into the transaction so if I said assert assert retract call foo on something you know let's just read this as a just account balance for Fred but0:34:15
retract the servicer and then this turned into a two step process that was you know do this and then do that and those that eventually expand eventually it's going to bottom out on all0:34:24
assertions and retractions and foreclosure uses was the smell like macro expansion right it's it's just0:34:33
like macro expansion but it's a beautiful thing for a data model because it really makes sense there's ground for data which is assertions and retractions0:34:42
and functional transformations either expanded to other transformations or eventually it all returns to ground and it's it's only this set of final0:34:51
assertions and retractions that ends up in the database and that gets broadcast out to the peers so the trans actor accepts transactions0:35:00
it's serializes transactions right there's research done at Yale that proved it's much faster to just serialize transactions and do it all in memory than it is to have a complicated0:35:10
scheme for trying to figure out whose overlapping with who and how queries and and transactions interact with each other it's actually the same resource0:35:19
that is behind volt which takes a similar approach but volt still combines reads and writes where we only use this for writes so we0:35:28
accept transactions transactions need to be expanded right through in the process I just described they have to be applied that may or may not succeed that's all done truly functionally inside the trans0:35:38
actor right so if the trans actor has any problems doing any of that job the prior value is untouched it's unaffected it was done as a functional transformation there's no0:35:47
undoing and no complicated rollback at the point it's got an acceptable transaction it needs to log it at that point the transaction happened right so0:35:57
it's been made durable and that will then broadcast the change to everybody who's connected and obviously to the person who issued the transaction so they know it's exceeded the other things0:36:07
trans actor does is index in the background I've talked about that a little bit and indexing itself creates garbage I said there will be garbage and the slide yesterday so using storage0:36:17
immutably and using it to represent persistent data structures creates garbage but the beautiful thing about this garbage is it's not interleaved in0:36:26
the middle of another file right what happens when we make a new index is some nodes of the other tree are now no longer referenced from the new tree so0:36:35
we just say we don't care about them we delete them wholesale there's no merging there's no recreating a whole big master thing again and some of the big long rewrites you know associated with things0:36:44
like Cassandra for instance they go away because we're not taking the big table approach to merging inside the transact or implementation we use a bunch of0:36:54
Interop stuff so we use hornet cue to do the communications for transaction communication the internal structure of the trans actor is highly pipelined so0:37:04
again I talked about sort of programming with data and the ability to use cues to pipeline your architecture internally that's what happens inside the transact our implementation there's a lot of0:37:13
pipelining so even though we're so what's interesting is you say oh you couldn't make the trans actor faster if you did transactions in parallel the0:37:22
answer is no but it doesn't mean you want to have your trans actor use only one core of the trans actor machine so how do you get back leverage from having the cores you pipeline your architecture0:37:33
you say ok there'll be a thread that's just expanding transactions the thread that's applying transactions a thread that's taking applied transactions and compressing0:37:42
them and then a thread that's doing the i/o and a thread that's doing the acknowledgment now you're using all the cores in your box efficiently where0:37:51
you're still doing what's effectively a serial process with just a lot of pipelining inside it and all that stuff's done asynchronously like that and then of course the other nice thing0:38:00
about being closure is you know all of these storage api's we want to use all the storage engines we want to use have Java API so we just use them indexing0:38:11
itself has does extensive use of laziness so the indexing job has got to walk through the tree in memory and find the0:38:20
nodes in the tree that's in storage that needs to be replaced because they have data merged into them and that process0:38:29
is done in parallel it uses P map because it's dealing with a data set that's bigger than what fits in memory but we want to have parallelism also it does a lot of parallel i/o to the0:38:40
storage engines because we expect storage engines like DynamoDB to be highly parallelized so we don't wanna have a serial approach to interacting with the storage engine so again the ton0:38:50
of parallelism in the indexing job but it all uses ordinary closure stuff the one interesting point here is you know I told you the database was a single atom0:38:59
so if you think about the fact that the indexing job is to its role is to take what's in memory merge it into disk the0:39:09
the result of that operation should be you don't need to have it in memory anymore but while the indexing job was going on your more transactions came service you're going to have this point in time where you're finished indexing you may0:39:20
have accumulated some more novelty but you need to now say of that novelty I only care about the stuff that happens since I started the indexing job rather0:39:29
than have a a contention issue coming back from indexing we just put the indexing results into the IMP acute serially processed and then so you're0:39:41
going to process transactions see the results of an indexing job do the what we call the acceptance of that which will nuke the fact that you don't care about this memory index0:39:51
and not have any contention so we can still do it with one atom so I mean it I think the critical thing for closure0:40:00
people is you know closure has these state constructs there it's really great that they have really good semantics you know there's meant to be used sparingly and and this shows you how little bit0:40:11
tiny little bit of it you need for a very large system on the on the pier side right one of the things we've said is we're gonna have declarative programs0:40:21
who have an embedded data log it's a new language essentially that I had to write which was beautiful and fun to write in closure it takes data and and sources0:40:31
and rule sets as arguments so if you used to say Casca log sort of has an ambient notion it uses a user's actually0:40:40
you know and being embedded in closure and name resolution and closures sort of figure out what the what the data sources are this is callable from java so everything is explicit you know if0:40:50
you have a data source you want to operate on you pass it if you have a rule set you want to use in the data log Cori you're going to pass it and it's been extended to work with your own collections and it calls your own code0:41:00
inside the implementation it's all data driven right so queries are data rule sets or data query results or data0:41:10
transactions or data and by that I mean Java util collections Java you to list Java util map now obviously closure collections are those things so using0:41:19
this from closure is a breeze but Java people can also program against this without using strings they can write programs that write queries and so forth0:41:28
because the interface is actually defined in terms of data structures that should be familiar the implementation if you know anything about dialog there are sort of two fundamental ways to go about0:41:38
doing data log implementation one is query sub-query recursive and the other one is blanking I mean no magic sets0:41:49
Cory so Cory recursive actually great fit for closure and in particular because it's dynamic but the big advantage so how people know about0:41:58
CoreLogic and closure the big advantage over something like CoreLogic or Prolog is the semantics of those things our result at a time or tupple at a time0:42:07
the semantics of data log are set at a time that means that underneath the hood inside queries that run here entire sets0:42:17
are being merged joins and the things that you expect of a query engine like a hash join a true hash join right so I have m and N instead of doing M times0:42:27
and lookups I'm going to go and take n put it into a hash table and whatever the join characteristic is and now I have M plus n complexity that kind of thing which you never write yourself by0:42:37
hand in your applications you want from a database engine and it's possible in data log but not in the semantics of Prolog so it does that and therefore is0:42:47
really fast it leverages the indexes when one of the components is the database itself and when it encounters expression so you can embed expressions0:42:57
in queries it uses the closure compiler it calls you know eval not for not for interpretation reasons button or evals a0:43:06
function definition once and then caches the results so it can make a true function call you know every iteration so it's not interpreting but it does use the codes or compiler at runtime and we0:43:17
cache all the transformation stages so I talked about the difference between over here and over there with it with a with a server we have in the peers direct0:43:27
access to the storage server so we're actually going to embed in the peers the library for communicating with you know dynamo or sequel or whatever your back-end is they have this query engine they have that database I described0:43:38
write the combination of the live index and them and us and a storage back to index being live merged that happens in the trans actor for transaction support0:43:48
it happens in the peers for query support the exact same thing the exact same components are present in the peer to represent the database appears also0:43:57
have a caching scheme built in obviously the data set could be huge right it could be arbitrarily large but all any particular application server cares0:44:06
about is it's working set but it doesn't have to store the whole database it doesn't synchronize the whole database does have to keep up with the whole database right but what it does need is0:44:15
it needs when it goes when it goes over to story to remember this is part of my working set so we have two level caching the first level of cache by the raw segments0:44:25
out of storage which are actually you know a binary format that's been compressed and very efficient you know thousands and thousands of datums per segment and we0:44:35
keep that in a on or off heap cache and then the higher tier is an actual object you know Java objects on the heap which is what you need when you finally want0:44:44
to evaluate them from an implementation standpoint again we said we're going to use Hornet queue for the transactions we use that we use Google guava collection0:44:53
and well Google guava caches to do the caching stuff we found it to be okay but has some overheads we'd like to get rid of obviously we use the Java API for0:45:04
storage then the other thing we present to application programmers a different way of looking at the datums is as entities and entities feel like maps so0:45:13
you can say of the database you know get me Fred and what ends up happening you should get back something that looks like a map you can ask for its keys you can you know do get and keyword lookup0:45:23
on it but it is effectively a multi map because when you when you look at the atomic because we have multivalued attributes you can say I like pizza and0:45:32
I like ice cream and I like whatever all the like attribute is multivalued the maps that result from this are also what are called multi maps one p can map to0:45:41
more than one value when when it is a multi valued key the value you get back you can consider like a set of values so0:45:51
that's kind of an interesting an interesting thing is very useful it makes it extremely easy to use from closure but also you just map like interactions to Java one of the neat0:46:01
things that we have those the reverse attributes so we talked about there being this inverse index so we can point backwards and that you can't do with0:46:10
closure Maps obviously because so the map knows what its things are but because this is actually a set underneath it we can go backwards so0:46:21
what are the consistency and scale characteristics of this well obviously the process goes through the trans after right so that has a very traditional model of scaling fortunately when you0:46:32
take trans actor and you remove all the concurrency stuff and all of the needs of service reads and queries you can0:46:41
accomplish a huge amount of work with one box and that is the scope of the kinds of systems for which day Tomic would be suitable if you need arbitrary0:46:50
read/write scaling it's not the right system but people that are choosing this or choosing it because they want acid and they want transactions and they want0:46:59
queries and all the other things that they would otherwise have to give up if they did that and you you would make that highly available in a traditional manner with a standby machine which we0:47:09
support and then the immutability is really the key to these consistent reads right by mutant by using storage immutably you know you can cache relentlessly that whole notion um you0:47:18
could you have a CDN for a database you could actually use a CDN for diatomic and we completely totally work you could use HTTP caching for diatomic segments0:47:27
because they never get changed and because you have a basis for deciding whether or not um that is the latest those are the problems it sort of solves and then if you wanted scale reads you0:47:39
can you know obviously have more peers you get more query and you can scale reads depending on the storage that you choose so storage like dynamodb really it has a knob unfortunately when you0:47:50
turn it it costs more money but it's still really cool to have a knob and that's what you want and the query scales with the peers the testing story0:48:01
is is really interesting probably a whole independent talk which steward do at some point because he's in charge here but I'm just you know so you know0:48:11
test generative was born in inside the atomic it's what we use I I found one of the most interesting things about the prior talk was talking about the value0:48:21
of tests in terms of information theory right so if you write a test you know it will always work how much information is being generated by that test exceeding0:48:30
none right that's a really really important point it doesn't take away from the value of that test as a regression barricade but generative0:48:41
testing is really good for figuring out if you got right in the first place and because it is it is generative and you missed the talking that was awesome0:48:52
no it was great really great so we do that and we do a lot of functional testing we do not do a lot of the unit testing whether you say this should obviously do this I hope it does it0:49:02
forever and then at the higher level we do simulation based testing which is again really interesting but I want to summarize and have some time for questions so the last couple of slides0:49:13
the first thing I'd like to just talk about is the fact that being a simple system and using closure the way we have was a definite source of agility I don't0:49:24
think this these two things get connected enough but it was another critical thing that was in the last talk right he talked about margin right and0:49:33
Kenya if you increase your capacity you can deal with more variability right but then he said don't stop there in the very next slide the very next slide was an argument for0:49:43
simplicity and software was a big involved slide but the point of it was your degree of architectural independence is going to dramatically0:49:52
improve your ability to deal with variability which is what we consider agility to be can you do something when things change without redoing them0:50:02
without rework if you can ever get his slide deck the slide after the leverage slide where he talked about capacity he said the other thing to mitigate0:50:11
variability in your process is isolation of components architectural isolation of components this is the simplicity0:50:20
argument I've been making the two things are connected I was so happy to see that I wished he had said there were simple somewhere so how do we get a agility0:50:29
right one of the things is that the these subsystems are defined in terms of protocols and the protocols are really really small like seven entry points is0:50:39
the biggest protocol the protocol for storage is three functions three functions so we support a whole bunch of0:50:48
back-end things right we support memory we support sequel embedded we support Postgres and sequel server and stuff like that we support and spin and finis0:50:57
pan and dynamo and we can probably add others right we did not know about dynamodb we did not we were not on the beta we did not know anything about it it came out in January or whatever right two weeks0:51:08
after it came out we had taken out my own homemade version of dynamo and swapped in dynamo and changed our business model in two weeks I mean it's0:51:20
huge architectural change to the system we were running our own clusters and everything like that just vanished right very straightforward two weeks that took0:51:30
supporting something like Postgres or in finis ban was a one day job one day you got a new new back end so I think there's a lot of a lot of power in that0:51:41
in terms of leveraging closure all the traditional leverage points that you see you know in lists and now more specifically in closure right did we use print read on closure data relentlessly0:51:51
because it's a cheap way to get serialization absolutely it's awesome it's brilliant it works you don't have to think about it do it0:52:00
you know if you don't if you don't consider doing that already in your programs just do it is fantastic having an embedded language is another sort of characteristic of you know when0:52:10
applications get large enough right you're gonna need an embedded list well if you start with an embedded list you're ready to go runtime compilation was just there right when you have0:52:19
closure you have a runtime compiler if you want to have a language that compiles at runtime you just sit on my back it's there right up I'll carry0:52:32
extending standard interfaces of per and protocols of closure another big leverage point you don't have to go all off on your own right when you start writing your own stuff0:52:42
you should always think about can I extend one of the standard interfaces or protocols because then I can just plug into all the other algorithmic stuff that's sitting around you should always try to do that always seek opportunities0:52:52
to do that and of course using things like def record automatically make you play and a whole bunch of things but when you're doing something more specific you should order always consider that oh I'm doing a vector like0:53:01
thing should you support enth should you support indexed you should if you do you're going to get some benefits out of doing that obviously we used interrupt you know extensively and that paid off0:53:12
for us and you can see how we extended the state model so in summary I think closure was made for this kind of app it's not surprising but you know poser0:53:21
wasn't made with this app in mind but this category of application it needs to be very fast it's it's a large system but no part of0:53:32
it is large there is a ton of concurrency if that wasn't evident there's a ton of concurrency we never we never ever write so we never sweat about0:53:42
concurrency never I mean it's never one of our problems because we just it's everything is immutable and when we need some some coordination we use one of the0:53:52
constructs you know like atoms to make that straightforward we definitely leverage certain interrupt things as the negative thing I would say embedding closure as a library is still not great0:54:03
the startup time that we sort of amortize on the server we now have to pass on to our customers we're going to consume the peer library of course most0:54:12
of their consuming applications are themselves servers but it's still something I'd like to improve but the net result I think of implementing the atomic enclosure and following the0:54:22
closure principles in the implementation of de Tomic is the resulting application was simple and I think the same benefits are available to any application of a0:54:31
similar size written in this way and any questions we have time yes no not at the0:54:43
present time right now we need to do that and it's0:54:54
mostly just because I'm not I'm not sure I would consider that part baked yet as I add more backends I refine how that looks and so publishing it would be sort0:55:05
of pouring concrete on it it might be premature but certainly anybody who wants a back-end support it should talk to us we're very interested in doing it and obviously we've already spoken about0:55:14
a couch bass I think it's logical yeah you mentioned that protocols are all small examples so the protocol above0:55:27
storage is the cluster protocol which is the more more involved one when we when when you talk about I talked earlier about that state model of closure state0:55:37
model having values and refs and pods that protocol is defined in terms of the storage protocol but it is a wider0:55:47
protocol that has seven entry points that is the biggest one that's the big that's the big kahuna the cluster protocol which has I mean actually it could be divided into two separate ones0:55:57
and then and each one would be three and four but that that's as big as that gets other critical entities sort of your your first four did you evolve your way0:56:11
so I did the cluster protocol first so I worked on a state model very very hard in fact the state model was built so that it would comply with HTTP semantics0:56:20
even though we weren't necessary going to implement it that way but I tried to make that work and that's where how I ended up with that seven entry point thing and then we implemented a dynamo0:56:29
cluster and it was our only intended storage engine so it was an implementation of that protocol but even though we really had one there was a protocol for it and then the Dynamo0:56:39
thing came up and I was like oh boy I'm glad I put this behind her and we swapped dynamo in but at the point of time I was doing dynamo I realized I0:56:48
needed less of it then this first protocol did so it was a refactoring job to create that storage protocol it didn't change this one but it put0:56:57
another layer in and then that one is the one that's really trivial to make cop to make implementation so I'll be one example of sort of the evolution in terms of changing stuff I0:57:08
spend a lot more time before I start so I don't because I don't really like hashing around on it as Stu knows he's always waiting for me to get off the hammock and put some code in it's a0:57:20
cluster right because when we we only had the distributed storage system I was writing about testing and I think I said him off because I said in a mocker step0:57:30
or something like that is you know his hair still in it and he came back the next day and this is a great example of closure protocols he hadn't taken the cluster protocol and extended it back to0:57:40
concurrent hash that happened job which gave you a conformity implementation of the entire stack that ran in memory right which is how which is how I do a0:57:49
lot of the small localized testing right because you don't ever have to you know mock or stub for performance reasons with this thing because you can use the the whole stack thing that's just backed0:57:59
by in-memory collections and that was a trivial job to do the closer protocols there would have been potentially quite tricky with a different implementation right there are protocols around the0:58:09
protocols or interfaces around the datums around the peer what appears to the peer to be a peer so we can swap peers out that actually don't have any of the same infrastructure behind them0:58:18
at all but satisfy the same same communications I think it's critically important you should always put a protocol or an interface between any two0:58:27
things in your system if nothing else and if you do data-driven programming you can also then put a cue in or put a wire in between two things and those two architectural guidelines0:58:37
will solve like 90% of your problems Yallah pistol something like that you0:58:48
know we just saw the the benchmarking thing yesterday with the Yahoo whatever and we intend to implement that so we can show some comparable things but we0:58:57
haven't done Java pet store or anything though there's a small small tampon website that uses the Seattle yeah but0:59:06
it's not it's not a point it's not a point of comparison with something else though yes we we don't yet now that so0:59:26
the time that's on the transaction is what were your two terms straight a technical time right so to the time0:59:36
right so the time on transactions is technical time but the thing is transactions are first-class so you can make assertions of transactions so if you want to assert an attribute of a0:59:46
transaction which is its business time you can do that but the granularity you have for that is the transaction level not the datum level otherwise datums0:59:55
become enormous but so that's a business time to do well that what I'm saying is if you add the business time it depending on what you need to attach1:00:04
business time to if the transaction is is coordinated with that you make an attribute transaction it's very efficient you could have added a thousand things in the transaction you1:00:13
can get from those facts to the transaction and then to the business time very efficiently or any other fact about the transaction business time business user business process you know1:00:24
business approval put them on the transaction so it wouldn't be hot I'm sorry do you know some transaction does it has1:00:34
natural limitations that say if you do pipelining on 20 chords do you have numbers to show us that this is not the bottleneck it is the bottleneck it is the bottleneck no1:00:47
because nothing is infinite that's why it's not a problem right if you need arbitrary right scaling this is not the system for you if you're like 99% of the1:00:57
businesses that could not saturate one box with the amount of novelty in your system this was a good fit it's that simple but trying to make a universal1:01:06
system that can handle infinite it means dropping a whole bunch of value and the point of de Tomic is I'm tired of seeing that value dropped I want that value I know many many many businesses that want1:01:16
to leverage that value and giving it up is a bad idea for those businesses it's a bad choice saying I want to possibly maybe one day support infinity is a bad1:01:26
choice for most businesses0:00:00
The Language of the System - Rich Hickey
0:00:00
Thanks this is the third Cange and the0:00:09
fifth year of closures being a public thing and I couldn't be happier to see everybody here and a lot of good old friends and new friends and so excited0:00:19
about the vibrancy in the community and obviously the creativity of everybody involved so congratulations on what0:00:28
you're accomplishing now what I've been accomplishing is is something I call TBD and I'm a little bit frustrated because0:00:38
my my thing leaked you know it's like one of those Apple Apple Keynote so TBD0:00:49
was it mean to to better do and that0:01:02
should have a little trademark a trademark so to better do is a is a new massively parallel concurrent AI driven0:01:15
to-do list application and and our trademark is putting the personal back in P Mac0:01:29
that's all I have there'll be a github repo tomorrow with nothing in it and that will probably although it will ever be no so today I'd like to talk about0:01:40
the language of the system which is a which is a title it may not convey anything in particular but hopefully it will make some sense by the end so one0:01:51
of the things I think happens to us all especially as enthusiasts of languages and some some people use their language is like it's just a tool or whatever and then you're like you find something that0:02:00
you really like and you become enthusiastic about it and you look forward to enhancing it or making libraries for to making things to interconnect with other things and you0:02:11
you sort of define your world synonymously with the world that's implied by your programming language and it's impossible to avoid this right0:02:21
because the semantics of a language they eventually you know pervade your brain we say things in these conferences that you know people from outside the closure community be like how come you can say0:02:31
that nobody says aw yeah you know it's all that it's all data you know it's all the data oh yeah I I know it is0:02:42
I hear you here so a programming layer defines the world and and and I'm going to say language here and and I really mostly mean sort of the language in the0:02:52
corresponding runtime because we have languages a lot of languages at the bottom the perimeters are kind of same this control flow and things like that and the runtime sort of enhances that with a bunch of other things but but we0:03:03
get involved in this programming language as the world and then of course if it's a functional language like closure we get even more involved with wow this functional part this is the0:03:13
good world this is the world I really want to live in and everything else is sort of like the ich so I have the good world and we want to minimize the ich0:03:22
you know and we call it IO or something like that and by painting it as IO we almost sort of like would like to make0:03:31
it somebody else's problem and like Haskell is really good at this you know it's like there's a monad it's like stay out you know it stays over there we don't really force that but by by0:03:40
convention and discipline we try to do that but it's important to note that you know that's never been closures approach to imagine that that part of your application was not important I mean the0:03:50
whole existence of the state model is there because you know actual programs need to do interactions with the world need to affect the world if you're not0:03:59
affecting the world I don't know why you're writing software so really really is important so if we look at what constitutes a language and again sort of0:04:08
language what's runtime we get all of these facilities and this is in no particular order but some of the things that that really matter when we start talking about the bigger picture as being either present or missing or the0:04:19
analogies either hold or don't are things like a memory model right so we have this presumption in Java maybe maybe enclosure you're isolated from this but as the author of closure0:04:29
is author of the the primitives that guards state and memory transitions the existence of a memory model and job is super critical it's a big big promise0:04:39
and you know the fact that it's present it's true for all libraries written in closure or not that running the same runtime that that's based upon you know0:04:48
a resource management structure like garbage collector that's shared is a gigantic suite of facilities that's common both to your your language other0:04:59
things written in the same language and things written in other languages calling conventions this may be who even knows what a calling convention is anymore C programs remember calling0:05:08
convention because you had all these choices right and maybe maybe maybe even in the absence of you know who's pushing what at the staff level we still have0:05:17
sort of conventions around deciding whether we pass values or references even in Java though that's sort of disappearing but that would be one one aspect of it resource management like I0:05:27
said mostly in in the memory space we know eventually the runtimes and the languages start not helping us anymore with resources outside of memory there's0:05:37
all kinds of coordination right we have monitors we have volatile and things like that to interact with the memory model that help us coordinate things and0:05:47
again that's sort of embodied in the primitives enclosure right swap and things like that our coordination primitives that rely on coordination primitives down underneath and of course0:05:58
probably the biggest things that we derive from languages as we touch them that are that are more fun I mean again this there are the primitives for control flow and whatnot are any of the0:06:07
tools for abstraction and or type stuff and of course some languages emphasize this more than others and closure probably does not emphasize it as nearly0:06:16
as much as some others so that's what we talk about when we talk about programming language and typically language when we talk about system we're0:06:26
talking about something bigger bigger than a program in particular I'm talking about something bigger than a program so the definition of system is is the roots0:06:36
of it are in stands together and by that I think the interpretation I would take is that you know one leg of the stool is not particularly useful thing0:06:46
and a stool with two legs is dangerous but you know when you compose enough of the pieces you end up with something that performs something a useful a0:06:55
useful function and it's actually these systems that most of us deliver how many people how many people have a main product of their effort that is a single0:07:05
program that doesn't interact with any other programs how many people think mostly what they do is build systems or parts of systems right so we do that but0:07:17
the programming language is pretty much stopped before the system in other words a system is this composition of things whose language doesn't know anything0:07:26
about systems it doesn't say anything about systems this in sambala programs of course there's lots of ways to build systems and I'm going to try to narrow the scope of that because in the old0:07:36
days you just any two programs could talk to each other any particular way and you know that's a system and it is still a system I think over time we've0:07:45
gotten more disciplined about how we build systems and now we tend to think of systems as compositions of programs that offer services to other programs and that's an analogy we can draw out of0:07:55
what we do inside programming languages right you can get libraries that give you services as you consume the library and then you know in the process space you have services that you can call and0:08:04
they have certain API s and you call them and that's what happens but there there are many things about system that are very different in particular there's no global supervision anymore0:08:15
a lot of what we get inside the language is not there right there's no global resource manager there's nothing watching everything there's nothing that knows everything that's going on could0:08:24
be more than one process in the same box it could be more boxes there's no like person in charge of the internet making sure everything is okay and and the0:08:36
question is how do we connect these how do we connect these pieces and the premise of this talk is that there's a way to talk about the way we connect0:08:45
these pieces that draws analogies to the way we talk about how we connect pieces inside programming languages and it both informs the design of systems and I0:08:55
think goes the other way and systems should help inform the design of languages of the use of languages so when we say language what do we mean the0:09:04
root again is tongue it's obviously about communication right but everybody knows you know the old saw about programming is you know you think it's about talking to the Machine and in a0:09:15
certain sense it is but it's certainly also about talking to other programmers right so you write a program the other programmer could be you write later ten years later you look at your codes like wow who said who said that but I think0:09:27
it does split out a little bit right so I think in all cases all programming language and all the use of language around talked about is somehow about programs talking to programs as0:09:38
programmers talking to programmers but inside a programming language this is also the other aspect which is the programmer talking to the machine do machine make this happen do this stuff0:09:48
but that a very interesting different characteristic of the communication that occurs between programs in a system is that the language that's used there is0:09:58
the land is a language for programs to talk to programs almost definitely it's extremely rare see the interface on a0:10:07
service be one that's oriented towards people or at least oriented towards people and human interaction fundamentally it's fundamentally0:10:16
oriented towards or program talking to a program and that's going to become really important as we move forward so one way to think about these two these0:10:25
two things is as stacks stacks of specificity and hierarchy and and encapsulation so at the bottom of a programming language is a bunch of0:10:34
primitives language primitives for control flow for memory acquisition and things like that then on top of that we have core runtime facilities and core0:10:43
libraries and/or libraries from third parties and then finally we build our application libraries and our applications on top of that that's sort0:10:53
of all inside the program inside the program view if we look at systems I think it's a little bit harder to sort of tease out what are the what are the0:11:02
primitives of systems but certainly if you start with the communication side you end up with two very evident pieces to the language of systems right one is are the protocols right UDP0:11:15
TCP HTTP WebSockets all these things right sort of the negotiated transfer primitives that we have and the other0:11:25
are the format's what do we say over these protocols and I think that's pretty evident and straightforward although I will talk more about formats0:11:36
but not at all anymore about protocols then the analogy to the next level up though I think is an area where we're particularly weak in in having good0:11:46
language for it that's where the focus this talk is going to be and finally somehow at the top we ended up with either portions of applications or entire applications acting as services0:11:56
and or consuming each other as services and that's a system of course is it there's a joining here because those things that are the applications on the0:12:06
right we're written using the stack on the left but the stack on the Left doesn't have a lot to say usually doesn't have a lot to say about the0:12:15
stuck on the right so the first thing we have to talk about is say what again we talked about protocols and formats but formats are huge right how many0:12:25
different ways we have to talk over these wires what are we sending XML JSON is probably the big winner right now protocol buffers and then of course it's0:12:36
quite common in this room would be Eden enclosure data but there's also Avro and Hessian and Burton how many note what all of these things are not too many how0:12:46
many know of those people could make a matrix of two as to why one is better or different than another and yet you know0:12:55
this is actually pretty important right this is what we're going to be saying from one process to another it's a huge thing and it's full of decision points I0:13:07
think one of the things that's really cool about it is all of these things are representations of data what's not up here what key Java technology for things0:13:16
talking to other things is not here well that's not really what in this yeah with RMI right RMI yeah big winner0:13:26
how about decom Korba anybody okay these are not even on this list right they all0:13:35
lost that all lost for really good reasons so we're not going to talk about that we're already reached a point where every single one of these choices is of0:13:44
data format it's so already we've got this great premise the way services are going to talk to each other is by conveying data not through some hyper0:13:53
linguistic or extended linguistic thing where there's all these extended verbs and there's a notion of a program object being on a different machine and things like that we're just going to talk with0:14:02
data so we have to talk we have to split out what about the data is good or bad what are the decision points one is extensibility right given this format if0:14:13
I have a new thing to say to you tomorrow is there a way for me to encode that if there's not it's not extensible which of these things on the list is not0:14:22
extensible JSON there you go that's not really that's really not good and at0:14:32
least a couple of problems we'll get to later and there's two notions of extensibility one is two new types the others two new versions right so there's0:14:41
a sense in which for instance protocol buffers are really mostly about being extensible to new versions you can make things go to new types but an existing consumer can't be really aware of those0:14:50
but they can be tolerant of new versions self-describing which of these things is self describing XML kinda-sorta0:15:03
what else not protocol buffers Avro Eden Hessian and Burke and and Erlang0:15:14
transfer which is what Bert is a flavor of what does that mean to be self describing it means that if I have a0:15:23
decoder that understands the rules of the format I can read anything that you said and I don't need to know anything else out-of-band I don't have to get a0:15:32
description any other way that's not true a protocol right so I start streaming your protocol but for stuff it's like gobbledygook if you've never seen the schema and where is the schema in the protocol buffer0:15:42
stream it's not in the stream it must be transmitted out-of-band so we get to this other part which is schemas0:15:51
in or out of band of the ones that are self describing one of them has schemas which is that well that's optional0:16:00
though but one has the one that's required for for reading them now Avro toko buffers were a bro Avril has a0:16:09
trailered schema thing so you and you have this question of the schemas in or out of band Avro has schemas protocol buffer has schemas average or in band0:16:19
protocol buffers were added band but both of those have more requirements on the schema interpretation than something like Eden or XML of course XML you can0:16:30
definitely read it you may not understand it you can read it without anything if you have schemas they're sort of optional why does why does it0:16:39
matter whether or not schemas are in and out of in or out of band means on the slide if you have scheme is what can't you have if you have outer band schemas0:16:49
what can't you have you can't have these things generic processors and intermediaries it's really interesting that Google came up the protocol buffers0:16:58
imagine if the internet was built with protocol buffers how good was Google search be it would be bad right because0:17:08
they're in the intermediary business they are taking advantage of the fact that any HTTP HTML processor can read any HTML right if everything was the negotiated contract it just simply0:17:18
wouldn't work so you really have to understand it's not to say the protocol buffers are bad I'm not saying that right but what I'm saying is that there's a spectrum of choice0:17:27
and-and-and-and trade-offs that's really important here it's as important as choosing a language when you pick your programming language but picking any programming language now leaves you with0:17:37
this decision when you move up to the system level of course a lot of times this is not your choice right you're consuming a service that somebody else has made a choice and I highlight sort0:17:46
of the next problem within this space which is that there's nobody in charge when use a programming language the programming language kind of sorta says0:17:56
well we're all going to pass arguments like this and we're going to define our types like that and everything else and with no one in charge systems struggle against this set of independent0:18:05
decisions which may or may not compose and the format's problems is that is the first place this comes up so this this scheme is out-of-band is really tricky0:18:14
and that's one of the things where people like Oh JSON where I can put dates in JSON right how do you put dates in JSON as strings and how do you know they're there add AB and you go back to0:18:29
the napkin right it's like if the if the if the key has the word date in it then the string is a date there we go and so0:18:40
there's another aspect of that which is that's not merely out-of-band right if you get a protocol buffer schema out-of-band like it's not a napkin right it's very straightforward JSON is very0:18:50
very the people's use of JSON is extremely context dependent and a lot of times that context is not captured anywhere except on a napkin it's like0:19:00
okay well we've all agreed to send this and like you know this is coming and therefore you're going to go to the you know last edited field and you happen to know that last edited is a string that has a date in it so that context0:19:11
sensitive is really bad so obviously in this room we don't have to talk about the value of values we like values and and I think the only thing to do here is0:19:21
to sort of again think about the difference in differences between programming languages and systems with values so we definitely have values and systems at least at one level on the0:19:30
wire right we just looked at all the popular formats for transmitting stuff they're all data formats they're all values right we're not really passing a reference to a guy that you're going to bend call back on as RMI interface to go0:19:41
get more stuff and have this big chattery communication with objects we just convey the data that we care about so that's fine those are ephemeral and0:19:50
they're usually nameless and in programming languages values are often usually nameless right we have the same notion we can pass values wait we get0:19:59
our value as a return from function we just have it we start processing it I mean Java is not a particularly strong language for values because everything almost is a reference0:20:08
type but in languages that really have them as distinct things a lot of times values are completely anonymous you have an array of structs none of the struts have0:20:17
names if however you want to have a value in a system that is not ephemeral that means that either maybe it's large it's so large I don't0:20:27
want to put it on the wire and send it to a hundred people I want to put it somewhere and let the people know where it is or I want to have memory in a system I'm going to remember a value in0:20:37
both those cases you end up incurring a new thing which is that your values need to have names and that's a definite0:20:46
change versus your programming language it's one that really matters because until we start becoming more cognizant of when we're manipulating values and0:20:55
that this is the name that names a value we're going to keep making these icky messed up systems that don't distinguish references from values for instance how0:21:06
do you know when a link is a permalink you don't know when the link is a0:21:15
permalink because they on the webpage where you got it it said this is a permalink and so when designing a system0:21:25
you need to be more considerate of this and call it out so that brings us back to names and again here we sort of have this difference right inside a program0:21:35
we have all these great scopes right I'm in a local scope I have a let just nobody knows about this now I'm going to function I'm also sort of cool and this function is the namespace that's also0:21:45
sort of gray and then the namespace is on github and then what happens then we're all fighting for names on cool0:21:55
names on github or use all of all the characters and all the stars and robots and you know names of food and so it's0:22:06
really critical once you lift up as a system right and nobody's in charge anymore what's true of most system names their global I mean they're potentially0:22:18
global and you really need to think about that you really need to be considerate of the fact that as you start building systems as your names start escaping out of your processes0:22:27
that they are global names right and the really tedious things like Java's you know calm dot whatever not whatever that0:22:36
stuff matters right because what's calm dot whatever where'd that come from somebody who's in charge right there's that somebody in charge there in the0:22:45
absence of that it's a free-for-all and so those those DNS names and and whatnot become critical and using fully qualified namespace names that are truly0:22:56
global names is an important discipline for doing systems but it's also interesting to think about how different the names are one of the one of the0:23:05
things that one of the things that what are the most of your names in a program so say in a closure program most of your names 99% of your names are what they're0:23:15
one of two things right there either locals or what the names of functions and we have a huge huge number of names0:23:25
dedicated to functions in our that's where most of our names go they're mostly verbs what happens in systems who likes to work with the0:23:36
system has a ton of verbs that's really interesting right why is that there's all of these inversions as we get to0:23:46
systems aren't there right we have lots of names of verbs hardly any we have this global control we don't have global control and we are going to have a lot of names in systems with the gonna be0:23:56
use for other things probably not verbs machines and things like that storage locations and then these values right are going to need names another critical thing so so0:24:07
systems look like this every every process has a number no obviously they don't this is being lazy on Google images like that has circles and lines0:24:16
it's it's faster than me trying to learn how to do that in keno does anyone know how to make a line connect to a thing and stick like I moved the thing and the line is just sitting there0:24:25
dude can you make them connect no I can do the day I can do it there but then it's like two things and then there's the internet it has this picture so if0:24:35
you ignore the numbers the numbers are not important the numbers are not important but by law systems have this shape right it's fundamentally hierarchical it's not like everybody's0:24:44
calling everyone it's this big big nightmare right it's generally some things call other things call other things come back come back and there's some sharing across there maybe a couple lines across at a level and there may be0:24:54
one guy at the top you know from your perspective that's you who I get to consume all this stuff maybe they don't serve anybody else depends on how I'm situated but the critical thing here is0:25:04
that while each of these things in their bubble might make a ton of sense maybe they're written in Haskell and like it's proven that they're correct or something0:25:13
awesome right as soon as you start drawing lines between them what happens all sorts of new implications about what things mean have arisen have emerged0:25:24
from the connections of these things and it's different you know in some way from consuming libraries you might look at this and say well it's not different from libraries oh I have libraries is0:25:33
the same thing they wrote the library and they did whatever then I'm consuming it but what did the library and you share a of stuff all that runtime stuff you0:25:42
share all kinds of presumptions about memory coordination locking threads garbage collection the whole the whole nine yards what do you share between these things0:25:52
some wires routers and thing and things like that so the question is you know where do the semantics of a system what0:26:01
does this mean how can we define that the pieces such that we can sort of get a grip on what this is so usually it's0:26:11
hierarchical but that's not enough to really understand it and this is where I think we really run into trouble this is where the problem is right what what0:26:21
does that look like it looks like object-oriented programming right all these objects are connected and they send stuff to each0:26:31
other and whatever and and it's and it's possible right it's possible but that this that this system built out of all these processes is exactly like objects0:26:42
at scale right every process is like an object and it's stateful and it sends things over to other guys and then they change and the whole thing is really0:26:52
exciting because because service is an arbitrary notion was it mean to be a service you know you send me stuff on I0:27:02
do stuff I mean one thing that's sort of telling is there aren't a lot of verbs which is kind of good but you know all the services are still nouns the fact that they don't have a lot of operations0:27:11
is helpful about saying well maybe they're not like objects but there's nothing stopping them from being objects so that is that is crossed that right0:27:20
yeah so so in what way is this not object orientation how do we keep it from being object orientation in the0:27:29
large because if we if we you know spent all this time doing functional programming in the small only to build object-oriented programming the large then our system in the large is still0:27:40
going to have the negative attributes of object orientation so I think one way to think about this is to think about0:27:51
machines and product lines and things like that what we're trying to do here in the next few slides0:28:00
is to try to think about a way obviously we're saying change happens right we know that this is a dynamic system it's producing stuff it's affecting the world that's the point of it so we're not0:28:09
going to try to deny that but what's a way to organize it such that we don't end up with object mess and one ways to think about it like this this production0:28:19
line thing so what does a machine do a machine applies forces to accomplish work now think about like car factory times in a car factory well people go in0:28:31
there every day and they work real hard and they mutate the state of the car factory and then they go home right0:28:41
that's like objects that's like an object-oriented program right maybe you know some stuff like no it's not like that right there's like one end of the0:28:50
factory and something comes in there what raw materials parts you know things iron in tires and stuff right em and then something comes out the other end0:29:01
what hopefully cars right and so this notion of a flow I think is the key to0:29:10
keeping keeping a system sorted so there's a bunch of characteristics that you can combine that will even though that they technically a certain0:29:20
percentage of them are not functional accomplish something in a way that is not place oriented right if you've heard me talk negatively about place0:29:29
orientation right that you know we all into the factory and had a good time and went home and like the factory is now better on this place orientation and this kind of flow orientation it cures0:29:39
that so what are the what are the things that we have went in in flow we have transformation right we're going to so one of these are going to be doing is transforming values so I'm going to take you know the lugs and whatever things go0:29:50
and tire I'm going to screw them together and now I'll have a wheel instead of the parts of a wheel we're going to move things from one place to another we're going to route them maybe0:29:59
it needs to go here or there we're going to have decisions about that we may remember things right and again the word remember is a term that that is0:30:10
not incompatible with functional programming in a way that update is and I think the critical thing to sort of0:30:20
making systems out of these parts is that you as much as possible keep them separate in other words when you make a transforming moving routing remembering0:30:31
thing it's really going to be hard to keep that from being a something you can't take apart and reason about or combine with other things right so even0:30:42
though each of these steps I think this has this has a sound use if you were to put them all together in one thing it would not know would not be sound anymore so you want to keep transforming0:30:52
separate from moving and moving separate from rounding rounding separate from remembering it's like that and this is the difference between flow and places0:31:01
but move and route and and remember are not strictly functional that's okay we know we need to affect the world so transformation this is the thing that's0:31:10
easiest right we know transformations just functions right it's basically straightforward the only thing here is that generally there may be some input to the function which0:31:19
is now not just sort of a local input from a call from a programming language but it's coming over a wire and there's output over the wire the thing that gets a little bit trickier sometimes with0:31:29
functions at the system level is that sometimes you need to convey information out of out of you know off the wire you know I need to put it you know in a0:31:39
database so that you can see it later and I'm not going to actually put some huge thing over the wire to you in every message and in that case you now have0:31:48
this sort of stranger view where I need to run this function and what I have is not the value but what the name of the0:31:58
value and I'm going to try to distinguish the name of the value from a reference because they're actually different so sometimes you work to and0:32:07
from storage otherwise though it's still functions this is not straight this is not hard now we get to moving things around I think it's one of the things0:32:16
enclosure maybe I didn't make clear enough I didn't need to wrap them is that the queues in job you took concurrent are0:32:27
awesome if you're not using them as part of your system designs internally you're missing out and in the large queues also rule because they have this really great0:32:37
characteristic they're completely decoupling right messages what it what happens with the message a says something to be when a says something to0:32:46
be what is a need to know be right that's a problem if a puts something on a queue who gets it don't know0:32:56
so that decoupling is really good both in the identity of the consumer also in the availability if I put something in a queue and and the person is post0:33:05
consumer is not running and what does it do I care not usually there may be backflow on some other kind of0:33:14
considerations but the availability of the consumer is also something that you don't care about right again a directly connected message a said something to B if B is not around that's now a problem0:33:24
for a if a put something on the queue presumably if you can make the queue more available than B you get this you get this independence both in the0:33:34
identity of the consumer and the availability of the consumer which is extremely strong the other great thing about conveyor belts and queues is that what do they do0:33:43
what's their job move stuff what's their other job there's no other job that's all they do right so it has that0:33:54
characteristic we we had from before I mean when you get to pubs about you really you end up with routing and moving and they're both on this slide but that's that's really strong cues are0:34:04
extremely important cues are decidedly different from messages right for those reasons messages they need an available consumer and you need to know who you're0:34:13
talking to it's architectural II completely different all right now this memory this is the part that's really tricky right because you do not have a0:34:22
ton of great options for memory that are not place oriented there's a new thing that's kind of good for this but0:34:33
but but you don't need to even use that the key point I want to make here is that the epical tie model the one that's behind closure it works in systems it0:34:42
works at the system level I'm going to show you the picture again later but the basic idea is what we have reference types right and we have values and the reference types only ever contain values0:34:54
they only ever just point to values and they have semantics about how they transition from one value to the other there's nothing about what I just said that is about closure that is about0:35:05
memory that is about locking there's a little bit that's probably about Kaz but not Kaz on the chip it's a very very0:35:16
general notion and de Tomic implements that notion in a large but you can also implement it yourself right and you're going to need to combine a couple of things you need to combine naming values0:35:26
with some sort of reference and some sort of ala carte coordination so this is my old slide of the epical time model0:35:35
closure implements this right we know Adams or this refs or this agents or this and we can do this ourselves what we're going to say we have a reference0:35:45
it takes on different states over time each of the states is a value you're able to obtain the value out of the reference as an independent thing and we0:35:54
just said before about values in systems that you're going to need to get a hold on are going to need to have what names they're going to need to have names0:36:03
that's what's different and then we can transition from set values to values so we can see this in action in in the way0:36:12
de Tomic uses zookeeper and things like react or s3 so react and s3 don't have the semantics required to do the State0:36:21
succession right they don't they don't have what you need to do that you need something along the lines of either Kaz or versioned updates or something like0:36:30
that but zookeeper they have that they have versioned updates so you can combine them and you can implement something like refs in zookeeper that0:36:41
point to values that you store in something like react or s3 or some store that doesn't otherwise have the consistency or the ordered0:36:50
transitional semantics and you can pull tools out like about right right now and do this for yourselves so the important thing to note is that0:36:59
the closure state model is available at the system's level you do it this way and the only thing you have to do is put names on your batteries what's a good0:37:09
name for a value your UID is that's - no0:37:20
that's good Stu's always my spoiler yeah UUID what's not a good name Fred I got this from wherever or anything else because what starts to happen when you have0:37:29
those those kinds of names people start to care about them what should you care about about a value name nothing at all0:37:39
also because a lot of places where you're going to be putting values you really want to be conflict free you don't want to have to coordinate our value you want to keep rojos this Fred0:37:48
27 or Fred 217 or you know whatever you just don't want to be there so you you IDs are a good good thing to use to name values you don't care because that's not0:37:59
the identity right what's the identity the one over here right what you're going to have very few of so for instance the atomic you can have like0:38:08
hundreds of millions of items in day Tomic you know how many reps you're going to have in zookeeper for a database three you know it now right you0:38:18
build systems enclosure how many reps you end up having how many atoms a tiny tiny amount probably the best thing about closure is showing people how0:38:27
little of that you actually need it's the same thing here but the the strong names right the globally qualified namespace names will be the identity names that's really important that they0:38:38
be like that the value names you want to be a conflict-free tear off names that anyone can create without coordination and that's what a UUID is about all0:38:50
right of course is my favorite topic errors and error messages and whatever so so0:39:05
this is really important paper at the bottom here and if you read this paper over and over again which I recommend you're going to see a couple of facts0:39:15
about systems right and and and it's another way in which systems are really different from from programs right in a program what are you really you're like0:39:24
afraid that some objects you're going to call is not going to be there no the whole program tends to like be around or not like altogether it's like it0:39:35
succeeds or fails all altogether we get all confused because we live in this bubble it's like well errors are like when I made a mistake that's not right that's just like programmer convenience0:39:45
thinking right in the real world failures are like there all the time right the things that you depend on are possibly not there all the time right a0:39:57
large system is in a state of partial failure almost continuously right the the math is against you for having like all of your 10,000 machines always work0:40:06
all the time so parts of your system right when you look at the whole thing will not be working it also means that those things that are not working will0:40:16
not be available right those failures are going to be uncorrelated they're going to be completely independent right you still are fine but somehow the thing0:40:25
you're talking to has become unresponsive or unreachable or whatever and it starts to give you a whole new way of thinking about dealing with0:40:35
failure right because the things you're talking to are unreliable you have to use timeouts you have to retry if you're going to retry well you have this open question I mean I might0:40:45
not have heard back from you but you might have heard my original request and done it so I need to know that my future requests are idempotent who is worried about that when you're working on stuff0:40:54
in memory inside your program you don't worry about these things at all but the thing is as soon as your program becomes part of a system this these error modes0:41:03
are going to go right through your program you're not going to be able to deny them and I can be able to convert them into something else you can't fix them right they go right through you as0:41:12
soon as they go right through you you realize that distributed error modes are the only error modes everything else is0:41:23
just like program or convenience error handling stuff but it's not really what the systems error modes are about so I0:41:32
definitely recommend that you read the paper because you can't think about it often enough and it really is difficult to internalize and you'll still write systems where you presume the best and0:41:41
then you're like ah the best thing is not going to happen sometimes so the other things about systems is that they're dynamic and they're dynamic in a0:41:51
whole bunch of different ways right they're dynamic in membership where you just said some machines come and go sometimes they'll come and go on purpose right not because they failed because somebody started some more machines0:42:00
they'll come and go for capacity right as people trying to scale they'll also come and go for capability like the system will be running and all sudden somebody wants to do something new and0:42:10
they'll start up new stuff and systems that can become dynamically capable of doing new things or really strong systems it's the kind of system that you want to pursue and so all new kinds of0:42:21
terminology is going to come to bear at the system level that you don't have inside right you can't scale one box but you can scale a system right it's not0:42:30
usually the same notions of discovery right the somewhat you know maybe if you're talking about injection and things like that but the true notion of0:42:39
discovery is a distributed thing elasticity is the same kind of thing so so we know that systems are dynamic that has implications for the programming0:42:48
languages so there's a holistic approach to this right and there's a great example of the holistic process which is Erlang Erlang is a language of the0:43:01
system it takes the approach of saying I am only going to be building systems I know that upfront and I want these semantics inside the processes I don't want0:43:11
different set semantics I don't want my bubble semantics and my system semantics I don't my bubble interfaces and my system interfaces just so you're not0:43:21
worried this is not where I say we should all switch to I just saw everybody's like oh my god did he change his mind already it's only0:43:33
been a couple of years no so there's nothing wrong with holistic approach right at any erlang the fundamental0:43:42
units of programs or services they call them processes but there's their little services they have communications capabilities right but they follow all the things that we talked about before in particular it's not like RMI right0:43:52
those little services are not like objects they send what messages which are data right their data it is though0:44:02
custom communication that they use and and there's a very specific model baked in to the language and the basically said we are going to do actors we are0:44:11
going to do asynchronous send only receive asynchronously no synchronous communication RPC you have to build out of pieces and things like that so it's very very specific model here which i0:44:22
think is extremely well-suited to making communications programs but what's the trade-off with the holistic approach is0:44:31
it Erlang a great number crunching language no is it is it really expressive in certain kinds of domains no right that's definitely it's good at0:44:41
some things and less good at other things doesn't have a rich type system it doesn't have a rich abstraction model or other things so the trade-off of a holistic approach is sort of you put all0:44:51
your eggs in one basket I think the fact of it is you're never going to be able to dictate to everybody to use Erlang or use any one thing you can't say we're0:45:01
all going to do our programming in this one language right that's the whole there's a king of the world thing inside you know Ericsson maybe they can do that because everybody's going to do Erlang0:45:11
but in the world on the whole I don't think you can sell holistic approaches so you can't convince everybody to use the same language even if it's better so0:45:20
that leaves us with the heterogeneous approach right we have to have some sort of cross language notion of how to talk about things how to express the0:45:29
semantics of systems and what the language of systems are that crosses languages and runtimes and platforms like that and as I said the beginning right we know parts of that language are0:45:39
protocols and formats and I think the the third part the thing that fills in this box are things I'll call simple services so a simple service is a0:45:52
service it's its own process right it does communication using data should have a very small surface area in terms of the API right if the API is mostly0:46:02
data it should have an extremely small number of verbs associated with it and it should do mostly one thing and you'll see that a lot of the facilities of0:46:12
programming languages and runtimes are now available as services right so we have queues right we have Java util0:46:21
concurrent queue and then how many message queues are out there tons tons all with different characteristics and you know you'll make different choices but there are plenty of message queues0:46:31
that are dedicated to that now unfortunately this says simple and you know if I knew how to use keynote that would be blinking and like on fire I saw0:46:40
a fire was good right that's super important and and I think one of the challenges for for this approach is invariably people would like0:46:50
their service to like do some more and making it do a little more Olsson breaks the simple part so for instance queues usually have very very icky durability0:46:59
things like once they start to get into that space and also an Wow this is not not simple anymore coordination things like zookeeper are extremely interesting0:47:08
if you've not used it or something like it it's very cool to think about all I have over here is just coordination and if you can constrain yourself to that0:47:17
across again so you keep her adorable and you can try to treat it like a database and now you're trying to make it do more stuff and not use it use it simply because it does do more if you0:47:27
treat it simply it's a fantastic little little utility just to do that part of the closure state model or the whatever0:47:37
epical state mode control flow right you have things like Amazon simple workflow right and storm we just saw an example of storm before look at storm what is it0:47:46
it is what I'm talking about is phloem Oh although again it sort of says this is the recipe that crosses all the pieces as opposed to saying we're0:47:55
going to compose qs+ arbitrary consumers of cues and other cues this sort of says I want to wrap around your whole thing and I want you to play this coordinated game so again0:48:04
there's it's less simple then it could be but as an architectural strategy it's an example of what I'm talking about0:48:13
it's flow oriented right we're used to memory services right memcache is a beautiful thing people like memcache bla bla bla most of the problems with memcache is people are0:48:22
using it to solve horrible problems with using place oriented databases that's a sucky problem that's not a suckiness of memcache right memcache is0:48:32
brilliantly simple it does exactly one thing oh of course they keep trying to make it do a little bit more but it does0:48:41
the one thing it does really well so that's shared memory Redis is another popular example right again hopefully they keep it simple and to the extent they do it's the kind of thing you can compose together and of course storage0:48:52
has exploded s3 is global shared memory it's an awesome thing except what shared0:49:01
memory is dangerous right but we know how to make shared memory say but closure has shared memory uses it in fact it's quite fundamental to closure0:49:10
that you have shared memory and shared memory is important right you just have to be careful in using it if you combine the reference to immutable objects you0:49:19
can use s3 just as safely you can use a key value store just to say for exactly the same way the only trick there is the transitions of the refs needs help from0:49:30
things like zookeeper but moving up the stack like DynamoDB has that semantics built into it a lot of the memory caches like in Finnish band have it built-in so0:49:40
you can get it you can get both together like we have in memory in systems so you want I think we want more of these and want them to be smaller still and to do to do as little as possible so I think0:49:52
one of the problems we have here is we there is something that we really like inside our programming languages an important tool which is the interface or0:50:02
the protocol right it's the thing that attracts away from us the details of what we're talking to where is the interface for s3 right in a different0:50:19
audience to be people like gripping the arms the chairs like no we've solved this right we use wisdom and then I use a BPEL thing and I draw these pictures and like I have dive systems and we're0:50:32
just naive in here because we like to build things out of smaller parts and with this we should be up there now I mean there are things like that right but they don't get used my ambled amazon0:50:42
did not use wisdom maybe they tried do they try early on same way remember whether any scheme as ever there used to0:50:51
be right now it's just like go read the docs try it you know and when you get it right you'll get a good you will get a 4040:51:01
so you just don't see it you just don't see anymore and so what you've seen now instead is you know s3 is so dominant that when OpenStack wants to have the0:51:12
same kind of service they don't have any abstraction to tap into to say we also implement that abstraction what do they have to do they have to directly imitate0:51:22
the protocol of s3 this is not a great place to be same things happen with memcache right people like oh memcache is cool right and people are like well I0:51:31
have this other cool distributed redundant memory cache it's like well I use memcache but I mean this is more better but you know in this way no what0:51:41
do they have to do mimic memcache on the wire this is really a bad thing and I don't know what the answer is because I0:51:50
don't think wisdom and things like it or the answer either but it leaves us in a difficult it leaves us in a difficult place this is an area that we can repair inside the programming language right0:52:00
there's all kinds of variants of put stuff out of place like s3 some of them mimic s3 and some of them don't but something like J clouds right can go and0:52:10
isolate you from that right so it's superimposing abstraction now there's two ways to think about doing this right that super imposition of abstraction0:52:19
happens where is it a service is it who knows what J clouds is all right fair amount so J clouds is a library it's a0:52:29
Java and closure library that has an encapsulation both over sort of like the ec2 elements of cloud services and of the storage and we just think about the storage right now that this thing called0:52:39
blob store and it tracks away the details of connecting to s3 or connecting to you know open stacks stack0:52:48
or to whatever VMware sells or whatever another vendor has and so they've given you abstraction inside language if we don't want to do this inside what do we0:52:57
end up with what's the system version of this proxy and that you tend not to see why as a hop right it adds a hop in and0:53:08
it's like that but it's still tricky we don't have interfaces and I think we're suffering so what can programs tell systems0:53:18
what can systems what can our systems learn from our programming one is we need more values values need to be first-class we need to name them we need0:53:27
to start using that epical time model in our systems designs you can do it yourself today just showed you three ways to do it you just have to choose to0:53:36
do it right you have to take this flow orientation right this is something you may or may not be using like people talk to me a lot and closer to like I love clothes I have the functional part I0:53:45
think I'm getting a grip on it and every time I try to get the state even if I use the state stuff from closure still end up sort of struggling with a model for the whole thing the model is this0:53:55
flow model right just flow values around use cues inside your application it's not like this tribulus --is everything you need to do but you can do a lot by0:54:04
just emulating this inside and of course if that's your best practice inside it's nice to convey it out this is the way you're going to get more reusable things0:54:13
and things that are easier to compose I do think we're struggling with any kind of abstraction we know it's good but we don't know how to do it at the system level and I think the biggest thing we0:54:22
suffer from here is a well yeah how does somebody else provide a service like s3 and let you try to use it but the B side0:54:31
of it is what if you're trying to be a service and you're trying not to build the in durability into yourself like you'd like to be playing this game well0:54:41
and saying I'm componentized right well in a programming language we totally know how to do this you say I'll work with anything that implements this interface or anything that implements this protocol we now have a way to say0:54:51
that and and the person who wants to compose you with something else has this recipe for doing it now what's the system's way to do that what's the system's way for saying I'm0:55:00
parameterizing belen my storage it's really difficult a URI is not enough right you mean you need to know what0:55:10
what method to talk over so what ends up happening right now is your service needs to embed something like J clouds are an implementation of an abstract thing and you need to individually0:55:19
support what your users are going to need or provide an extensible mechanism but you're doing it inside yourself as opposed to sort of saying at the0:55:28
system level I have a way to say this is an interface that I use so that you can plug in the kind of storage do you want with me so we're suffering there like the0:55:38
system's tell programs I don't think I don't you know there's great papers great old papers that say do not try to make a distributed system like your programming language and they're totally0:55:47
right especially at the time they wrote it which when objects were hot and people are trying to do korba and things like that terrible terrible idea0:55:56
but we should also be able to pull so but some things are important like functional program is important I think it's not done a lot in systems what consistence tell-tell programs well the0:56:06
one thing is this machine-like thing right maybe it's easier to see when you have wires right it's quite obvious the only thing I can send over the wire is it is a value in XML so I've chosen to0:56:18
use that but now like well in this audience I'm going to say this but in John but people have a real question right they don't tend to send data structures around in their interfaces0:56:27
the way we do and they have this real choice I can send a data structure or an object that has like all these verbs and knows how to do stuff and changes and dances and I might as well send that0:56:36
it's only one argument it's a lot easier and I don't have to type and in fact the IntelliJ will just type it for me but so I'm so I think in closure we're0:56:47
kind of spoiled right because we do this all the time but it is something that if you're trying to talk to somebody you're trying to talk somebody else is building a system about maybe they should bring0:56:56
this architecture inside their program you have to make the rationale from that systems level this makes sense and systems and you explain to me why it doesn't inside the program because I0:57:06
don't understand why it wouldn't the other thing is this programmatic program to program interfaces rule right where0:57:16
do we suffer when we don't do that when we when we only define a human interface so we're defined a human interface first where do we suffer every0:57:26
single time we do it every single single time right anybody ever trying to write a program that manipulates any UNIX program yeah0:57:36
is it fun yeah yeah three parsers you have to figure out how the command lines work and all this other stuff I try to manipulate get from a program it's like terrible I just0:57:48
did it it's not fun what else is an example of that sequel right in both these cases they wanted to support0:57:58
somehow some persons going to be seeing at the computer and then you're going to want to like do stuff and they're going to go blue and go and it's got to work and there's nothing wrong with that you0:58:07
know that use case is important you want to make that happen but when the only interface you define is the one for that you end up with no programmatic interface so what we have in sequel yeah0:58:16
well this is simple you know people will say where and blah and that's really great and what do we have for programs string building we got nothing we have0:58:26
nothing to work on so build your human interface on top of a programmatic interface because programmatic interfaces are all you've got in the0:58:35
systems level always typing into Amazon AWS services and was like oh I'm going to like use s3 you know they don't do that0:58:45
so you wanted you want to have the programmatic interface underneath the systems failure model is the only failure model you have to look at all of0:58:54
your error handling from that perspective as soon as you do you realize they're not going to be a lot of places for the I made a mistake flow it's got to be dominated by the the0:59:05
system is partially unavailable flow systems are dynamic and data-driven it might be a nice idea to use a language that was also dynamic and data-driven0:59:14
again in this room I don't need to say that so I think people are building some great libraries I'd love to see more0:59:25
people build some services some simple services I think this is a tremendous opportunity area for closure closure is really really well-suited to building0:59:35
these things and if you build these things it's going to give you the inroads into your into your your organization's right oh can I build this new thing in closure I don't know well I0:59:44
built a service you want to use it oh well yeah what does it do it does this oh it's nice it's simple does this one thing right and we're seeing some of that like the Reimann thing0:59:53
right who even knows is closed well it's this cool logging thing but does one job it doesn't really well it's a service like thing there are tons of1:00:02
opportunities we just saw a bunch of things that we're done and storm is really great and things like that but there's lots more and when you build something like that you're going to end up something that's much more reusable1:00:11
than a library now things will have to be libraries and libraries are great but I'd encourage you to build systems I'd encourage you when you do it to avoid custom formats of course again in this1:00:22
room I don't really need to say that there's a good format we tend to all like it and we will try it yet we'll try that first even though you don't1:00:31
necessarily have a means of expressing at the system level the abstraction of your service design it anyway right at the point you know there's always all1:00:40
this stuff about a premature abstraction whatever definitely a danger by the time you're writing a service there's nothing premature about abstraction the thing has got a surface1:00:49
area this big it's you're going to spend time on that there's no problem spending time on that it's never not worth it it's never going to be well it's overkill you wrap this thing with the1:00:58
thing you know down in the small and a program you can over abstract up here you can up here I mean unless you start making a lot of new layers before your service you want to have some1:01:07
abstraction consider a second implementation over your interface like maybe you've decided for speed you're going to use your avro or something like that but if you also design an HTTP1:01:18
interface you'll sort out your abstraction just by that exercise it still doesn't give somebody the ability to say I'm going to make something like it with the same shape but it will make1:01:29
your service better and the other thing is to design your service to be composed and again I think this is a challenging area right don't keep adding stuff inside yourself you're going to make a1:01:39
little monolith you're going to become a stack yourself you don't want to become a stack you want to allow people to plug in right if you need to store stuff consider using something like J clouds1:01:50
now you don't need to store disks are terrible who wants to write and program disks you know it's a solved problem so1:02:00
as soon as you get to the oh I need to put something somewhere plug in something like J clouds you know or anything or me you can roll your own whatever it has to make sense for your your thing1:02:09
but make it so that somebody somebody doesn't say oh I'm taking you on and I'm taking on the fact that you store stuff over here don't do that let them say this is how I want you to1:02:18
store let them make things composable let them say this is the kind of queue I want you to use this is the kind of storage I want you to use to the extent1:02:27
you can do that you'll build a system ponents that can become parts of systems that are built of services that are simple and that's it0:00:00
Clojure Exchange 2012 Rich Hickey The Language of the System 55095205
0:00:00
which is sort of out the door up the stairs and then that way it's got a nice outdoor area and a lot of space inside as well and really good beer and there was one more thing the feedback forms I0:00:10
already said the feedback forms feedback forms pub put it in the box over there the box is here at reception everybody see the box right then and you can win a0:00:23
book still so there are three books joy of closure which is a pretty good book0:00:32
and the mailing list so there's a London closure Ian's Google group as well groups google.com slash groups slash London - closure Ian's movie just search for a lot of closure go to London0:00:42
closure inse org you can find the link from there and I bet rich really wants me to shut up now you don't care do you I mean really right then the language is0:00:58
a system Ricky thank you [Applause] so thanks I'm really happy to be here0:01:09
and as always just thrilled to see so many bright and interesting people using closure to do really cool stuff and nothing could make me happier to see0:01:19
where it's gone and where people are taking it so very very happy to be here so today I'm going to talk about why the0:01:28
language you use doesn't matter so you0:01:38
know AB well especially we're in a group of people that all use the same language so we're must be enthusiastic about the language we use and it may be even we think you know the language we use is sort of the center of the universe and0:01:48
that's inevitable right because when you use the programming language you start thinking in a programming language and you write libraries in the language people who use the language and and and0:01:58
languages do a lot of stuff for you and and when I say language here I'm talking about the language itself and the runtime it provides so the runtime libraries and some core facilities and0:02:08
off when we think about especially in in in functional programming languages as if you know the logic that we're doing is sort of everything and IO is just sort of this pain in the ass that we0:02:17
have to do but but in practice we all know you know how many people write programs that stand on their own and read some input and spit out some output in one process and that's what your0:02:27
program is that's what your system is all right nobody absolutely nobody does this anymore except maybe compiler writers they do that0:02:40
and we and we like our language because it does a lot of stuff first right we have memory management memory models I mean you'll see the memory model of Java too much enclosure but it's there and0:02:49
it's really helping you or helping me help you get concurrency right languages do calling conventions and calling and0:02:58
arguments and how is that going to work and how is the stack gonna work they do resource management at least for memory right so we have garbage collection0:03:07
that's a tremendous facility there's coordination constructs I mean if we were using raw Java we'd have locks we're using closure we're getting things like the reference types right to0:03:17
do coordination so we can we can independently build stuff that works together and threads all languages have some sort of abstraction capabilities we just sort of talks about multi methods0:03:26
and protocols and things like that and maybe you're using a language that has types that's a generic statement but you0:03:35
know now you could optionally have types for closure and that might be something that you really think is important when you move up or out to the system level0:03:44
you lose this right system means for a bunch of you know independent things to to stand together to form something something together and the biggest thing0:03:53
about a system is that you lose any kind of global supervision there's no global supervision there's no global rule set0:04:02
there's no global enforcement there's no guides there's no nothing and and and so the title this talk is about the language of the system being something that's emergent right when you0:04:14
have a system of independent pieces independent processes communicating the language they use they define themselves you give you give each other the0:04:23
language you used to have have a system and and the semantics that you share you have to share by agreement because0:04:32
there's no language like a programming language enforcing that so when we say language we you know we know we're talking about some sort of you know0:04:41
talking write some sort of tongue and it's certainly the case you know people talk about is the programming language for programming the computer or talking to other programmers and so let's just all agree whether we're0:04:50
talking about the language of a programming language the language the system that you're always talking to programmers at some level programmer to programmer by communicating something you've written it down somebody later is0:04:59
going to come and potentially be able to read what you wrote it's not like a you know a vanishing thing you type it in it it disappears in smoke or something so even left that programmer is you you0:05:09
are communicating to yourself what your intent was but then there's sort of a bifurcation between programming languages which really also are about programmers telling the program's what0:05:20
to do and expecting to be able to communicate to a program from a human to languages of systems where we really have two programs talking to each other in the language one program offers for0:05:31
talking to it is really designed for another program to interoperate with and that that yields very very different results okay so let's look at the stacks0:05:45
of these two things for a programming language at the bottom there's some language primitives the runtime provides some basic facilities for a memory allocation and and things like that and0:05:55
core libraries may provide a rich suite of basic functionality building block functionality that you can combine and finally get up towards your domain you0:06:04
add your own you read your own bits if you look at the system's stack at the bottom you have protocols and formats right protocols or like TCP and things0:06:15
like that and formats I'm gonna talk more about in a second and then you know what are the building blocks what's what corresponds the second slot I think0:06:25
that's an interesting question and finally up at the top you know what we're trying to get these days are systems that are composed of services right and exposing your applications and0:06:36
services how many people write applications that are service like in some degree even if like part of their application is serving the front end right yeah so everybody's sort of doing0:06:45
this but I think we need to pay more attention to what we're saying and what languages were creating so I'm not going to talk about protocols at all I think0:06:56
we have a lot of good choices for protocols and every now and again we have new protocols but protocols are designed by like engineers and and and0:07:06
we get to make these decisions right because those decisions are very hard to get standardized and accepted and whatnot but every time you stand up the service you get to pick right you get to0:07:15
pick this stuff what is actually going to be transmitted around and so we're talking about a service that's on a network except some sort of messages may be in an RPC manner or a one-way message0:07:28
manner and we have all these choices XML JSON protocol buffers Avro Eden hash in freshman Bert who knows what all of0:07:38
these things are - people know what all of them are all of them every everyone oh you know because we talked so what0:07:52
ones don't you know Hessian freshmen so Hessian is a Java library it's sort of a bytecode encoding for transmitting0:08:01
serialization bert bert-- is a library for doing Erlang style binary transfer that also like so works from Ruby I0:08:10
think the github guys built it I don't have enough time to explain XML I'm sorry right0:08:21
Avro is is is self-describing with that with a schema at the front protocol buffers have a tab and schemas right XML has a standard format JSON you know is0:08:32
whatever it is in freshen is new it's like Hessian meets Burt meets Avro meets Eden and it's the it's the protocol and0:08:44
format that's used by a tonic and we've open sourced it and if you go to freshen org you can see what it is but it's like Eden binary and you'll see how much of0:08:55
Eden as a standardization of the subset of form of closure that we use for communication was actually inspired by freshen but diatomic uses freshmen on0:09:06
the wire and in storage so anyway all of these things are data formats that's good right we're not just sending raw bytes back from most services where it's0:09:16
like figure it out but do you have these questions you have to ask yourself when you're trying to evaluate one of these formats and its suitability for what you're doing one is0:09:25
is it extensible so which ones are extensible XML protocol buffers kind of sort of for running systems to some0:09:35
degree Avro Eden Hessian kind of sort of freshen yes for yes which ones are self describing0:09:48
XML JSON Avril Eden fashion kind of freshen and burped so why does that0:09:59
matter right why does this what does it mean to be self describing being self describing means that anyone that understands the format can read anything that's in the format they may not know0:10:09
the semantics there everything about it but they can definitely read it in parse it so once you understand how XML works you can read any XML but if you get a protocol buffer message and you never0:10:18
were told what the schema was a tab and somehow you can't do anything with it and this gets fine-grained in coarse grain so Avro has a has a schema up0:10:27
front but things like freshen and Burt and Eden you know if you were being sent if somebody was dribbling out you know objects in these formats in the stream0:10:37
you could hop in the middle and kind of figure out what was going on at any time so they differ in their granularity but a big a big important point here is that0:10:46
if you want to build composable systems at some point you're gonna want to have generic intermediaries and out-of-band schema stored generic intermediation I0:10:57
mean imagine if you had imagine if every website had its own schema that you had to like read about out-of-band and then construct custom messages to talk to the website the web would not have happened0:11:06
right so this stuff matters values of course for this audience I have to linger too much on this they're really quite important and and the thing about0:11:15
values is that we we just this is what we do on the wire right we rarely send references on the wire it's it's not like people are doing korba or decom or0:11:26
any of that stuff anymore right all those things lost for good reasons but one of the interesting thing about values at when you go up to the when you0:11:37
go up to the system level is that in on the wire and in program so imagine in a closure program you have this values now0:11:46
if you'll have a ton of named values in their programs very rare right very rare you create you construct a value through some functional construct and you flow it0:11:56
around you pass it from function to function but you rarely stick them somewhere and call them something but if you want to put a value somewhere in the system so it's accessible via someone else and it's not going to be part of a0:12:06
transfer it has to go somewhere and it has to have a name right and that's really interesting we have talked about names in a second the other thing about0:12:17
values is that everything's network addressable right how many people when looking at a URI can tell if it's a permalink or not special x-ray glasses0:12:28
or something right now you can I think this is a big problem right right we need we need to be able to build systems that have the properties of functional programs because we know that makes0:12:38
things more modular it makes them independent makes them more robust easier to test and if we want values and systems means we have to know when we0:12:47
have values we have to be very explicit about when we're giving somebody a value so it's an interesting problem names again it's the same kind of thing we use0:12:56
names in in in systems but they almost always have some global scope this is why it's very important once you start lifting up to the system level you have0:13:05
to get any kind of level where you're going to start sharing across different people you know independent development efforts you have to start thinking about global naming schemes that's why we have0:13:14
DNS and all that kind of stuff and and uu IDs and things that are guaranteed not to clash you can't just arbitrarily say you know my name and you and hope nobody takes0:13:24
your name later so namespaces are really critical and again the kinds of things we name differ a lot in a program where where do most0:13:33
of your names get assigned to the vast majority of your name is name what order your choices data we just said0:13:42
hardly anybody does types methods functions what0:13:52
any of the names that are that are global and they name functions right they're basically the verbs of your system is very very lots of names for lots of verbs but in in systems we don't0:14:03
have as much of that we have we have names for services and we're going to have names for entities and usually pieces of data and values hopefully if we do more of that but we're not as verb0:14:14
oriented with names so the problem with systems is keeping track of the numbers because as we know all systems are based0:14:24
on little circles with numbers in them especially when you get your images from google image search numbers no no they0:14:36
don't have numbers but but a lot of a lot of service architectures end to end up having this shape right this is this is dependency tree somebody's going to call someone else is gonna call someone else is not usually a lot of like0:14:46
arbitrary graphs it's usually very very tree like and sort of somebody helps me down below there's also a lot of serial stuff which I'll talk about later but there's not a lot of cyclic cyclic0:14:58
graphs but the trick about a system is um who owns the semantics where are the semantics of a system where they live0:15:08
well they're sort of all over the place they're there they're in the little circles for sure but a lot of the semantics of the system a lot of the semantics of the communication used by the system are actually on the lines0:15:18
right are the specifications for how you communicate and that ends up that ends up being a problem because you're now in grave danger and this is where I think0:15:29
you know the functional pros especially for closure programmers you know we now have the optimal you now have the opportunity to like get global variables back right every service can0:15:38
be a global variable right every database is a big global variable right we're right we're right back we took all this care in the small an in process to be clean and we're getting all kinds of0:15:48
tange benefits from doing that and we lift up to the service and we've got like objects again my objects and global variables because service is an0:15:58
arbitrary notion it doesn't really require you to do anything there's no ruler anymore there's no opinionated language anymore it's your job to to avoid building a0:16:10
system that's just about triggering effects on services calling each other so how do we avoid doing this well one approach is to take a machine-like0:16:20
approach right so when we criticize object-oriented programs we know they're like turning around what object graphs like updating them so they're all about themselves right right and you look at0:16:29
the thing you like you can't get it in a state that you can test anything because it's all this interconnected thing and it's like a factory where you know people go in every day and people come0:16:39
out every day and they're like that was awesome yeah it's totally cool and like every day you watch this Factory and people come in and out just people in and out and they're all working and what0:16:49
West they'd be doing like munging around some stuff in the factory and who cares about that nobody right what's a good factory do what else comes in and out of a good factory raw materials come in one0:17:00
end and like cars and TVs come out the other side right in other words a machine does something right it's not0:17:09
like it's purely functional it's just a calculation or I think their stuff gets transformed stuff gets moved right there is process here it's not about ignoring process right as functional programmers0:17:19
in closure we know we make our programs do things and yet we still are mostly functional because we're very careful about process so how do we end up being as careful about process when we when we0:17:31
want to write a system and I think the way we do that is by having a value orientation and being very clear about the kinds of things you can do to values0:17:41
all right so one thing you can do a value is transform it and we know that right because that very much corresponds to what we do in the language another thing you can do with the value that's much more like the machines and the0:17:51
factories is to move it literally move it around move it through a process move it from one participant in the process to another and that might involve routing or making some decisions0:18:02
the other thing we do with values is remember them and it's quite a distinct thing to remember a value than it is to put it in a shoebox right and to have a0:18:12
shoebox that is about what you're remembering and then every time you want to remember a new thing you know you take it out of the shoebox and put another thing in right and a problem we0:18:21
have is like most of our databases are like that their place oriented you know we have a place for Sally's address and when Sally has a different Sally moves0:18:31
you know we go to that shoebox and we put in a different address and that place orientation ruins this right because that's not memory right memory0:18:40
if Sally had an address and Sally moved you would actually remember both things so we need to start having storages that our memory oriented and that's what I've0:18:50
been working on and a tonic right things like that but as we'll see later that's it that's a that's a that's something you can do for yourself even if you're not using daytime workers systems like0:18:59
it I would advise even if you're taking this approach which are all things about values right single operations about values is to keep them separate as soon0:19:08
as you start combining them you can start building things that again get sort of place oriented so the idea here is it's a flow orientation versus a place orientation I'm not going to talk0:19:18
too much about the transformation right we know a transformation is this is like this is these are functions right transformation is I take some value I produced a new value in the middle of calculated something and that's why the0:19:28
new value is different the only thing that that the only thing that differs when you're doing it at the system's level is in a function in a functional program that value it you know you can0:19:38
see it go right into the function you can get the return right back out and a lot of times in a system you have to communicate that value at a band right0:19:47
especially as it gets really large you may have you may have in one process produced this really large result only portion of which is interesting for the next0:19:56
consumer but you don't know what portion is you don't want to put that really large results on the wire so instead you're going to put that hopefully that value somewhere and communicate to the0:20:06
next process hey that thing's over here it's still like that guy's going to do another functional transformation but the input wasn't directly communicated it was indirectly communicated and0:20:16
that's why we need values and systems and names for them right moving things around right it's like a conveyor belt right unfortunately too often we build0:20:26
directly connected systems man is even though even though the systems are separate they're directly connected and there are some cases where direct connection makes sense right if you're0:20:36
calling out to a service that's going to do transformation for you and give you a result that RPC like interaction makes sense but there are other interactions0:20:45
with systems where you really are doing flow you're trying to just communicate something to another system and you know what you really don't want to know what that other system is and this is where0:20:54
Q's rule you really want to use Q's quite often and anywhere where you're moving stuff because qs d couple itt0:21:03
couple a bunch of things right you and you put something on the queue do you know who's going to get it now right when you call Amazon do you ask for a0:21:13
particular person like when you go to Amazon do you like give me Joe I like get my CDs from Joe right and then what happens Joe's on vacation and you can't0:21:24
get your CD that doesn't work right it doesn't work in the other direction either right even if you found Joe and Joe had your CD for you you looks like Joe's gonna ask Sam the UPS guy to bring0:21:33
it to you that's not what happens right there's some generic endpoint that is Amazon order queue and you put something on it and you have no idea what's gonna0:21:43
happen and that's good for you because you get your CDs whether Joe is is there on vacation or not so you don't know about the identity of who's gonna do the0:21:53
work you don't know about their availability and everything about that becomes configurable and scalable and changeable by somebody else maybe five0:22:02
people work on your job maybe one person does maybe five processes consume it and things like that so queues are really important that's how you move and route values and0:22:11
I don't think people use confuse enough neither in systems nor in in process ok0:22:20
memory this again I'm not going to spend too much time on for this audience all I want to say is the epical time model the model that closure uses for its0:22:30
reference types works at the system level right and the basic idea is you're going to distinguish a reference which is something that gets updated to point0:22:40
to new States over time new values over time distinguish those references from values right make sure you have yeah well the New York Times URI0:22:49
it points to the New York Times and you get the current paper over time you go there but permalinks were invented on the web for a good reason right who remembers the web before permalinks yeah0:23:00
what you do you're like hah the web is awesome has those great information on it right so I'm gonna do this research Oh bookmarks they're cool I found this I found that I saved all the bookmarks0:23:09
right you go back a month later what happened to you everything was about something else you didn't remember anything you had nothing you had gotten nothing out of that so you want0:23:19
identities and values so this is my old picture right we know what this is we want it we have an identity it's gonna take on a succession of states each0:23:28
state is a value we want people to be able to obtain the values directly right not always get a random black box view of the identity and we want to do that0:23:37
kind of transformation so in practice this is how you do this with with system II kind of things with pieces of systems so for instance these new key value0:23:46
stores are really powerful things like react and Cassandra and stuff like that but they have limitations in particular they don't have sufficient semantics to0:23:55
do the reference part right the reference part of the job is and it is it's mutable but it's not just mutable it's it's controlled mutation its mutation that says nobody no two0:24:05
people are gonna try to update this thing and get a mess right it moves it moves atomically from one state to another so we can build that model out0:24:14
of two very separate subsystems we can take a key value store like Rio which is completely awesome at storing values immutable values because all the0:24:23
eventual consistency stuff it can't hurt you in in with immutable value so it's like you don't even care eventually consistent what's listen between eventually consistent immutable value0:24:33
and a consistent immutable value nothing there's no difference it's like a get-out-of-jail-free card so just put immutable stuff in here and0:24:42
then permutation we have pool services like zookeeper right which which do just this atomic succession stuff and you0:24:51
need very few of these you can put big big items in there so this is these are the tools with which you can build the epical time model in systems so the0:25:02
other thing that's really important about systems is their failure modes are completely different right in your program if foo is gonna call bar are you like worried about the bar function not0:25:11
being around no does the bar function ever say like 404 you know and say you know I'm not unavailable service0:25:21
unavailable no they're not I mean you don't really have a lot of independent failure inside a program but what's your program is part of a service you do in0:25:30
fact everything you do now has this and and once you have it it flows all the way through everything you're doing so it ends up that the the failure modes of0:25:39
systems are the failure modes as soon as your program is part of a system and all this little you know stuff we do about oh I had an exception because I did0:25:49
something along my program that's like it's really not the important error handling of your program it will never be this is going to be it and it's going to dominate the architecture of your0:25:59
error handling scheme so I definitely recommend if you haven't read Joe Armstrong CSIS that you read that it's quite quite awesome just will change the way you think about0:26:09
errors if you've never looked at it so systems are dynamic we like dynamic things we're comfortable with that we know all the ways systems can be dynamic0:26:18
you might have different participants in the system at different points in time more machines fewer machines different capacities you may have new capabilities coming online right so you're talking to0:26:28
the service and the service gets upgraded you know you didn't coordinate right if Google changes something you like they don't ask you right they just change it which is fine as long as0:26:38
you're capable of dealing with mostly some novelty coming back right like I read your API and I I knew these things might be coming and now maybe0:26:48
they're gonna send me something new right you got you need to be okay about that and I think the the techniques we've developed in closure for working with maps you know if you're a consumer0:26:57
of a map you really take two approaches dealing with it right I only care about these two keys I don't care if there's more keys fine or I really care about0:27:07
all the keys and I'm always gonna call keys right so I know I'm a numerating the entire set and if there's more I'll find those the next time so both those things make you have an elastic a0:27:16
discoverable system that is capable of seeing of encountering novelty and not not getting corrupted for it by it and that happens right this is what's0:27:26
important we need to support independent development in evolution and we get it from this kind of an approach so there's two ways to accomplish you know building0:27:37
systems one is the holistic approach right which is the Erlang approach basically Erlang unlike most other languages is a language is a language of0:27:46
systems like the fundamental unit of an Erlang program is a process it is it is one of these subsystems whether you're using these processes in the same you0:27:57
know OS process or across boxes it provides answers to all the things I've been talking about but it has discovery0:28:06
services and it has failure handling and and and everything else but you're getting a whole package here right you're getting an answer to every question has its own you know0:28:16
communications mechanism which very few other languages participate in and it has a very specific model for what a0:28:25
process is and how it communicates which is a a send only receive only no RPC actor of like model which is not what a0:28:36
lot of our systems end up looking like in practice we don't send a request to a web server and wait for them to tell us later through another channel here's0:28:46
your web page would be very difficult for the web to work that way so we're all stuck here right unless we all0:28:55
switch to Erlang which I'm I'm not advocating there's nothing against Erlang but you know it's a language of system here so it's really funny I was0:29:04
I'm gonna repeat it cuz I in say it I thought was outrageous but you'll have to watch the video from from tech mess where some funny things were said in0:29:16
ingest about about Erlang being about being fundamentally sort of about communication but we know we pick different languages because we need to do more than communicate right we need0:29:25
to calculate we want a certain amount of explosive 'ti we may care about a lot about numerix or types or something like that so in the heterogeneous approach we're going to have different languages0:29:35
we're going to have systems written in different languages and the language of the system is not going to be it's not gonna be solved for us because we all chose the same language but this is unlikely to ever happen and so we have0:29:47
we're going to combine protocols with choices about formats right and then I think this this missing piece right this0:29:56
this building block unit that we want to see more of in systems is what I'll call simple services right and these are examples of simple services so we're0:30:07
quite familiar with message queues right so we talked about the conveyor there's tons of examples of message queues as as services just like I want a conveyor belt boom you know here's 0 mq0:30:18
or ActiveMQ or whatever you want to use right that coordination I think is a new thing that zookeeper utilities quite clever and is a neat design and there's0:30:28
a small service that really does one-and-a-half things which is really good we have services for control-flow0:30:37
AWS service workflow is really cool and and storm is another example you know coming from our own community we're0:30:46
quite used to certain uses of memory as a service right memcache and Redis and things like that but we have to be really careful because I said memory0:30:55
here but I'm really particular about memory memories not update in place so how you use memcache dramatically0:31:04
differs right if you're treating it as memory what are you doing with with the key you ever reuse a key no you never reuse the key if it's0:31:14
memory if it's really memory you're not going to reuse the key which means you're going to have to fabricate names for memory that are really consumable0:31:23
which is why you you IDs are good names for values and we have storage as a service s3 started this whole whole world of storage as a service and now we0:31:34
have these great key value stores and things like this so so these kinds of simple services I think are particularly interesting in the building blocks of bigger things it's it's an interesting0:31:44
thing it was interesting that we had to talk about a Google+ because it's sort of like it got near this this topic which is what happened to interfaces and we're talking we're trying to make0:31:53
analogies between programming like oh I'm trying to make analysis in programming languages in systems right and then when this is big thing we said it was cool that languages gave us which was abstraction right so where is that0:32:04
in the service space I am a herd of wisdom or those other things how if you were using wisdom oh my god0:32:13
that's more than a tech mesh that's amazing I'm still gonna make fun of it0:32:22
I'm still gonna saying nobody uses right I think these things lost I mean certainly they lost it they're not popular but they try to do something I0:32:34
think is important right it's quite valuable to us to have an abstraction that says well if you write to this abstraction and I satisfy this abstraction then our two things work0:32:44
together and we never talk to each other right we we're missing that on the web right what we're getting and said our de facto standards right so s3 it just it0:32:54
just kind of one it was just out front so early so dominant so many people did it that when people wanted to do s3 like things and and and get people who are0:33:03
already using s3 to consider their service it was an example of I want to substitute a different solution that corresponds to the same interface0:33:12
what was the interface it was literally the implementation that Amazon had they had to sort of copy the wire copy the protocol and and pretend to be s3 I0:33:22
think we're really suffering from not having something here but I think there's something about these typed efforts that was that we're wrong that0:33:31
were unsatisfying there were insufficient in some way and I think part of it was that you desperately need a spec if you don't have a self-describing format right if there's0:33:42
no spec and there's no self-describing format you're just gonna get stuff on the wire and you won't know how to interpret it but as soon as we started using self-describing formats we're like I don't need that spec I can I can just0:33:54
directly you know it's JSON it's no problem I'll just deal with it I don't really know the semantics that's always going to be out of out-of-band but I think we're missing something here and I0:34:05
think what we substitute now is we substitute these programming language level abstraction things so Jay clouds would be a good example Jay cloud says0:34:14
well yeah there is this problem there's all these different stores that are like s3 and even the ones that are trying to be exactly the same as s3 aren't exactly the same so that's a problem for you if0:34:23
you want to say I have pluggable storage you know how are you going to do that and so they now have at a language level write a library level internally to0:34:33
process a way of abstracting away these services and the problem with that is that that's not a that's not a service anymore that can't be used by services0:34:42
to extract other services so I think we need more more work here because otherwise we can't build services that that offer extension points because we0:34:52
have no way to describe what they are and you're never going to get third parties to like play your game like protocols are cool because you can say I had this and I had that and these people0:35:01
never talk to each other and they don't talk to me and I can build a bridge and I can access both but if we start doing that on the network we had what we had a0:35:10
proxy right we had a hop and that nobody wants to do that so what can programs0:35:20
tell systems but what form programs do we want to communicate and bring over to our systems I think we want to bring values over especially from functional0:35:29
programming we see the value of values and I think we need a lot more of them in in our and our in our systems I think we're doing it on the wire really well0:35:38
right we also all services are our value oriented on the wire I think we're missing several key things like memory and and named values immutable things0:35:48
you know saying it's part of my architecture that you can access this thing and you're not going to get ever get something different when you access it we need to make those promises this0:35:59
flow orientation I think this actually works both ways I think good programs have a lot of use of queues but a lot of people are not choosing queues I think in the closure community possibly we're0:36:09
not using queues enough because closure doesn't provide any abstractions over the queues but you know the queues in Java are really good I didn't abstract0:36:18
over them because I'm anything add to them right you should be using Java util concurrent queues they're they're great or you know whatever you know I don't0:36:28
know if Allah has them but you know things like that are really useful interfaces in abstraction I just talked about I think it's something programs should bring to the system space but I'm0:36:37
not exactly sure how we're gonna do it without them though it's going to be difficult to four amortize your service with the interface to another service you can say I do this and you I can use0:36:47
your storage but you giving me that right so what can systems you know tell tell programs what can we bring from systems to our programs well one thing0:36:58
is to recognize this machine-like aspect of our programs I think we talk a lot especially in functional programming about our functions right are a0:37:08
functional part and more and more I'm liking describing the non functional part of your program as the machine-like part of your program if you think about0:37:18
it that way if you take a flow orientation to it even though you have the construction we have a lot of cool constructs in closure but you are going to need to flow stuff if you introduce0:37:27
queues into cavalieri between cues and references you have values already you can do this entire machine thing in this clean way0:37:36
right flowing through no no updating in place that works a big deal here is programmable interfaces and0:37:46
again for this audience I think that's what closure has been great great at right we've seen this tremendous library growth because because closures data-oriented all of our libraries are0:37:56
data oriented and I think we've seen so much more just inherent transparent incidental accidental no effort0:38:05
interoperability between data-driven subsystems right so we want to bring that to our to our services and I think most people how many people today when0:38:15
they build services with closure for consumption by either closure or their front end use Eden your closure data only three people are you the rest are0:38:26
using JSON all right so here's the JSON quiz how many people know how to encode0:38:36
a date in JSON okay tech mesh same answer none how many people know more than one way to encode a date in JSON mm-hmm there you go0:38:46
end of the quiz right Jason is really poor really poor and that's you know if you're if you're0:38:56
confused it all about why I did the eating the why we have a different name for it or whatever or whatever it's because Eden is way better way way better at the things Jason supposed to0:39:06
do has a much richer set of abstractions built in and it's extensible right and it happens to be what you get for free from closure so I definitely recommend0:39:16
it everyone I know who's using it for their process to process communication is rocking with it it's a lot more power and having a name for it like Eden there0:39:26
are a lot of implementations now if you've gone to the site people are implementing it in Ruby and Haskell and and Scala and that's good that's good0:39:36
for us but we don't want to be the last people using it okay please alright so programmable inspiration is using data data-driven interfaces the fact that the0:39:47
system failure mode is the failure mode inside your process once your process has become part of a system right this latency and lack of availability stuff0:39:56
is something you have to take seriously and incorporate in your logic right systems are dynamic and data-driven you know is your language well yeah yeah yes0:40:05
and are your protocols yes so in summary there's no silver bullet here right the language of the system is0:40:15
is inherently gonna be emergent it's up to us to make good decisions you know some of those decisions I think have to do with building simple services and I0:40:25
think you know if I had one message to convey to the community it's that I think we were spending a lot of effort on libraries for each other and for ourselves and that's really great to the0:40:35
extent we can build services like like we have a Rieman and storm and things like that those are things that people from other0:40:44
languages can use like they wouldn't even necessarily know it's closer just like I think is awesome they may be like how do you know that's closure love like wow that's great because closure can do that kind of0:40:53
stuff I might consider using closure for my next thing and plus it allows us to communicate to the world at large you know we're just like helping ourselves0:41:02
right if we can start closure I think is extremely well-suited to building simple services it's fantastic at communication it's great at concurrency it's great at0:41:12
data and if you have an idea for a simple service go for it I think it's a tremendous potential growth area for programming in the large and definitely0:41:22
for the closure community right consider how your service represents an abstraction although there aren't good ways to represent that right now I don't think I have to tell you to provide0:41:31
values designing to be composed is really tricky right because I don't think you want to say well let's say you're building a service that's not otherwise storage driven right and then0:41:41
you want to make it durable somehow yeah the temptation will be to just like include durability and and then that's not really good because now you haven't0:41:50
designed your system to be composable so you have to think about how you're gonna let people you know plug in their storage because there's so many pluggable storages right now I really0:41:59
think it's almost a mistake for any service to incorporate storage other than to say tell me which one you want to use so think about that and you know0:42:09
again if you're using closure I think you're inherently data-driven and programmable but definitely consider building simple services and that's it0:42:27
[Applause] questions already we didn't even have to wait actually actually two questions first one isn't hypermedia of rest0:42:39
services some way of implementing semantics this is the first one let's take one course I'm wearing well as if questioned or is it rest the way to do0:42:49
this no arrest well the specific part of rest hypermedia so sending back leaves of what the next state could be yes so every time I give this talk which is like twice and this is gonna be the last0:42:59
time I give it so somebody afterwards says why didn't you say rest and and the reasons because the rest is like big it's this big idea it's got lots of0:43:10
things and currently mostly it's being used that you can say that's not rest you know that's what you do with rest you say that's not rest it's like this0:43:20
great great question that no one can answer in the particular case of hypermedia I'm I'm skeptical about hypermedia0:43:29
for non-human driven interfaces because I think when you have hyper Mir so everyone know it hyper media is and rest right it's like you're gonna get some0:43:38
information right through through in some representation and they'll be links in that representation sort of tell you what else you can do and therefore you can discover what else is possible0:43:48
through those links and you know just the whole thing comes up but there's no semantics associated with this links except in very very small vocabularies0:43:57
right now but on a webpage it's easy you can like put some text around it you can say click here to find something from last Tuesday but you know find something from last Tuesday has no standard0:44:06
registered meaning that programs can use the other problem I have with rest is that it's very one-sided even people that expend all the energy to author the0:44:17
rest endpoint don't hend up with any consumers who consume it that way everybody looks like show me the documentation man I know like hard hard0:44:26
code anything you tell me and and they don't write this they don't want to write discovery so if you're trying to provide an interface for people who do not want0:44:36
to write a rest consumer you're wasting your time so that said obviously there's tons of things interested completely0:44:46
Carl correspond to what I'm saying and and and overall there's complete unity between the two philosophies but the reason I don't use the term rests0:44:55
directly is because I think it's it's too monolithic a term at this point unfortunately all right and the second one might be more consideration when you0:45:05
say shoot program with queues in mind and so put our data through like a pipeline where we have queues in between so I've worked with this kind of0:45:15
applications the only thing I can say is yes this Caleb very well but they're very so difficult to troubleshoot when something goes wrong and he's on the other side of the queue is there's no0:45:26
consistent way of recognizing the error and take actions that so may be so we need to be more robust in that sense0:45:35
well you know a lot of it is it's not your problem right you look at Erlang in Erlang there's absolutely no way to tell if any message you ever send ever got0:45:44
there and that's their recipe for building door you know important systems so that means that if it was if it was0:45:53
important for you to if it's important for you to see whether or not the thing happened you need to monitor that's some way but quite often the thing that needs0:46:02
to monitor it and it can actually make a decision about it is not the thing that puts the stuff on the queue it really isn't and the fact that we use call chains to do that is really weak because0:46:12
you know I called somebody and they can't do it well I don't even know what to do because they can't do it somebody else does and the fact that we flow up and unwind the stack is like we can't we0:46:21
can't really fix it so I understand that but I think the flip side is especially when you use queues in the large and systems is that you get new insight right if you had two0:46:31
programs and they're just like chattering away and I'm the sysadmin for your system I'm like what the hell is going on I had no clue if you give me ask you that I'm0:46:40
administering in the middle I can watch the mailboxes fill up I can see you're getting behind I can fire up new consumer machines I like I have some power I can watch the disk fill up so so0:46:53
I think queues are much more administrable than direct process to process communication and and in any case you're gonna have to instrument your system with feedback that allows0:47:02
you to monitor it yes hi when you talk about storing values in the system in0:47:11
order to provide durable resource or durable services or anything like that yeah do you have a vision for I'm hoping you do a vision for day Tomic replacing0:47:23
the sort of late 90s dream of system object databases that were persistent that became these horrible monolithic enterprise systems like Tivoli that nobody uses but something smaller and0:47:33
more agile and more beautiful in that sense yes I mean I wouldn't want I wouldn't want to directly link it to those ideas because there were some parts of those ideas that I disavowed0:47:42
out still like the notion about I mean a lot of those were object oriented and we're still objects we're still places so I think information models are important that's a different talk but if0:47:53
you're in the information business then you're just recording what's happened right and that's an accumulation kind of process and that's I think we're desperately where we need to go but is0:48:02
it different is a different talk yes the big values you refer to using a few you0:48:12
IDs I've looked a lot of gates where gift works and I've wondered like it uses the shower yeah devalue what why do you not0:48:21
use hashing for value names I think that's very cool - totally also very good right are already trade offs there between naming identities and variable0:48:32
naming values differently so sometimes you need to produce the name before you have the value which is a case for you you IDs sometimes you can't afford to0:48:41
calculate the shop which is the case for you you IDs and no content based dressing for immutable especially immutable large values is awesome the0:48:52
great alternative great also great0:00:00
Deconstructing the Database - Rich Hickey
0:00:00
way we think about update and in fact day Tomic does not consider update that way so if we look at update I think we0:00:10
have a fundamental question as to what what does it mean to update something you know if you update someone's email address you don't actually change one0:00:19
email address into another email address right there's a new piece of information which is that someone has changed their email address but most of the systems we0:00:28
work with allocate a place for the email address and updating means going to that place and erasing it and putting something different there it's a0:00:37
fundamental premise of this system and and the designs of systems like it that we stopped doing that now we stop doing place oriented program and you move to a0:00:46
notion of program that's about information accretion and there's always a question of granularity as well I think as we move to these new storages0:00:55
we have keys and values and so also and we've devolved from what might have been a more nuanced thing in a relational model to something where we get a blob0:01:05
at a key and then we have this problem of what's in the blob and how big is that and or you can lift it up write a column store may let you modify a row or0:01:14
a sequel database may be lets you modify a set of things and transactionally but that whole notion of what's the granularity of update is an open question for every database system that0:01:24
you look at and directly tied into that as a notion of visibility right if I make a change at a certain granularity do you see that at the same granularity0:01:35
or not can you see it while it's happening that's the isolation question when I'm done do you see its entirety or can you see pieces of it and that's a consistency question that's sort of orthogonal to0:01:46
the notion of consistency we heard about in the keynote this morning right everybody can see the same set of data that doesn't actually satisfy our0:01:55
applications notion of what it means for the data to be consistent and and that's consistency by cap but not consistency by business requirements and the two0:02:05
actually both matter so the visibility of consistent change is something we have to concern ourselves with when we get this wrong which we do quite often0:02:15
we have lots of problems right we have programs that are simply wrong they don't produce the right results they don't do the right things we have difficulty scaling I think the0:02:26
other thing I want to talk about today is sort of how can we reach we've had a lot of talks about you know just adopt eventually consistency and get all this great stuff and then the people who are left saying I'd like consistency or like0:02:36
well you have that old database thing and you know where where are the choices in between the monolith and end eventual consistency and and I do believe several0:02:47
speakers today have talked about it's a spectrum and so this is about addressing some points in the middle of that spectrum we have problems of round trips0:02:56
we have problems of overloading the monolithic servers that we want to we want to address I'm gonna talk about the other point the other thing that the0:03:05
atomic tries to pursue is this question of if I want consistency do I have to give up all of the new research that's taught us about these great properties0:03:16
of these stores like dynamo and and things like that what what is possible can we combine transactional components with redundant0:03:26
distributed storages and get hybrid systems that have some of the best qualities of each um can we get elasticity inquiry and storage we move0:03:35
we have proof you know examples of elasticity of storage that's sort of something that we take for granted now outside of monolithic databases but can0:03:46
we get a last to stay the same elasticity for query and you know finally you know there are times when consistency matters and I think it's0:03:57
easy to say well there were real worlds and consistent things like that but there's a lot of coordination in the real world as well and and again there's the spectrum thing right you can decide0:04:07
I actually don't want any inconsistent inconsistent data to answer my system but I still want to remain highly available and maybe I'll cash requests until I can make that happen in which0:04:17
case you can be used both systems you end up saying I'm in this potentially partitioned world where I accumulate requests for change and they move into a0:04:26
world where I can consistently apply those requests another thing I think is tied to that place notion is is the notion of0:04:35
information and time before we had computers we used the word memory and we use the word records to mean things that were highly enduring and that we never0:04:45
erased right we didn't go back to old records and erase them and write new things and we just wrote new records and we just carved new things in stone and we kept them around and that was a0:04:57
really good thing and when we started to have computers we didn't have much memory and storage was really expensive and we stopped doing that and we of course we had erasable media and it0:05:06
seemed like wow we can just do this different this different technique but it's a technique we never used in the real world prior to computers and which we should abandon now that we have0:05:15
plenty of memory and plenty of storage so you want to move to a model that's information accretion and there's lots of reasons to do this certainly just0:05:26
being able to audit things what happened and why you know if you look at a database that's update in place that's in a sort of weird state how did you get there you have no idea it's just the0:05:36
side effect of the Activity Stream that happened but you don't actually know why it is the way it is unless you've independently kept some sort of log of0:05:46
everything that happened and so I think we start can start taking approach to databases that say we will always keep track of everything that happened and of course analytics people are desperate to0:05:55
have everything that happened they don't like you erasing things anymore and the final sort of premise point here is that I think we want to be careful when we0:06:04
design systems that we give proper consideration for perception as its own thing too often we just have this notion of0:06:13
you know interaction is update and read and update and read and those things are they're tied together if you use a traditional relational database and you want to read something consistent you0:06:22
actually have to read inside a transaction that kind of coordination is unnatural my perception is not a coordinated activity right everybody's0:06:31
free in this room to look at whatever they want or look at whoever they want whomever they want you don't need to get permission or coordinate or anything else light just bounces around and we all can all receive it at will and you0:06:42
know in building these systems where we're focused on place we can no longer see consistent things without coordinating about access to place0:06:51
that's something we have to abandon both for consistency and for scale and is it being a scaling disaster so I'm just0:07:02
going to briefly go through some of these terms I'm going to be using when I say value I mean something that's immutable and it's it's a notion you have to apply both to things like 42 and0:07:11
bigger things like strings and bigger things still like collections and the premise of de Tomic is you can take that notion all the way and you can consider0:07:21
the entire database and the entirety of activity that's gone into that database as a value you can maintain and and not have to consider you know corrupting in0:07:31
your hands we want to separate the notion of identity from where we put values and so I'll talk a little bit0:07:41
more about identities but the idea behind an identity is we have notions that we carry through time like sports teams or rivers and things like that but0:07:50
the actual values those identities take on change over time but the things themselves don't change you know the river doesn't turn through another river you don't take the people who are on the0:07:59
team and push them around and so they become the next people on the team there's just another value of the team it's independent of the old one and and0:08:08
so the notion of the team is an identity we're going to associate with values over time we're going to call those values over time States so an identity0:08:17
has a value at a particular point in time that's at state it may take on different states at different points in time and time is just a relative thing0:08:26
that may reflect some causality so that gives us a time model that looks like this you may have seen this in another talk I've given about functional0:08:36
programming versus object orientation this is the same problem precisely the same problem in the storage space so this is what I was talking about0:08:46
graphically represented the identity is the dotted box it represents a succession of states is this work no that's too tiny0:08:55
it represents a succession of state each state is a value right we're going to move from state to state by transformation functions right some sort0:09:04
of process events and we want to be able to observe States and once we've observed the state we want to feel like we have a value in our hand and the fact that time is proceeding shouldn't affect0:09:14
our memory of what we saw and it doesn't in the real world right you know when things change in the world that they don't go into your brain and also simultaneously update your memories you0:09:24
have memories of things that they're always the past in fact we always perceive the past that's how things work so how do we implement values in memory0:09:33
we use persistent data structures they're always trees and they use structural sharing they look like this I'm not going to spend too much time on this but the idea being you can0:09:43
represent anything you can represent sets and maps and vectors all as trees and once you've represented something as a tree then you can leave that tree as0:09:53
an immutable thing and when you want to change the value right and particularly here we're talking about aggregate values so like an entire collection you want to add something to it you don't0:10:04
copy the entire thing you instead make a new tree which will at least involve a new route and a path to the part of the tree that you had to make a new and that0:10:14
new tree can share with the old tree a whole bunch of stuff and in this way incremental change is inexpensive but every particular value of the collection0:10:24
is immutable itself and you know this right from link list you can build linked lists where you have a linked list and it says you know BCD and you0:10:33
can link a to the rest of that and it doesn't impact the rest of the list at all now I have a list that's BCD and you have a list that's ABCD and we share the tail there's no problem with that as0:10:44
long as no one's ever going to change that in place and that's how persistent data structures work so the problem we have and this is not just about sequel0:10:54
databases this is a general problem with databases that work this way is that too often the database is a place it's this thing and maybe a lot of places and0:11:03
maybe a set of places right in a key value store maybe a set of places but still fundamentally a place where we get some sort of connection to the place we can send a request to the0:11:13
place in order to update it you know update plate you know parts of it and they could that could be transactional or not and then we issue queries and0:11:22
each time we issue a query we get this random result out of the contents of the place and the same query later will give us something different we never have anything more concrete than that that we0:11:32
can hang on to it's just this black box gives novelty potentially every time we interact with it and there's a lot of0:11:41
problems with this you'll recognize this is the same problem as objects they have the same problem they've conflated right identity and value they're both the same0:11:50
thing this place is both an identity and where we keep the values which means there's no way to get the value independent of the place which the identity which0:11:59
should take on new states but I can't I mean unless I have some copy semantics I can't really do that then copy semantics again introduce that coordination0:12:08
problem or everything else and have them forbid you try to copy an entire database like that so what we want to do is we want to adopt this same model the same model that corrects the problems of0:12:18
object orientation for databases right and we just want to replace that database place with database values0:12:27
right so fundamentally we don't want to interact with the connection obviously you have to go through the connection to get a value but once you've gotten a value you should be able to interact with the value not keep going back to0:12:37
the collection to the connection and in that way you can do stable operations you can communicate things to other processes that are stable you get all0:12:46
the benefits you get from values for databases that's what we're shooting for so this is a traditional database I don't know if you can read that text or0:12:55
not but you know what's in it right there's this server it's this monolithic thing it does transaction management it does indexing it does i/o it manages0:13:05
storage and it handles query requests and you're separate from it and you send it strings or something and you get back strings or something and you're very0:13:14
terrified of overloading this thing so you stick caching over it and the caching is all on you right what do you put in the cache to you you'd put the answers to0:13:24
questions hoping that maybe you'll ask the same question later and expiring that and determining the the policies for it is all your problem and the0:13:35
problem with this is that if we want to scale this we have to make that relatively complex thing bigger right and we know there's limits to how big0:13:45
you can make one thing and then it's going to burst or we're gonna have to make copies of that thing and this is already a really complicated thing so we don't actually want a lot of really0:13:54
complicated things it doesn't get simpler when we do that so this has gotten correctly dinged as something that's difficult not but not impossible0:14:04
to scale but certainly even if you can successfully scale it it does not become simpler it's very complex so if we're0:14:13
gonna fix this we need to make a couple of choices the first choice is to be careful about when coordination is actually required right how much coordination do we need where do we do0:14:23
it right and for what purpose process which I'll say is the term I'll used to say the acquisition of novelty of novel information is something that requires0:14:33
coordination in the end I don't care if it's transactional or eventual consistency that thing at the end that's gonna merge together your stuff that's a0:14:42
form of coordination right there has to be rules that govern what's allowed and what's not somebody has to be responsible for doing it we can't all do it right one person has to eventually0:14:51
say I'm taking these two things and turning it into this merged result so there's coordination associated with process but perception as I said before0:15:00
is something we want to avoid coordination for and we want to remove that I'm going to say that immutability is a fundamental premise of solving this0:15:09
problem if you don't use it you can't solve it period there's just no good way to do it and you'll see the benefits as we as we go through so the approach0:15:19
that's taken by de atomic which is trying to solve a bunch of problems right you may not want to solve all those problems and you may have a subset of them in which case some of the things0:15:29
that de Tomic does may apply to your architectures independent of the others but because we're trying to do all these things specific things will appear in our0:15:39
design so we want to move to an information model we want to move from away from a place oriented model to information I talked about that in the next slide we want to split apart process change acquisition from0:15:51
perception queries reading things like that transactions perception queries they should not be co-located they0:16:00
shouldn't be intermingled we they need to be taken apart we want to use storage in a way that's fundamentally immutable there's a bunch of architectural0:16:09
advantages to that there's a bunch of informational advantages to that and we're going to also supplement storage with memory so that we can do this efficiently and that's sort of an0:16:19
implementation technique I'll talk about ok so what do I mean by information well the word and form means to convey knowledge via facts and information is0:16:29
just the facts right so we want to build a system that stores facts and facts it means something very specific a fact is0:16:40
something that happened in the world right there are other things you might use stores storages for other than0:16:50
information right sometimes you need a place to keep stuff the atomic is not about keeping stuff it's perfectly fine to need to keep stuff and have systems0:16:59
that just keep stuff but if we're trying to build an information system we need to really actually be aware of what that stuff is so we're going to say this stuff is facts which means something0:17:08
that happened it's it's a fundamental part of the word the word is derived from a past participle it means something that already happened and one0:17:17
of the key things that falls out of that is facts cannot be changed they're recordings of what happened in the past you don't change facts you0:17:26
don't update facts or anything like that what do you do you accumulate new facts a fact must have occurred at a point in time so you have to have some path to0:17:35
time and it's important if you're going to build a system that's about the accretion of facts that you have a representation your structural representation is minimized0:17:45
right you don't want to have this big composite thing and say I need to and a fact to it like in the middle here and store this whole thing to get that0:17:54
new piece of novelty in you need to actually boil down your data representation to be that primitive thing and we call that a datum it's just0:18:03
an entity and attribute the value and some path to time we use the transaction because it's also a path to other information about what happened0:18:12
including provenance or causality or operations or anything else like that so the fundamental difference here from a place oriented databases that we're0:18:21
going to consider a database to be a value right but we know things change over time so how do you have something that's both a value immutable and have0:18:31
novelty over time and the the analogy I make here is two tree rings which also never show up that well on the slides I have a tree grows and there's new rings0:18:40
and they get added right to the outside but if you if you had a view of the tree of the middle of the tree the fact that the new rings have been are being added0:18:49
it doesn't impact you in other words any particular value of the database is unimpaired by the novelty that comes later right that view is still stable0:19:00
I can say that's a value and it meets all the criteria value it's immutable it doesn't change I can convey it potentially to somebody else and we'll see how that works a little bit later so0:19:11
a database is about accretion of facts and in that way we get something that both changes right it grows bigger and still feels immutable to the consumers0:19:21
because anyone who's looking at a particular intercept or past you know a point of time and before that has something that's perpetually stable to0:19:32
look at that means that process novelty new information requires new space right this is a physics problem right there's0:19:41
just no way to get new stuff and keep old stuff and not have new place you know new storage for it on the other hand we're doing this already all right0:19:51
how many people keep your source code in the file system in a directory you just like you changed you changed code and you just store it over the old code I always get one person raises their hand0:20:01
like crazy no we don't do that right we don't do that we but there was a time when we're like oh my god keep every version of every source file we have0:20:10
there's no way we can do that and we're gonna fill our floppy right it's not like that anymore right systems in the0:20:19
time I've been using computers are a million times more capacious than they were that's not an exaggeration that's the actual number a million times more0:20:28
capacious I we don't need to be worrying about space and of course we're already doing this everybody keeping everything every is logging everything we're keeping it around and in terms of the0:20:39
kinds of information you would keep in a database like this it's certainly no no no burden to acquire new space and we move away from places by doing this now0:20:49
I showed you a picture before about how we do persistent data structures in memory right we have this tree and we say I want a new version of the tree and I create a new route and I sew together0:20:59
a new path to the new data and I and I pointed all the old data and because I did that like in my program I sort of had this implicit handoff I made I had0:21:09
the old version of the tree and I made a new version of the tree and I had that in my hand and garbage collection you know got rid of the other one if no one was looking at it we can't actually do0:21:18
that on the disk because that would leave us with a route a new route for every every change to the database and0:21:29
we would instead of having this ever-growing tree we'd have a whole bunch of independent snapshots and no snapshot would necessarily contain the0:21:38
past so instead what we want to do is say every value incorporates the past as well it's not a whole bunch of snapshots and this ends up being really important0:21:47
for the information mile as well because you really want to issue queries across time right it's a terrible shame when we use databases that force you to overwrite the email address because when0:21:58
you have a problem it's like do you never notified me about my shipment oh I don't know what's your email address well that's the email address I have no0:22:07
well if I could look at the database at the point in time I notified you I could look and say oh we used to have this other email address and that's where I sent it or this other physical0:22:16
address and that's where I sent it when we update in place we lose the ability to do that we also lose the ability to answer questions that cross time right what's happening we have a supplier and0:22:26
they they change their prices and if we're using a place oriented database we just update the prices in place then the business person comes to us and says this supplier really seems to be jerking0:22:36
us around or maybe they say you know this seems to be the seasonality to their pricing I wonder if we could game that and get better pricing about ordering at different points of the time what what's the history of the pricing I0:22:47
don't know every time they give us a new price we update the price in place I have no history of that if you keep everything you'll be able to go and say oh look at this every June they raised their prices let's order in May I'm not0:23:01
going to talk too much about process but the critical thing about a design like this is that when you have novelty you don't want to have effects you'll want to say something changed in the world0:23:11
let me just affect things you want to take that change and turn it into something concrete that you store right anybody who's used to event sourcing is a representation of that idea but the0:23:21
general notion is that you have to reify process you have to turn into a thing then you can look at and touch whether that's a log or whether that is inside0:23:30
the data itself which is what happens in day Tomic you do want to keep track of that and you want it to be minimal right you're gonna you're gonna keep track of everything you can't say every time you0:23:40
change the single thing you have to store a new row or a new document that's that doesn't that doesn't work so we're going to break this apart this is the0:23:49
old database that did everything in one place and the first thing we do is we partition it into process and perception right processes the transactional part the coordination required to do that0:24:00
especially if we want to have a consistent system that means we only want to have some changes enter the system that are consistent with the business rules for the data and we have0:24:09
perception which is the query side so we have the problem of how do we represent state and it's a critical thing here I think that when we're talking about data0:24:19
stores and databases right we used to have no databases right and then we had file systems which let us put stuff somewhere with a name or path0:24:28
and get back to the stuff but we did not call them databases then eventually we had things we call databases and those databases did something that the file systems didn't write they gave you0:24:39
leverage over the information that they were storing they knew something about what was being stored right they imparted some sort of organization what0:24:48
was being stored so that when you want to find something specific or get an answer to a particular question there was some leverage to apply beyond we'll just go look at every single byte in the0:24:58
thing and figure it out right and it's that leverage that makes something a database so when we talk about storing the state I think we want to talk about storing the state in a way that's0:25:08
organized such that we can get leverage and I would call I would characterize query as leverage so we're going to just say the database is a sorted set of0:25:17
facts and in fact it's multiple sorted sets of facts right and we know from systems like BigTable that that's not0:25:26
something you can efficiently do live in to storage right if every time you had a new piece of information you needed to modify your entire index to put it in0:25:35
the middle of it and do that in a mutable way you would churn through storage just relentlessly it's not practical so we have systems like0:25:44
BigTable right what does BigTable do it treats storage completely immutably right but what it does is is accumulates novelty and memory till it's got a block you know certain size 64 Meg's or0:25:54
something like that and then it blitz that out to disk and starts accumulating more stuff in memory that has nothing to do with durability while it's doing that it could be logging everything as it0:26:04
comes in right so the durability for the purposes of can I restart can I make sure I've not dropped anything has nothing to do with this this is about indexing right0:26:13
you accumulate novelty in memory you could also a blog that then periodically you put that into storage and a process integrates the novelty in a batch way0:26:23
into the index in the case of BigTable that's emerged join thus done in the file system and in the case of des Tomic0:26:32
that's a merge of these trees also done in storage so it works the same way you accumulate novelty memory occasionally you put it in storage and you use persistent reemerge to do it so0:26:43
indexing is just this merging it just says I've accumulated a certain amount of novelty in memory I can now amortize the cost of integrating that in the tree0:26:52
right so instead of making a new root for every new piece of information I now say I've accumulated a whole bunch of information and create a new route and amortize the cost of making the inner0:27:01
leaves to accumulate the stuff so it's the same thing it's just like BigTable we merge them together so this is what transaction processing and indexing0:27:11
looks like transactions take novelty as it comes in it immediately logs that again that's the durability side it's not really the organizational side and0:27:20
keeps novelty in memory where it's organized on-the-fly and sorted in memory so it can answer questions from memory and that periodically this merge0:27:32
job will take that live index from memory and integrate it into a new tree in storage that tree sharing structure just like the other picture right and0:27:43
then we can look at the perception side now the perception side means I want to ask you and ask a question and get an answer I want to ask query like0:27:53
questions and I want to get answers at the speed that indexes can help me get well in order to do that I need access to storage right because the last stable0:28:03
index and storage is sitting there plus I need access to the Delta right what's happened since then from memory and it's the same thing BigTable does right if0:28:12
you ask BigTable a question it does a live merge-join between the filesystem and what's in memory and the atomic works the same way it's a live merge-join between the live index and0:28:21
storage the only difference is it's a tree instead of flat files the key thing here though is what coordination is0:28:30
required to do this none the exact right amount zero right there's no talking to0:28:41
the server there's no need for it for a transaction right the stuff in storage is immutable the stuff in memory is immutable uses the same technique one0:28:51
you know tree here and a tree there the join is stable there's no coordination required at all as long as you have read access to storage and access to the live index0:29:01
you're good so the components of this system roughly are there's a trans actor which coordinates requests for change0:29:10
there are peers and these are actually your application servers because now you no longer tied to this big database server we're going to empower application servers with query capabilities and direct access to0:29:20
storage and some sort of storage service so that whole thing looks like this if we start in the bottom right we see a0:29:31
storage service it's quite interesting aspect of the atomic that it's not in the business of disks at all right there are lots of good storage services that0:29:42
already exist right if you take a systems approach to the problem of how do you make a new kind of database you say I should be reusing those I0:29:51
shouldn't be re implementing DynamoDB and I shouldn't tie the way I use storage to the details of how I do transactions or queries I should use0:30:00
that a la carte and so that's what happens des Tomic can run in memory it can run against the sequel database it can run against DynamoDB it can run0:30:11
against in finis pan and similar memory type grids it can run on top of ryoga's we'll talk about later right and in each case we can make decisions about storage0:30:21
that are orthogonal to the other decisions we make that's exactly the point that was made in the in the keynote this morning right a systems approach will give you choices but how0:30:31
you integrate these systems and when you use the different parts if we move to the left we see the trans actor it does transaction coordination so anybody who has some novelty they want to have into0:30:41
the system will supply it to the trans actor the trans actor has no storage at all it is strictly a coordination thing it will take the novelty integrate it0:30:51
into the live view and put it into storage when it puts it into storage directly that's a logging a pending kind of process it's the indexing that's going0:31:01
to build the sorted view that gives us the leverage in the case right now the trans actor also does that periodic indexing but0:31:10
that could be moved to a different machine finally if we look at the top we end up with the application server process which now is empowered with both0:31:20
read access to storage and its own query engine so we've relocated query from a monolithic place where we go to answer0:31:29
questions to everybody gets their own brain and once you go to everybody gets their own brain you now have a system that's scalable right if you have bigger0:31:39
load and you have to answer more questions you can just add more servers here which you're already adding because you're adding those servers as the load increases anyway and that's elastic0:31:49
right as you don't care anymore you just have them go away the other thing that's interesting about this is the way caching works right we0:31:58
saw a picture of cache in the traditional database model before and what did we put in cache we put the answers to questions in cache hoping0:32:07
maybe we'll answer ask the same question again later and we're doing it because we're trying to keep the burden off of this single monolithic server or cluster0:32:16
of servers now what gets cached and this happens automatically under the hood if you just configure please use memcache is that the sources of answers get0:32:25
cached in other words the actual pages of indexes from the storage get put into memcache which means that all the queries have access to the resources0:32:36
they need to answer questions from memory directly what actually gets put in storage is not the individual facts right when we looked at those trees0:32:46
before what actually is getting put into storage are chunks of index write segments just like the blocks that a traditional database puts in a file0:32:55
system in a b-tree in the file system daytime I puts chunks of index into storage and all it needs from storage0:33:04
just like key value style access and we'll talk about the consistency model in a second so I talked about strands actor right it does accepting0:33:13
transactions the other thing it does it may not be obvious is it also rebroadcasts it appears the novelty so that they can maintain their own live in Nexen memory and therefore they can when0:33:23
they do their queries they do their own merge joining okay so the trans actor does that it does the background indexing indexing creates garbage so we0:33:33
end up with the notion of garbage in storage just like we do in memory right we just acquire new memory with new right that's a great thing but it0:33:43
creates garbage and it shouldn't be surprising when you move to an immutable process where new information requires0:33:52
new storage that you end up with an analogous thing on disk end up with garbage on disk and garbage collection for desk that's fine I talked about this0:34:04
pretty much already peers have direct access to storage right and these storage systems right you can imagine something like DynamoDB or Rioch these are highly scalable redundant0:34:13
distributed highly available systems they're quite capable of serving and an equally scalable set of readers at very0:34:23
high speed so they have direct access they have the query engine they do the merging and there's extensive caching what's great about the fact that0:34:33
everything we're putting in storage is immutable is we can cache it relentlessly and cache it anywhere you want it's never going to change so you0:34:43
can see from this that we've now sort of teased apart so I'm going to cap stuff right we end up actually with one consistency and availability model for0:34:53
writing and a different one for for reads and queries how many people saw Mike now guards talked this morning I was a great talk right so day Tomic0:35:04
actually is like is loophole to eight and ten I'll put together so so we have a traditional availability model for for0:35:16
writes write the trans actor can have a backup its high availability only by standby and if you get partition do you lose availability but it's consistency0:35:26
this this this system is oriented towards consistency so that's where the trade-off is made there's nothing wrong with making a different trade-off the trick is0:35:35
understanding trade-offs needs to be made and making trade-offs when you need to make them and getting the aunt you know getting the solution that you need for your business but on the reed side0:35:46
is completely different because two things one is everything we put into storage is immutable right which means0:35:55
that we actually end up getting consistent reads because it was written consistently right we never have an issue about seeing half of something0:36:05
we're always going to see the entirety of something or we won't see it at all I'll talk about that a little bit more in a minute and then query scales with peers and it scales in an elastic way0:36:15
not in a pre-configured I'm going to have 17 peers and I said that on my configuration file but in a real I just set up you know AWS auto scaling and I0:36:25
get more or fewer as as load goes up and down so talk a little bit about the memory index it's this persistent sorted set it has big internal nodes just like0:36:35
the one on disk not quite as big as the one on disk and there's a couple of sorts we sort by entity and we sort by attribute that means you get the effect of something more like a document store0:36:46
when you go via the entity the orientation and you get something much more like a column store when you go by the attribute orientation that keeps all the values of email next to each other0:36:56
so you don't have to pull whole records in there's no notion of record right there these datums and you can store them different ways storage itself is0:37:05
the same kind of thing right it's this tree right we stored the log as a tree we saw the indexes as a tree and they're fully covering indexes so I used the0:37:14
word index but it's not actually a pointer to something else it's a covering index all of the data is in each index it's just sort in different ways and this from the storage0:37:24
service we have very basic requirements we want to see we have to be able to put things in and get them back as keys and that's why I call it storage right because it at this point that storage is not looking like a0:37:34
database it's looking like storage and there's nothing wrong with that right it's very important that this component is there it has the qualities that it0:37:43
has so we put values in under keys and the keys are just like you you IDs that label the immutable blocks of index0:37:53
that go in there and there's a couple of cases where we need consistency in order to support the consistency model above there's no magic trick that I can get consistency out of inconsistency and0:38:04
I'll talk about that more in a context so I just want to show you this picture again this is happening on disk but but not immediately each change right this0:38:13
notion of trees will move from one to another so in memory this is what it looks like right we have some immutable thing it's inside a box that box is0:38:24
called an atom enclosure but it doesn't really matter you can consider it a pointer right things that are immutable are always pointers and the things they point to are always immutable that's the0:38:34
recipe for that epical time model I showed you before and actually it was said in the keynote this morning right0:38:43
you talked about the way you use immutability and pointer swap that's how de Tomic works exactly that so in memory we have this pointer to something that's0:38:52
immutable which itself points to the memory index which is another one of these trees that's immutable and it does the same thing and then it points the tree in storage and I'll show you that0:39:03
here so in storage it looks like this there is a cell and storage and entry which is the identity right how do you0:39:13
find a database in storage if it's always immutable and there's always new values how do you go from I'd like to open the you know customer database to one of those trees well there has to be0:39:24
at least one mutable thing which is the customer database is there a pointer there's one of those in fact there's probably four of them per database0:39:33
that's it that's all the mutability you need to make a database that point to an entire tree which is itself a pointer to0:39:42
a set of trees one for each sort we can also saw store Lucene data the same way and then that is a bunch of blocks just like b-tree blocks that form this tree0:39:53
with wide branching factor and those all gets stored as blobs in the storage alright so all the segments of this tree are as blobs and storage so I wanted to sort0:40:04
of tie this into the talk today that's actually talked about one particular implementation of storage under de Tomic that's interesting because we0:40:14
desperately wanted to support Rioch it's very popular it's a very high-quality product we have a lot of customers who are interested in using it but Rioch is0:40:23
only eventually consistent at the moment so it's not actually a store that can satisfy all the requirements of the atomic so in order to make the atomic0:40:32
run on react we had to build another storage service and have two other services I think this is the best thing ever right because I don't want to write any of this stuff but I love the fact0:40:42
that things like react and in this case we chose zookeeper exists do one thing do one thing really well have really great semantics and are things you can0:40:52
use as building blocks this whole notion of well I'm using react and therefore my world is react you don't need to do that right react is a tool it can you can use0:41:02
it for its own benefits and use it as a piece of a bigger composite thing and that's what this does so react has these properties it's redundant it's highly0:41:12
available it's elastic it's distributed it's durable but it's eventually consistent we need to supplement that because we need these pointers we need0:41:22
to store these pointers somewhere and they need to be written in a consistent and actually with kaz way and so zookeeper does that zookeeper is both0:41:31
redundant and durable and consistent it doesn't scale like react does it's not actually for that kind of storage it's for very small amounts of storage but0:41:40
that's what we have I have this beautiful thing we have all this immutable data and possibly tons of it with the tons of readers we have a tiny tiny tiny little bit of mutable data it0:41:49
has to be manipulated consistently as very infrequently read that's exactly what zookeeper is for it does exactly that job so it looks like this we keep0:42:01
the values in react everything we put in react is immutable it's those big chunks of index and we put the identities in zookeeper a couple of pointers per0:42:12
database that point to where roots are in react and and we use zookeepers cast semantics to make sure that they're updated in a consistent way0:42:22
and viewed in a consistent way so everything you put into react is immutable these are some detailing people know about react or dynamo okay0:42:31
so it's just a little bit of information about that if this makes no sense to you it's fine right but we can we can presume you know n replicas or three right we right with with the quorum0:42:43
right up to right but the really really interesting thing is the read side we read R equals one yeah who starts0:42:53
worrying R equals one you wrote you know n is 3 and W is 2 and R is 1 that's not what I read in the Dynamo paper for consistency but everything changes I'm0:43:04
Ike Nygaard mentioned this earlier everything changes if all you ever write is something immutable but then there's only two Possible's things it's there or0:43:16
it's not there's no other possibilities as know it was updated there's this vector clock this is causality of how it got to be this way there's none of that0:43:25
right it's there or it's not if it's there or it's not you can read with R equals 1 as long as you have one additional semantics which is if you0:43:35
don't find it try another guy because as soon as you find any value of it you have found the value of it unambiguously0:43:45
that's super efficient and really really clean so we do R equals 1 and react has not found ok as a flag if you set that0:43:56
to false it means if it's not found on the first read it will try another and only if it exhausts and will it come back and say I'm sorry it's not there that's coupled0:44:07
with another thing which is do we ever look in react for something that might be there I might not do we do any spec YouTube looks speculative lookup no0:44:16
never right we found the root in zookeeper said the root is ABCDE you know 1 2 3 4 5 when we go to look for that in react0:44:25
we expect it to be there we're not randomly picking a number out of the ether and saying do you have this do you know this you're never doing that0:44:34
you're always saying somebody told me you had this give me it when we get that value we get this block of pointers to other values guess what they should all be there so the combination of0:44:45
immutability and these semantics completely change the way you can use something like react and drastically up the consistency that you get because if0:44:55
you have rough availability to the data and your reads are all satisfied you know you are always getting something that is consistent from the application0:45:05
view from what McKenna guard called the predicative notion of consistency that this set of data matches the business requirements notion of what constitutes0:45:15
a consistent data set you're never seeing half of one tree and half of another tree you started from a root you found all the things that were under it that's a consistent view of the world0:45:25
the full stack for the atomic looks like this I'm not gonna have a lot of time to talk to this talk about this except to show you that there is also a restful interface which is more client-server0:45:34
oriented but this is so just some of the cool things that you can get once you have the database as a value right in the first case we're using Java one of these first class peers we say0:45:44
connection get the database and we can say dot as of some point in time we can ask for one of the inner rings and then we get a value of the database that is0:45:53
that set of inner rings right that set of data from that point and prior and we can issue queries through it over and over always with the same basis every0:46:02
quarter we issue to that value of the database is has the same basis and that even works when you go over a client-server protocol like the restful0:46:12
client you actually can have permalinks for databases I want to go back to this outer remember this database I want to tell somebody I think this database was messed up and0:46:23
I can send them a link and three weeks later they can go look at that link and say oh yeah that looks bad let's fix our code and run it against the same0:46:32
database and see if it's better right so it's communicable you can recover it and there's all kinds of things you get from treating the0:46:41
as a value you can say as of a point in time in the past you can window it right you can take a value of the database and say I wonder what this database would look like if I made these transactions0:46:51
you can do that completely locally right you don't have to talk to the trans actor or the server you don't mess up anybody else say I have the value of the database I'm thinking about putting this0:47:00
data in I'm gonna actually say give me that database with this data and then I can issue some queries and say does that still do the Cori still work or does0:47:10
this meet my you know requirements okay good now I'll really put it in buying the trans actor to the version of the database everyone can see the other0:47:19
thing that's key is that everything about database flips around database is an argument to query it's not the ambient container for a query change you're gonna have queries that involve0:47:28
more than one data source including things in memory so I think there's a ton of simplicity benefits you get from this I don't have time to really dig into them but transactions are0:47:39
well-defined we only have coordination for novelty not for perceptions you can put your storage anywhere you want and you have a lot of freedom about how that works and0:47:48
how it scales you can cache anywhere and process is reified which means both you can look at it and see what happened and you can transmit it around and therefore0:47:58
build reactive systems so I think the net approach you get out of this is that things are a lot less complex you get a0:48:07
lot more power you can see the scalability of query and reads you can take advantage of this great technology like dynamo databases and things like0:48:16
that and I think you get an information model that's that's a lot more sound so hopefully this has given you some ideas for your own architectures and thanks0:48:26
very much you0:00:00
Reducers A Library and Model for Collection Proc - Rich Hickey
0:00:00
thank you all for coming I'm gonna talk about reducers today as you know this is0:00:10
the substitution because somebody couldn't make it to talk about Lmax disruptor so they got me in because they0:00:19
want to make sure they still had a talk that would filter out only the most hardcore people I don't know if you'd get a t-shirt when you leave or0:00:28
something but that and that's where you are I just quickly before we get started how many people here know closure how0:00:37
many people do not know closure of the people who don't know closure or do you use a language like Scala or Haskell that has higher-order functions who here0:00:46
does not use the language or have access to a language with higher-order functions nobody alright self-selecting crowd it's perfect while0:00:56
this this library is fundamentally a closure library the ideas that underlie it would cross any language that has higher-order functions and and collections so the the motivation for0:01:07
this work is is simple its performance there are two things we're looking to get by moving to a different framework0:01:18
for manipulating collections one is just to have some efficiency improvements improvements over lazy collection0:01:27
processing which is what closure does by default but the bigger motivation is to try to move to a set of collection operations that can be parallelized and0:01:37
eventually leverage fork/join because as we know computer clock speeds are stuck and I think everybody knows the old joke about you know how do you make your0:01:46
program faster you wait you wait 18 months and the new computers make it faster but that joke is now not funny0:01:56
anymore right because that's not happening anymore and so the trick is foreclosure and and all0:02:05
the programming languages how do we get that back I mean they are making denser chips and the chips have more cores on them but that's not necessarily helping oh it's not helping a sequential program0:02:14
at all right a sequential program with the same clock speed on a new computer does not get any faster and we've been sitting in sort of a queasy interim0:02:24
period here where you know our businesses are buying new computers and our software is not getting faster but eventually they're gonna be like hey wait I do remember 10 years ago when every0:02:34
time I gave you a new computer all the software was twice as fast that's not happening anymore and what are you going to do about it and and it really matters0:02:43
all programming languages that want to be viable in the future have to have an answer for this so what we're trying to do here is provide a model for0:02:54
collection processing that's as similar as possible to what we have already because we like that and people are familiar with it and to leverage parallelism there are0:03:06
two big inspirations for this this library one is the Haskell literate II and enumerator work this is some great0:03:15
stuff this papers are linked to from that first link describing the works um but some of its pretty dense admittedly but it was it was the first thing I0:03:25
remember that took the work of reducing operation and made it really really did to a thing that you could talk about and say this is the properties of that thing0:03:34
they it arati the other thing that was really inspiring was guy steals talk from 2009 if you haven't seen it you should definitely watch it0:03:43
he's so mild-mannered but essentially it was a condemnation of what we're doing okay basically saying all of the sequential processing we're doing it it0:03:52
doesn't matter what language you're in whether you're in Java and using iterator or you're in closure or Haskell and you're using map and fold left or fold right these these things are all0:04:03
inherently serial and therefore not amenable to parallelism and we have to break break free of that but what do we do do we new do we need new collections so we need0:04:13
new algorithms do we need new libraries so just a little bit of history of how we got here again this sort of has a functional programming bent but it's you0:04:23
know that's the history of programming to write lisp was an early influence and it was built around one data structure called the list and it was had a0:04:32
mathematical basis for processing that was based heavily on inductively defined data structures and recursion and closure is derived from the sera to0:04:42
Jazz's haskell and and in scala and enclosures case we have sikhs and laziness and and all of these things0:04:51
list some recursion are inherently sequential they're about take one thing do something to it then you know connect that to the results of processing the0:05:00
rest of the stuff in a row it doesn't actually matter if you're doing this imperative lis by bashing on an accumulator or functionally by0:05:09
generating successive results that those both are still sequential operations it's not like functional programming has made this non sequential you know0:05:20
miraculously but the future is more cores as I said before the speed has to come from parallelism and and do we need0:05:30
something do we need something else from the model so we should look a little bit at the model we started with bashing on accumulators right we had assembly0:05:39
language you had a little register there and you'd go to and it was like a party right just keep going around around bashing on that thing until it had the0:05:49
answer that you wanted then you were done which is a sequential operation and languages like Lisp and and the successors and in functional programming0:05:58
languages lifted that up to higher-order functions operating on lists eventually languages like closure0:06:08
and and others lifted that again to say well you know what we don't really we don't really need your data structure to be a list if you could meet us if you0:06:19
meet the algorithm at some interface or abstraction that the algorithm was happy with we could process you and so that that abstraction for closure is the0:06:30
sequence but for other languages it might be the iterator or the stream or something like that but the idea was if a collection can turn itself into something that can be sequentially0:06:39
accessed then you could take an algorithm that was built around sequential access and connect the two together so now you have higher-order functions operating on collections and0:06:48
those functions have names like map right map takes a function and applies it to every and every elements of a collection and gives you a result for0:06:58
every element as a new collection usually as a new sequence type or list where we want to go in evolving this model is to have even more independence0:07:08
from the collection representation like to get as far away from collection representation as we can but the real key for getting to parallelism is to get0:07:18
out of the order business and that order dependency is really where we struggle right now in the in the way our our0:07:27
current operations are defined so this is a this is that map function I talked about right map being a function that takes a function and some collection and0:07:38
it will you know this is this is a classic definition in Lisp but it basically means take the result of applying that function to the first item0:07:48
in the collection and attach that to the result of mapping that function on to the rest of the collections this is the classic recursive definition of map and0:07:59
while it's a very small function basically a one-liner it does way too much right and and it0:08:08
possibly promises too much first of all it works recursively it relies on order in several respects right so that called0:08:17
to first in the call to rest is is utilizing an abstraction that the collection provides for accessing its elements in order the whole notion of0:08:26
first is an order sensitive thing enclosures case this this function is also lazy and promises to be lazy so0:08:36
that also has water dependencies right so if it's lazy that means I have an expectation that if I haven't consumed so much of the result only so much of0:08:46
the work has been done again there's order in that sentence you know so much of the work from the start from some point the beginning this this definition0:08:58
is based upon a closer abstraction called a sequence but but logically this classical definition is map is defined0:09:07
in terms of a list right same thing for Haskell right map is defined in terms of a list and a singly linked list so it consumes a list it's passed a collection0:09:16
in closures case we don't have to have an actual list we can meet at the sequence abstraction or iterable you can say anything like that but it consumes a sequential thing it's defined by saying0:09:26
I will consume a sequential thing and in addition it builds a list right so map has a return value what does it return it's going to call this function it's it's gonna make an answer for the0:09:37
first the first element in the collection and it's going to you know attach sets to the results of doing the rest it's actually building a singly linked list in this case and that's what0:09:48
cons does it builds little linked list cells as it goes so this says this does too much now there are other functions0:09:59
that also process collections and process sequences a very important one is reduce depending on your programming language this might be called fold left0:10:09
or fold l and this function takes a function and an initial value and a collection and the logical model is0:10:18
similar to map I mean the classic definition is also a loop over a sequence and what reduce does is it says0:10:27
take the initial value and the first item in the collection and apply the function to them so you get an answer so Pharr then take that answer so far and the next item in the collection and0:10:37
apply the function to that and you get the next answer so far and keep doing that till you process the entire collection different from math where you get an answer for every elements in the0:10:46
collection reduce is going to produce one answer for the entire collection so you can for instance say reduce with the function plus with the initial values0:10:57
zero and some collection that's that's a way to say sum using reduce Michael is gonna add the first add the first thing0:11:06
to zero and the next thing to that result and the next thing to that result and so forth and you've now added all the numbers using a higher-order function like reduce enclosure we've0:11:17
moved away from that definition of reduce based upon a sequence definition I notice it's not it's no longer meet me at the sequence abstraction instead this0:11:28
uses a closure technique called protocols which is an open form of polymorphism if you're working a language that has type classes or something morally equivalent to that0:11:38
it's the same idea there's a there's a there's a protocol that says this is the way you can tell me that you know how to reduce yourself and the cool thing about0:11:48
protocols is you don't have any derivation stuff you can say strings know how to reduce themselves by externally defining this protocol for0:11:57
strength without touching the string class so defining reduce this way is really powerful because now we've gotten some independence of the color of the collection structure right we're not0:12:07
actually meeting at an abstraction we're not meeting at the sequence abstraction reduce itself you know I gave you the example of some but reduce can build anything you can reduce where each step0:12:18
you know add something to a collection and the result of reducing is actually another collection that's done all the time so you can really build anything with with reduce although another0:12:27
critical aspect that reduces that it's not lazy so we're stepping away from laziness but it's not going to get us all the way there because we said you0:12:38
know reduce starts with that in it and it says take the in it and the first thing so those those this is those phrases are now still full0:12:47
of order dependence so we need to get away from that well but at least this definition let's the collection drive it's gotten to some collection in independence so I want to talk a little0:12:58
bit about the reducing function for people who may not be familiar with it because this entire mechanism is based around manipulating these reducing functions so we said you could reduce0:13:09
with plus to get a sum so plus is a binary operator it just takes two things and returns an answer and a reducing function is essentially that right it's0:13:18
a binary function that takes two things and returns an answer but there's a certain interpretation or semantics to the arguments to the reducing function0:13:27
that is that the first argument is the result so far and the second argument is the next thing to incorporate into the result and it ends up that that that0:13:39
actually matters quite a bit when you want to try to define things in terms of manipulating reducing functions because plus I mean there's no real difference between two arguments two plus right you0:13:48
know you can swap them around so it doesn't have the semantics but we're reduces use of the function plus has this semantics it considers the first0:13:58
argument to be the result so far and the second argument to be the new input right and I described what reduced us before everybody clear I'm reducing what0:14:08
a reducing function is because it's gonna get a lot thicker fast okay so what we're trying to do now is to is to0:14:17
come up with a new definition of things like map and filter and these in these classic functions that are completely collection ignorant and order0:14:26
independent and so the idea is to build on reduce we already saw that reduce as currently defined it's kind of like a universal collection manipulator and0:14:35
using something like a protocol or a type class means that you can pawn off any collection knowledge on somebody0:14:45
else so now your algorithms going to be independent on that but we want to avoid the order of problems of reduce because0:14:54
map and filter really don't care about right nap is just supposed to produce a result for everything in the collection filter is just supposed to put supposed0:15:03
to produce you know some new logical collection that might be missing some stuff they don't actually care about the order they just accidentally had the0:15:12
order incorporated but there's this so so we can get that out of out of using reduce by just ignoring the fact that0:15:21
reduce as classically defined has an order Det you know definition and order dependence in other words if you define map in terms of reduce and you ignore the fact that reduce promises to do it0:15:31
in a certain order well you haven't you haven't been poisoned by that order dependence so we're gonna start by building on reduce and eventually eliminate the order0:15:40
dependence but we have to start by ignoring it we have to make sure we don't write any code that cares about it operating on the first thing first and the second thing second we're still left0:15:51
with the fundamental question though which is if we're gonna redefine map in terms of reduce what should we return right because we had to ordering problems and to collection dependence0:16:02
problems with map one was on the input side we depended on taking a list or a sequence or something that could meet us there the other was on the return side we had to build something concrete and0:16:12
and it ends up that the answer to this question is to define map in such terms that it doesn't actually build anything concrete and that takes us to the0:16:24
fundamental idea the fundamental idea is that we can define map and filter instead of as functions of collections0:16:33
to collections that produce concrete collections as something that instead takes a collection and changes what0:16:43
reduce means for that collection in other words you have some collection maybe later you're gonna reduce it if0:16:52
I'm at some function on that collection I could implement math by just changing what reduce will mean later for that collection and and every time I get to0:17:05
this point in the talk I get the same look come from the audience they're like what just happened what are you talking about it was like so good now it's all bad and0:17:16
so the very first time I gave this talk right now I made up a thing and it seemed to have worked it's worked three times so I'm gonna do it again I'm gonna talk about the guy who makes pies so there's this0:17:27
guy who makes pies he's the pie maker guy and he has an assistant the pie maker assistant guy and the assistant has a bag of apples0:17:38
right and the pie maker says to the assistant I'm gonna make pie out of those apples but when I do you have to0:17:47
make sure that you take the stickers off the apples because they all have stickers now you know they're organic and to make them so that you know they're organic they put this nice0:17:56
inorganic sticker and glue on them right organic so take the stickers off right and and only keep the good apples don't0:18:07
keep the bad ones right so those two things the first one take the sticker off every Apple what is that that's mat right and throw out the bad apples and0:18:20
only keep the good ones what's that that's filter right so the the the the pie maker has told the assistant to map and filter the bag of apples now the0:18:30
assistant like any good programmer slash pie making assistant is lazy right he's like all right I could right now take0:18:40
every Apple out and take the stickers off and put them in another bag and then I could take that bag of apples and I could go through that bag and take the rotten ones and throw them out put into0:18:49
another bag and then I have a bag and then what was time to make pies I could just hand those apples to the pie maker or I could go play minesweeper for a0:19:02
little bit and when it comes time to make the pies when the pie maker asked me for an apple I'll take one out and just like take the sticker off and look0:19:12
at it like okay here you go I'll do it then I'll just wait in other words that that instruction to map remove stickers and filterforgood apples0:19:24
is a recipe right that can be applied only when we reduce the bag of apples into pie we don't actually have to make0:19:33
different bags of apples we're not to move the apples around all right that's the idea behind this library is that we can implement map and filter by0:19:42
just changing what it means to make pie all right so what does that look like the library actually doesn't do this0:19:51
code exactly but I've written this code explicitly because macros make this stuff go away and you can't see it when you look at the library code but the the0:20:00
real key here for me in designing the library was to try to find what is the fundamental definition of what is mapping what does mapping mean does it0:20:09
mean to map something and so if you just if you tried to say if I was going to have a function called mapping that implemented this idea what would it look0:20:18
like so it would be a function we want to map some function all right that function might be something like take the stickers off the apples right we0:20:27
wanted to find this function by returning a new function right that says it when you give me a reducing function0:20:39
like make pie right I'm gonna give you back a reducing function right which is a function of a result and a new input0:20:49
that will call the function you gave me make pie but before I do and this is the pie so far I'm gonna take the sticker0:20:58
off that's the essence of mapping right it's given some function take the sticker off and we're gonna say now that it what it all it does is it changes the0:21:09
meaning of a reducing function giving some reducing function now I'm gonna return a different one that's gonna actually use the reducing function I was given after I've mapped that function on0:21:19
it on the input this is the essence of mapping what's really cool about this where's the collection0:21:28
what collection it's gone right so that's good but mapping is probably the easiest case there are two other categoric a PSA's that matter right one0:21:39
is where you have some stuff and you end up with possibly less stuff that's what filter does so can we do the same thing for filter what is the essence of filtering right filtering has takes a0:21:50
predicate right is the Apple good or is it rotten right and same thing we're gonna say this is going to return a new function that when passed the reducing0:21:59
function like make PI will return a modified reducing function so this is just transforms it takes the make PI and it says make pie out of non rotten0:22:10
apples right it's just transforming your reducing function it's gonna take it and return another one and it's going to do the predicate it says if the Apple is0:22:19
good included in the pie otherwise what don't what's really critical here though0:22:31
and you know we just had to talk about MapReduce does this return like an empty bag of apples or something like that a little ziplock right does it does it put0:22:41
every Apple in his zip lock and then sometimes gives you an empty zip lock no it does not this is very important there's no junk0:22:51
in the middle of this filtering means don't use it not make emptiness okay so that's the filtering case and what about0:23:00
map padding or flat napping it depends on what language you use and what you'd call this so what is math cat or flat map right is like map it takes a0:23:09
function applies this to every element in the collection but the presumption is that that function returns for each element a collection itself and what you0:23:18
want is all the contents of those collections without the collections around them anymore that's what flat map or map cat does so how do we do that that's the other tricky case right that's the expansive0:23:28
case so we've looked at the one-to-one case right we've looked at the possibly reducing case or I should say eliminating a filtering case and now that we have the expansive case so it's0:23:38
the same thing we're gonna say if somebody gave me a good example of what this was Oh chopping the apples up into pieces that's it so that's what we0:23:48
want now we're gonna say we want a map cat slice up the Apple and it's gonna produce more than one slice for Apple which is going to take some reducing0:23:58
function like make pie it's gonna return a new reducing function that given the pie so far is going to slice up the0:24:08
Apple and then do what put every slice into the pie and so we can just use0:24:18
reduce to do that job in other words what does it mean to have an expansive transformation function it means operate on the result more than once so0:24:28
filtering might not operate on the result and expansion might operate on the result more than once but no this doesn't actually return what from each0:24:37
guy a little baggy of slices of apples right no it just puts the slices right in so it's very important this is this0:24:47
is much different than what you see from some other libraries that are sort of mapping oriented that must produce collections for every step so I think one of things that's really critical0:24:57
about this is to look at what's in common between these things right so what what's all this stuff so what what0:25:07
we call that the legal term boilerplate right we're working the language with macros what's going to happen they're0:25:18
gonna become victims of macros that's just gonna go away when that goes away when macros meet make that something that you don't have to write what exactly do you have to write what you0:25:28
have to write to write to define what map is just that little purple part right is there possibly a smaller0:25:37
definition of what mapping is than that I don't think so how about filtering just the purple part0:25:47
you just write the purple part right when you get your fancy ooh parallelism libraries from your language designers ask them what do I0:25:56
need to write to write my own things my own extensions my own parallel operations can I just write the purple part please because that's all I want to0:26:05
write same thing here this is the essence of flatten mapping by defining it this way as these reducing transformers so that that's basically0:26:16
good I mean we like the fact that we can handle these three cases that's starting to smell like an answer but the use right now is still awkward right because0:26:25
we said that mapping is a function of the reducing function so for instance if we wanted to sum this collection but0:26:34
first increment to everything in it normally we would say oh mac math increment over the collection and then reduce with plus write the Sun that's why I said something is and that's what0:26:43
this does but it's kind of backwards right because what we have to do is say mapping Inc transforming plus first and then apply that then reduce with that0:26:53
that's not actually what we want to do that's awkward right we're used to enclosure in most other languages mapping increment across0:27:03
the collection and then reducing the result of that so we need to churn mapping into a function not of the reducing function but of the collection but it's still going to work this way we0:27:13
just need to move it over so what we want and what we're used to is for math to be a function that takes a collection returns a collection but we said we don't actually want map we the guy is0:27:23
lazy he's playing my minecraft or minesweeper I said mine so please minesweeper minecraft he's a modern PI0:27:32
making a system he plays minecraft and he does not want to make an actual bag of unstick or non rotten apples he's0:27:42
busy so we need to revisit our notion of what constitutes a question what's true of every collection0:27:53
that's a hard question right what is actually true of every collection do they all have stuff in them not necessarily can they all give you a count mm-hmm can they all oh they're all0:28:05
iterable or something a bowl or do they all have you know and almost nothing is0:28:14
common with all collections it's very very tiny no of course Java says this huge pile of stuff is kind of for collections but that's just Java being Java but logically there are very few0:28:25
things a collection might have stuff in it and in particular it might have more than one thing in it that's really all that's true of every collection it might0:28:34
have more than one thing in it and so what we want to do now is we want to say we wanted to find math like we used to taking a collection returning a0:28:43
collection we're gonna have to get clever about our definition of collection and what I'm going to say is if you have stuff in you and I have something like protocols or type classes0:28:52
which say if you have stuff in you and I ask you to apply this function to reduce yourself with this function you can your a collection that's good enough for me0:29:01
right because I can build a pile of functionality on top of reducibility so we're going to change the definition of collection we're not gonna say a collection is iterable we're not gonna0:29:10
say a collection is countable or add any other fancy stuff we're just gonna say the minimal definition of collection is that your reducible right which means0:29:20
that you support the reduce protocol I can call reduce on you that's it so now we could say could we make a definition given one of these0:29:30
transformers could we make something that is reducible in other words we need to make a collection now I don't want to actually make a bag of apples I want to0:29:39
make a recipe for something that would reduce into a nice pie I don't want to make a concrete bag so the answer is yes0:29:48
right because this protocol is open closure has this thing called reify that lets should create an instance of a protocol you can consider it to be like0:29:59
the moral equivalence of making an anonymous inner class right that's it that implements an interface right you can make it instance of this protocol just from scratch with some code and that's what0:30:09
we're gonna do we're gonna say a reducer take some collection right and some transformation function so collection0:30:20
here is bag of apples transformation function is something like mapping like the ones we defined before and we say that can return a collection if you're0:30:29
gonna tell me only a collection is only reducible then I can return a collection given those two things by saying the definition of reducing this collection given some reducing function like make0:30:40
pie and an empty pie 10 is to ask the collection itself the bag of apples to reduce itself with the transformed make0:30:52
pie this is now make pie after taking the stickers off the apples and here's the empty pie 10 in other words this is a recipe for a new collection that0:31:02
itself is reducible Nords I can ask this thing to reduce itself with make pie and so as if I had a bag of apples that had all the stickers taken off I don't0:31:11
really I'm just gonna on-the-fly modify the reducing function to first take the stickers off with this mapping we defined before this thing now returns0:31:23
something that behaves like a collection insofar as its reducible it means that we can write this now we reduce with plus starting with zero so this is some0:31:32
first making a reducer around the collection using mapping with increments0:31:42
this means increment to everything in the collection and then make a new collection that's a logical collection out of that and reduce it's not quite what we want we're gonna fix that in a second but the key thing is this this0:31:53
reducer makes a logical collection that itself is reducible because it implements this protocol by just transforming the reducing function the way we'd showed we know we can do0:32:03
mapping and filtering what about this is going to change when we move from mapping to filtering does this even know it's doing mapping does it know this is0:32:13
mapping or filtering or math catting or any other collection no it doesn't you can define reduce reducer once so we0:32:23
have some boilerplate on the prior slide we have the definition of reducer once and once only this is a universal collection maker it can make a collection out of any collection and any0:32:33
reducing transformer and now we've moved at least we've moved the transforming to be some function of the collection as0:32:42
opposed to a function of plus so we're close we're very close in fact it's just one more step to have something that0:32:51
feels like map used to feel so I just call these are maps so they wouldn't clash with the closures map but math in the new model in the reducing model takes a function or collection just like0:33:01
the old map used to and it returns a new collection by making a reducer on that collection using the transformer on the0:33:12
reducing function the transformer that's based around the mapping function right that's what map is filter same thing Naf cat same thing look at these things how0:33:22
similar are they totally what's going to happen all that the victim of a macro again it's going to go away so we have0:33:32
all the boilerplate on both sides of this disappears there's one function called reducer that makes a collection given another collection and a0:33:42
transforming function we have a universal recipe for making reducible collections from other reducible collections based around a set of0:33:51
transforming functions that don't know anything about collections at all not at all and now our invocation looks the way0:34:00
we're used to at least if you're a closure program this is what you're used to right you say map ink across this and then reduce that result with plus you know you just function application and0:34:11
you just do this and you can compose these things same thing filter looks the same thing a filter called filter out to even the numbers and then sum those or you know get me ranges of all these0:34:22
different lengths and some of the some of the ranges flattened so now we're pretty good we now have a recipe for all these functions that's completely0:34:31
different than it used to be but it the invocation looks exactly the same as it used to be and we've gotten rid of all the stuff except reduce we know0:34:40
inside is still sequential so this is actually faster than the closures lazy code because this is not lazy anymore it does not do any allocation per step0:34:49
right you saw the way those transformers work they weren't building boxes they weren't building lists around things that all there's no allocation per step there's no stuff in the middle there's0:34:59
nothing so it is faster there but I keep forgetting to change the slide so it's supposed to say where's the PI now so where's the PI right I started this talk0:35:09
by saying the whole point of this is to get to parallelism but reduce itself is not parallel so we now have map which is cool but not parallel but what's really0:35:22
neat about it is it's not actually doing though job right math no longer does the work right then when did when does mapping happen only later when you0:35:35
reduce it right when does you know when does the stickers get taken off not right now playing Minecraft right later0:35:44
right so no concrete work is being done so even though reduce is sequential and when we call reduce later it's sequential we didn't actually put anything in mapping or filtering that0:35:55
cared so maybe we can use them again in a different context and that's how we get to the the real prize which is something called fold and of course fold0:36:07
is an old word and but typically in programming languages it means a something concrete either fold left or fold right and or they'll call them fold0:36:16
Elif rolled R and reduces fold out closures reduces fold down lisps reduces fold al fold is an abstraction of folds0:36:27
generically that specifically says I might work in parallel right so it's a it's a it implements its semantics by a0:36:38
threat I hereby threaten you that I might operate in parallel right when I do that I say that what does it mean of your0:36:48
reducing function might operate in parallel better be associative right0:36:59
that's gonna need to be associative so what fold is is like a potentially parallel reduction you might not always get parallelism there might be reasons not to paralyze you're too small0:37:08
parallelization overhead would dominate I'm not gonna bother to go that way or you're trying to fold the collection for which there's no it's not amenable to0:37:17
parallelism like a linked list a singly linked list is not amenable to parallelism that's okay just we won't do it in parallel so what fold is is it's0:37:26
it's like reduce right you're gonna take a function and apply it to a collection it's gonna reduce the collection pairwise but it's not going to start at the very beginning it could potentially0:37:35
happen in parallel and the way fold works is it uses a reduced combined strategy it very specifically does not use a Map Reduce strategy and I'll show0:37:46
you why in a second under the hood the implementation of fold leverages the Java fork/join framework so it's going to do it's going to partition the work0:37:56
and use the work-stealing capabilities of fork/join how people know about fork/join okay i'll show you a little bit unfortunate in a second so so fold0:38:06
it's very similar to reduce right it's a function that takes a reducing function in a collection but fold also can optionally take a combining function it0:38:16
can it actually has another optional argument that happens earlier but we're gonna talk about this three argument flavor for now which is takes a combining function a reducing function in a collection right and fold just like0:38:28
reduce is also implemented in terms of a protocol so it's open system basically you say I know how to fold myself if you give me these two functions I can fold0:38:38
myself so fold just says ask the collection to fold itself using those two functions this is a partitioning size which is optional and has a default0:38:47
that's why you're not seeing an anywhere here so just like reduce it asks the collection to do the work and it uses the protocol to do that the logical0:38:56
operation of fold is to segment the collection two pieces right then to run multiple reduces in parallel on those pieces0:39:05
using the reducing function then using the combining function to pairwise combine the results so you get a single0:39:14
result that's how fold works and that looks like this so you can imagine the collection is a big continuity here0:39:23
chances are in reality it will be a tree of some sort it doesn't really matter but in a functional programming language it will usually be a tree but of course this works on arrays and things like0:39:32
that logically it gets segmented it you don't actually produce these segments but you're just going to divide it up so you know sub vector or sub range is you0:39:41
know it's the same idea you know this the big thing is still there but I've got a window to a little piece of it but I'm gonna operate on then we're going to do independent reduces now we know reduce takes an0:39:51
initial value right and it operates on that value in the first item then the result of that in the next item because0:40:00
we're going to be firing off a bunch of reduces in parallel we need a bunch of initializers have to remember that because that seems tricky where do they0:40:10
come from then we're gonna use the reducing function on those so that's a straight reduce it's exactly the straight reduce an important characteristic of this0:40:19
design is the fact that while in a theoretical model for fork joint what happens is you take the work you have to do you divide it in half you divide the0:40:29
halves in half and you keep dividing in half till you're down to one and then you do the work on one and then combine it with the other in practice you never0:40:39
use fork/join that way because the overhead of processing individual items at the bottom would dominate so you always stop short of that so the0:40:48
practical use of fork/join always segments and stops at some segment size which is greater than one usually much greater than one but a lot of the0:40:59
designs around fork/join pretend as if it's going to go down to one right I don't see the point of that right if you're gonna go down to N and the best0:41:08
way to process n things is with reduce right why pretend you're going to down to one and proccess the bottom things with Matt because you're not and when you do0:41:18
that you end up with all the goofiness right with empty baggies and baggies with slices in them and like all those extra collections and stuff like that we know we're going to reduce at the bottom0:41:27
we know we're going to have a collection at the bottom and the best way to process a collection is to reduce it so this library is defined in terms of reduce at the bottom not map then that0:41:39
will get answers for that this reducing function can be arbitrary right so if it was plus these might all be numbers right if it was concatenate to an empty0:41:48
collection these might all be collections doesn't matter this is very general right reducing and then combining and then the combining function is another binary function0:41:57
right that takes two of whatever those are and return something and then so forth and so forth and finally you get a result so this is really good right0:42:09
first of all it breaks free from the ordered single pass right there's no initial value the big difference between fold and reduces there's no initial value for the whole job right reduced0:42:19
took that initial value right the fold doesn't fold doesn't take in it up here that's what's missing doesn't have an initial value so we've broken away from0:42:31
the ordered single pass right that was the last piece if we had an initial single value we'd still have to go in order now we don't we we have the threat of parallelism we can do it in separate0:42:42
pieces but we do have this question right where do the seeds come from right what these reducers have to start with something what do they start with who0:42:54
cares about what they start with who's gonna ultimately need to consume0:43:03
that the reducing function has to be able to process it but maybe there's nothing to reduce right which case it would flow0:43:14
through who really cares a lot these guys they care a lot right this value here must constitute a sort of magic0:43:26
kind of value for these combining functions because imagine you had a segment here we after we filtered out all the rotten apples there were no rotten apples right this value here has0:43:37
to flow through this function and do what in it represent nothing right it has to disappear to that function so0:43:47
there's a so what happens in the library is we need to find we need to have an initial value for each of those things we're going to obtain that initial value0:43:56
by calling the combining function with no argument so when you call the combining function with no argument it return something that thing has a name it's called an identity right so if you0:44:07
think about it the identity for plus is zero right you can add zero to anything and the result is that thing so it's like an OA value for the operator right0:44:18
and what you need for a combining function is a binary function of two things for which there's some identity right so the identity for addition is 00:44:29
the identity for multiplication is what one identities for like collection things are usually like empty collections and stuff like that but it's very important actually that this not be0:44:39
a value right you'll you'll hear this talked about as being a value and the mathematical definition of this thing is called a monoid which every time I say0:44:51
it people like ok you know do just go tell me to you know oh ma noid means is exactly what I'm saying it's a binary0:45:00
function it has an identity value that disappears when it's used to combine that's what it means they have fancy words for all these things so it's great if you want to be0:45:10
fancy but that's all it means it so that's the property we expect the property we expected this function that the reducing function combining functions must be associative binary0:45:20
operators and the combining function killer also have an identity value that you can obtain by calling it with no arguments but by calling it instead of saying I'm gonna give you the identity0:45:30
value you can actually have identity values that for instance are immutable if you want to do a very high-performance folding job you might want to have arraylists in there well if0:45:40
you gave every step here the same ArrayList what's that going to do it's gonna make a mess right they can't all have the same ArrayList have to call the0:45:49
same constructor they don't have want to have the same value so we get we get the values that way so now we have an idea we talked we0:45:58
talked about a reducer right a reducer is just something that takes a collection and this transforming function and makes it into a something that itself is reducible we have the0:46:08
same notion here a folder is something that takes a collection and some transforming function and and make something that itself is foldable that's0:46:18
all it ends up being the case that as long as the transforming function doesn't care about order right map doesn't care about order filter doesn't0:46:28
care about order take actually compares a cares about order as well as the transformer itself doesn't care about order then any reducer is actually also0:46:38
a folder anything that's reducible is also foldable mapping a mapping of something is foldable right because map doesn't0:46:47
care about order so we can define something that looks a lot like reducer right called folder it takes a collection and a transforming function0:46:56
remember those are like that mapping stuff it's going to reify both reducible and foldable we can just think of them0:47:05
that way both of these protocols and it's going to it's going to implement fold the same way it's gonna say if you0:47:14
ask me to fold with this combining function in that reducing function I'm just gonna ask the collection to fold itself with the same combining function0:47:23
and with the modified reducing function right so if it ends up that the pie maker and the pie maker assistant are0:47:32
like master jugglers right and the pie maker can actually catch more than one Apple at a time and the pie making assistant can actually0:47:41
deliver more than one apple at a time they can have apples multiple apples flying through the air at the same time and make pie at twice the speed of0:47:50
anybody else because they have all this juggling capability because it has nothing to do with take the stickers off he takes the stickers off and he throws them you know at the same time he has to0:47:59
be pretty agile to take the stickers off one-handed but he's you know he's got a lot of spare time all right so this0:48:10
looks exactly like reducer before it's just a generic thing that says I can make a folder out of a transforming function in a collection by reifying this protocol and instead you'll find0:48:21
that map and filter in the library are not defined in terms of calls to reducer as I showed you before but in terms of folder which makes them both folders and0:48:31
reducers I think can be used for the both of these jobs right so the cool thing here is we didn't touch mapping or0:48:40
at all right or map that we're all done with that by making that completely collection independent and end and0:48:49
simultaneously operation independent are you reducing are you folding map does not care mapping does not care neither this map because all map does is make a0:48:58
recipe for something you can do something with later and it's going to ask the thing to do the hard work although it needs to do is transform the function in the first place you've made0:49:07
these completely orthogonal that's huge right collection representation water independence operation independence and0:49:16
transformation all are independent completely independent we do not have parallel map right we don't have0:49:25
parallel collection don't have those don't need them what you need is just to take stuff apart far enough so you can put it back together however you want0:49:35
and that's how this works has some very neat properties right it's composable so we can say instead of saying filter the even numbers on the collection and then0:49:44
map on that collection and then reduce it you can just say make this recipe for me this recipe is you know filter out the rotten apples and and take the stickers0:49:55
off and we can compose those two things you notice that these two calls are missing the collection right there curried they don't have the collection0:50:04
so what you end up with is a recipe for taking the stickers off and filtering out the rotten apples or filtering out only the good ones right that doesn't0:50:13
yet have the job or the bag of apples it just has the recipe you can write this recipe down you can give it to your0:50:22
friends they could make pie or they could make they can make apple pie they can make pear pie if there's such a thing or they could make I don't know0:50:32
what else those apples in it crumble right so the recipes of first class this is first class right you can you can0:50:41
compose these recipes and later apply them to the collection sources that's what you want that's why we use functional programming languages right so I'm just going to beat up a little0:50:50
bit here on reduce combine versus Map Reduce in particular one of the things that struck me about guys talk was has0:50:59
examples of filter parallel filter and parallel filter is returning all these empty collections as collection izing things and and you just scan it you're0:51:09
gonna see that and you're gonna count of that everywhere I think it's not a small thing that this library does not do that I think that's what makes it categorically different from a lot of0:51:19
what you might see and it makes it very fast it means you don't need sufficiently smart compilers or things to like make stuff go away or stream0:51:28
fusion or you know other really amazing magic it's really quite simple but the thing is that's it that could just be an0:51:37
implementation detail the biggest thing is you only write the purple part right what else lets you only write the purple part from before because that's really0:51:46
the key win when you want to write one of these things you write the essence of the job you are completely isolated from everything else right so we don't have0:51:55
any collection if ocation earlier about the fact that you don't want an identity value you want an identity value creating function that0:52:04
will let you create identity values that are mutable which is going to be important if you want to have very high-performance insides right because we don't care right inside this job if0:52:13
we you know bang on ArrayList and end up with an immutable thing at the end that's a win right we don't need to be all religious inside there about0:52:22
functional programming we need to produce a result in a functional manner which means we took inputs we didn't affect the outside world we gave you an output and we're going to give you the0:52:31
same answer every time you can use arrays inside the middle of that that's okay but if you want to use the Rays inside the middle of that you can't pass0:52:40
the same array to every step of the reducing job you're going to need to manufacture arrays for each step of the reducing job and the other thing is just the granularity difference is there what0:52:52
happens when you when you when you use Map Reduce style is you end up with jobs in the intermediate steps of the computation if I say fold a collection0:53:01
that I've mapped and reduced I mean that I've mapped and filtered did they produce extra work for the parallelism job no they're gone they're gone before0:53:14
the parallelism job even starts all they did was transform the reducing function at the leaf right if you have a map space system where the bottom job is0:53:23
creating empty collections that's going to create work for what function the combining function right the combining functions gonna have to undo all that0:53:32
stuff can candy and empty collections on collection if I stuff flatten all and in every time you concatenate those things together that's more work and then the0:53:41
compiler guys are gonna be like oh my god we got to make this disappear you don't have to do that if you choose reduce at the bottom okay so in summary0:53:51
the the idea is to revisit the ideas of map and filter as reducers which are logical collections that are themselves0:54:00
reducible they're defined in terms of making a transformer of a reducing function to another reducing function0:54:09
they're now completely independent of the collections right that mapping doesn't know about collections it has nothing about collections it's it's it's just the essence of the work and a step0:54:19
with no notion of where that step occurs so we they have no collection dependence and no order dependence they do not know what part they don't know that the part0:54:28
of any particular kind of algorithm it's happening sequentially or in parallel they don't know right if you fold itself0:54:37
is parallel all these operations are inherently parallel because that that's orthogonal now parallelism is not a characteristic of map parallelism is0:54:46
also not a characteristic of the collection right if you can if the collection didn't think of this and you can define the protocol fold for it you've now added parallelism to that0:54:55
collection so this works on the existing closure collections just as they stand right and it works with existing operations so you don't have parallel0:55:05
collections you don't have parallel operations I'm not really sure I really believe in allow those things you'll build parallel algorithms around this but these fundamental things are0:55:14
independent of order and that's where the power comes from so you want fold collections and reducers all independent and then you can compose them simply and0:55:24
that's the idea I just want to show you how this works so we're gonna just load up this library and we'll make a big0:55:34
collection of numbers and so this first call is as first of the three calls here I'm here is is ordinary closure code it0:55:46
calls the lazy sequence function so it's going to filter out the even numbers from the collection that's gonna map increments over those and finally produce a sum so we'll call that of0:55:56
course this guy is the first guy to run so he's gonna get penalized with some compiling time and we'll run him a bunch of time so we get good speed there we go0:56:06
so that's that's roughly where he's at this next guy second one right we're doing the same job over the same collection did not touch the collection nothing behind my0:56:16
back right but we're now going to switch to the reducer versions of filter and math we're using the same reduced function right0:56:26
this is Legos take your pieces put them together however you want so we're filtering with the reducer and mapping with reduce which means those jobs are not actually happening they're just building a recipe and then we're going0:56:36
to reduce with ordinary reduce so let that warm up and there you go so that's faster that's just eliminating the allocation per step overhead of laziness0:56:45
and finally we're going to now just have switched in this last one so from reduced to fold again fold is logically0:56:54
the same job in fact when fold can't paralyze the operation it just falls back to reduced because the same logical operation oh I should have said if you0:57:03
don't supply a combining function the reducing function is the combining function because you notice this call to fold doesn't have a separate combining function it just has plus means the plus0:57:13
is being used for the combining function and the reducing function which makes sense right we can add all at the leaves and then add the results of the leaves so we just use and that's that's very0:57:23
convenient right so folds can be used exactly as a replacement for reduce if you don't have a hybrid job or you're trying to do something fancy so we just swap four fold here and boom there we go0:57:36
that's pretty easy so that's it thanks [Applause]0:00:00
Clojure core async - Rich Hickey
0:00:00
I'm gonna talk about closure Cory sink I'll try to I really want to talk about0:00:09
the motivation behind it and how you should think about it but I will talk enough about the details so you get a sense of what the API is like but it's not fundamentally about how to use it or0:00:19
the code so what problems are we trying to solve the first problem we're trying to solve is the fact that function chains make poor machines and if you've0:00:30
heard me speak before about how objects are were are like little machines and they're not very good for doing logic0:00:39
functions are like little use you know units of logic that are bad for making machines and it ends up that because we're moving to this world where people0:00:49
are trying to be more reactive we have a ton of callback api s on our hands and we tend to connect to them with chains of function calls and we're in a0:00:59
situation where we actually need a program that's more like a machine we're trying to use tools that are for functional programming to do that and it's a bad fit so how do we fix this0:01:11
so the idea behind behind this and it's something that I've said often is that good programs should be made out of processes and queues you know at a0:01:21
higher level you want to move to queues and you want to move to queues in particular as soon as you're involved in any kind of conveyance within your application because at that point it's0:01:31
no longer nested logic you're trying to move something from one place to another maybe from some input through processes in your system that perform calculations0:01:40
or transformations and eventually out somewhere else that's a one-way street so we want convenience to become first-class and we want to organize for programs0:01:50
this way and we could always do this right if we're on the JVM it already has queues often people have said you know why doesn't closure wrap queues and it's because you know there's already queues0:01:59
and they're perfectly fine and as if they're not perfectly fine and we're gonna make them a little bit better with this potentially but they're there and they work but they do have some some problems one is that the only way you0:02:10
can coordinate with the queue with the java.util concurrent queue in other words to wait for data to be present is to take an actual thread a real0:02:19
thread and and block on the cue and that has a cost we're gonna talk about in a second also if we try to look at the0:02:28
whole scope of platforms that closure addresses which also include JavaScript through closure script there are no real threads there and there are no queues0:02:38
there of any real sort so using Hughes directly looking for queues and from the host is not necessarily something we can always do even when we can do it there are0:02:49
overheads associated with queues and people make a lot of this and in particular in JVM there's a definite overhead per thread I'm sorry not with0:02:59
cues associated with threads which is the stack size you know every thread has a pretty big stack if you tried to have hundreds or tens or hundreds of thousands of threads you'd consuming a0:03:09
huge amount of memory there's also wakeup time and other you know overheads associated with threads people often talk about that as if it was the scalability things scalability is not0:03:18
about what happens inside one machine it's about on being able to add machines or add resources to make to make things scale but it is an efficiency problem0:03:27
right are we making the best use of the machine by using a thread per connection oftentimes we're not so we'd like an alternative to that so the the API du0:03:42
jour is events and callbacks right this is all back well the kick from a long time ago from UI development is now like the way to do everything0:03:51
we have listenable futures and promises and callback handlers and async API is for every kind of RPC and and invariably0:04:00
they expose themselves as these you know listenable future or some sort of a callback and what we end up doing is we put some logic into each of these0:04:09
callbacks to sew on a button click or on a message coming over the pipe or on or whatever do this stuff and so the stuff is a little piece of logic that we hook up there and there are lots of these0:04:19
sources of events and we end up with lots of pieces of logic and we end up with this giant web of these direct call relationships a lot0:04:29
like the kinds of webs we create an object-oriented programs and it's similarly they're very difficult to reason about or to control flow and everybody understands this phrase0:04:39
callback hell we're going to really look more carefully at what that means and what it what it's about but fundamentally I think it's about having to break your logic up into little pieces so that those pieces of logic can0:04:49
live inside handlers when those pieces are part of a design or way of thinking about a state machine or a way to approach the problem that was all of a0:04:58
piece and the fact that you divided it up has nothing to do with the way you want to think about the problem and everything to do with the artifact of this mechanism for addressing it and it0:05:09
makes everything difficult it's difficult to see inside these things to see which callbacks call which handlers to monitor them you know on what thread are the callbacks going to be run at0:05:19
cetera etc etc and there have been very you know various approaches to try to mitigate some of this with observables0:05:28
and rx and things like that but they only handle a very narrow set of cases mostly you know filtering or making a stream like kind of approach to0:05:37
composable transformations on a single event chain but if if you really are trying to make a state machine that has multiple sources and sinks of events you0:05:48
can't just get it out of something like you know filter and map composition primitives so what does that look like the problem looks like this essentially0:05:57
I mean I don't care if in and out are cues or external API calls or sockets or button clicks and a browser or whatever0:06:07
it's the salts or generically stuff comes from somewhere else that's input it's outside of your program and somehow it's going to show up and say whew pay attention to me0:06:16
right you have some logic associated with that which you're going to put in a handler for that and eventually that0:06:25
logic doesn't have to produce something and and move it along so at the top part here what can I work on my mouse over this no no you don't see that okay maybe0:06:36
I could point you'll see that either okay up up above the in to the logic and out those are0:06:45
all function calls right the the input thing is on something you know call this function and inside that function you eventually call you know print or send0:06:55
it out or update the Dom or something like that and this this box is not a whole box because the way you thought about this job may have been a state0:07:04
machine maybe you're trying to implement you know some algorithm you're trying to implement some sort of a state machine and and in fact it involves multiple0:07:13
inputs and multiple outputs but because you have to break it up into these pieces you have to put these small fragmented things and the problem is if you really did have a state machine0:07:22
model you had some logic up here that may or may not need to do something in particular depending on something that0:07:31
happened in the logic down below and how are you going to coordinate those two things because they're both getting separate event streams and they're both running on each event and the way you do0:07:40
that invariably is having to introduce some shared state and we all know the kind of party that leads to and so objects don't fix this right I've just0:07:51
just put this in a blue oval that's about that's all that they do there's nothing nothing fundamental about what0:08:00
we just talked about is changed by putting in an object okay the objects are like marionettes right where you know anybody can pull any string at any time yeah and you're not going to get an0:08:11
episode of the thunderbirds add to that so so there are interesting things out there you know this happens all the time well what's the problem the problem is0:08:20
we had to take what would have been a single thread of control we had to break it up into pieces that we could run it on callback handlers and you know some people have done work in this area already C sharp and F sharp have async0:08:33
primitives not necessarily oriented around callbacks but they do something they are solving the problem of the what I would call reversion of control right0:08:42
which is that you want to call some asynchronous operation you want to continue to use your thread or give your thread back to a pool and if0:08:51
when whenever you were waiting whatever you were waiting for it comes back somehow resumed control and so the way c-sharp style async works is that it0:09:00
takes code that looks linear but that calls asynchronous are pcs for instance and will turn the code in place into a0:09:10
state machine and actually register that state machine on the callback handler so your code looks ordinary and linear and this inversion of control is happening under the hood so it's an interesting0:09:21
idea to to copy this I don't know who saw Philips Scala async talked earlier good so so it's not a bad idea to to0:09:33
think about copying this and Scala did that and I saw Phillips talk or in the spring I guess and I was like that's not a bad idea we could just copy that0:09:43
enclosure and it would look like something like this and this is not what we ended up doing but it shows you the idea right so I put up top you have traditional blocking code right you0:09:52
there's some future the future is blocking when you dear F it you're waiting you're wasting your time and you're wasting resources in the second0:10:03
place you use something like callback so now you have a a listenable future and you'd say on completion in the future you know do this job and it looks kind0:10:13
of straightforward here but if you were in the middle of a loop or something else or you had multiple things you wanted to wait on this would get really nasty trying to fabricate what the0:10:22
callback handler would be and have a capture the context of a linear looking program segment we'd be very difficult so what c-sharp style async does is it0:10:34
says we'll just put that code in some sort of a special contract we'll call async and then instead of registering a handler with the callback or actually0:10:43
blocking on it you'll make a call to a weight which semantically is blocking it's like the first it's like the first code here but some magic and in the case0:10:54
of c-sharp is some compiler magic is going to go and look at this block of code and say okay I see that you're trying to do an asynchronous thing0:11:03
we analyzed this block of code let me turn it into a state machine that actually can be called back let me in RIT register that state machine as the0:11:13
callback handler for that future and relinquish the thread so the thread is no longer running and what happens is the code that you want to have be the0:11:23
continuation if you will of of that future is stored away in an in a blob of a state machine that will eventually be0:11:33
called upon to run on us on a thread from thread pool and that allows you to share and it's a it's really a great approach to this it makes for linear0:11:42
code and allows you to use a machine efficiently because you can share this thread pool you can have lots of these pending you know future actions but0:11:51
notice in particular this is really RPC and that's sort of the problem you know I walked away from the talk saying that would be a cool thing to copy and then like after a weekend I was like you know0:12:03
because the promises and futures it's not actually the first problem I talked about right the first problem I talked about was true asynchrony all right somebody clicks on a button or something comes over a socket your code0:12:14
didn't initiate that the same way the code we saw on the previous slide did that's code that says go and do this and then when it's done you know come back and I want to relinquish the thread in0:12:24
the meantime these things are actually happening asynchronous so promises and futures are sort of lightweight constructs you know they're one-night stands right they're just hand offs or0:12:34
call and return scenarios they can't really model you know enduring connections and so they're not actually helpful for this so it's sugar I mean0:12:43
it's it's good sugar but I felt like we should put it on a better cake so what do we do again I think that people who0:12:54
are writing production server programs already use queues and this is the way most large server things are decomposed they aren't a giant you know web of0:13:04
direct calls there's queues invariably inside inside everything and they have a lot of great properties right they do couple producers and consumers right who's0:13:13
putting the things I want into this who cares who's on the other end nobody cares right they don't know about each other it's somewhere down there someone's gonna take this thing right there first class you know there's that0:13:23
cue there's that conveyor belt right there you can go you can kick it touch it you can talk about it name it bump into it you know it's it's out it's outside they're enduring right the0:13:35
person who's on this end putting boxes on it takes a break somebody else comes up and like person on the other end is sick today and somebody else comes out with this conveyor belt is still there0:13:44
it's there in spite of people coming and going potentially because it's first-class and external and a thing it's it's an easier0:13:53
place to hook on some monitoring and supports multiple readers and writers so of course there's nothing new to invent it's just a matter of finding stuff so the thing to be found in this0:14:05
case is CSP which goes back to Tony hoards work in the in the 70s sort of like the parallel track as actors in0:14:14
their CSP you know so pick pick your fight so we're gonna we're gonna take this side with with Korey sink and bet0:14:23
on CSP as many others have and it has some very interesting characteristics maybe not always in the not all in the first paper but over time CSP has come to represent first-class channels as0:14:34
being the only way that independent processes interact with one another there's no shared state except for these channels the semantics are blocking by0:14:43
default which means that you can also use them for coordination which is quite interesting so it's not just a primitive for conveyance it can be a primitive that allows you to say you know stay0:14:53
here until somebody's ready for me or vice versa they can also be buffered and then therefore allow for some a synchrony0:15:02
between the producers and consumers and you know there's a long set of history behind this ah cam and implementations0:15:11
for Java and of course the most recent renditions that sort of makes this first class in a central point of languages the go language so we're we owe0:15:20
everything about what you're going to see to all of this prior work but there are lots of cool attributes to this right the multi reader multi writer is important you can pass end points0:15:30
around those first class things the other cool thing about CSP and channels is that they support choice by a0:15:40
mechanism called either select or Alt depending on what language or what paper you're reading and the purpose there is that you can wait on one or more of a0:15:49
set of i/o operations on channels those includes both writes reads and timeouts in addition through time and in the0:15:59
academic literature and work there is formalisms and proofs and things like that that can be used to help you reason formally about systems built in this0:16:09
style there's nothing in that area yet done for a fork or a sink so there are already people that have done this there's a very nice implementation of0:16:18
CSP for Java called Java CSP people have tried stuff in Scala but both of these implementations are tied to real threads they basically use real threads as0:16:27
processes one to one and we want to do something else for closure in particular we want to try to create an implementation of this this approach0:16:37
that works both for closure and closure script and that supports both the problem of I want to use real threads and real blocking which still has0:16:46
utility if you do not have an arbitrary number of connections you need to support if you have a small server with a finite number of processes it can be0:16:55
more efficient and have higher throughput to use genuine threads and real blocking then to force everything to go through the mechanism of a thread pool but if you are trying to target0:17:06
arbitrary connection counts or work on a platform that doesn't have threads at all which would be the JavaScript engine you need something else and so the idea0:17:16
here if there's any new idea in core async is just to implement this technology on platforms that weren't designed for it like if you look at go0:17:25
it has a runtime engine that's oriented around this and to get the more efficient use of threads and thread0:17:34
pools by leveraging the kind of immersion of control technology that was used in c-sharp async right because that's what you need to take code that should logically be0:17:44
blocking and instead turn it into data and eventually consume thread from a thread pool so that's the that's the idea0:17:53
so if we can do this this is really great right because we can we can deal with traditional threaded apps with low thread counts we can approach you know0:18:03
designs for high connection count servers that are efficient you can deal with a vented server api's like those provided by node and virt and you know0:18:12
we can finally try to make some sanity in browser development not targeted currently by corey sync is anything to do with extending this over the network0:18:23
all right so Corey's sink itself what is it it's a library it's it's about fundamentally coordinating between processes using0:18:32
channels but the process is you can think of as independent threads of activity they may be real threads or they may be these inversion of control threads where you write code that looks0:18:41
like it's doing X to Y Jersey in a row and blocking in the middle and that process is turned into data whenever the0:18:51
blocking occurs and the real threat is relinquished so we'll call those inversion of control threads and we'll know threads like this but logically there are threads and the semantics are0:19:00
the same we want the semantics of threads and the semantics of blocking in all cases because it allows us to write linear code and then we'll connect these0:19:09
things with these Q like channels and target both platforms so the first thing you need to be able to do is create a thread or create a process and there's0:19:18
two ways to do it there's this read call which is a lot like the future call except this you know could happen to0:19:27
different thread pools and it's clear about what you're doing when you say thread you get an actual real thread and when you use the api's within this0:19:36
thread you get real blocking so it's not quite as you can't create as many of these arbitrarily as you could with the next ones but when you want real threads0:19:46
and you know what you're doing you can get high throughput with something like this and then we have go and go is the same kind of thing it creates a logic thread it runs the block inside of it in0:19:57
this mode whereby if you issue any blocking calls this mechanism will come into play create an inversion of control state machine out of your code and and0:20:08
register on the handler for the asynchronous callback so we're going to talk about that as a thread we're blocking we're gonna call parking which0:20:19
essentially relinquishes the thread and takes the code and parks it somewhere for its resumption later all right so channels themselves what0:20:28
are they don't like queues right a queue you put something on one end you take stuff off the other end that's it that's what they're about put stuff in one end take stuff out the other end you can have multiple writers you can have0:20:38
multiple readers fundamentally channels are blocking like with that if you don't touch them and or anything else they're blocking that means that if somebody comes in to read and no one's written0:20:48
anything it waits if somebody comes in to write and no one's waiting to read the writer Waits so they block on both sides they can be unbounded we also0:20:59
support some fixed buffering so I'll talk about that in a second so the API is pretty straightforward you create a channel with Chan Chan with no arguments creates an unbuffered Channel which is0:21:09
essentially synchronous rendezvous you can say Chan within which gives you a buffer of that size or Chan with a buffer and there's a couple of API calls it allowed to create buffers and we can0:21:20
put with put that's what we we read that write our bang all right angle bang and that's the parking barrier all right0:21:30
that's the one that will Park the thread it must be used within a go block the other variant is the double bang and you can read those as blocking put blocking and all those flavors and this will go0:21:42
throughout the API if there's a double bang it's a blocking call you read it as blob blocking so put blocking and none of those are available on closure script0:21:51
on JavaScript because there are no threads to block similarly there's take and take blocking and close what's really useful about this if you're using it on the JVM is that you can mix and0:22:00
match these so you can have a bunch of go threads and you can have actual real threads so you can make blocking calls and parking calls on same channels on different ends of the0:22:09
same channels that all fully interoperates which can be extremely useful so the buffers themselves again normally we run buffered or by default were0:22:20
unbuffered and that acts as a rendezvous like I said readers wait for writers and vice versa if you supply a fixed buffer use in up some value then rights will0:22:31
succeed right away until the buffer is full and then the rights will block you know block like this and then there are two flavors of buffers that will never0:22:40
block a writer because they've incorporated a policy about what to do when they're full and one is a sliding buffer which effectively walks across0:22:50
and drops the newest stuff and a dropping buffer which is if you come to it with new stuff and it's full already it just drops whatever you send and they0:22:59
will never block the provider what we will not do not provide and will not provide our unbounded buffers this is just a recipe for a broken program looks0:23:09
like I don't feel like seeing this bug till later so I'll make an unbounded buffer I'm not going to help you do that we've already probably rejected the0:23:18
patch request a couple of times and we'll keep doing it I mean just hopefully people will get tired of submitting it just don't do it I mean you what's beautiful about this0:23:28
is that you can establish a policy eventually we'll have a recipe for providing other policies with more sophistication but you know make a0:23:37
decision you need to make a decision here you can't just have stuff piling up all around in memory and not be thinking about it so it's it's a value proposition of of CSP and of Cori sink0:23:50
that you're thinking about these things and making choices so the other cool thing extremely cool thing about CSP and0:23:59
about channels is the fact that they offer a choice so it's quite frequent that you want to make you want to put a0:24:08
state machine in particular in a state where it's waiting for one or more possible activities or operations to succeed0:24:18
and that can be a really difficult thing to do that's something for instance that none of the Java Q primitives support at all Windows has some nice primitives for0:24:28
weight on multiple things but Java does not obviously sockets do have select them whatever so alt and the two flavors0:24:38
of alt alts are our functions that take a set of operations and operation is one of two things if it's just a channel it0:24:47
means try to read something from this channel if it's a vector with a channel and a value it means try to put this on the channel and alts will return0:24:58
whenever one of those things becomes ready to complete and by default it will be random there's also a priority option that will let you say try them in this0:25:08
order what's really important about the choice is that one in only one operation will complete so you're gonna ask for you know let me know in any one of these0:25:17
things that's happened but only one of them will happen it will be as if all the other pending requests you made were cancelled and you just got the one thing and you get a return value which0:25:26
indicates in the case of take it's just the value and the in the case of take where you had multiple channels it's going to tell you which channel actually0:25:35
succeeded and what the value was in the cases of put it was gonna be like okay that worked that channel put worked and it is a macro that combines this0:25:44
function with with cond you know with that with the logical branching so basically you say alt bang which is not a function now this is a macro and you0:25:54
supply one or more sets of operations you want to attempt right the whole set of operations is going to be tried sort of in parallel and the first thing0:26:04
that's available to complete will be the result of the condition so we have I think I'm arrows here let's see so have the operations themselves the first two0:26:14
are reads try to read from C or T or try to read from X or try to put this Val two out and or if none of them are ready0:26:23
right now just return 42 then we have expressions right which would be the result if CRT returns a value then we want to grab0:26:34
that value and call it Val and grab the channel that actually succeeded because it could be C or T and call that CH and then we'll call foo with those two things so we have a binding and then0:26:44
some expressions so this is a you know some macro that is an expression it's not got all the you know gook of go with statements and it will return one of the0:26:55
values so you're going to try multiple operations and you can have a default if none of them are ready otherwise you will block waiting for one of them to succeed you can get the value that you0:27:05
know you desire as a result of that particular operation succeeding so it's like alts plus cond so how do you do0:27:15
timeouts because a lot of times what happens is you're setting off all these blocking things but you don't want to wait forever what's really cool about go is that they decided to make timeouts0:27:24
channels themselves and that's a really good idea so well worth copying so you just create a timeout by calling timeout with the number of milliseconds you get a channel that closes after that number0:27:34
of milliseconds so you can put that into an alts and what will happen is either you're gonna get something from the thing that's not a timeout well the timeout will close and that will cause0:27:44
your alts to have a completed operation which is the closing of the of the timer Channel but the cool thing about it is that timeout now is a real thing as opposed to how many people like putting0:27:54
timeouts and API calls yeah that stinks right especially if you tried to coordinate multiple things or you know just not sure how long to wait or you0:28:03
trying to do stuff in a loop where you have to keep trying things but you're trying to have an overall counter go down so you have to keep recalculating that timeout it's smaller smaller smaller smaller now you can just create0:28:13
a timeout once and say I'm gonna try all this stuff in a loop for three seconds so you create a three-second timeout you keep passing that same thing into every API call the same one because it's0:28:23
either gonna complete or not and you just reuse it so you can share them which is very powerful so obviously if you know go this is very similar it's0:28:32
it's like--go but the things that are different are all of the operations everything I've shown you they're all expressions it's for a functional programming language there's no statements that you don't need to have0:28:41
States to start interoperating with channels because channels are a state mechanism already it's a library it's not a language feature it uses macros to0:28:50
do what it does and the macros are quite interesting and we'll have to have other talks about those but the go macro is the thing that inverses control and sets0:28:59
up the state machines for you the other thing that's interesting is that alts is a function you can map it it's very attic and supports a runtime variable number of operations you can't say that0:29:08
with any constructing go because they're all statements and we support priority so so how do we integrate this into our0:29:17
world right because we're not going to get everybody to change their API is to use channels right they already use callbacks all right you know I'll redo and the answer is no right the the trick0:29:27
is as soon as you're in a handler and something comes in the handler just put it in a channel and that's it you should0:29:36
be done don't put anything more there so we actually have because the operators that put like the right angle yeah the0:29:47
right angle bang put it can only be used inside go and you're not in a go block in this event handler right you're just like on mouse click that wasn't in a go0:29:56
block what do you do we have two calls put and take which can be used from outside of go blocks that are0:30:05
asynchronous and they'll just in queue the request returned right away it's sort of an entry point from the edges of your system into the channel based system so you're talking about a0:30:15
process that wasn't part you know wasn't really a CSP process has got some input you use put to introduce it into this system and so there's a similar take which0:30:24
reverts control this is way to get out for instance in JavaScript and get a callback back so speaking of JavaScript so what do we do there yeah this is the0:30:35
browser it's just all callbacks is 100% callbacks and again this is totally fine everything I'm showing you is for so everything about the goal blocks not the0:30:45
ones that not the thread ones it all works in the browser all works in closure script if you read David Nolan's post about this he's doing you know fantastic things that are exactly what0:30:55
this was designed to do right you just revert control immediately in your handler right don't put any logic in your handlers just take0:31:05
the event turn it into data put it on a channel and be done with it and that will let you put your world right side right side up so it's just a way to0:31:15
restore the separation of concerns right instead of fragmenting your logic and building all these little pieces everywhere you keep your logic together you use these bridges from channels to0:31:25
get stuff to wrap through so the model essentially looks like this right in contrast to the first one that you're going to just take inputs and put them on channels you're gonna have your0:31:35
logical together you can consume from multiple channels that can put on multiple channels you can coordinate with other logic and other instances of the logic because it's it's many to many0:31:45
but it's it's all together and there puts it on channels and eventually those reach reach the outside world and so I0:31:56
the the the super critical thing is this part of the talk okay because we have two solutions here that on the tin right0:32:07
they're the same these let you efficiently do asynchronous programming it's like greatest feature awesome dude let's go do it0:32:16
but they're really really different I think of characteristics are completely different so it's a battle0:32:25
direct versus indirect right so how many differences can you see between these things except for some colors it looks like good you know it's looking more of0:32:34
a hassle right I have five things down here I had four things up there always a bad measure so well I'm geologic in the0:32:43
direct system it's split up into tiny pieces and spread out through callback handlers in an indirect system it's coherent right it's all together it's linear right what happens on your0:32:53
callback it directly calls your logic which struck that calls the output it's all this big chain of fire fire fire unless you manually introduce some more asynchrony right it's synchronous right0:33:04
channels can be synchronous or not right we can set them up with with no buffering and actually cause workflow and back pressure wait or we can add buffering and get some0:33:13
more asynchronous asynchrony we still have have choices in the logic itself right what's the arity of the call back to the0:33:22
colleague that's built in it's like you've what you gave them a piece of code to call you they call that code it's the one to one button click calls the thing on socket calls or whatever0:33:31
write this end-to-end here right with a synchrony how many people can feed this channel as many as you want how many people can consume it as many as you want so you can share the work you can have0:33:41
multiple sources you can do routing you can create these relationships dynamically you can pass the channel to talk to dynamically this other stuff has got code I made code and the code has0:33:51
got like what to call up above right so that's all implicit where am I going from here it's like inside this piece of0:34:00
code where am I going from here I don't know I mean there's going to be some code that wires the stuff together which I'm going to be able to see and think about and cooperate with and possibly0:34:10
manage dynamically something you can't possibly do on the other and the other scenario where is the shared state well sort of an internal implementation0:34:19
detail of you know whatever you had to do to try to make your state machine cross multiple callback handlers which is usually just a real incredible mess right where is it here it's external0:34:29
right it's not like there's no state right there is state there of course there's state it's a big machine right it moves stuff from there out the other side right but in the top we're making0:34:40
function calls but how do we know we're using function calls wrong up top right we functions calling functions calling functions there's nothing wrong right0:34:49
that's a call chain we do that all the time in functional programming but what else do we also do we pay attention to the return values right function calling0:34:58
function calling function we're getting the return using and getting the return using again the return using it it's fine that's function composition to have a stack of functions serve an0:35:07
algorithmic purpose that's not what we're doing up top right we're shoveling stuff across functions we're ignoring the return values and trying to spew stuff out into the real world at the0:35:16
other end so having having the stay P external is totally fine it's now clear that there is state and I think this is very very0:35:26
interesting right these states are different right the shared state that you have between two pieces of logic in order for them to communicate with each other is going to be placed state it's0:35:36
going to be I put something in there you know I squirreled away some acorns so that either when I come back or this other handler comes in they know that I'm in the middle of doing this and they should not do that or you know we got0:35:47
six of these and we need four more before we can proceed there there's going to be the shared state which is a set of places that you're using to to0:35:56
update things but I think what's really cool about first-class channels is that they expose a subset of state and it's a0:36:05
very interesting and far safer one which I'll call flow state right which is just about all you can do with this kind of0:36:14
state this put stuff in it and then you really don't know anything more about it or you can take stuff out of it and you really don't know how it got there and the analogy I would make is they let's0:36:24
say you work at a factory right and you get into the factory and you have your jacket right you guys do something with your jacket would you rather hang it on a coat hook or put it on a conveyor belt that's0:36:36
moving or dump it down a laundry chute or put it in the back of the UPS truck that's about to leave well it's quite0:36:46
there's a difference right those are all places to put your jacket right but what is it about the coat hook right you expect to go back there at the end of the day and see your jackets still there0:36:56
of course if there's only a few coat hooks and there's a lot of jackets you have this contention problem right because it's a place but you would never put your jacket on any of those other0:37:05
kinds of places states because you know they're good they're flow they're gonna go away it's gonna go on that conveyor belt it's gonna do you know it's gonna0:37:14
go somewhere else it's gonna go down that chute and you can't recover it the UPS trucks gonna drive away and that's good because you can't possibly have0:37:23
your life depend on going back and seeing the same thing there again so it's not really like state away shared state and place oriented status flow0:37:32
state is much simpler it's much easier to read a reason about and much safer because you do not have this expectation of coming back all0:37:42
right and then in any case if you're running any of these things in in an environment in which that not only is there you know cooperative concurrency but there's actual real simultaneous0:37:53
concurrency then any kind of state whether it's a channel or the shared state there it's gonna have to have some coordination right so we don't all step on each other but in the first case0:38:03
whose problem is it your problem right that coordination getting that thread safe is your problem in this problem in0:38:13
the in the in the bottom case it's my problem right it's the core async library problem and once we get it correct I think we've already gotten it0:38:22
correct but it's correct for everybody right its share its correctness is shared by everyone so you know which do you want also the logic right when do0:38:35
you when do you decide to handle an event that comes through a callback handler you don't decide right you're like you're just passive you you get0:38:44
called whenever you get called right when do you have it when you have channels and and alt whenever you choose right if you don't want to go read that0:38:53
channel right now don't do it right if you think it's more important to read this other channel or to write to these channels or to wait for 10 minutes or to go ask somebody you're not a passive a0:39:05
passive recipient of control being driven from an external event unless you've chosen to do that right so I0:39:14
think the problem I labeled first is the one I want to call out now right this first thing is using code as a cue right call to call to call to call when we0:39:26
should always be using data again it's one of these things we never make this mistake when we go over wires we never make this mistake when we go over wires0:39:35
we don't the reason is because we can wires don't let us make function calls right we always have to turn it into data put it in a buffer and queue it0:39:44
shovel it over or DQ it do whatever maybe we'll map it again to a function call and we'll recreate our PC but there's no you see wires don't do RPC they're not code they don't have entry points0:39:54
you can't invoke them so we never do this then but it's a good example of another thing that when we bring it inside we make this mistake and when0:40:04
we're shoveling stuff around inside our programs it's not different than when we're shoveling around between boxes it's not different shouldn't be shouldn't be a different way to think about it so there's a sense in which0:40:15
this the direct callback approaches is intimacy right it's it's making pieces aware of the other pieces right the callback callback event knows who to0:40:26
call the caller knows where it goes next and it's all code and and there's a sense in which using channels and this indirection is ignorance and we all know0:40:36
that ignorance is bliss and it ends up that in systems it's also it's also bliss and at least in systems intimacy0:40:49
is pain this is where the pain comes from this is why this is why it's painful so here's a little bit of code just so I don't just show ideas on the0:40:59
screen and get complained about so code this is a great example from Rob Pike's Rob hikes examples for go and it's just nice because it shows everything I'm not0:41:09
showing the supportive code the basic idea here is that you've got a job you're trying to do this search you need to come up with the web and image and a video result for it you have a couple of0:41:18
possible sources of these answers they may reply at different speeds you want to bound the amount of time that you're going to take to do this entire job to0:41:27
80 milliseconds you want to try to get everybody to do the job and take the first results you get and get out so the function fastest takes a query0:41:36
and and you know two sources write to URI endpoints that it can call it's going to create a channel it's going to0:41:46
set off go processes to go and attempt to do that RPC with each of those web servers and get an answer and and each0:41:55
of those callbacks are going to go and take the answer they get and put it on that channel and then fastest is going to return the channel so essentially you can read this0:42:04
as set off a set off two processes racing to put an answer on a channel and return the channel and you do that similarly for the images and similarly0:42:13
for the video so we end up with three channel results right and we're gonna set off asynchronous processes that are gonna go and say read from that channel then put it on this shared Channel so0:42:24
and it's gonna do one right so that read and then put back that's the RPC kind of thing and there's no you can build right that c-sharp style async RPC stuff out0:42:34
of channel read and channel right like this and it's a common idiom right so it's gonna read each one and put it on there and then we have a loop and the cool thing is that this loop it doesn't0:42:44
know where this stuff comes from it's just been told there are going to be three answers on this channel don't spend more than 80 milliseconds trying to read them and that's what it does it0:42:54
goes through and it makes an alt call and it says try to read from the channel or timeout and take the value and just add it to the vector so the spectra is0:43:03
going to return up two possibly three results or Nils if no results are available but it's going to be done in 80 milliseconds no matter what given0:43:13
that spec for this is the job you want to do that would be really hard to write without stuff like this so what do you0:43:22
get from using this kind of technology you get a separation of concerns you get it back because you've lost it you get coherent and linear logic right and you move away from mutation you know if you0:43:32
have a state machine inside your process you can implement that using recursion instead of having to create internal state with traditional measures you get0:43:41
coordination possible you can get back pressure with this which is quite useful and difficult to get otherwise you can dynamically reconfigure these networks0:43:51
and you can use your thread pools and your thread resources efficiently so I just like to thank the guys has helped me work on it and particularly Timothy0:44:00
Baldrige did the go macro inversion of control stuff it's really cool and when he talks about it at some conference in the future make sure you catch it so0:44:09
this is where you can get the code and try it out it's on github the stocks there there's now a maven artifact for it and there's a blog post that describes it more but that's it0:44:20
thanks [Applause]0:00:00
Core async Communicating Sequential Processes using Channels, in Clojure - Rich Hickey
0:00:00
I'd like to thank everybody for coming I'm going to talk about closure core async which is an an implementation of0:00:11
CSP style channels for closure and along the way actually just talk more about channels and their use for solving these0:00:22
kinds of problems and contrast them with some other solutions so the first thing is to enumerate what the problems are that we're trying to address whenever we0:00:31
do anything and in this case I think there are two which may not make sense from looking at the slide as it stands but hopefully we'll as we go forward and0:00:41
the first is that function call chains make for poor machines and recently and during the talk earlier I tried to0:00:50
contrast the part of your program that's a machine from the part of your program that's information and certainly there's a there's a part of most programs that0:01:01
need to convey stuff around which is certainly a mechanical like activity and we end up with these callback api is0:01:10
often to do this stuff and that is a chain of calls that ends up doing the job of conveying things around quite0:01:19
poorly but the other problem we have is a lot of real-world api's expose the end point to something like i/o via callback0:01:28
api so it's it's something we encounter and have to address so the premises here is that our that there comes a time and0:01:38
all good programs when we need to keep things separate and we need to isolate things beyond the kinds of isolation we can do with calling and in fact we want to isolate ourselves from call sequences0:01:49
that you're going to start using queues inside your architectures to that the producer of some information and the consumer of some information know nothing about each other like completely0:01:59
nothing about each other and all the encapsulating ways that we have that involved chains of function calls fail to do that at some point and you can0:02:09
tell they fail to do it because you won't be able to move things to other places or you'll have some build dependency that's where the stuff will appear so we0:02:20
want to do is raise up conveyance the part of your application that says I have something over here it's going to produce some information and that should be the end of what it knows it's going0:02:29
to put it on the end of a conveyor belt and someone else is going to come and pick that stuff up and do the next thing with it and never the twain shall meet0:02:38
we want that to be a first-class thing and so I think what happens in most architectures whether it's in process or across processes that you start0:02:47
introducing cues because cues are a representation of this stuff so we're gonna have processes we're gonna have two cues or channels now if you're using0:02:57
Java you know we already have cues right this Java util concurrent cues and but there are a couple of problems with using them and practice one of which is that they coordinate via thread control0:03:09
in other words they block actual threads which means that you have to Park a real thread on the end of a queue in order to utilize it another problem we have is if0:03:20
we want to try this to apply some solution both on the JVM and in the browser and places where JavaScript runs we don't even have threads there so we0:03:29
need something other than this now of course if threads were free mmm that problem on the JVM wouldn't necessarily be a problem but they're not free they0:03:39
certainly have stack size associated with them which can become a problem when you have a lot of threads which is an efficiency problem for an individual server and there also are costs involved0:03:51
in waking up threads and and getting them to execute work on your behalf so that's why we can't afford to use threads often so queues are good job0:04:01
util concurrent queues don't always apply and there are specialized queues that do a much better job for very particular kinds of scenarios Martin0:04:12
will tell you all about so if we look at the situation we're in we're facing events and callbacks right and what do we see when we address the events in callbacks we often see something like0:04:21
listenable futures or promises and what happens is you you know chain these things together and you end up with a a web of direct connected relationships and it's very0:04:32
difficult to reason about control and flow in a program where all of your logic has been separated into little witness just click do this and when this message comes in do that and when this0:04:42
does do that and the big picture about what your program is about is broken up into splinters and situated in all these callback handlers so the logic is0:04:52
fragmented and the vernacular term for that is call back hell write anybody who's worked with the system has a lot of callbacks knows this is a big problem0:05:01
there are also opposite compositional problems associated with callback handlers and something like Rx and observables do help with that they do0:05:10
help you for instance build transformation pipelines that are connected to callbacks but they don't do everything so there's a lot of problems with this kind of chain of callback0:05:20
handlers and no matter what kind of wrapping you put around it the visibility of which handlers are in play monitoring that control what's0:05:29
going to run on what thread right so I'm calling you back as oh no my thread is it on your thread you know how many people have ever done something with callbacks and there was an admonition I'm gonna call you back but don't do too0:05:39
much work don't do too much work in the handling okay that's a sign of a problem so one of the problems with this is that0:05:48
we're starting to when you do this when you build a set of call chains connected together you're using that call chain as if it were machine handing off one0:05:57
function calls another function calls another function calls another function right and they're passing it along and what invariably happens is because your logic is fragmented right it's in two0:06:06
pieces one is associated with one callback handler one is associated with another and especially if there's conveyance that is to say when you're told about something happening you have0:06:16
to tell somebody else about something right so you're shoveling something through maybe with the transformation in the middle as soon as you fragmented your logic if you have any state at all0:06:26
like am i interested in this message can I accept it right now should I send who should I send it to who's a list of people who care right if there's any state associated with the0:06:35
decision-making process in your split apart logic you have to put it somewhere and use shared state to do it writes this handler goes and says look in the shared thingy and make a decision local0:06:45
for this handler or another handler says look in the shared thing and put something there and then go back so you're buying and you're forced into shared state when you do this and now of0:06:57
course people say oh well you know we have objects to encapsulate this but object you know it doesn't do anything it doesn't actually change anything about this right it just puts a blue oval around it that's all the object0:07:07
does nothing about the scenario is is any different right if you actually have multiple threads of control running through an object that object is really not in control it's not really you know0:07:17
keeping track of everything all that shared state stuff is still on you you know objects are like sort of marionettes where anybody can pull the strings at any time right so that0:07:27
doesn't usually work out that well so they're a bunch of techniques that have been used to sort of reinvent roll if you will right so the controller you would like to have said is if there's0:07:37
something interesting happening do this do that see if there's something else interesting happening bla bla bla around or around possibly with different sources of input we'd love to just go0:07:46
back to writing a blocking program because it looks nice and it's easy to understand and it co locates all the logic the problem is we can't effectively do that if we have callback0:07:55
api's and or this thread thing so c-sharp and an f-sharp both had some enhancements made at the language level that attempts to reinvent the control0:08:05
right all they do is take code that looks linear and rewrite it to be callback code but you don't see that and so what you see looks very0:08:14
straightforward you say go do this thing in an async block that takes you know an arbitrary amount of time then do this then do that and what ends up happening is your thread doesn't get blocked there0:08:24
it looks like it does but it doesn't it gets relinquished and if the thing that you were interested in completes then you continue so I saw a talk from0:08:37
the scala guys who had copied c-sharp a sink for scala and i said that looks cool we have macros we should do that it's probably a weekend project and it0:08:46
would have looked something like this right so you have your original kind of code where you say ordinary future that's it like a Java future the blocks go do something useful0:08:56
try to dear if the future at that point your thread is tied up right and then eventually the future completes then you keep going and the middle code is what0:09:07
you would do if you had callback handlers you'd say right uncomplete to something with the result and that do something with the result is the fragmented code that you know if I had to share logic with other handlers and0:09:17
or state they would become messy finally you go back with something that mimicked c-sharp async to something where you say in this eight block you know treat all0:09:27
calls to blocking or asynchronous things as if they were blocking but actually invert control and you say do something useful and then you say await this0:09:37
future and what happens is that the calling code gets turned into a state machine and parked on a callback handler0:09:46
which will resume that state machine in a thread pool thread whenever the interesting thing happens and effectively your thread is back in the pool for to do something else productive0:09:55
so that's a that's a very nice thing and and it has a lot of utility but it's it's kind of just a subset of the0:10:04
problem that you want to address right it's sort of just sugar because all that it really addresses is RPC style communication right promises and futures0:10:13
sort of a single-shot relationship between two parts of a system in an architecture right go do this use your answer or here's a here's a promise and when it gets fulfilled this the one0:10:23
thing that'll ever send to you is in that promise it's just hand off so it's hard to use to model enduring relationships and it's hard to use for external events because0:10:35
you know again a futurists are like I give you this for an answer but it an X or a little bit it's like a stream it's continuing continuing to pass you stuff0:10:44
so what we want is we want this sugar sugar was good we just want to put it on a better cake so again as I said earlier0:10:54
in the talk the answer here the thing that you'd like to have the programming model that you'd like to have is queues queues fully decoupled producers and consumers right if somebody put0:11:03
something on into this conveyor belt who's gonna pick it up they have no idea if you're picking stuff off a conveyor belt who put it on there you have no idea right I0:11:14
don't know I don't want to know I used to say that so often in a course I taught that one of the students made a shirt for me right it's a good thing from an architectural standpoint that's a good thing because it means you have0:11:25
independent decision-making so they're also a first class they're enduring right you can use them to model enduring relationships here's a channel or a cue0:11:35
put stuff on it you know all day long all week long you can make them so that they're independently monitor Bowl and you can make them so that you can have0:11:44
multiple readers or multiple writers so what's beautiful about a cue is it does this job and it doesn't do any other job right it's not like an actor where the0:11:53
logic of handling is connected to a cue and you get this one thing that's both a mailbox and a handler ooh because two things it's it it's not as0:12:02
good as one thing one thing is better than two things so we like this and in fact the logic and and way of thinking0:12:11
this way of thinking about programs is actually old Tony Hoare wrote this communicating sequential processes paper which is not exactly like what it's become but it's the basis for this way0:12:21
of thinking the idea is simply you have multiple processes and I'm not talking about operating system processes here I'm just talking about some piece of logic that's going to run independently0:12:31
of another piece of logic whether that's truly asynchronously or you're just using you know time slicing cooperative stuff like the JavaScript engine does0:12:40
doesn't actually matter because this is a pattern for organizing your program not necessarily it doesn't necessarily dictate a way of realizing it channels are first-class at0:12:51
least this is what CSP has become over the years channels are first-class so you can pass them around you can pass them as an argument you can hand somebody a channel and they can say okay I'll hang on to0:13:00
this and and I put something on it or read something from it later by default the semantics are blocking and in particular for CSP style channels0:13:09
the baseline semantics is it's a completely unbuffered Channel that is to say it's a synchronization point it's a it's a handoff point as one thread is0:13:19
gonna come in with something that they're writing will not go back until another thread comes and consumes it or vice versa and when I say thread I mean thread like0:13:29
this so you can actually use them for coordination with that semantics you can build coordination primitives on top of it and a lot of the CSP literature is based around doing that but as soon as0:13:39
you introduce buffering then you get real asynchrony all right so we can put something on it's putting a buffer they go and proceed somebody else can take it off and there's a long history of this0:13:50
akhom was one of the first languages to sort of make this a first-class part of how it worked does Java CSP which is a library approach to doing this and then0:13:59
of course go is the most recent language the sort of took this as the first class this is how this is how this kind of programming should work and I agree with them and their choice I think it is a0:14:08
good way to do this kind of kind of thing so there's a lot of nice things that also have come to grow around this notion of channels the first is that0:14:17
multiple readers and writers can be supported so that you don't have any binding you can add more readers to support work distribution you can have0:14:27
multiple writers so you can have separate authorship writers and readers can come and go like no one sort of bound up to the queue you can pass the0:14:36
endpoints around that's part of what I mean by first-class and then the other critical feature which is quite nice is there's always a construct called select or alt which is allows you to wait on0:14:46
more one or more IO operations so you can you can select or alt alternate on more than one channel like waiting for something to arrive on more than one0:14:56
channel or waiting for a right to complete and a read or a read to complete or timeout operations and this is huge right obviously in soccer0:15:07
programming we do this kind of stuff all the time on the JVM the queues and the thread stuff doesn't have anything like this on net and actually our windows they have0:15:16
long had a a long wait multiple writer who remembers what that's called they have a multi wait so a multi wait is a0:15:26
very nice thing as an organizational contractors you can again put a single piece of logic that says if any one of these things happens I'm going to proceed and deal with that and then and then go back and there's also a0:15:39
set of formalisms and and algebra is around doing analysis of programs constructed this way so you can prove that there are free of deadlocks and0:15:48
things like that there's none of that support built into quarry sink at the moment so there or there are already implementations on the JVM Java CSP0:15:58
would what would be one and communicating Scylla objects was another but both of these are tied to actual threads so they don't overcome some of the thread limitations before they allow0:16:07
this model of programming the shape of programming but they they would have difficulty using your machine efficiently so the challenge the idea0:16:18
behind this library is to try to create a channels a CSP style channels library for both closure and closure scripts as something that works in both places where closure runs where you can use the0:16:29
same calls on both platforms where you could with similar calls on the JVM get actual blocking because sometimes we're real threads and actual blocking or the0:16:40
most efficient thing that you can do or you can get this macro generated inversion of controls so it's like what the c-sharp compiler was doing where we0:16:50
have a set of macros and closure that will take your code and invert controls to take code that looks like it's saying read any one of these things and wait until it happens and turn that into make0:17:00
me a state machine and associate it with callback handlers on all these things and relinquish the thread and if any of those things happen one and only one of0:17:10
those things will be seen to have happen by that logic and the logic will be re-established on a call on a thread pool thread and will continue to run so0:17:20
it's beautiful you write code it looks like it's blocking and you get code that's actually doing all the callback work for you so this is a big deal if0:17:33
you can do it right because you can still write traditional threaded apps this way you can get higher connection counts on your JVM servers if you switch to the inversion of control system you0:17:43
can even work on invents servers and in the big the big kahuna for the clojurescript guys and people in that space is to fix the callback hell0:17:52
problem in the browser there are other ideas for using these kinds of channels on a network it's difficult actually to0:18:01
convey all the semantics of channels over a network because of the failure modes and core async does not currently contain any network channel so we're strictly talking about in process inter0:18:11
process communication one of the smaller processes are just pieces of logic that have independent lifetimes so one of the cool things about quarry sink enclosure is that it's just a library it didn't0:18:21
require any modifications to the language right you can do this with just macros they take your code they rewrite your code that's what macros do so this is a job0:18:30
for macros didn't need to touch closure to do this what you get are independent threads of activity will call them threads but they're well they're Co alignment with threads is weak and you0:18:40
get channels that behave like queues and it supports both close around the JVM and closure script so it looks like this you say thread with the body and that0:18:49
allocates a real thread and all the blocking calls in that are real blocking calls or you say go body and you get this inversion of control thread that0:18:59
uses a state machine in parking and thread pools to do the job you have channels again their queue like their multi reader multi writer they're0:19:09
fundamentally blocking their unbuffered by default or you can have fixed buffers there's no indefinitely sized or0:19:20
arbitrary buffers in quarries think we're not going to provide that because it's a recipe for a buggy program so you may have to tune your program and analyze it0:19:29
and see what's going on but the net result of that is that you can write real programs that have genuine back pressure which is a great thing as an0:19:39
architectural construct when you don't have it you're always struggling in its absence the API is pretty straightforward to create a channel they're calling Chan or you can say Chan0:19:49
10 which again gives you a fixed size buffer or you can create some buffers and what's nice about buffers well I'll talk about that in a second or you can create an explicit buffer pass that to a channel then there are0:20:00
two fundamental constructs put and take there'll be a parking version and a blocking version the parking version is one bang the parking the blocking0:20:09
version is two bangs I'm like really going to talk too much about the blocking version because that's not supported on JavaScript so the the portable code that you can write uses go0:20:18
and the single bang versions of put and take so you put a value on a channel and you take a value off a channel you can0:20:27
close a channel if you are writing a JVM program you can mix mode so a single channel can be consumed with both truly blocking code and if this go code and it0:20:40
can also be produced without a flavor of code so you can mix the modes which is very nice again because at the edge of all these things you usually have to revert to the code that didn't know0:20:50
about channels so how do you get there buffers by default are there there are none right so it's unbuffered by default0:20:59
which is just strictly a rendezvous a fixed buffer will block when it's full but the the other cool thing is that you can really incorporate policy into buffers because you could hand a buffer0:21:09
to a channel we have a couple of flavors of buffer that implement policies that would be common right for instance the sliding window buffer says if the buffer0:21:18
is nominally full at when I put something new on it get rid of the oldest thing that's on the front of it which is quite commonly what you exactly0:21:28
what you want to do of course the other flip side of that is when it's full every new thing that comes in you drop on the floor so these are the policies you take in a program where you're not0:21:38
going to say I'll just pretend this unbounded buffer is a good idea and see what happens in production where you have forced to make decisions well there you go you make the decision and you0:21:48
incorporate it in the policy that's in your buffer because we think on banner buffers are bad then we have this choice construct we chose alt for that0:21:59
so sledge allows you to wait for multiple operations so you can block on multiple puts and takes the fundamental construct underneath Altis is a function0:22:08
called alts which takes a set of operation represented as data and will wait on any one of those the critical thing here is0:22:18
that when all returns one and only one of the things that you were waiting for has happened those you've taken one0:22:27
thing off of one channel that you were trying to read from or you've succeeded in putting something but you haven't read anything so you know exactly one thing and this is a Tomic across all0:22:36
participants if more than one thing is ready you'll get it a random choice made or you can set priority which would mean if0:22:46
more than one thing is ready the thing with the highest priority it's the thing that happens but one thing happens and then alt so alts is a function that0:22:55
implements the work and alt is just a macro on top of it that allows you to write code that that that works like this so I'm not going to get too much into the code but this says try to read0:23:05
from C or T call the result Val and the channel that actually succeeded CH and then do something with that in function foo now this says wait for read on X and0:23:16
passed it to a function call and call it V and do the work of V when you pass a pair you're saying I want to output a0:23:26
value on a particular Channel and so whatever whatever operation happened the thing on the right is the result of the expression so these are the operations0:23:36
this is the binding part that's what happens if that alternative is chosen so like go we use channels to represent0:23:46
timeouts that ends up being very powerful and quite elegant you create one vise for saying timeout and certain milk number of milliseconds and what it does is just returns a channel that0:23:55
closes after that amount of time but what's cool about that is it turns a timeout which is usually an argument to every API call you make into a0:24:04
first-class thing that you can for instance reuse across a whole set of calls in other words do this for five minutes you can say make one timeout five minutes from now and put it in the0:24:13
alt of every operation you do and after five minutes have come back that thing will complete you had didn't make a gazillion calls all of which had five0:24:22
minutes well now it's five minutes less three seconds I mean who has done that with timeout code it's just not fun so this is quite clean and you can include the timeout just an0:24:32
ordinary alt you try to take from it and it will return when it closes and that allows you to share timeouts between operations which is also powerful and encapsulate the actual timeout value and0:24:42
the way it's expressed so if you're familiar with go you'll see that this has a lot of similarities to go and of course the other things that have been built with CSP over the years some of0:24:53
the differences are that all of the operations are expressions right this is closure it's a functional language we don't do statements so everything is an expression it's a library it's not a0:25:04
it's not a language feature so it didn't require the language to be built around it because there are trade-offs with that I mean hopefully go is going to be able to do what they do quite efficiently because they're oriented0:25:13
around doing it and in a library you're gonna make some trade-offs alts as I said before I showed you the macro but it's built on top of an actual function0:25:22
that's quite powerful that allows you to write code that arbitrarily at runtime waits on an arbitrary number of things like you read a configuration file it says go try to read these seven things0:25:32
if you have a language that's built this into statements there's no way to make a statement that has an arbitrary number of branches in it so it's nice to have it be a first-class function and we0:25:42
support priority so at the edges of your program you're gonna be facing callbacks anyway so is this just a waste of time it's like this is great rich but like I0:25:52
have this pile of things that all pass me futures and listenable futures and promises and where I'm in JavaScript land and everything is the callback it0:26:01
you know is this is this a lost cause and the answer is no it's really easy to bridge to that code because in your handlers all you need to do is take the0:26:10
thing that they gave you and immediately put it on a channel just stick it on challenge that point you've inverted control you said okay call back or we're done now it's in the channel system and0:26:21
everything else is going to be flipped around right-side up if you will so you you just put the values you encounter right into a channel and those put and0:26:32
take you'll see this uses the words they need not be in go blocks all right so that's your entry point to channels from code that's not otherwise in the0:26:41
code that's inverted because this code isn't inverting it's just supplying a value to a channel similarly in JavaScript especially you're going to need to get out of0:26:51
channel land right because there aren't real threads and eventually going to need somebody to say okay well do this you know effect this widget or something and so you're going to need to revert or0:27:01
rien vert control on the edges of a JavaScript program and you can use take in a similar way so take can be executed in code that's not had this inversion of0:27:11
control outside of a go block in particular so the combination of these things means that you can you can deal with the browser alright the browser is a place that's0:27:20
all callbacks all the time that's all they have it's built it's oriented around this and it ends up being the case that you know friends don't let0:27:30
friends pelagic and handlers right this is this is where the hell comes in this is how you get help so if you do what I just said you can avoid this hell0:27:39
because you don't have any logic in your handlers and your logic becomes all back in the same place so when you use closure script and Cori sink you get the separation of logic between events and0:27:49
and view and it's a very big deal I mean I don't know if anybody's read David Nolan's posts and whatnot but you0:27:58
completely change the kind of code you can write in the browser you can take things that were nasty complete messes even written by expert JavaScript0:28:07
programmers and turn them into things that are you know 1/5 the size where the the event handling code is here and updating code is there and the logic is there and it's it couldn't be possibly0:28:17
be cleaner so it fundamentally changes what you do and and and we were just having a conversation before I came up here and I think the question is you0:28:26
know if you had both would you ever choose callbacks and answers absolutely not all right there's all kinds of ways to fix callbacks and make them slightly better you would never pick that if you0:28:35
had a choice so the reason why you don't have a choice is because not every language a was either oriented towards this or has the ability to morph itself0:28:44
to work this way even sometimes but when you do you wouldn't do this so once you have channels what does your model look like well the first thing is0:28:53
is it logic gets put back together you have your logic all in one place no matter how many different kinds of input sources or places you might want to redirect stuff right because this is a this is about conveyance no matter where0:29:04
you're getting stuff from or sending it to your logic can all be in one place right because you can alternate all of your reads of all your sources together and you can alternate your rights or you0:29:13
can all straight the whole set of things that you know for instance that you're never doing more than one thing at a time maybe you have a very complex state machine we're incredibly difficult to coordinate in nineteen callback handlers0:29:23
but you put them all on the same alt you know absolutely you're not doing more than one thing at a time and it's super clean to write so your code looks like0:29:32
this so I would like to try to contrast the two things here because I think you know you'll see talks about our X and0:29:42
whatever and this like talk about duals and it's all like ooh duals are the same right they look the same duals are not0:29:52
the same dual means opposite has the same shape and the opposite meaning right the same transformations work on both but the the semantics are the0:30:02
opposite so what happens when we try to contrast direct calling right switches chains of function calls in the callback model with an indirect system that puts0:30:13
channels in the middle and you'll see everything is opposite right your logic in the first case is split up into separate handlers your logic is together0:30:24
when you when you use channels right your calls are synchronous unless you put in some extra stuff right I'm gonna0:30:33
call you are gonna call them are gonna call you're gonna call are gonna your call are gonna call bloom that's all I'm gonna happen them you don't have any real ability to spread that out unless0:30:42
you superimpose something extra whereas with what channels it's inherently async right you can choose a policy that synchronizes or you can choose a policy that doesn't but function calls call0:30:52
functions call functions you can't just magically snip that in the middle you have a one to one relationship between the providers and the callers can you0:31:02
make broadcasters but it's still I'm calling whoever's gonna get called so like I'm in charge of doing that with channels you can easily get multiple0:31:12
producers and multiple consumers right you have this implicit relationship between a callback handler and the thing it ends up calling right you can put all0:31:23
the programming and direction you want and I encapsulated it in an object and whatever but the bottom line is that is going to call you and here you have an explicit separation of concerns which0:31:35
also means that you can do explicit orchestration right I have somebody who's interested in consuming something I have somebody's producing something I have channels they're all independent and I can make a third party in charge0:31:46
of doing all that work whereas with callbacks is very difficult to do because you have to get inside the installation of things the shared state0:31:55
as I talked about before is an internal thing and whatever shared state there is because there's always some state associated with the channel or a queue0:32:04
right what who can get at the head right now and that kind of thing is external in any case it's reified outside right that shared state you got to come up with your own strategy for making sure0:32:14
you know your different handlers don't trance on each other the other thing that's interesting is that I think the state that you get with callback0:32:23
handlers is inherently a place state right so one hand was gonna say there's a new user let me put them here and another handle is gonna say let me go look there and see what was put there by0:32:33
those other handlers so this inherently place oriented notion - that the analogy I would make is you go to the you go to0:32:42
you go to work at your factory right and you have your jacket right places like I put my coat on this coat hook and what's your expectation you can go back later0:32:53
and find your coat on that coat unless somebody else said well we're out of coat hooks I'm gonna take your coat off and put mine on it you get these collisions whereas with channels I get something0:33:03
that's a subset of State right yeah things are changing it's obvious right this is moving conveyor belt the some state here but its flow state right if you came into your factory and you could0:33:13
took your coat off and you put it on the end of a conveyor belt what's your expectation be you're never going to see that code again right you don't build0:33:24
programs with flow state that expect to go and revisit state and therefore they're a lot less complex right there's still state there's still things in0:33:34
motion here there's still two machines but flow machines are less complex than place than places so I think that's a0:33:43
big win the other thing is when you do callback handlers that shared state is your problem right making the channels do the right thing is a library problem right it's just channel authors problem0:33:52
to make the flow state work it's not your problem the logic in a callback handler is passive right when do you get called back whenever you get called back0:34:03
you're not in charge right your passive when is your logic run whenever maybe I had a conversation with the guy who's gonna be calling me but maybe not when0:34:14
does your code run in a program that consumes channels whenever you want because you don't have to read those channels you could be doing something0:34:23
else you could say when I'm in the state I don't look at those channels therefore I don't hear from them right how many people have built our you know large architectures of callbacks and then been like well I wish I could turn off these0:34:33
three when this is happening that's hard right it's very hard so you get that you have the choice right in your logic the0:34:42
other thing is that this implicit communication is code driven right and the explicit communication is data driven the thing that's flowing over these channels is is data which means0:34:51
it's straightforward to go and for instance put on a wire or do something get a real true separation of concerns we saw this in the design of pedestal0:35:01
which was a piece of logic for the browser just a library for closure that in its original incarnation basically0:35:10
takes inputs in transforms a data model that can detect Delta so you can efficiently determine when this change0:35:19
came in these three parts of this data model changed and therefore these parts of the UI should change and because that system was architected with cues on both ends of that thing0:35:29
they were able to say you know what it would be nice if we could run all this transformation logic in a web worker and they just took that code and they put it in a web worker they took these two0:35:38
channels and they marshal and they marshaled right when you have webs of calls you can't do that kind of work because you're your fundamental communication is not data its calling0:35:48
and you can't just take call you know call chains and split them across web workers right you can't even call across web workers so as soon as you can get to0:35:57
data you should you should have a lot more flexibility in your system when you do that so I would say that there's a sense in which this callback thing is0:36:08
intimacy right everybody knows by really building this whole intimate system with a lot of connectedness and and there's a sense in which a channel driven system0:36:17
is ignorance right I don't know I don't want to know I put stuff there and I'm done I take stuff from there I don't care where it came from right and we all know the ignorance is0:36:27
bliss and in this case intimacy is pain not necessarily generally but certainly in this case I think it is this is just0:36:38
another taste of what it looks like this is an example from goes examples of how you would for instance set off a bunch of queries that try to reach multiple0:36:47
possible sources for each of an image web query and a video query and returns whichever the first one of those came0:36:56
back with an answer for each of those types but bounded the entire thing what for you know with 80 millisecond timeout and that's what it looks like here it's0:37:05
just it's just like the go code but it's a it's just as expressive except this is all expressions and not statements and this is a really powerful and simple way0:37:15
to think about your programs if you're writing concurrent programs because the semantics are very straightforward and you can their semantics you can get your0:37:24
head around and make decisions based around it's not like this nebulous set of conventions that you're0:37:33
forced into with other solutions so what do you get when you do this you get a separation of concerns for realz separation of concerns you end0:37:42
up with logic that's quite coherent and linear it's co-located right you end up with logic that if it has state it might be able to just use recursion to0:37:51
maintain that state and not need any mutation constructs or any kind of coordination constructs versus the shared state which would require place0:38:00
oriented state you can get coordination out of it if you want you can run on buffer channels and use them as synchrony points and rendevouz you can0:38:09
get back pressure because you're gonna put in a fixed buffer which means you can get to a point get the back pressure and then cascade that so you can build very large systems that have reliable0:38:20
and easy to reason about back pressure characteristics you can't make them dynamically configurable again because the channels are first-class you can0:38:29
assemble a network that makes sense given the topology you're encountering at runtime and they're efficient so I'd0:38:39
like to just thank the people that helped work on it especially Timothy Baldrige did all the icky part of the macro that inverts the control which is0:38:48
quite gross and if you want to try it it's here so the code is here and whatnot there's a bunch of other things in there now there are nice constructs for doing merging and mixing and pub/sub0:39:00
and kind of higher-level things I'm certainly I don't anticipate people I would hope people would not need to work at the bottom in most cases and I'd also encourage you to make sure that you0:39:09
reserve this code for true conveyance scenarios and not just to write goofy parallelism stuff because it's not actually well-suited for that at all but0:39:21
but there are a lot of higher-level constructs and we hope to have more of them including pedestal based around this kind of work and so that's all I have to say and I can take some0:39:30
questions probably so the question is how would you how would you extend this to distributed systems with real real0:39:39
Q's and the answer is like I said earlier on the talk I think that's still somewhat of an open question you can't necessarily get all the semantics that I0:39:48
just described in a distributed queue because some of the failure modes are different on the other hand what most of the people who have tried doing it have0:39:57
done is just subset the semantics so you still have these two semantics and they still work the same way and I think that's a reasonable approach to take so0:40:06
for instance you might have constraints around whether or not buffers could be effectively blocking you might always0:40:15
have to install a policy for instance like like the sliding window or the dropping buffer sometimes some of the solutions like the Java CSP solution has0:40:27
some networking constructs that require for instance the consuming end of a of a channel to to host it so in that case it0:40:39
wouldn't be as first-class right it wouldn't be a channel like a cue system that's sort of independent of any process that runs you would have the endpoint connected I don't love that because I think it starts to smell like0:40:48
actors at that point and you lose that sort of first class of the channel is what it is people come and participate but it's something we're actively0:40:57
looking at right now I do know that I don't think all the semantics can be conveyed I mean I think that so the question is have we contrasted between0:41:06
CSP and PI calculus which is more recent work and more more involved and the answer is definitely not yet I mean I'm not sure that PI calculus has moved to0:41:16
the point where I would consider it sort of closer to something I would use in actual programs yet as opposed to more of a theoretical underpinning there's0:41:25
plenty of great ideas there but again you you know you have this challenge right are you gonna write a new language that works that way or what can you0:41:34
bring to an existing language so this is particularly interesting because it's a library I'm go probably had that question more readily available for them you know you're writing a new language0:41:43
why didn't you use PI calculus so my excuse is of course it's a library but I do think that there are interesting things there and and they we should look0:41:54
at them so the question is there's using channels on cerise data centers using channels increase the complexity from a versioning perspective between producers0:42:04
and consumers and I would say probably not it probably does the opposite because it's easier to agree on a data representation and and and migrate the0:42:14
code than it is to agree on data encode or code and calling signatures and data I mean this it's always going to be data end and end and so what this does is0:42:23
takes it just down to data they're the the contract is data contract so I think it's it's more tolerance of versioning independence on both end because it's0:42:33
more independent so the question was how does this timeout policy work and so a0:42:42
timeout good call to timeout creates a channel like any other that you will attempt to read from and after the timeout has occurred that the channel0:42:51
will close which will cause your reads a complete read on a closed channel returns immediately so down at the very bottom is the code that actually tries to read it says alternate try to read0:43:02
what's happening is all of these jobs are sent off asynchronously and told to put their results on the same channel see so this code down here tries to read any of those results and the timeout0:43:13
Channel so this will return when when any of those things produced as a result on see at the bottom there oh you can't0:43:22
see my cursor so I'm wiggling it over to see at the bottom I'm sorry that alt call at the very bottom says says try to read either of these things C or T and0:43:32
it will return when either something is available on channel C or T closes because that's the only thing it's going to happen on time out channel so what's0:43:41
cool about that is that's in the middle of a loop that loop just keeps going and going and the single timeout is governing the operation of the entire Loup as opposed to having to come up0:43:51
with a timeout per invocation of read for instance so I think this stuff is extremely cool everywhere systems that0:44:02
have done this kind of work have touched this in order to find an alternative the code has become dramatically simpler really dramatically simpler than the0:44:13
word dramatic should be reserved for this kind of thing it's dramatic so I definitely believe in it there's all kinds of things that you can do to try to improve performance and things like0:44:22
that but as an architectural construct I think it's it's quite quite appealing so I think with that well we have one more question to forget okay the question is0:44:33
as many work to get them working across processes a little bit like the other question and yeah people are working on it I'm mostly concerned that they don't do something that has the same surface0:44:42
and different semantics so mostly I've just told people no no no no no no because that I think would be a catastrophe you don't want something0:44:51
that looks the same and behaves differently so like I said before I think that there are there will be limitations to the semantics you can convey over a wire I'm definitely0:45:00
interesting in having that in lieu of that though there's no problem saying I'm going to continue to use my favorite cueing system and on its endpoints which0:45:10
have got callbacks I'll do exactly what I advocated before then you're combining semantics you're saying you're going to convey something with a you know a third party cue across the wire and then0:45:21
you're going to turn that into channels for the application code so you don't necessarily have the channel behavior for instance you might not get back pressure across the wire that way but0:45:32
both of these guys will feel as if they're reading a channel that's got a policy on it or writing to one so so you can combine the two right you can use this with all0:45:41
the I off stuff you have already you can use this with any Q's distribute accused you have already and just turn their API endpoints into reads or writes of channels and then use channels from0:45:51
there on actually making a distributed channel that said I have the CSP semantics maybe a research problem but I0:46:01
don't I don't think it's completely possible given you know TCP and other realities we you want to address if you really want to be something use in the real world as opposed to a theory all right0:46:13
well thanks enjoy your lunch [Applause]0:00:00
Spec-ulation Keynote - Rich Hickey
0:00:00
hi every human yeah once again it's fantastic to come to the0:00:10
cons and see everybody old friends and all the new faces and everybody being so happy that's really great I think you0:00:23
know it's just so important to revisit the fact that the community is a positive one that's you know full of optimistic people or crazy people who0:00:34
are like willing to try this new stuff and and do things in a way that that's different and to help out other people who are similarly optimistic or crazy so0:00:49
this talk is called speculation and that's just a way of covering the fact that it's a rant0:01:02
so I think we have a few talks about spec and there was a spec workshop and so this is not a talk about spec it's0:01:11
not a tutorial about spec it's not about the tech of spec sort of in any way but it is very much a talk about what spec is about because I'm not sure when you0:01:24
look at spec and what especially when you hear our talk and you see you can do this and you know it's a floor wax it's dessert topping that it's evident0:01:33
necessarily that particularly that some of the design decisions are pointed at these these two things which seem to not0:01:43
say very much at all one is that spec is about being able to give something to someone so that they can use it and the0:01:53
important thing about that that that word use is that it's sort of like a positive thing here's something you can use as opposed to like here's some rules0:02:03
that you have to follow so we want to be able to give people things that they can use and you know Stu talked about getting you know a piece of code that had no documentation or insufficient0:02:14
documentation and wondering you know what what should these Maps be what are the keys and things like that so it's hard to use that without more of a0:02:23
description so it's about that but the other side of it and again this is not about you the user doing something wrong it's about me the provider saying I'm0:02:33
going to make a commitment this is the way this thing works and in particular the commitment means and I'm not going0:02:42
to take that away later so I want to emphasize that today and that spec is designed around that and it's sort of part of a bigger problem which is change0:02:54
it's interesting to look at spec and say spec is about doing this this way or doing that that way or providing these things but spec is really about being0:03:04
able to change later a lot of what spec is is oriented towards changing things later but it's an important question because I had a conversation just two0:03:14
a where the word change was used a ton of times we use the word change to sort of cover a lot of things that happen in software development and one question we0:03:24
need to answer is is is the thing and if it's a thing is it a thing that we want in our software development lives so of0:03:36
course you know I think this happened yet today on surprise but here we go the mandatory definition of a word I was0:03:45
very surprised by this like everybody says analyzed before I do my talking go to Wikipedia to make sure I don't say something you know obvious or obviously0:03:54
wrong so I go to the dictionary and so the definition for change is sort of circular and that one of these two words has the word change in it as well but0:04:04
the origins of the word we're actually exchanged it was about barter right you can turn a cow into wheat how people play euro games oh well you know you can0:04:17
turn cows into wheat and wheat into wood and wood into stone apparently these things lead to great success in the Middle Ages so that's not transmutation0:04:29
right that's not like stuff changing in places look exchanging stuff and when you think about that way you can say well what does it mean to just change something without somebody's permission0:04:40
or cooperation or participation right one one way to say is that you just took something from them but at least it's0:04:50
not something that's nice but I think that in practice we have things changed on us and we experienced this last line0:05:01
by how many people have ever chased down dependency problems how many people enjoyed that0:05:11
okay so what are we going to do right it's not like software should be immutable that's not that's not the0:05:20
thing but how do we move it forward right so I'd like to find some different words and change in this particular case how do we make it better and different0:05:29
tomorrow in a way that you know our consumers can tolerate or at least can we consider that when we when we make it0:05:39
better so we all know how we do change right we use maven or something that drives maven and we have artifacts which0:05:51
are libraries and the library and our application says I want to use these libraries a B and C and these are the versions I want and then library a says0:06:02
oh but I need library X to work and this is the version I want and I need library Y to work and this is a version I want library B says I also need library Y and0:06:12
I want a different version and library C says I want library Z so we have a little conflict here 2.1 and 2.4 i don't0:06:22
know if you can read that you can imagine what it says something that's in conflict and the Maven has some rules that automatically make this work0:06:32
usually it will pick the later think 2.4 it is and this tree right the our immediate dependencies and the0:06:42
transitive dependencies are the things we need to have our program run right you all know better than to answer yes0:06:53
no rhetorical questions with it yes no no right so the first thing is that artifacts don't use anything right the0:07:06
library doesn't use a library at that level right because artifacts are not doing anything right they're just0:07:20
so they don't use the other artifacts they have these lists of them for various reasons we'll talk about the other thing is there's nothing in the code at least in closure and I think in0:07:29
most languages that use this infrastructure there's nothing in the code about these artifacts these are the two things to know so what does your0:07:40
application actually need and so we'll look at this problem again and we've expanded it a little bit so we look inside each of those artifacts what do0:07:52
we see if there are closure artifacts we see namespaces right I give you this jar it's got a bunch of namespaces in it we're at Java code there would be0:08:03
packages in there by the same kind of thing namespaces packages so there's a bunch of them and in fact our app is decomposed similarly right our app0:08:13
starts with a couple of namespaces that we wrote in our application space now those namespaces do say requires right0:08:22
so my app Ralph's namespace requires a Riki namespace and my app Ralph's name0:08:31
space requires see Fred namespace and App Trixie requires be Lucy and then and0:08:40
that's in code that's nice so we can see it in our program at least and then we go down and we say a Ricky needs why0:08:49
Barney somebody else had some names oh this morning Paula did these are different TV shows apparently so we have0:08:59
Barney and Wilma and for anybody who has good eyesight so this is the truth right this is the actual namespace requiring0:09:09
namespace or importing package right and now this is in code and these are the actual connections between the things right0:09:21
well what with anybody's good site I say what can you tell about this right now already about our app needing XY and Z0:09:32
we don't need Z right only Ethel uses Z and the app doesn't0:09:42
use Ethel so it's actually the same thing right name spaces are not code0:09:52
they don't do anything right there requires well obviously name spaces could be effective also you could require something for a side effect but if we set that aside the namespace0:10:03
declaration that says require is not actually trying to accomplish anything itself so they don't really use that we like the fact that we can see this in0:10:12
the code but we have this other niggling I would hope niggling problem with this which is that how do we know which namespaces are in which artifacts are in0:10:23
which jars yeah I don't know somebody tells us right we have and we meet0:10:32
somewhere else treat you like did you try this jar man it's got Fred in it I'm loving it no really there's not a0:10:42
place right where we keep this so that's a problem so what's the truth the truth is you need better and better eyesight0:10:51
to solve this problem so we open it up a little bit more and we look inside Ralph and we see that Ralph actually has0:11:00
functions in it there's a function foo inside our app in our Ralph's namespace and we also see the beauty of namespaces here because I just got tired of making0:11:09
up new names and I didn't want to get into TV shows so like every namespace as a foo and a bar function but they're conflict free because namespaces are awesome right so Ralph foo calls Riki0:11:19
foo and Riki foo calls Barney foo and Riki bar calls Fred bar right and so on0:11:28
and so forth these are the actual calls that are made these are the actual dependencies right code that needs to0:11:39
run needs other code in order to work that's the truth so we can also see that right because those calls are evident0:11:49
but one thing that's not evident there's no purple lines but in the in the legend there's purple which is that we depend0:11:58
for instance we see that Ralph foo depends on Ricky foo the function to exist but there are other details about0:12:07
that call right what does Ralph ooh pastor Ricky foo well maybe that changes over time I don't really know what is Ricky foo0:12:17
returned to Ralph foo and all callers well maybe maybe that changes over time that stuff is invisible right because0:12:27
maybe you start consuming it tomorrow or and it's very subtle what you use of the return value especially when we start returning maps and then for people with0:12:36
really excellent eyesight what else do we discover now from this we don't need X right Ralph0:12:46
foo calls a Ricky foo it never calls Ricky bar and Ricky bar was the only thing that needed X so this is already like not great0:12:58
right it's not our dependency tree is not really reflecting our actual or actual needs another thing that's going on here in the bottom and I'm not going0:13:08
to talk too much about it except to say that there are also internal calls right so inside library why Barney bar calls0:13:17
Betty foo that needs to match but nobody can see that in the direct tree0:13:27
necessarily but we do want to make sure that those things match and that's one of the advantages of pulling in an entire library is that you know the0:13:37
stuff will match even if you're getting way more stuff than you need whatever stuff you need you know should match so0:13:48
this is not great but you know supposedly this is not a problem one of the reasons why is because we have semantic versioning and in the semantic0:13:57
versioning spec which has been versioned by the way and and I had to walk through like a ton of dips to like see what had0:14:09
changed over time mostly I guess because I don't know how to use git but there wasn't like a summary of what's different between the two things but0:14:19
it's in version and of course when you start versioning your versioning but supposedly right we have these rules we0:14:30
have major versions and if a major version doesn't transition we have this implication that it should still work is this what happens in practice has anybody ever bumped to Depp in order to0:14:40
make a leaf a library visible to an application yeah everybody at some point0:14:49
has done this yes this is the answer yes we all do this it's okay yes no this is what happened this isn't what isn't what happens we're bumping versions all the0:14:59
time right something we use is better somehow in a way that our code does not care that doesn't change our code at all we get our new DEP changes0:15:08
our name change the name of the thing that clocks to us and so on and so on and so forth this is a lie all right this cascading version bumping0:15:17
happens all the time we're just trying to communicate through this palm tree through this thing and I will call this a level violation we're going to talk0:15:26
about levels now so what is actually happening well there's a few different there's a stratification of problems0:15:35
here right if we start at the bottom right this call truth we know that functions call other functions by name what is happening at the next and and0:15:45
that's you know it's clear in fact if you just treated the namespace declarations as aliasing when I forget about code loading if they were just0:15:54
aliasing they tell you enough they would tell an analysis tool enough to know when you said foo over here you were talking about Ricky's foo and therefore0:16:04
you you need to know about what Ricky's foo the the actual requires is just creating an execution context in which0:16:13
that call will work that the code for Ricky's foo will be available and that requires will do that it will also make a whole bunch of other code you don't0:16:22
call available but we know it will cover your need so we put that in our code so it creates a context if we go up another0:16:31
level to the artifacts the same thing is happening those palms are saying I need these other libraries they create a0:16:40
context in which those requires are going to succeed that this this thing requires this other library this other namespace that will be there because somebody on the street told us that if I0:16:51
use this jar Fred will be there and therefore it will work but the problem that's really broken is that this last level the person on the street told us0:17:03
to do this is pure magic right there's nothing in code about this thing and that it's going to come back later so we now understand the levels this0:17:13
functions calling functions name space requires artifacts so why do we do this0:17:22
what happen you know like what is this doing for us I mean I don't think it's you know just inherently terrible but what is what is happening can we disentangle what we're trying to say so0:17:31
why do we put things in our depths or pom or a project file at all and one is that we need the codes when we're0:17:40
working right we're writing our app we're not writing this library it used to be before we had depths that we would like download jars and we would make0:17:49
clasp as ourselves and say use this classpath it actually wasn't worse than this to be honest with you because0:17:59
there's something about a list that you made and you know what's in it and you know what's not in it and you know it says that's somewhat well it's certainly0:18:08
more tangible but somewhat more reliable than this next point which is that what's also nice is we conveniently say we needed ABC and the fact that XY and Z were0:18:18
needed was just solve for us maven will NAB through the transitive dependencies and pull everything else in so there's an ease factor to point to here right0:18:28
the other thing we do with this these depths or the pom is that we turn around or our build turns around and propagates0:18:39
them into our artifacts so that maven can do this with our stuff right can continue to nav down and let somebody who uses us and in particular when I'm0:18:50
talking about us today I'm mostly talking about when we write libraries right when you have the consuming app it's somewhat different and I'll talk about that later but in particular when0:18:59
we're writing libraries so we're writing a B or C right we use X we need that in our palm so someone who uses us gets X in addition to us so the whole YZ thing0:19:09
and stuff works so that's why we do it I think one of the things people imagine happens from putting in this in this project file is we give some integrity0:19:20
promise we make some integrity promise about we've tested our library against this thing I do not care because the chances of me0:19:30
running against the thing you test it against are slim in the end we'll talk about that later so I think that's a non benefit you imagine it but it's not true0:19:40
but again the problem is that this is coarse-grained these things are don't tell us what's actually happening and they just create a context so what I0:19:51
would like to talk about is how we talk about change because you know I mentioned things change and then we're going to change versions or get new new0:20:01
versions but I want to disentangle this and I think that you can boil down all change into this kind of language right0:20:12
which is if I'm making a library I may make requirements of the users of my stuff right so what do I require if I'm0:20:22
writing a function what I require are the arguments you have to pass me arguments if I'm a namespace what do I require just names right a namespace is0:20:37
sort of like a lookup your give me this name I gave you this bar or the function the thing in it go out the level again what do artifacts require if I give you0:20:49
a jar what are you going to do you're similarly going to look for stuff in there right with either a name or a path you're going to find the actual class0:20:59
file or clj file given some name right so there's a sense in which namespaces0:21:08
and artifacts are just functions of names to stuff okay a namespace is a function of a name to a bar or function artifacts as a function of a name to a0:21:18
namespace for package right then you can flip it around and you can say what does the library provide a function provides0:21:28
its return right if you gave me what I required I will provide to you this result and of course I would like to broaden0:21:38
this discussion to include services and procedures and things like that so if your thing is affect 'full right one of the things you provide is that effect right if you call this thing with these0:21:49
arguments the thing will be in the database or I will send an email for you or some other some other thing what does a namespace provide it's just the lookup0:21:58
but you give it a name it gives you the VAR function what does an artefact provide you gave it the name it's going to provide you with the class files the packages this kind of stuff so that's0:22:10
how we exchange things so what I will say is that you can now look at the kinds categorically the kinds of changes0:22:21
you would make in these ways right the first is this idea of growing your software your software is going to do0:22:30
more the first thing is just a chrétien right what happens when you recruit you say I'm going to provide you more you were giving me seven before and I gave0:22:41
you back 42 and now I'm going to give you back 42 and some we write more stuff so this not a lightweight use of this0:22:51
word provide I mean very specifically we need to say the words provide and require so we're going to provide more that's straight accretion the other0:23:01
thing is relaxation right it used to be you give me two wheat and a donkey and I'll give you some steel and now I don't0:23:12
need the donkey just going to be wheat and I'll give you steel so I require less right that's a relaxation on my0:23:22
part right and there's a nice sort of zen-like thing of saying you know the less you need the the more you're0:23:31
growing right it is it well whatever I will push them not that like0:23:40
touchy-feely and the others fixation and this is another cool thing as I looked up fixation it actually means to fix things it doesn't mean to be paranoid so0:23:52
then the final thing is just fixing stuff which doesn't impact what you provide or require it just means you're now doing it correctly or maybe faster0:24:02
or maybe with fewer requirements you know fewer dependencies or something something else but whatever it doesn't0:24:11
impact what you provide or require ok because again this blanket concept of change it also is used casually in0:24:22
conversation to talk about these things which is breaking your software how do you break your software you require more0:24:31
oh that to eat and a donkey it's not enough I want gold too and a ruby and0:24:40
then you'll have steel right and you know it's sort of evident I mean we use bigger sentences to mean I broke you and0:24:50
you know it's incompatible or something like that but we should be using these small things I just require more if you require more than somebody who's giving you less now is not going to get what0:25:01
they want it's not going to work for them it's broken alright the flip side is you're providing less well I was giving you steel and now I'm going to give you tin0:25:11
and good luck with your building so you're providing less of returning less than what you promised previously the0:25:21
other sort of categoric you know why are you doing this is just just changing like you know we were calling that you0:25:30
know trade and now we'd like trade to mean like something completely different so we're just going to use it for something else and so if you were calling trade you know it's like the0:25:41
classic thing if you have well homophones are or or something straight out like draw write drawing pictures we used to draw pictures for you and now we draw guns for you just the semantics are shot this0:25:53
complete do-over so so the thing here is that change is not a thing we shouldn't really be saying I changed it because0:26:02
you're telling me nothing when you say that you tell me nothing because I just described - this is great I like this I'm happy to give you less I0:26:13
like the new gold you're giving me awesome awesome sauce here I'm really angry at you right this is not good so0:26:23
calling a change it's just not useful we need to talk about one of two things is either it grew or broke right there's0:26:32
growth and there's breakage so there's so one of the things that spec is designed to do is to help us understand0:26:42
and maybe even programmatically detect when we've accidentally broken something when we just intended to grow it and0:26:52
make an argument for growing it in a minute but but that's an important part so that's why spec uses set logic for0:27:01
maps and uses reg X's for sequential syntax is because there there's already logic for determining growth like0:27:14
compatibility of those two things right there's already math for that stuff so it's not just going to be this I promise0:27:24
you kind of use car dealer thing we can we can you know we can run a program and maybe determine this we don't have those programs yet for a spec but spec is0:27:34
designed to support them being written so that helps us in the small as long as we don't do something like try to0:27:44
version specs version 2.00 the spec says you need to give me gold now don't do that0:27:54
so what about changing the large well the key thing here I would say is that we need to start recognizing when things are collections because there's only two0:28:03
rules for collections right if something is just a collection including an index collection right you give me the name I give you a thing but that's still a0:28:14
collection it's just a index keed collection right there's only two operations there's adding stuff to the collection or removing stuff from the collection0:28:24
adding stuff is growth period it's just easy it's just accretion and removing stuff is always breakage always removing0:28:35
is breakage from a collection but the important thing is when you look at software you need to see these collections because the other problem we0:28:44
have all the time is we we keep conflating changes at different levels and the versioning system encourages that right a namespace is just a0:28:55
collection of VARs right artifacts are just collections of namespaces or packages we need to see that spec uses0:29:05
sets for maps it doesn't let you say what the keys mean for this reason it's the same thing maps are collections of0:29:17
keys they're not the stuff inside the keys right if I put on a hat it doesn't0:29:27
change what my family I would still remember my family my family contain these people before it contains the same people later I didn't version my family when I put on0:29:36
a hat but we do this all the time all the time we don't see this so you have to recognize collections is really all0:29:45
the interesting stuff happens at the leaves and everything else is collection with these two rules adding stuff fine taking stuff away it breaking alright now we get really ranty0:29:58
semantic versioning I looked this up in the dictionary and it didn't it wasn't there right because because like what if0:30:10
we had dictionary version right this sort of a fundamental problem with this idea of semantic versioning which is like things mean what they mean until0:30:20
they don't mean what they mean this is helping me how I don't really see it so let's dig into the semantics promised by0:30:29
semantic versioning if you change the patch part you don't care as a consumer if you change the minor version you also0:30:39
don't care but you just don't care these things mean that they have this this they have this great semantics for is more than three that's it0:30:51
you know I'm glad there's like a you know a manifesto about this0:31:01
but what about the major component what does it mean it means you're screwed right that's the semantics of the major component it's terrible it's an absolute0:31:12
catastrophe right it does because it doesn't tell you in what way right what it really says is you might be screwed0:31:24
right so if somebody says you're screwed you're like oh that's that's terrible uh if somebody says you're Mike you're screwed you're like it's worse it's0:31:36
clearly worse right and why why is that it's because this level thing it didn't occur to this people who did this right0:31:47
smash all the levels together any change anywhere of anything that might be any of the things we just like carefully0:31:56
pulled apart and said you are requiring more you're providing less this this is big ugly thing where anything could have0:32:05
happened and we're just telling you watch out watch out and I don't think that's useful right I think you know Stu0:32:15
said it before trying to steal my thunder you might as well just change the name my going to 2.0 is not helping anybody yeah just change the name I mean0:32:31
what does it mean it's not it's just completely not meaningful to do this to somebody it just isn't it's just like we're rule now playing a different game0:32:40
and it's called the same name you know have a seat you don't know how it's played you thought you did I I predict you're going to lose0:32:53
and I think the thing is that you're like well is that bad and then I'm not changing it that's a new thing yeah that's exactly right it's a new thing so0:33:08
this raises the question of like which name do you change because we just saw this smashing together up into the version of the artifact is probably not0:33:17
good right so if I'm going to say to you change the name I need to be able to answer this question what name should I change should I call the whole thing if0:33:28
I'm if I'm requiring more in one of my functions should I change the artifact name to you know new game and now it's0:33:39
the same thing right you're going to go look at the levels for providing requiring so are you requiring more arguments or more from the arguments or0:33:48
providing less in your return we recognize these things as breakage before essentially we're going to be able to say this speck is incompatible0:33:58
the speck for your revised function is incompatible and if it is I want to see a new function right and enclosure you have two ways to do that you can stay in0:34:08
the same namespace and you can have foo - right or maybe you made a systemic kind of change it's looks like we've been passing around this thing and now I0:34:17
realize that everywhere in our API we should be passing around two things well just make API - you know namespace you0:34:27
can keep all the inner functions the same which is fine I mean I know like thinking of a good name is hard right but namespaces mean you can glom some0:34:37
different thing on the front and have good name you know new place different you know good name you don't have to you know go on and on but in practice I wouldn't either be afraid of foo -0:34:47
because it's not it just doesn't happen that often you know it just it really just doesn't and and one of the things0:34:57
that's really great about this this is to remember that the namespace is part of the name it there's really nothing called for to enclosure except like a local variable everything else has a big hairy0:35:09
name that includes the name space beforehand we're always dealing enclosure with these nice hopefully globally unique names and spec liens on0:35:19
that and you can lean on that to make these kinds of changes the other thing we have is our the aliases which help again right because I could take some0:35:28
code that used you know game one namespace and called game 1 foo and now what's called game 2 foo and it can just say G foo everywhere in the code and0:35:38
just change the NS declaration to say you know require game 2 as G now if I just did that and walked away from the code it would break but I'm when I'm0:35:50
doing that I'm saying well I'm moving to the new thing I should read the new specs read the docs I know some of the names of berm reused but I'm in charge0:35:59
right when do I need to do that when I feel like playing the new game if I want to continue to play the old game which0:36:08
maybe I'm decent at I'm going to do that I got a lot of week to get rid of all right so what about if you want to get0:36:17
rid of a fun I mean just I hate this function I hate it I hate it I hate that people call it I just want to add on my life there's no functions like this0:36:29
enclosure so the way to do that what have you changed right if you want to get rid of0:36:38
a function you didn't change the function really because it's like God what did you change you change the collection the collection no longer has the the thing that was in it so where0:36:49
are functions there in namespaces so the namespace that collection level has changed so you need to pick a new namespace and again a major refactoring might be a way to do this we've0:36:59
deprecated a whole bunch of functions blah blah blah we're going to move to library 2 and really the biggest change there is not that any of the functions are different it's that half of them are missing right we just don't want to have0:37:10
them anymore so we have a new namespace game - namespace and we just took out a bunch of stuff and that's really what's different about it so this is the way to do that alright moving up another level0:37:21
what about at the artifact level so what if I want to get rid of these this namespace I hate this namespace people are still using this namespace I gave0:37:30
them a better namespace you know three years ago they should be using that I'm tired of these people I'm going to take this away from them I just really want to do this what should i do well then0:37:40
you know again because of the levels thing we're getting rid of something in the collection the collection is the artifact and you would think by applying this logic that you should just change0:37:50
the artifact ID and you could you definitely could the first counter argument I would or objection I would expect here is that's what the major version is for you know that's what it0:38:00
isn't right unless they're going to have semantic versioning 3.0 which completely changes what this means for everybody who uses it and breaks all uses of it0:38:10
and all presumptions about it forever for everyone which they're not going to do because in fact they don't believe in semantic versioning you couldn't you0:38:19
couldn't version semantic versioning into supporting this change and that shows that it's broken right semantic0:38:28
versioning can't support this change you can't have semantic versioning 3 oh I'll do this without breaking everybody in it in an unfixable way so unfortunately0:38:38
they already decided what this means which is and this is a quote from the spec any backwards incompatible changes across all the levels we can't suddenly make it mean only additions or removals only0:38:48
removals of namespaces or packages would cause this to move so it's got to brought a semantics so that's not it the0:39:00
problem we have here is that magic I talked about earlier right if I just say this is game 2 library right inside it0:39:12
it's got you know turn one namespace but game one library also has turned one0:39:21
namespace and where's the mapping from artifacts to namespaces I don't know the guy in the street he's not there today0:39:30
right it's nowhere there's not a place for this so we can have these clashes how many people have ever had a clash where two jars they included had the0:39:40
same package in them woohoo how many people enjoyed that right so this can happen in like there's really nothing there the thing that it0:39:49
solves this for us when we made this change at the namespace level was the fact that that implicitly gave us a new scope it really actually renamed everything in that thing if I still had0:40:00
foo and bar I have game 2 foo and bar they're not in conflict with game 1 foo and bar I'm sort of ok if I do this up at this level I'm not ok because it's0:40:11
actually not an implicit change some of the ways to deal with this would be actually renaming your namespaces to match this change because usually your namespaces have some relationship to the0:40:21
artifact name it's some the library name is like in both I'm actually sure if that's the right answer to this but I would like to fix this ok so this just0:40:34
seems like a lot of work right it doesn't this make you reluctant to remove things this is a rhetorical question the right answer is yes it does0:40:45
it makes me reluctant to move things remove things and it should this like yeah why should you get so uptight about somebody calling the function that you don't like anymore so what0:40:57
like really what is more important okay so here's the the root of the rant breaking changes are broken that it's0:41:09
just a terrible idea don't do it don't do it don't try to figure out the right way to do it don't get together on the internet and say oh we've all agreed you know major version0:41:19
makes this possible woohoo it's a bad thing you don't want to do it don't figure out the best way to do it this0:41:28
method of renaming turns breakage into accretion right we still accomplish the same thing right we got rid of that pesky function because we have a new name space that doesn't include it right0:41:38
we you know we we clarified these arguments so we we really need new stuff to do this new job well we wrote a new function to do that and it sits0:41:47
alongside the other word this is gigantic right because this coexistence means people can just freely proceed0:41:56
otherwise they have to be paranoid all the time because how many people have ever encountered a breaking change that didn't move the major version and how0:42:06
much fun was that it doesn't matter the version it doesn't matter what matters is that you did this it doesn't matter how you covered it or didn't cover it or0:42:16
what you said or how you excused it or whatever it's just not good right so we like this they can coexist we want to turn breakage into accretion so it's may0:42:26
even broken right this is what we do with maven not really right we're doing this to ourselves right maven is0:42:36
actually quite interesting first of all maven doesn't let you change artifacts in it doesn't let you do this right and may even never breaks0:42:47
and maven is not versioned is there maven version you know maven central version 1,600,000 and 17 is0:42:58
there right there isn't how could this work how could how could this be all these people changing all the stuff all the time and maven never breaks it never0:43:09
breaks because it actually it doesn't voices that's for losers I'm not doing versions maven central is a big name you can rely on go to maven0:43:18
central and you can find everything you ever found in there forever and ever and ever that's the idea of maven central right you don't say I'm going to use maven0:43:28
central you know 5060 to I mean the number would be astronomical right and like oh no I use maven this and then we'll have maven versioning versioning maven central versioning we don't do0:43:40
that and yet it works right it's where it's like crazy we've all presumed what this name maven central means we all share it and we also all share an0:43:50
understanding and actually sort of a peaceful feeling that it will continue to mean what it always meant forever and0:43:59
ever and ever how could this work how could it work it's very straightforward it's an accreting collection of immutable things as functional0:44:08
parameters we should be like duh of course this works this is what we do in the small everyday and at the very top0:44:19
end of the ecosystem this is how it works also so at the bottom it works like this and at the top it works like this so I'll just advise you right now0:44:31
not to look up rotten sandwich on the Internet because it's it's it's quite unpleasant0:44:43
but you can imagine this beautiful sandwich right at the bottom we have our functional program and we know we're doing we we have all this assignment0:44:53
conversation say with some we're talking about you know talking to Java people about using closure and I think I do think it remains one of the biggest challenges you have and trying to0:45:02
evangelize closure is that you eventually end up in a point where you're trying to say to somebody you know closure solves the problem that you0:45:11
don't know you have and that problem is like this intense anxiety and pressure you feel dealing with mutability on an ongoing basis and until you've0:45:21
experienced that you know lifting and closure is not the only language that can do this for you but until you feel that lifting you don't really know what0:45:30
you were suffering from before it's like if someone's standing on your foot every day you'd be like you know you wouldn't even know and then they get off you're free like whoa that's pretty good0:45:41
walking is a lot easier now so so we've experienced this at the bottom and we actually do experience that same thing like I said when you use me even0:45:50
centrally you also feel the same way about it it's like I'm not really afraid I'm going to go look in there and jar XYZ one two three four is going to be0:45:59
different tomorrow or missing not really worried about things those things because they have these rules that play the game I've been describing so far0:46:08
which is name should be enduring in their semantics and you should be a creating immutable stuff but in the middle we're messing this up big time0:46:18
right the way we do artifacts the way we do namespaces the way we just trash function signatures is a complete mess0:46:28
so this is not a surprise at this point in the talk is semantic versioning broken totally yes broken bad idea we0:46:37
should abandon it as soon as possible right because it is fundamentally in the in the biggest semantic it has right not0:46:46
the small ones which are like four is bigger than three right in the biggest Symantec has the semantics about major version change it's a recipe for how to break software0:46:56
that's what it is that's what semantic versioning is like here's how you break software here's how you screw up your users here's how you make like life difficult for people0:47:06
here's how you undermine software development and you know and but it's a standard and it has like you know it has0:47:15
a web page and everything I'm not actually you know advocating for something particular said except to say that it doesn't matter a whole lot I0:47:25
mean as long as you have something that still has the properties the for is bigger than three you know some sort of sequentiality to it you have a lot of options here one of the problems with0:47:34
versions even if you get rid of the major version this minor version thing is that it's it's completely self0:47:43
relative right 1.2 is bigger than 1.3 but I have these six libraries this is 1.2 this is 3.7 this is 4.1 you know one0:47:55
of these three is 11 years old and the other one was released yesterday can you tell which one no these numbers they don't they don't relate to each other0:48:05
now that's not saying you could take this chronological versioning and do something deterministic with it because you don't know what people saw right but you could use Lamport lock like logic to0:48:16
know what they could not have seen which is not nothing but you don't have that otherwise so if your artifact name is a0:48:28
stable thing a stable a thing as maven central you have a lot more flexibility about this you could do something like this it would convey more information0:48:38
that then 4.3 does and have some sort of possibilities for relativism0:48:47
what about git so this is another thing I mean obviously these approaches predate get everything we're doing about jars and0:48:58
maven and stuff is really pre good but git has these great properties is definitely Cole aligned with a lot of what I'm saying right it's immutable its truth of code it really is about this0:49:09
source code as opposed to stuff you just made up about it later right 4.3 what I mean what does it even mean to say this is 4.3 oh that's good I mean it doesn't0:49:23
mean anything it really does not mean anything so the truth is always the code right now it's quite widely adopted it0:49:33
has a nice property being able to do content-based addressing like I said it's pretty much ignored by the systems it's not their fault right it's just that they existed before it does it does0:49:43
have some challenges I mean I think this should participate right I started this talk by saying the truth is actually the code dependencies and get you know is0:49:53
where the code is being managed but the way get talks about stuff is via Shaw's and people don't like Shaw's they like0:50:04
the characteristics of it in terms of being you know a universal unforgeable key but it doesn't convey anything about0:50:14
order unless you have the rest of the repo it doesn't imply anything about causality I mean four is greater than 3 at least says that it came after and0:50:25
those readability issues but I you know I think that there's a way to integrate this stuff and I think it would be driven from the bottom back up to make a0:50:36
solution so now this is not like me preaching to you or like I think we all could do better0:50:45
with us I mean closure doesn't have a perfect track record in this area but the most important thing is that you know we're not going to be able to you0:50:54
know tack ourselves out of this right what did I say about maven and it's actually not broken right what's broken is that what we're putting into it is broken for that to be different we need0:51:05
to not put broken stuff in there and that's a social thing that's about considering other people one of the0:51:19
things I think that makes this challenging is is open source right because when we work in a local team or0:51:31
whatever in your team it might be a distributed team but when you work in your team you have a small set of people and you have stand-ups and you're0:51:41
working on private stuff that doesn't get published and no one's consuming except yourselves you have everybody in on the call and we say you know what I think we did this wrong we really do0:51:52
need to we need wheat and corn to do this job all right well we got to change all of our calls to pass corn everybody good on that yeah Sally when can you0:52:03
have your wisdom Tuesday I'll have mine done by Friday all right by next Monday we'll all be passing wheat and corn everybody okay yeah yeah yeah yeah great0:52:12
have a good day stand up is over now we move to the internet and we have slack and it feels0:52:23
like that right we're hanging out or friends are there a bunch of people that work in this library are there we're like ah you know what this library it's0:52:32
just not good we're a passing wheat and we need wheat and corn what do you think oh yeah I think so too everybody agree on slack0:52:41
that day agrees we should be passed in corn all right good I'm gonna go do it you know I'm just gonna do it you know boom get commit github artifact0:52:55
its enclosure ours you know I talked to everybody on slack right it feels it does feel like this like it's the same0:53:06
because it's what we want I mean so we would want open-source to be sort of like the team is now you know everybody but there's two things maybe everybody0:53:17
who actually is an author of that library was in slack right but it's different my on stand up everyone who was an author was on stand up and0:53:27
everyone who was impacted was on stand up on slack maybe everyone who was an author was on slack everyone who was0:53:36
impacted who knows who they are who knows who all their users of their libraries are unless it's nobody then0:53:47
you don't know right so the user base is open and it's unknown you have to you have to be caring about these people0:53:56
that you don't know I know in this political climate it just seems like something wild to say but you actually do you have to care about these people0:54:07
that you don't know and in software we need to do the same thing and so open source development is not the same slack is not stand up so how do we code for0:54:21
growth right Alex Miller is talked about spec a bunch and so has stew and the number one question they get about spec0:54:32
is why don't you let me say disallow any other keys and maps I'm angry about this I can't check for correctness without0:54:41
this thing right it is the number one beef and we saw this beautiful talk by Paula about logic this morning guess0:54:51
what most logic systems don't have in fact I don't know of any logic systems that do have it they don't have something that says and nothing else will ever be true0:55:05
and the reason why they don't have it is because then you like you could almost do no good logic with that system and you could never ever know or calculate0:55:16
anything you didn't know on the very first day right so open specs and open data formats which we like right we use0:55:25
maps we use them all the time in general we should be writing code that's like doesn't care if those keys in the map that we don't we don't care about but it's like a critical thing0:55:35
about spec spec is about what you could do but it's not about what you can't do because tomorrow maybe I could turn weed0:55:44
into cows you know I don't know I want to retain the flexibility to be able to do that especially if I can figure out how to do that might be a cool thing so so you can't0:55:55
let you know you're checking problem du jour dominate your specs that's not what specs before they're about what people0:56:04
can do you could make something with spec that could do that extra thing don't put it in your specs that's not your public thing you want to add another layer respect that they like0:56:13
shuts down stuff or run an additional check to help people you know detect errors or something like that that's fine but don't put it in your primary public spec your primary public specs0:56:23
you re nted towards growth because otherwise you're going to have nowhere to go because what happens if I let you prohibit things I promise you this is0:56:32
what it's going to happen and believe me every engagement we've had where people said I really want to say you can click two days later right their world broke0:56:41
because they had nowhere to go right if you say you can't do X it means you can never do X and if you're going to try to0:56:51
like make it okay to do X later then you need a new name but now we did the opposite of what I was saying before right when I said before is if you're0:57:01
going to break somebody use a new name now we're saying if you want to grow use a new name that's awful right because0:57:11
that's going to cause your thing to change the key that was in your map you're spec to change the spec of the thing that included you to change the0:57:20
speculative thing that included that to change spec is designed so that that doesn't happen that as long as you make growing changes you do not need to cascade up spec is not like semantic0:57:31
versioning that way but if you do this you will turn that completely upside down you will have this problem so this is why you can't I don't have a shorter0:57:43
way to do that but that's what that's why it's this way okay the other thing you have to do if you want to code for growth is you always have to presume people might hand you stuff that you0:57:52
don't know about that's just got to be okay although it's a coding discipline to deal with that a lot of people have like a just take everything that's in the map and put it on the screen you0:58:02
know maybe you should still like keys right because if you just throw everything on the screen and they just give you their you know social security number because they're there they're0:58:11
already anticipating you know API to know which grows in that way that that's not good so you have to either ignore it I have a policy for or something like that but0:58:20
you should be okay with it they should not be disallowing this stuff you can make checkers that run occasionally to do whatever but as a as a specification this has to be okay all0:58:34
right so what about iterative development right this all sounds like I got to get it right the first time right and that's not the case right0:58:43
you're going to have a place where you're just trying to figure it out you get off the hammock I hope you went on the hammock a little bit you came in you0:58:52
start typing you know you push something and you look at it and people kick her in this like uh you know I tried it it's not that great that's fine you just need0:59:01
to be clear that you're there you're in that mode and people should expect to have to move along if they're want to use your alpha they want to be on your0:59:10
standup they want to be in that circle but I think what we need is something more fine grain than artifact releases0:59:19
to be a tool for publishing actually calling an entire API and alpha is somewhat of a problem because this like then you need this big moment to0:59:30
get out of that so I think that's an area where we could we can do something more specific but that's not to say that0:59:39
you know just leave your thing Oh dot oo or dot you know 967 you know at a certain point you're going0:59:48
to have users and what whether you change it to 1.0 or not they're going to be depending on your stuff but I do0:59:58
think we need to be clearer about like where your promises lie what did you actually promise and you know yes you discovered the fact that if you give me a ruby i'll give you you know a magical1:00:10
sword but like I never told you I would do that why don't I go this far yeah1:00:20
okay okay so so now we've talked through we start with code we get two artifacts1:00:30
there's this magical jump there but this is other problem which I talked about when you were building your library right which is I don't care what you said in your palm for your library that1:00:41
does not mean that you're going to get what you said the very first slide right one library wanted X you know why 2.1 and another library wanted y 2.4 Lily1:00:52
both can't get what they want and your app needs to use both of them so there's no truth in this transitive dependency tree it's all suggestive you know I1:01:04
would like this I would like that you know it's like a Christmas list Santa saying you know all right maybe but you know not everyone's going to get the train set so the truth is the runtime1:01:18
classpath you know if you if you're set aside you know tricky class order stuff somebody has to make that class path they will maybe take as input I mean probably will take as input the dependency tree they find from maven but1:01:28
then they're going to have to resolve things maybe human being is going to get involved and say you know I know these two things don't work but it's quite1:01:37
possible that your library is going to run against the set of components has never ever run against right so you can't say well I built this thing and it works with 2.1 I don't care I need to1:01:47
run it with 2.4 because it's running in a context right that's the thing about contexts is that you're not guaranteed your context you get put in a different1:01:56
context that's what context means so this has an impact on testing right we think we tested oh you know whatever you know reproducible development1:02:07
reproducible builds now a lot of times the things that you depend on it doesn't impact the bytes of your build at all you're just getting some testing you1:02:18
know with this library today but it's an independent thing so you can't test against an open set of consumers and you1:02:27
can't test against changes to your downstream dependencies all the time which means that the actual testing you do of your artifact at release time is limited it should be about you know does1:02:37
my thing do what it says you know do my own tests succeed but it's not really communicating a lot about the dependencies because they're going to change but I do think that we need a1:02:47
higher level way to talk about artifacts sets that's independent of this tree right emitted lis as an application I1:02:56
don't want to have to write an explicit file with every jar that is the flattening of the transitive tree but so1:03:05
I mean how many people have ever had to exclude or put in an explicit version of a librarian yeah and was that fun no but it should be1:03:15
something that's more practical we should be able to have tools that start with the code and say you know what you don't even need libraries X and Z at all I'm just not going to include them and1:03:25
your life is simpler right we should have things that say we're rolling this stuff up now if we if we were doing what I said about names being enduring that1:03:34
tool would have a lot more leeway and what it could do it could just say I'm going to use the latest of everything and it could know latest without you1:03:43
telling it as a side effect of updating your depths and updating your version all right a hush comes over the room I mean this1:03:56
is just I had a template for the talk and said insert joke here so what about web1:04:06
services it's the same thing right so people are like oh you know jars and you'll man your Lego you're old we do1:04:16
everything with Web Services now I don't care about jars I don't have jar versioning I do web services I just talked to services right it's the same1:04:26
thing it's the same thing it's the same problems it's the same mistakes everything is the same how many people have versioned web services you have major versions woohoo1:04:41
and it's no better right versioning is still not an answer and it's still the same mistake how many people version their web service when they change the1:04:51
arguments to a function you do it I mean it's okay you do it right so it's what it's what industry practices people are doing this right but that's that's a1:05:01
level that's a level violation right if you have an operation in your service and you modify what it does it's a you know it's putting on a hat your web service is not a different service your1:05:11
service provides a set of operations a web service is a collection of operations the end it's a collection that's the end of that level the two1:05:20
things you can do to a web service you can add operations and get rid of operations right then you can mess around with operations and you can look1:05:29
at them just like we looked at functions what do they require what do they provide is there a way to grow web service operations yes especially if you1:05:40
take these approaches about openness and open specifications and open data formats right is there a way to provide more back from a web service and grow that way totally yes as long as you have1:05:50
expressed to your consumers I'm going to give you at least this but I may give you more right then we can grow together right similarly you can break them in1:06:00
the same ways requiring more providing less right and when you think you want to do that well think twice because what1:06:11
happens if you instead of saying I'm going to break foo you make foo - right well if you were going to break foo what would you have to do right what happens1:06:21
today you break foo you say we have version two of our API yet to tell everybody in their mother version 2 of the API is coming Tuesday1:06:30
switch here talk to this new endpoint blah blah blah change your world right there's no getting around that now happens if you just put through - next -1:06:40
foo you can still tell people they could say I'm in Bermuda this week but next week I will try foo - that sounds1:06:49
awesome but right now my web service is going to keep working because it calls foo and you didn't take it away from me while I was on vacation right this this1:07:00
thing accretion solves the problem exactly the same way right so what we need to do is bring functional programming to the library ecosystem that's it we need to take this thing we1:07:10
need to make it a good sandwich that you know the top and the bottom and the middle are all good right right now we do update in place we excuse it with1:07:20
this versioning thing which is just not good right dependency hell is not a different thing than mutability hell it's the same thing it is mutability how1:07:30
it's just at the scale right it makes programming fragile but the worst thing is this it makes libraries less useful how many people are reluctant to take on1:07:39
dependencies yeah I am right and it's not just because like they bulk up my1:07:48
thing it's because I'm afraid I'm afraid of other people but I don't want to be afraid of other1:07:57
people and I don't think we also be afraid of other people and this is the thing that's really sad about this is that you made your thing and you open sourced it you got a slack and you were1:08:07
feeling all really good about things but people don't trust you and it's not necessarily because you did anything wrong it's just because they've seen this happen right and like I said before1:08:17
it's sort of a social thing now how many people saying no right this all sounds fine but like it's easy for me to make a1:08:26
little piece of data immutable it's easy for me to know the 42 can't change to 43 but at the scale you know I just have new requirements all the time1:08:35
you can't possibly make a big thing that doesn't change this is just not true look at these things and when try to1:08:46
call something from the UNIX that was there in 1970 something it's still there it still works the same way every trying1:08:55
to run an old Java program still works right people still using same old HTML still working right and I think closure1:09:05
core also has had this you know approach I don't think we've done it perfectly but it's a value prop so when I keep saying no and when stuffs just stays1:09:15
there that you think oh get rid of this I hate this function but this is why it's still there because I don't want to do that people and and I think that1:09:26
whatever makes things successful is somewhat unknown but I know that I really believe that compatibility is a1:09:35
prerequisite to being successful you cannot ignore this and have something that that's going to endure the people are going to value and if you want people to value the stuff that you write1:09:45
you need to consider this so what would happen if we never broke anything names would be adorable enduringly mutable a meaningful right maven central1:09:54
I know what it does for me and will always do that right this compatibility checking I was talking about it would be possible and we would also be able to sort of just move to the latest and let1:10:03
a testing thing that's independent of any of the authors go and figure out right thing figure out who needed what when what the times were here's a set1:10:13
that works because it's not going to have to be afraid of these breakages right we could look into fine grain dependencies right which is something I1:10:22
think is particularly interesting and I'll talk about it in another slide we could use the latest with impunity we do that with maven central right we're not like oh man I need to get at maven1:10:31
central from three weeks ago we don't ever say that right and the other thing that's super critical to software dome is that we can compose with impunity1:10:40
right when we when we take two things and then the third thing needs one of the other things and we don't really know we can put them together we can't1:10:49
we're missing composition which is something you know we value as functional programmers so I think there's a bunch of open challenges here1:10:58
one is that the some changes we can't see I talked earlier about arguments and returns right there's not harder to see1:11:08
then you know the presence of a function or the dependency on a function collections are straightforward but when we start using spec more it will help here because we'll be able to see in a1:11:19
growing change to a spec that something changed right the calls don't necessarily look different yeah or in a way that's machine detectable but the1:11:28
spec will have change in a compatible way and kept the same name well we could see that happening right the spec compatibility is a little bit tricky1:11:38
right because compatibility somewhat difference depending on whether or not you're supplying you're providing something or requiring it because you can make a spec bigger or smaller and we1:11:48
know in one case it's breaking and in other case it's not so this directionality is something that I want to build into spec being able to1:11:58
determine the difference this spec in this context is the providing context versus a requiring context I mentioned repoed artifact two namespaces it would1:12:07
be great to fix that it would be great to have global registries of given this namespace here's a repo here's the artifact and then we can work from the1:12:16
bottom up as opposed to I talked to Fred and he said use this jar and then you know I I found that you know and Cantor was in it1:12:26
that's cool yeah we shouldn't it should work the other way and another big problem is just it we just have tooling that says do this when we have culture1:12:35
that says do this this we solved this problem we have major versions for this alright so closure can help we have spec I think that will lead us to flexible1:12:47
DEP awareness right as opposed to this fragile brittle thing which is talking about too much we saw it was already broken even at the first example maybe1:12:56
we can do something explicit about code two artifacts which I was just saying I also mentioned before maybe we could be you know public is not really that great because a consumer of your alpha needs1:13:07
access to something so you really need to say something else when you say I'm publishing this I'm making a commitment now and similarly you might want to say it's deprecated doesn't mean I'm going to take it away it just means hey look1:13:18
over here this food - it's better it's twice as fast and you know it makes cows the other thing that's going to come out1:13:27
of this is testing based on fine-grained EPS I don't want to steal Alex's thunder but this is something that we're already working on right because it's necessary for generative testing right right now1:13:38
you press save in your editor and like all your tests run because you're testing pretty useless you wrote them yourself and they don't test anything right but generative tests are useful1:13:49
and but they're they take a long time but the thing is who why should we ever test this function more than once if we didn't change this function should we1:13:58
test it again and again and again and again should I test it in you tested and somebody else doesn't somebody else test it this is pointless we should have shah's for code and say I tested this1:14:07
shot it's done and I tested the shot in this context with this other function we know what the fine-grained depths are this is something that you could have and then yeah you could proceed but only1:14:17
the stuff that actually is affected by what you did would get this you know admittedly more expensive generative testing all right I know everyone see1:14:26
and I certainly do so what I'm going to say is you should value exchange over change right writing libraries for other people1:14:36
to use is about exchanging right if you need to change it you need to be considerate because the primary thing is exchange not change right and there's1:14:45
two really good ways to do this one is to grow your software just grow it right the other is to turn what would have1:14:54
been breaking into a chrétien right in other words if you're going to have a variant you know give birth to a variant don't muck with the thing right think of1:15:06
the children think of the consumers and that's not to say the consumers are kids I mean I think to sort of think of the children is less about children than it is about like the future right think1:15:18
about the future of your software do you ever want to be able to change it and fix it and make it better and have people rely on it and like you then you1:15:29
need to do this you need to move forward without sort of trashing stuff behind you and like I said I think we all could do better with this and I'm certainly hopeful that we'll start with some of1:15:40
the contribs and apply some of these new things but I would like closure to lead in this area I think what I've described is not unique to closure it's sort of1:15:50
the industry standard and it's not great so why don't we be the first community to make it great that's it [Applause]1:16:08
you [Applause]0:00:00
Effective Programs - 10 Years of Clojure - Rich Hickey
0:00:00
I feel like a broken record every time I0:00:09
start these talks by thanking everybody so I want to start this talk in a different way by saying my son is getting married today0:00:22
in another state so right after I give this talk I'm gonna hop on a plane go do that I'll be back tomorrow morning so I0:00:32
haven't disappeared I'm looking forward to the follow-up talks and everything else but I will be missing in action briefly so now to be redundant thanks0:00:42
everybody for coming ten years ago closure was released and there's no possible way I could have imagined this0:00:57
you know I told my wife Steph I said if a hundred people use this that would be ridiculously outrageous and that's not0:01:08
what happened and what did happened is happiness is interesting I don't think it's fully understood by one today to talk about I0:01:21
look back a little bit about the motivations behind closure it's not like when you come out with the programming language you can you can tell that whole story I think one because it's not good0:01:32
marketing and two because if you really want to be honest you probably don't know it takes time to understand what0:01:41
happened and why and what you really were thinking and I won't pretend that I had a grand plan that incorporated0:01:50
everything that ended up becoming closure it certainly involved a lot of interaction with people in the community but there is this closure is opinionated0:02:03
this we hear this and I think it's interesting to think about two aspects of that one is in which ways is it and what does it mean for a language to be0:02:12
opinionated I think in closures case people come to it and they're like wow you know this is forcing me everywhere I turn to do0:02:22
something a certain way so and I think that the nice way to say that is there's only a few strongly supported idioms and a lot of support for them so0:02:33
if you use the stuff that comes with it there's a whole story that supports your efforts and if you want to fight against that we don't do too much0:02:43
Alex was asking me which glasses are the right ones and neither is the answer but but design is about making choices and0:02:52
no there's a bunch of choices enclosure in particular there's a big choice about what to leave out and part of this talk would be talking about what was left out the other side of being opinionated is0:03:03
you know how do you get opinionated I mean it's not like I'm opinion of course I'm opinionated and that comes from0:03:12
experience when I started doing closure in 2005 I had already been programming0:03:21
for 18 years so I'd had it I was done I was tired of it but I had done some really interesting things with the with you know the0:03:31
languages of professional programmers used at the time so primarily I was working on scheduling systems in C++ these are scheduling systems for0:03:40
broadcasters so radio stations use scheduling systems to determine what music they play and it's quite sophisticated the way that works you know you think about well over the0:03:50
course of the day you don't want to repeat the same song you actually have to think about the people who you know listen to the radio for one hour in the morning and this other hour in the afternoon when you create sort of an0:04:00
alternate time dimension for every drive time hour and things like that so as multi-dimensional scheduling and we used0:04:09
evolutionary program optimization to do schedule optimization broadcast automation is about playing audio and at0:04:19
the time we were doing this playing audio on computers was a hard thing it required dedicated cards to do to the0:04:29
DSP work I did work on audio fingerprinting so we made systems that sat in closets and listened to the radio and and wrote down what they heard0:04:42
and this was both used to attract stations playlists and then eventually detract advertising which is where the money was for that which involved0:04:51
figuring out how to effectively fingerprint audio and scrub audio sort of compare novelty to the past I worked0:05:00
on yield management systems never know what yield management is probably not so what do hotels airlines and radio stations have in common their inventory0:05:13
disappears as time passes right you have like oh I have a free room you know I've got a slot in my schedule I've got a seat on this airplane and then time0:05:23
passes and nobody bought it and now you don't so yield management is the science and practice of trying to figure out how0:05:33
to optimize the value of your inventory as it disappears out from under you and that's about looking at the past and past sales and it's not simplistic so0:05:43
for instance it's not an objective to sell all over your inventory the objective is to maximize the amount of revenue get you get from it which means0:05:52
not selling all of it in most cases that was not written in C++ that was around the time I discovered common lists which0:06:01
was about eight years into that fifteen years and there was no way the consumer of this would use common lisp so I wrote0:06:11
a common list program that wrote all the yield management algorithms again out as sequel store procedures and gave them0:06:20
this database which was a program eventually I got back to scheduling and again wrote a new kind of scheduling0:06:29
system in common list which again they did not want to run in production and then I rewrote it in C++ now at this point I was an expert C++ user and0:06:40
really loved C++ for some value of love that involves no satisfaction but as0:06:52
we'll see later I love the puzzle of C++ so I had to rewrite it in C++ and it took you know four times as long to rewrite it as it took to write it in the0:07:01
first place it yielded five times as much code and it was no faster and that's when I knew I was doing it wrong went on to help my friend Eric write the0:07:13
new version of the National exit poll system for the US and which also involves an election projection system we did that in you know a sort of self-imposed functional style of c-sharp0:07:25
and then you know around 2005 I started doing closure and this machine listening project at the same time and I had given0:07:34
myself a two-year sabbatical to work on these things not knowing which one would go where and leaving myself free to do whatever I thought was right so I had0:07:43
zero commercial objectives zero acceptance metrics I was trying to please myself for two years just sort of bought myself a break but along the way0:07:54
during that period of time you know I realized I would only have time to finish one and I knew how to finish closure and then you know machine listening is a research topic I didn't0:08:03
know if I was two years away or five years away so closure was written in Java and eventually you know the libraries written in closure and the0:08:12
Machine listening work involved building an artificial cochlea and I did that in a combination of Common Lisp and Mathematica and C++ and in recent years0:08:22
as I've dusted it off I've been able to do it in closure and that's sort of the most exciting thing you know I needed these three languages before0:08:31
to do this and now I only need closure to do it and then I did the atomic which was also close almost all of these0:08:41
projects involved a database all different kinds of databases from you know I say I'm databases a lot of sequel many0:08:50
attempts but many integrations of RDF databases are an essential part of solving these kinds of problems it's just this what we do how many people use0:09:01
a database than what they do every day how many people don't okay so this last thing is not an acronym for a database0:09:10
it's the it's there to remind me to tell this anecdote so I used to go to the lightweight languages workshop it was a one day workshop held at MIT where0:09:22
people working on small languages you know either proprietary or just domain-specific you know DARPA or whatever would talk about their their little languages and what they were0:09:31
doing with little languages it was very cool and very exciting we've got a bunch of language geeks in the same room and it was pizza afterwards so I remember I would just go by myself or with my0:09:41
friend and I'm not I was not part of the community that did that they just let me in but afterwards ahead pizza so I sat down a pizza with two people I didn't0:09:50
know and I still don't know their names and it's good that I don't because I'm gonna now disparage them they were both0:09:59
computer language researchers and they were talking also disparagingly about their their associate who'd somehow0:10:08
fallen in with databases and lost the true way and and one of them sort of sneeringly whence the other and said when was the0:10:17
last time you used the database it was like I don't know that I've ever used the database and like I searched oKed on my pizza because theoretically they are0:10:28
designing programming languages and yet they're programming and they never use databases I didn't know how that worked it was part of the inspiration to do0:10:38
closure because I mean people who don't do database system write programming languages anybody can so you know there are0:10:49
different kinds of programs and one of the things I tried to capture on the slide is to talk about what those kinds of programs were that I was working on and the word I came up with were0:10:58
situated programs in other words you can distinguish these kinds of programs that sit in the world that are sort of entangled with the world and they have a0:11:08
bunch of characteristics one is they execute for an extended period of time it's not just like calculate this result and spit it over there it's not like a lambda function in AWS0:11:18
these things run on an ongoing basis and they're sort of wired up to the world and most of these systems run continuously 24/7 it's quite terrifying0:11:29
to me that now these things which are 30 years old are almost definitely still running 24/7 somewhere if they haven't been replaced so this first notion of extended periods0:11:40
of time it means continuously as opposed to just for a burst they almost always deal with information by what were the0:11:49
kinds of things I talked about scheduling and scheduling you look at what you've done in the past you look at your research data what does your audience tell you there they like or they're interested in or what they're0:11:58
burnt out on and you combine that knowledge to make a schedule yield management looks at the past sales and sales related to particular periods of0:12:07
time and facts about that and produces pricing information the election system looks at prior vote records0:12:16
how do people vote before that is a big indicator of how they're going to vote again of course the algorithms behind that are much more sophisticated but it's in a simplified way you can say all0:12:27
of these systems consumed information and it was vital to them and some of them produced information they track the record of what they did and that's this0:12:36
next point which is that most of these systems have some sort of time extensive memory that database isn't like an input to the system that's you know fixed0:12:45
it's something that gets added to as the system runs so these systems are remembering what they did and they're0:12:55
doing it both for their own consumption and for consumption by other programs quite often and they deal with real world irregularity this is the other0:13:04
thing I think that's super critical at you know in this situated programming world it's never as elegant as you think the real world and I talked about that0:13:15
scheduling problem of you know there's linear time somebody who listens all day and there's somebody who listens just while they're driving in the morning in the afternoon and eight hours apart does0:13:24
one set of people and then an hour later there's another set of people in another setting and you have to think about all that time you come up with this elegant notion of multi-dimensional time and be0:13:34
like oh I'm totally good except on Tuesday right why well in the US on certain kinds of genres of radio there's0:13:45
a thing called two for Tuesday right so you've built this scheduling system and the main purpose of the system is to never play the same song twice in a row0:13:54
or even pretty near when you played it last and not even play the same artist near when you played the artist so they're all somebody's gonna say all you do is play Elton John I hate this0:14:03
station but on Tuesday it's a gimmick two for Tuesday means every spot where we play a song we're gonna play two0:14:12
songs by that artist violating every precious elegant rule you put in the system and I've never had a real world0:14:22
system that didn't have these kinds of irregularities and where they weren't important other aspects of situated programs they rarely are sort of their0:14:32
own little universe where they get to decide how things are and they don't need to interact with anyone else or agree with anyone else almost all these systems interacted with other systems0:14:42
almost all of these systems interacted with people somebody would sit there and say start playing the song right now or skip this song and we're like well I scheduled that song and I balanced0:14:52
everything around you playing it and now you know DJ just said don't do that the election projection system has tons of screens for users to look at0:15:01
things and cross tabulate things and make decisions and beating all the things you see on TV so people can explain things to other people so people0:15:10
and talking to people is an important part of these programs they remain in use for long periods of time these are not throwaway programs like I said I0:15:20
don't know that much of the software I ever wrote has stopped being run by somebody people are still using it and they're also situated in a world that0:15:30
changes so again your best laid plans are there the day you first write it but then the rules change maybe there's three for Thursday's I0:15:39
don't know but when that happens go change everything to deal with it another aspect of being situated and it's one I think I've been thinking0:15:49
about a lot more recently is being situated in the in the software environment and community you know your program is rarely written from scratch0:15:59
with all code that you wrote just for the purpose of the program invariably you're gonna pull in some libraries and when you do you've situated yourself in that library ecosystem and that's0:16:09
another thing so when I talk about situated programs and you look at the programs I talked about having written in my career one of them really sticks0:16:18
out right what's that closure compilers they're not like this they don't have a fraction of these problems they take0:16:28
some input right off the disk they get to define the whole world right when you write a language what do you do the first thing what you do when you write a language you get rid of any - for0:16:37
Tuesday's right you just you can just disallow it right you try to make the most regular thing and then your programming is just well now I have to0:16:47
enforce the rules that I made up for myself it's like wow what could be easier than that and it really is a lot simpler they don't generally use a0:16:58
database although I think they probably should they rarely talk over wires and so compilers and theorem provers and things like that are not like these programs so the title is talk is0:17:10
effective program and what is effective mean it means producing the intended result and I really want this word to become important because I'm really tired of the word correctness where0:17:20
correctness means I don't know made the type checker happy right that is nobody none of my consumers of these programs that I did professionally care about0:17:31
that right they care the program works for their definition of works on the other hand I don't want this to be taken as this is a recipe for hacking right0:17:42
just like do anything that kind of works so we have to talk about what works means what does it mean to actually accomplish the job of being being effective and and that's where I want to0:17:54
sort of reclaim the name programming or at least make sure we have a broad definition that incorporates languages like closure and the approaches that it takes because I think these problems0:18:04
matter so what is programming about I'm going to say for me programming is about making computers effective in the world and I mean effective in the same way we0:18:13
were talking about people being effective in the world either the programs themselves are effective or they're helping people be effective right now how are we effective0:18:22
well sometimes we're effective because we calculate really well like maybe when we're trying to compute trajectories for missiles or something like that but0:18:33
mostly not most of the areas of human demur endeavor we're effective because we have learned from our experience and we can turn that0:18:44
experience into predictive power whether that's knowing not to step in a giant hole or off a cliff or walk towards the roaring lion or how to market to people0:18:56
or what's the right approach to doing this surgery or what's the right diagnosis for this problem people are affected because they learn and they0:19:06
learn from experience and they leverage that and so I'm gonna say being effective is mostly not about competence but it's about generating predictive0:19:16
power from information and you've heard me talk about information right it's about facts it's about things that happen right experience especially when we start pulling this into the0:19:25
programming world experience equals information equals facts about things that actually happen that's what that's the raw material of success in the world0:19:35
it is for people it should be for programs that either support people or replace people so they can do more interesting things so I'll also say that0:19:46
for me what is programming not about it's not about itself programming is not about proving theories about types being0:19:55
consistent with your initial propositions it's not that's an interesting endeavor of its own but it's not it's not what I've been0:20:05
talking about it's not the things I've done in my career it's not what programming is for me and it's not why I love programming I like to accomplish things in the world0:20:15
Bertrand Russell has a nice snarky comment about that he's actually not being snarky he wants to elevate mathematics and say it's quite important that mathematics be only about itself if0:20:25
you start crossing the line right and standing on stage and saying you know safety type safety equals you know0:20:34
heart-machine safety you're doing mathematics wrong according to virtue Russell and it's not just algorithms in competition they're important but0:20:44
they're a subset of what we do so don't get me wrong I like logic right I've written those scheduling systems I've written those yield management0:20:53
algorithms over written the data log engine I like logic I like writing that part of the system I usually get to work on that part of the system that's really cool0:21:02
but even you know a theorem prover or compiler you know eventually needs to read something from the desk or spit something back out print something so0:21:13
there's some shim of something other than the logic but in this world of situated programs and the kinds of programming that I have done and I think that closure programmers do0:21:25
that's a small part of the program programs are dominated by information processing unless they have you eyes in which case there's this giant circle0:21:35
around this where this looks like a dot but I'm not gonna go there actually0:21:44
because I don't do that part but the information processing actually dominates programs both in the effort the irregularity is often there right0:21:55
that's this information part that like takes all the irregularity out of the way so my date a lot good isn't gonna like have an easy day because everything is now perfect because I see a perfect thing because somebody fixed it before0:22:05
it got to me and I don't want to make light of this I think this is super critical right your best tiles coolest you know search algorithm if they0:22:15
couldn't get it to appear on a web page and do something accessible when you type you know something impressed enter no one would care right this is where0:22:24
the value proposition of algorithms gets delivered it's super important but in my experience while this is the ratio it0:22:34
probably needs to be to solve the problem this is the ratio it often is and was in my experience in my work actually this is also sort of bigger the0:22:44
square would be more of a dot that the information part of our programs is much larger than it needs to be because the programming languages we had then and0:22:53
still have mostly are terrible at this and we end up having to write a whole ton of code to do this job because it's just not something the designers of0:23:03
those languages took on and of course we're not done right we don't write programs from scratch so we have to start dealing with libraries when we do that now we've started to cross out of0:23:13
we get to define everything land right now we have relationships and we have to define how those we're gonna talk to0:23:22
libraries and how they may talk to us but mostly we talk to them so now they're alliance right there's some protocol of how do you talk to this library and we're still not done right because we said these situated programs0:23:32
they involve databases now while the information processing and the logic and the libraries may have all shared a programming language right or at least0:23:42
you know in the JVM something like the JVM a runtime now we're added right now we have a database that's clearly over there it's written in a different language it's not co-located0:23:53
in memory so there's a wire it has its own view of the world and there's some protocol for talking to it and invariably whatever that protocol is we0:24:03
want to fix it and why is that well it's something I'm going to talk about later called parochialism you know we've0:24:13
adopted a view of the world our programming language put upon us and it's a misfit for the way the database is thinking about things and rather than0:24:23
say I wonder if we're wrong on our end we're like oh no we got to fix that that of that relational algebra you can't0:24:33
possibly be a good idea okay but we're still not done I said these programs they're not third they don't sit by themselves they talk to other programs0:24:42
so now now we have three or more of these things and now they may not be written in the same programming language right they all have their view of the0:24:51
world they all have their idea of how the logics should work they all have their idea of how they want to talk to libraries or use libraries and there's more wires and more protocols and here0:25:01
we don't get the database vendor or at least giving us some wire protocol to start with that will fix with ORM we have to make up our own protocols and so0:25:11
we do that and what do we end up with JSON right it's not good but at least now we have something so when I program0:25:22
this is one program this is a program to me this is gonna solve a problem and like no subset of this is going to solve0:25:32
the problem this is the first point you start solving the problem but you're not done with problems because it's not a0:25:42
one-shot one time one moment one great idea push the button ship it move on kind of world visit every single aspect0:25:51
of this mutates over time right the rules change that the requirements change the network's change the computing power0:26:01
changes the libraries that you're consuming change hopefully the protocols don't change but sometimes they do so we have to deal with this over time and for0:26:11
me effective programming is about doing this over time well so you know I'm not trying to say there's a right or wrong0:26:21
way and like closure is right and everything else is wrong right but it should be apparent and maybe it isn't because I think we all aspire to write0:26:30
programming languages that are general purpose you could probably write you know with your improvement closure actually I'm sure you could but you certainly would0:26:39
get a different language if your target work compilers and their theorem provers or your target word device drivers or phone switches closures target is0:26:50
information driven situated programs right does not a catchy phrase for that but I mean that's what I was doing all my friends were doing that how many0:27:00
people in this room are doing that yeah so when you look at programming languages you really should look at what0:27:10
are they for right there's no like inherent goodness is like suitability constraints so0:27:19
before I started closure I drew this diagram which I did not that would have been an amazing feat of0:27:29
prescience but as I tried to pick apart you know what was closer about because I think there's no reason to write a new0:27:38
programming language unless you're going to try to take on some problems you should look at what the problems are I mean why was I unhappy as a programmer after eighteen years and said if I can't switch to something like common list I0:27:49
am gonna switch careers why am I saying that it's I'm saying it because I'm frustrated with a bunch of limitations in what I was usually and you can call0:28:00
them problems and I'm going to call them the problems of programming and I've ordered them here I hope you can but can you read it yeah okay I've0:28:09
ordered them here in in terms of severity and severity and you know manifests itself in a couple of ways most important cost right what's the0:28:18
cost of getting this wrong right and at the very top you have the domain complexity about which you can do nothing this is just the world it's as0:28:29
complex as it is but the very next level is the where we start programming right we look at the world and say I've got an idea about how this is and how it's0:28:38
supposed to be and how you know my program can be effective about addressing it and the problem is if you don't have a good idea about how the0:28:48
world is or you can't map that well to a solution everything downstream from that is gonna fail there's no surviving this0:28:58
misconception problem and the cost of dealing with misconceptions is incredibly high so then this is 10x a full order of magnitude reduction in0:29:08
severity before we get to the set of problems I think are more in the domain of what programming languages can help with right and because you can read0:29:19
these they're all gonna come up in a second as I go through each one on its own slide so I'm not gonna read them all out right now but importantly I think0:29:28
there's another break where we get to trivial isms of problems in programming like typos and just being inconsistent like you what you thought you were gonna0:29:39
have a list of strings and you put a number in there that happens you know people make those kinds of mistakes they are pretty inexpensive so what were the0:29:50
problems that closure took on there's green ones and again I'll go through all the green ones in a moment but I would say amongst the ones in the middle I0:29:59
don't think that closure tried to do something different about resource utilization then Java did sort of adopted that runtime in its cost model0:30:08
and I don't think that I mean I wanted closure to be a good library language but I didn't think about the library ecosystem problems as part of closure and you know my talk0:30:18
last year about libraries implies that I still think this is a big problem for programs it's one of the ones that's0:30:27
left right after you do closure in the atomic you know let's left to fix and the libraries the libraries are there0:30:38
but not the inconsistency and typos not so much I mean we know you can do that in closure it's actually pretty good it letting you make typos so fundamentally0:30:49
what does closure about can we make programs out of simpler stuff I mean that's the problem after eighteen years of using like C++ and Java you're0:30:58
exhausted how many people have been programming for eighteen years or okay how many for more than twenty years more than twenty-five okay fewer0:31:09
than five all right so that is really interesting to me and maybe an indictment of closure as a beginners language or maybe that closure is the0:31:20
language for cranky tired old programmers and and you know what I0:31:31
I would not be embarrassed if it was I'm that's fine by me because because you know I did make it for myself which i think is an important thing to do trying0:31:41
to solve other people's you know problems and think you understand what they are you know it's tricky so when I discover common list having used C++ I0:31:52
that I'm pretty sure the answer to this first question is yeah absolutely and can we do that with a lower cognitive0:32:02
load I also think yes absolutely and then the question is can I make a list we can use instead of Java or C sharp because you just heard my story and I use common list a couple of times and0:32:12
every time it got kicked out of production or just rule that in production really not kicked out it didn't get a chance so I knew I had to0:32:21
target a runtime that people would accept so there are these meta problems right you can try to take on some programming problems but there are always problems in getting a language0:32:30
accepted I did not think closure would get accepted really honestly but I knew if I wanted my friend who thought I was crazy even doing it like person's number0:32:41
one other than myself to try it I'd have to have a credible answer just the acceptability problems and the power problems because otherwise it's just not0:32:51
practical it's like that's cool rich but like we have work to do if we can't use this professionally really it's it's just a hobby0:33:00
so we have acceptability I think that goes to performance and for me I thought it was also the deployment platform there's a power challenge that you have0:33:09
to deal with and that's about leverage and I'll talk about that later and also compatibility again that's part of the acceptability but you know closures ability to say it's just a Java0:33:20
library kind of was big I mean how many people snuck closure into their their organizations to start with right okay0:33:30
success and then there are other things I consider to be absolute non problems and the first of these is the parentheses0:33:40
right how many people and it's okay to admit right everybody has a story how many people thought the parentheses were going to be a problem and now think that0:33:49
was crazy thinking yeah which is fine I think everybody goes for that everybody looks at Lisbon's like this is cool but I'm I'm gonna fix this part before I get0:33:59
going before I start before I understand the value proposition of it at all I'm gonna fix this and that's just something about programmers I'm not sure0:34:08
exactly what but I don't believe this is a problem and in fact when we get to the middle this talk you'll see I think this0:34:17
is the opposite of a problem this is the core value proposition of closure and I think things like par make it go away whatever that is as a bad it's a0:34:27
terrible idea and it's not good for beginners to do that you know to try to solve a problem that's that's a feature the other thing I considered not a0:34:36
problem is it being dynamic right I worked in C++ you know we had a thing that we said in C++ is that if it compiles it will probably work right0:34:45
like they say of Haskell and it was equally true then as it is now but we really did believe it we totally did and0:34:58
it's it doesn't help it really does not help for the big problems the top the big wide ones okay0:35:07
so problem number one on that list was place oriented programming absolutely this is the problem almost all the programs I wrote lots of the things on that list were multi-threaded programs0:35:18
you know they're crazy hard in C++ just impossible to get right when you adopt the normal mutability approach immutable0:35:28
objects so this is the number one self inflicted programming problem it seemed you know clear to me that just that the answer was to make functional0:35:38
programming and immutable data the default idiom so the challenge I had was were there data structures that would be fast0:35:47
enough to say we could swap this for that and the objective I had the goal I had was to get within 2 X 4 reads and 4x4 writes and I did a lot of work on0:35:58
this this was actually the main research work behind closure was about these persistent data structures and eventually I found you know I looked at Okazaki stuff and you know the fully0:36:08
functional approach and none of that gets here and then they found Bagwell's structures which were not persistent but I realized could be made so and they0:36:19
just have tremendously great characteristics combining the persistence with the way they're laid out the way memory works they made it0:36:28
they made this bar and I was able to get my friend to try my programming language and we you know we don't have this large0:36:37
library of pure functions to support this and you know immutable local bindings basically if you fall into closure your first turtle is not the parentheses right it's this this0:36:47
functional paradigm everything is gone there's no immutable variables there's no state there's no mutable collections0:36:56
and everything else but there's a lot of support right there's a big library you just have to you know sort of learn the idioms so I think this was straightforward the critical thing0:37:05
that's different about closure is by the time I was doing closure the people who invented this stuff had adopted a lot0:37:14
more right I think most of the adherence in the functional programming community consider functional programming to be about typed functional programming statically typed functional programming0:37:24
is functional programming and I don't think so I think that this is a you know this was clearly in the 80/20 rule and I think that split here is more like 99 10:37:34
the value props are all on this side and and I think closure users get a sense of that they get a feel for that this is0:37:43
the thing that makes you sleep at night ok problem number two and this is the most subtle problem and this is the thing that annoys me the most about0:37:53
statically type lying is they are terrible at information so let's look at what information is inherently information is sparse it's what you know it's what happened in the0:38:03
world does the world fill out forms and fill everything out for you all the things you'd like to know no it doesn't0:38:13
it doesn't and not ever is probably more correct the other thing is what can you know what are you allowed to know so0:38:27
it's not good answers to that whatever you want right it's open right what what else is there what is there to know well I mean what time is it right because0:38:37
every second that goes by there's more stuff to know more things happen more facts more things happen in the universe so information a crease it just keeps0:38:46
accumulating what else do we know about information we don't really have a good way of grappling with it except by using0:38:55
names when we deal with information as people names are super important right if I just say 47 now there's no0:39:06
communication going on yeah we have to connect it and then the other big thing and this is this is the thing I struggle with so often right I have a system I made a class or a type about some piece0:39:17
of data then over here I know a little bit more data than that do I make another thing that's like that if I have derivation do I derive to make0:39:27
that other thing what if I'm now in another context and I know part of one thing and part of another thing what's the type of part of this and part of that and then you know this is explosion0:39:38
because these languages are doing this wrong they don't have composable information constructs so what is the problem with programming in a way that's0:39:48
compatible with information it's that we elevate the containership of information to become the semantic driver okay we0:40:01
say this is a person and a person has a name and a person has an email and person has a social security number and there's no semantics for those three things except in the0:40:11
context of the person class or type whatever it is and and often depending on the programming language the names0:40:21
are either not there right if you got these product types where it's like person is string X string X int X string extra accent explode explode product0:40:32
type like a complete callous disregard for people names human thinking it's crazy or your programming language maybe0:40:42
has names but they compile away right they're not first class you can't use them as arguments you can't use them as lookup vectors right you can't use them0:40:53
as functions themselves right there's no compositional algebra in in programming languages for information so we're taking these constructs I think0:41:03
we're there for other purposes we have to use them because it's all we were given and it's what's idiomatic right take out a class take out you know it's0:41:12
hype and do this thing but the most important thing is that the aggregates determine the semantics which is dead wrong right if you fill out a form nothing about the information you put on0:41:22
that form is semantically dominated by the form you happen to fill out it's a collecting device it's not a semantic0:41:31
device but it becomes so and what this what happens is you get these giant sets of concretions around information you0:41:41
know people that write you know Java libraries you look at the Java framework it's cool it's relatively small and everything's about sort of mechanical things Java is good at mechanical things0:41:51
will mechanisms but then you hand the same language to the poor application programmers who are trying to do this information situated program problem and0:42:01
that's all they've got and they take out a class for like everything they need every piece every small set of information they have right how many0:42:10
people have ever seen a Java library with over 1500 classes yeah everybody and this is my experience0:42:19
my experience it doesn't matter what language you're using if you have these types you're gonna have and you're dealing with information you're gonna have a proliferation of non composable0:42:29
types that each are a little parochialism around some tiny piece of data that doesn't compose and and I'm really not happy with this you know in0:42:42
programming literature the word abstraction is used in two ways one way is just like naming something isn't is abstract0:42:51
I disagree with that abstracting really should be drawing from a set of exemplars some essential thing right not0:43:00
just naming something and what I think is actually happening here is we're getting not deed abstractions you're getting data concretions right relational algebra that's a data0:43:10
abstraction datalog is a data abstraction rdf is a data abstraction your person class your product class those are not data abstractions their0:43:22
country shion's so you know we know in practice courses just use maps what this meant actually was closure didn't give0:43:31
you anything else okay there were there was nothing else to use you there were no classes there weren't the thing to say def type there weren't types there0:43:40
wasn't algebraic data types or anything like that there were these maps and there was a huge library of functions to0:43:49
support them there was syntactic support for it so working with these associative data structures was tangible well supported functional high performance0:43:58
activity and their generic what are we doing closure if we have just some of the information here and just some of the information there and we need both those things over there we say what's0:44:09
the problem there's no problem I take some information some information and I merge them I hand it along if I need a subset of that I take a subset of that I call keys and you know select keys and I0:44:20
get a subset I can combine anything that I like there's an algebra Assoc with associative data the names are first class right keywords and symbols0:44:31
are functions they're functions of associative containers they know how to look themselves up and they're reified so you can tangibly flow them around your program and say pick out these0:44:41
three things without writing a program that knows how to write Java or Haskell pattern matching to find those three things that they're they're independent0:44:51
of the program language right there are just arguments they're just pieces of data but they have this they have this capability and the other thing which I0:45:01
think is a potential of closure it's realized to varying degrees but the the raw materials for doing this are there is that we can associate the0:45:10
semantics with the attributes and not what the aggregates write because we have fully-qualified symbols and keywords and obviously spec is all about0:45:21
that all right Brill this and coupling this is another thing it's just my personal experience the static type systems yield much more heavily coupled systems and0:45:30
that a big part of that time aspect of the final diagram of what problem we're trying to solve is dominated by coupling when you're trying to do maintenance my0:45:40
flowing type information is a major source of coupling in programs having deist you know a pattern matching of a structural representation in a hundred0:45:50
places in your program is coupling right like this stuff I'm seizing up when I see that the sensibilities you get after0:45:59
twenty years of programming you hate coupling it's like the worst thing and you smell it coming and you want no part of it and this is a big problem the0:46:11
other thing I think is more subtle but I put it here because it lets you see this is positional semantics don't scale what's an example of positional0:46:20
semantics argument lists right most languages have enclosure has them too right who wants to call a function with 17 arguments nope0:46:33
there's one in every room nobody does we all know it breaks down where does it break down five six seven0:46:42
at some point we are no longer happy but if that's all you have right if you only have product types they are going to break down every time you hit that limit all right how many people like going to0:46:53
the doctor's office and filling out the forms right don't you hate it you get this big line sheet of paper that's blank then you get the set of rules that0:47:04
says put your social security number on line 42 and your name on line 17 that's how it works right that's how the world0:47:13
works that's how we talk to other people no it doesn't scale it's not what we do we always put the labels right next to the stuff and the labels matter but with0:47:23
positional semantics we're saying no they don't you know just remember the third thing means this and the seventh thing means that and types don't help you right they don't really distinguish0:47:32
this float explode explode explode explode at a certain point that's not telling you anything so you don't scale but they it occurs in other places so we0:47:41
have argument lists we have product types where else parameterization right0:47:50
who's who's seen a generic type with more than seven type arguments or C and C++ or Java yeah well you tend not to0:48:00
see it in Java because people give up on parameterization right and what did they switch to spring0:48:11
now I mean that's not a joke that's just the fact right they switch they switch to a more dynamic system for injection0:48:20
right because parameterisation doesn't scale and one of the reasons why it doesn't scale is there are no labels on these parameters they may get names by0:48:29
convention but they're not properly named when you want to reuse the type of parameters you get to give them names yeah just like in pattern-matching0:48:39
that's terrible that's a terrible idea and it does not scale so anywhere parameters anywhere positionality is the0:48:49
the only thing you've got you're eventually gonna run out of steam you're gonna run out of the ability to talk to people or they're gonna run out of the ability to understand what you're doing0:48:58
so so I think types are an anti-pattern for for program maintenance and for extensibility and because they0:49:10
introduced this coupling and it makes programs harder to maintain and even harder to understand in the first place so closure is dynamically typed you do not have this burden of proof you don't0:49:19
have to prove that you know because I made something here and somebody cares about it over there every person in the middle didn't you know mess with it you0:49:29
know mostly they don't mess with it I don't know we have what we're protecting against but we can prove now that you know there's still strings over there the constructs are open right we much0:49:39
prefer runtime PI morphism either by multi methods or protocols to switch statements pattern matching and things like that the maps are open there need0:49:49
to know what are we doing closure if we don't know something we just leave it out we don't know it like this so maybe this maybe that I mean if you actually0:49:58
parameterised the information system it would be maybe everything right maybe everything no longer is meaningful it0:50:07
just isn't and and and then nothing is of type maybe something right if your social security number as a string it's a string you either know it or you don't0:50:16
jamming those things to two things together it makes no sense it's not the type of the thing it may be part of your front door protocol that0:50:25
you may need it or not it's not the type of the thing right so we the maps are open we deal with them on a need-to-know basis and you get into the habit of0:50:36
propagating the rest maybe you handed me more stuff should I care no the UPS comes truck comes and my TV0:50:45
is on the truck do I care what else is on the truck no I don't I don't want to know but it's okay that there's other stuff so the0:50:57
other part was you know language model complexity you know C++ is a very complex language and so is Haskell and so is Java and so this you know most of0:51:06
them closure is very small it's not quite scheme small but it's small compared to the others and it's just you0:51:16
know the basic lambda calculus kind of thing with you know a mutable you know functional core there are functions there are values you can call functions0:51:27
on values and get other values that's it there's no hierarchy there's no primer ization there's no you know existential types and the execution model is another0:51:39
tricky thing right we're getting to the point even in Java where it gets harder and harder to reason about the performance of our programs right because of resources and that's0:51:51
unfortunate you know at least one of the nice things about C was you know you knew if your program krest it was your problem and you just figure it out but you knew what it was gonna take up and0:52:00
RAM and you could calculate things and it was quite tractable and that matters to program programmers right programming is not mathematics and mathematics you0:52:10
can swap any isomorphism for any other in programming you get fired for doing that right it's different right performance matters is part of0:52:20
programming it's a big deal so making this something at least I could say it's like Java and blame them was fine but it0:52:30
also meant that all the tooling helped us write all the you know all the Java tooling works on works for closure I mean how many people use you know your kit and profiles like that are close0:52:40
that's pretty awesome to be able to do that all right now we're into the really nitty-gritty of things I didn't like and therefore I0:52:49
left out this type thing it's it goes everywhere and the name I came up for it is parochialism right this idea that I0:52:58
have this language and you know it's got this cool idea about how you should think about things you should think about things using algebraic data types or you should think about things using inheritance it yields this intense0:53:10
parochialism right you start to have representations of things manifestations of representations of information that0:53:20
they only make sense in the context of this languages rules for things and they don't combine with anybody else's ideas0:53:29
right you smash against the database you smash against the wire you smash against this other programming language because you've got this idiosyncratic local view0:53:39
of how to think about things RDF did this right and they did it because they had this objective right they're trying to accomplish something we want to be0:53:48
able to merge data from different sources we don't want the schemas to dominate the semantics how many people have ever gotten the same piece of mail0:53:58
from the same company and been like what is wrong with your databases dudes right yeah what is wrong what's wrong is one company bought another company right now0:54:09
they're the same company they now have these two databases in one database your name is in the person thing and in0:54:18
another database your knee is in the person table and another database your name is in the mailing list table right who knows that mailing list table name0:54:29
and person name are actually the same piece of information nobody they have to have meetings I mean this is a big dollar this is a big ticket problem it's0:54:39
not it's not a small it's not a laughing matter right these big companies have giant jobs trying to merge these systems0:54:48
because because table Pirozhki ality it's the same as classes and algebraic data types it's the same problem it's not a different problem it's all like I0:54:58
had this view of the world and on the day I decided how the world is I decided that names were parts of person and you decide that names are parts of mailing0:55:07
lists and now we need to fix this and you know how a lot of those companies fix it they introduce the third database usually an RDF database as a Federation point so they now can figure out these0:55:18
two things are the same and eventually they will stop sending you two pieces of mail the same piece of mail twice right so there's this subject-predicate object0:55:27
and obviously you can see the influence of this on day topic right but it goes further right I would say that the more elaborate your type system is the more parochial your types are right the less0:55:38
general they are the less transportable they are the less understandable by other systems they are the less reusable they are the less flexible they are the0:55:47
less amenable to putting over wires that they are the less subject to generic manipulation that they are right almost every other language that deals with0:55:57
types encourages this tyranny of the container I talked about before we have a choice enclosure I think people go either way right there's two things one0:56:07
is the container dominates the other is just sort of the notion of context dominating the meaning like because I called it this in this context it means that but we have the recipe and0:56:18
enclosure for doing better than that which you use name space qualified keys with name space qualified keys we now Khmers data and and know what0:56:28
things mean regardless of the context in which they're used and and anything about this thwarts the composition I talked about before and in particular because we're pointed at this program0:56:38
manipulating program ideas you'll see later it makes this harder so closure has names they're first class this is you know stuff that was in Lisp it just0:56:49
dominates more because they became the accesses for the associative datatypes and they were they are functions in and of themselves keywords being functions is0:56:59
sort of the big deal they don't disappear they're not compiled the way into offsets we can pass them around we can write them down a user doesn't know0:57:08
closure can actually type one into a text file and save it and do something meaningful with our program without learning closure we have this namespace0:57:18
qualification if you follow the conventions which unfortunately a lot of closure libraries are not yet doing of this reverse domain name system which is0:57:27
the same as Java's all closure names are conflict free not only with other closure names but with java names that's a fantastically good idea and0:57:37
it's similar to the idea in rdf of using URIs for for names and aliases help let's make this less burdensome and we've done some more recently to do more0:57:47
with that then there's this distribution problem here's where I start saying taking a language specific view of program design is a terrible mistake0:57:56
because you're in that little box you're ignoring this big picture as soon as you step back now you have this problem you have to talk over wires how many people0:58:06
use one you know remote object technology well I'm really sorry because it's brutal right it's very0:58:15
brutal it's incredibly brittle and fragile and complex and error-prone and and specific how many people use that0:58:24
kind of technology to talk to people not in there not under their own employee no it doesn't work that's how the internet works right distributed objects spelled0:58:34
right the Internet is about sending plain data over wires and almost everything that ever dealt with wires only succeeded when it moved to this and this is very successful why should we0:58:46
program in a way that's all super parochial if we only need to eventually represent some subset of our portions but some subset of our program may be a0:58:56
subset we didn't know in advance over wires if we program this way all the time we program the inside of our programs as let's pass around data0:59:07
structures and then somebody says whoo I wish I could put half of your program across the wire or replicate it over six machines what do we say enclosure that's great I'll start shipping some even0:59:16
across the socket and we're done as opposed to I got to do everything over so there were plenty of0:59:25
inspirations and examples for me of this runtime tangibility it's one of the things that I really got excited about when I learned common lists coming from C++ small talking common lists are0:59:37
languages that were obviously written by people who were trying to write programs for people these are not language0:59:46
theoreticians that you can tell they were writing they were writing gooeys they were writing databases they were locked writing logic programs and languages also but there's a system0:59:57
Sensibility that goes through small talk and Common Lisp that's undeniable and when you first discover them especially if you discover them late as I did it's1:00:10
it's stunning to see and I think it's a tradition that's largely been lost in academia I just don't see the same1:00:21
people making system and languages you know together it sort of split apart and that's that's really a shame because there's so much still1:00:30
left to pilfer from these these languages they were highly tangible right they had reified environments all the names you could see you could go back and find the code the namespaces1:00:40
were tangible you could load code at runtime I mean one thing after another after another right in the whole perlis you know quip about you know any sufficiently large C or C++ program you1:00:51
know has a poorly implemented comma list it's so true again spring right you eventually as you get a larger system1:01:00
that you want to maintain over time and deal with all those complexities of you know I showed before you want dynamism you have to have it it's not like an1:01:11
optional thing it's it's necessary but what was particularly interesting for me and implementing closure was how much runtime tangibility and situated1:01:21
sensibilities were in the JVM design the JVM is actually a very dynamic thing as much as Java looks like say c-sharp or1:01:30
C++ the JVM you know it was written with an idea of well we're gonna embed these programs on set-top boxes and and Network them and need to send code1:01:40
around that you could update their capabilities that's like it's situated everywhere you turn and the runtime has got a ton of excellent support for that1:01:49
which makes it a great platform for languages like closure and thank goodness you know that the work that the people did on self and it didn't die1:01:59
that it actually got carried through here not everything did but it's quite important and it will be a sad day when1:02:09
you know somebody says well let's just replaced that JVM with you know some static compilation technology and I'll tell you targeting the JVM and and the CLR it's plain the CLR is static1:02:22
thinking and the JVM is dynamic thinking so there are situated sensibilities in all these the last problem on my initial slide was concurrency and I think mostly1:02:31
concurrency gets solved by being functional by default the other thing you need is you need some way to some language for1:02:41
dealing with state transitions and that's the epical time model I'm not gonna get into this again here but I've given talks about this before so closure1:02:50
has this and and it was a combination of those things that let me say I think have a reasonable answer for my friend if he says how can I write a real program with this I could say here's how1:03:00
you can write a real program including a multi-threaded program and not go crazy so there's lots of stuff I wanted to take from Lisp and you know I think I talked about a lot of these it's dynamic1:03:10
its small it had first class names it's very tangible there's this code is data and read print and I'll talk a little bit more about that but there's the repple and I think that still people are1:03:19
like the repla school because I get to try things and that's true but the repple is much cooler than that it's cooler than that because it's an acronym it's cooler than that because read is1:03:30
its own thing and what closure did by by adding a richer set of data structures is it made read print into a superpower1:03:40
it wasn't just a convenience it isn't just a way to interact with people it isn't just a way to make it easy to stream programs around or program1:03:49
fragments around it's now like here's your free wire protocol for real stuff how many people ever sent Eden over a wire yeah how many people like the fact1:04:01
that like they don't need to think that's a possibility they can just do it and you know if they want to switch to something else you can but it's it's a huge deal eval obviously we know it lets1:04:14
us go from data to code and that's the source of macros but I think again it's much bigger than than the application to macros and finally this1:04:23
print which is just the other direction but let's Pat a bunch of things that needed to be fixed in my opinion it was built on concretions you know a lot of1:04:33
the a lot of the design of more abstractions and see lossless stuff like that you know came after the underpinnings the underpinnings didn't take advantage of them so if you want if1:04:43
you want polymorphism at the bottom you have to retrofit it if you want immutability at the core you know you just need you need something different you know from the ground up and that's1:04:52
why closure was worth doing as opposed to trying to do closure as a library for condomless the lists were functional kind of Mia mostly by convention but the1:05:04
other data structures were not you had to switch gears to go from you know a social with lists who you know a proper hash table and lists or crappy data1:05:13
structures sorry they just are they're very weak and there's no reason to use them as a fundamental primitive for programming I'll also packages and interning were very complex there the1:05:25
other part about closure that is important is leverage no I'm running out of time I'm gonna talk about that or that so the Eden data model is not like1:05:36
a small part of closure it's sort of the heart of closure right it's the answer to many of these problems it's tangible it works over wires it's not incompatible with the rest of the world1:05:45
there's two other languages have maps associative data structures and vectors and strings and numbers and so it seems like a happy you know lingua franca and1:05:56
why shouldn't we use the lingua franca in a program why should we have you know a different a different language it's actually not that much better and you have to keep translating all right1:06:05
here's the final thing Simon Paton Joe it's an excellent series of talks listed1:06:14
these advantages of types because this is a big thing that's left out of closure there's no types right they guarantee the absence of certain kinds of errors which is true and he would say1:06:23
he does say this is the least benefit of static typing they serve as a partial machine check specification and partial is the operative word here1:06:33
it's very partial there are a design language right they help you think you could you have a framework in which you can think about your problems they support interactive development like1:06:43
intellisense but the biggest merit he says is in software maintenance and I really disagree with just a lot of this it's1:06:52
not been my experience the biggest errors are not caught by type systems you need extensive testing to do real-world effectiveness checking1:07:01
names dominates semantics a to a list of eight a list of a it means nothing it tells you nothing if you take away the1:07:11
word reverse you don't know anything you really don't and to elevate this to sale this is an important thing and we have all these properties it's not true it1:07:20
just isn't true there are thousands of functions that take a list of a and return a list of a what does that mean it means nothing and checking it I mean1:07:30
if you only had a list of A's where you're gonna get something else to return I mean obviously in return mr. base unless you're you know getting stuff from somewhere else and if your functional you're not how many people1:07:41
like you I'm out having below ever used a UML diagram tool right it's not fun right it's like no you can't connect1:07:51
that to that oh no you have to use that kind of arrow no you can't do this no you can't it's terrible I'm regretful is much better you draw whatever you want what are you thinking about draw that what's important write that down1:08:01
that's how it should work right yes intellisense is much helped by static types and performance optimization which1:08:10
you didn't list but I think this is one of the biggest benefits we love that in C++ and maintenance I think it's not true I think that they've created1:08:19
problems that they now use types to solve oh i pattern match this thing 500 places and i want to add another thing in the middle well thank goodness i have1:08:29
types to find those 500 places but the fact was that thing I added nobody should have cared about except the new code that consumed it and if I did that a different way I wouldn't have changed1:08:39
anything except the producer and the consumer not everybody else who couldn't possibly know about it right that's new1:08:49
so I mean for young programmers I mean if everybody's tired and old then this doesn't matter anymore but when I was young1:08:58
when I was young I really you know when you're young you've got lots of free space I used to say an empty head that's not right you have a lot of free space available and you can fill it with1:09:09
whatever you like and these these type systems they're quite fun right because from a from a you know endorphin standpoint solving puzzles and solving1:09:19
problems is the same it's like gives you the same rush the puzzle solving is really cool but that's not what it should be about I think that I think1:09:29
that this kind of verification and whatnot it's incredibly important but it should be a la carte right depending on what you need to do depending on the amount of money you have to spend1:09:38
depending on what you want to express you should be able to pull different kinds of verification technology off the shelf and apply it it should not be1:09:47
built-in right there's a diversity of needs as a diversity of approaches to doing it and diversity of costs in addition I think to the extent these1:09:56
tools can be pointed at the system level problem and not some language parochialism you get more bang for your buck I have a Google view speck to speck a wire protocol yeah there's gonna be a1:10:06
lot more of that going on and I won't talk much more about speck but the next version will increase program ability so finally information versus logic the1:10:19
bottom line is where are we going in programming right the fact is we actually don't know how to drive a car we can't explain how to drive a car we1:10:29
can't explain how to play go we can't and then therefore we can't apply traditional logic to encoding that and make a program that successfully does it1:10:39
we just can't do it we're approaching problems in programming now that we don't know how to do we don't know how to explain how to do like we know how to drive a car but we don't know how to1:10:48
explain how to drive a car and so we're moving to these information trained brains right deep learning and machine learning statistical models and things1:10:57
like that use information to drive a model that's full of imprecision and and and speculation but that is still1:11:07
effective because of the amount of data that was used to train it at making decent decisions even though it also couldn't explain necessarily how it works these programs1:11:17
though are going to need arms and legs and eyes right when you train a big deep learning network does it get its own1:11:26
data does it does it do its own ETL no right it doesn't do any of that when it's made a decision about what to do how is it gonna do it well when we get1:11:38
to Skynet it won't be our problem anymore but for right now it is and I think it's quite critical to be working in a programming language that is itself1:11:48
programmable that's amenable to manipulation by other programs right it'll be fun to use closure to write you know to do brain building but it'll also1:11:59
be useful to be able to use closure for information manipulation and preparation as well as to use closure programs and program components as the targets of1:12:10
action of these decision making things in the end real-world safety is gonna come from experience it's not going to come from proof anybody who gets on1:12:20
stage and makes some statement about type systems yielding safe systems we're safe means real world that is not that is not true so this is what's really1:12:32
interesting deep learning and technologies like that are pointed above the line above that that top 10x they're pointed at the misconception problem1:12:41
they say you know what you're right we don't know how to play go we do not know how to drive a car let's make a system that could figure out how and learn it1:12:52
because otherwise we're just going to get it wrong so I'm going to emphasize that we write programmable programs and the closure is well suited to that we1:13:02
have a generic way to nth information and emphasis we have an Eric way to compose arguments without adopting the1:13:11
type system right it's hard enough to drive a car if you have to understand monads to your you know it's just not going to work1:13:21
a reified system is subject to dynamic discovery and I think spec combined with the rest of closure being reified is a1:13:30
great way to make systems that other systems can learn about and therefore learn to use and of course we have the same ability to enhance our programs over time so I would encourage you all1:13:41
to embrace the fact that closure is different and don't be cowed by the the proof people write it is it's not a1:13:53
programming it's not a solved problem okay logic should be your tool it shouldn't be your master you shouldn't be underneath the logic system you should be applying a logic system when1:14:02
it works out for you I'm encouraging you to design at the system level right it's not all about your programming language we we all get infatuated our programming1:14:12
languages but you know I'm actually pretty skeptical about programming languages being the key to program I don't think they are they're a small part of programming they're not you know1:14:23
the driver of programming and embraced these new opportunities is gonna be a bunch of talks are in the conference about deep learning and take advantage1:14:32
of them make programmable programs and solve puzzles problems not puzzles so thank you [Applause]1:14:50
[Applause]0:00:00
Maybe Not - Rich Hickey
0:00:00
thank you hi every can here we all right once again it's wonderful to see everybody here a lot of friends how many people have been here every time how0:00:11
people is the first time nice all right how many newlyweds yes oh yeah that's0:00:27
awesome all right before I get started you know0:00:36
there's been a lot of controversy about what we're working on how much working on it and roadmaps and I hope it's0:00:48
evident to everyone this week what we've been working on it's our fashion sense and and I'm happy0:01:01
to announce tonight with the caveat that you know things can change and also that you know this is completely up to me0:01:13
but the roadmap the five-year roadmap for closure is going to be stripes and0:01:22
because I know you've seen enough of my purple shirt but and and if we get enough time to work on it and again no guarantees we've done some experiments0:01:33
of work already but we may work on scarves okay actually we have been0:01:43
working on a ton of stuff and it is mostly getting spoken about at the cons but the thing we didn't quite out in time for cons was 110 but it represents0:01:52
a ton of work and in particular it represents a ton of work by someone who is not going to speak well you just heard speak but I would really like to hear a super recognition for the work of0:02:06
Alex Miller0:02:21
all right maybe not so yeah it's tricky working at the bottom of everybody0:02:32
else's stuff being a language designer and working on languages and it is something that anyone who does it takes very seriously and it's super stressful0:02:43
because you just don't want to make mistakes but you know they happen and so this talk is somewhat about n dollar0:02:52
mistakes so we're gonna start with this quote from Tony Hoare who said that null references were his billion dollar mistake they led to all kinds of0:03:04
exploits in languages like C and things like that down the line and of course they still exist and we still have nulls0:03:13
although we have Java's memory system which makes them not necessarily exploit vectors but certainly still things were not happy to see at runtime null pointer0:03:22
errors or whatnot but and there were many reasons why you might have put null references in a language back when he0:03:32
did that had nothing to do with design intention or user intention you know things like it was easy to implement or it's efficient to implement or though it didn't have another idea and in this0:03:44
talk what I want to talk about is the fact that we still use things like this we still have the desire to say that something is optional and whether we use0:03:54
nulls or some other thing in our programming languages this is still an idea this idea of maybe not needing0:04:03
information in a certain context so when do we do it and why well the first is that we might optionally require0:04:16
something like you could give me this or not if you give it to me maybe I'll have an extended set of features I provide but I don't I don't need it so this has been argument to my function I might not0:04:27
need and of course if you've got no very attic args and you have fixed number of slots and some have to be optional you're gonna have to put0:04:36
your optional thing as one of the types of the arcs or you know if you're using something I expect you're gonna have to say that there now we do that less often enclosure because we have a couple of0:04:45
other ways to accommodate optionality for instance we have very attics so you can just not pass me those extra arms they'll put them on the end and I'll have different over loads of arity and0:04:58
that's how you can get them or you can say the optional args or key word arcs and that's another way to do that that doesn't have you having a nominal thing0:05:08
which is a nillable nullable optional maybe kind of thing in there but that's a place certainly argument lists are you0:05:17
know kind of product types they have places in them the first argument the second argument or what not when you have places you have to put things in places right another place where we use0:05:26
optionality Zinn returns I'm gonna go try to find that thing for you and if I find it now I will return it and if I can't find it0:05:35
I'll return no didn't find it some other kind of things so you might or might not I might or might not provide something to you that's in the return value spot0:05:45
and pretty much there we are we do the null thing and we have nil punting and everything else cuz we're still having the nil party in lists and then the core0:05:58
of this talk is going to be about the third context which I think is particularly interesting and very challenging which is how do you manage partial information in aggregates so I'm0:06:10
gonna give you a collection of staff for a bunch of things that have a name associated with it bunch and then names within the bunch and maybe in certain0:06:19
contexts I want to or need to see them coming towards me or I will or will not give you them as a provision we do not0:06:31
in closure tend to do this using those right we don't put a key in our map and put nil in as the value there0:06:41
and I'm gonna talk a lot about or not Allah I'm going to talk about the differences there so this is the context right how do we represent optionality0:06:50
and programs so of course you know mills were bad so other people fix them for us and we are you know Philistines for not0:06:59
yet using this and there's a couple I mean there are many floating around this is not like there's one answer there are many answers so probably they aren't all0:07:10
the best but in Haskell we have a type called or there is a type called maybe it's parameterize type right maybe of some0:07:19
type a and it has two constructors you can have just an A or you can have nothing which is our nil and then Scala0:07:30
uses a lot of things so to make that same kind of thing if somewhat so I0:07:40
think we'll stick with the Haskell versions moving forward the and you know you will hear this set of this is the way you know this is a way to do this0:07:49
this fixes the problem what's great about it is it forces you to check right and of course that is the most important thing in programming that somebody is0:07:58
watching you and making sure you're checking for Nils no matter what the cost right and the problem is no one can0:08:07
articulate the costs no one ever mentions costs this all benefit right but it is not ok so when do you see the0:08:17
cost of maybe you see them in program maintenance all right so yesterday I had a function it took an X and returned to why people wrote code0:08:28
to that function right today I'm like you know what I was asking too much of you I actually can get by without that X0:08:38
I'm now making an optional right this is an easing of requirements an easing of requirements should be a0:08:47
compatible change I think so we make this change we say foo now takes a maybe X this is the way you represent optionality M returns a Y and0:08:58
the compiler inside foo will make sure that the code and foo doesn't accidentally fail to consider nothing right whew that's all wind except what0:09:10
this breaks existing callers right this is a breaking change it should be a compatible change but it's a breaking change let's talk about providing a0:09:21
stronger return type okay so yesterday I wasn't sure if I could do the job in all cases I wasn't sure I could provide a0:09:30
meaningful return value so I took an X and I returned to maybe Y but today I figured out how to give you an answer in0:09:39
all cases and so because when I was giving you that maybe why you had to deal with it I want future callers to have more0:09:48
certainty about what they're getting so I want to make a compatible change of strengthening my promise okay so relaxing a requirement it should0:09:58
be a compatible change strengthening a promise should be a compatible change so I do this I change this I'm definitely gonna give you why0:10:07
guess what happened I broke all of my callers again I broke my callers right because now they they have code that0:10:17
deals with maybe and they're not getting in maybe anymore so what is happening here right what's happening is that0:10:26
maybe in either in spite of their names and the play on language in English are not actually type systems or no matter0:10:35
how many blog posts from people that just learn Scala you read and Haskell that you read this is not or write this is an evidence of a type system that0:10:46
does not have or four types does not have union types and you're trying to fix it in the user space right and guess what you can't fix it in the user space0:10:56
right either in particular Wow it is just not a beautiful thing it does not mean0:11:06
or right it's got a left and a right it should have been called left right thingy you know because then you have a better sense of the true semantics there0:11:16
are no semantics right except what you superimpose on top of it and using English words to try to like give you some impression is not good especially0:11:26
in this case where you're so failing to come close to or right has none of the mathematical properties it's not0:11:35
associative it's not commutative it's not symmetric right actually better than left right thing you would be sin Astaire dexterity right because at least0:11:44
you'd have some sense of how it treats left it's quite poorly so you know I have a reputation for bashing type0:11:53
systems and I am NOT I'm bashing maybe and either okay but you know other type systems have other answers to the same0:12:03
questions right here's Kotlin Kotlin has nullable and non-null types right so if you say string it's assignable from0:12:13
string that's pretty good but if you try to assign null to it it says compilation error so they've strengthened the reference types and Kotlin they've said0:12:23
you know what no is not an okay value of all reference types even though Java JVM allows you to have a null as the value of string we're not going to allow it in0:12:32
the surface language of Kotlin even though it compiles to bytecode but you can have string question mark and question mark is the way you you know add null ability to a type and it0:12:43
creates a proper Union all the strings and no as a type right because types are sets so it's all the strings that set0:12:53
and one more thing and then it's assignable from both right you can assign it for maybe saying you can assign it from though if you made the same changes I just described in Kotlin0:13:03
you would not break the caller's subject to how kind of links and I don't know how Catholics Dottie the successor to Scala that the team is working on0:13:16
has union types in their plan and it says of union types of union types are the dual of intersection types values of0:13:25
type a I'm gonna say or because I think it matches values of type A or B are all values of type A and all values of type0:13:34
B and all values of type B that set its set Union right or is commutative a or B is the same type as B or a I think this0:13:45
is awesome I have never used types of somewhere I haven't desperately wanted this so it can be different do not get lectured to by people about maybe and0:13:54
either they are not the best answers in type system world so let's get to the harder problems right first of all well0:14:03
actually let's talk about closures buuuut versions of those things obviously we're dynamically typed so we don't get into the are you doing the right thing game until we add spec right0:14:14
but once we add spec we're exactly in the same place we're trying to enforce in testing right the same kinds of0:14:23
things are you making sure you're dealing with what you're what you expect are people passing you what you expect are you returning what they expect you know you're providing and requiring and0:14:33
so we have spec nillable which is an an analogy to the Kotlin nullable and we have spec or which is again just straight or of course our types are just0:14:43
sort of predicate of sets right you have a predicate things that satisfy that predicate you know constitute a set and that's or is unions of those sets and it0:14:54
has all the same properties you want for or that's why we're allowed to call it or so let's talk about the hard problem0:15:04
the hard problem is this partial information problem right so here we're talking about providing or supplying aggregates so in closure we would be0:15:14
talking about sending around maps okay in object-oriented languages you'd be talking about singing around objects0:15:23
instances of classes you might have a language that has record types it could be that or it could be you know Haskell style types of course0:15:37
you know of course we have our definition aggregate and the thing that's cool more secrets of giving talks is that it seems like I know all this0:15:47
Latin stuff but what happens is I look it up and I see this great definition I'm like oh my goodness I mean we we've known it all along0:15:56
like our languages in bed essential concepts and so when I looked up aggregate I discovered that Gregory0:16:05
which is the same root as gregarious it means flock or herd and flocks or herds mean animals that travel together this0:16:16
is a beautiful notion it's exactly the right notion I need for this talk right which is that we're trying to talk about information flow in programs and we're0:16:26
trying to say we're creating these sort of ad hoc willy-nilly you know aggregations for the purpose of a particular communication we're0:16:37
gathering a set of fields right sets of information things we know and we're passing them around that's going to travel together so the notion of0:16:47
aggregates I think is super important and the notion of an aggregate being heard is really beautiful so we want to stick to that even no matter how you0:16:56
make aggregations in your programs you're doing the same thing you're trying to name your herds right your flocks so now he gets a sort of a0:17:07
fundamental difference right and how you model this right it sets versus slots enclosure we use0:17:17
Maps right that's fundamentally sets of keys and the things to which that they're associated in languages those0:17:28
records and whatnot you're dealing with slots of course you can already tell which one is better0:17:39
so let's talk about maps I think I find it really interesting because you know people look at maps and our use of maps and they're like this is just you being lazy bla bla bla bla bla bla but you0:17:51
know what the thing is russell's and just gave a talk and he was trying to talk about functional programming to people who you know we're just trying it he talked about mathematical functions in there the fact0:18:01
that they're essentially their mappings right but they're essentially abstract and in programming we only get mappings via code that is not true actually we0:18:13
have an even more primitive way to get from a mapping of one set to another and it is the literal map it is saying if0:18:23
you give me this I will give you that if you give me this other thing I'll give you this other thing if you give me this third thing I'll give you this third thing I'm saying specifically0:18:35
declaratively with no executable code no functions being run nothing a definition of a function a mathematical function0:18:45
right a mapping between a set and another set it's a concrete thing it is the best function in programming because0:18:54
it's the easiest one to understand right it should be a function it should be something that you can call right if0:19:03
it's a function and we can call the keys right we do this all day long right maps0:19:12
are the most fundamental functions in programming they should not be denigrated they should be exalted right this is the first place to start this is0:19:21
the simplest thing that you can do there's no code associated with it there's no categorical statements that need to be made about it it's not like0:19:30
something of this mapping to something of that and the binding between to carry this is like no it's a little it's this set and that's it right an enumerated0:19:39
set is the simplest possible thing a categoric set or predicate if set is a bigger you know notion so we can directly write these and we can directly0:19:50
and everyone knows who works in closure the feeling of this this is a big deal all right0:19:59
records fields product types the stuff you did before you did maps right I will contend even if they're immutable0:20:09
this is still place oriented programming there's a place for the name there's a place for the address there's a place for the other thing right this is not a0:20:21
function anymore and in general because even when the fields are named and sometimes they're not if you just have raw product types you've got no names but even when the0:20:31
fields are named so for instance a java class you've got names for your your fields there's still not first class but you can't you can't say given this object0:20:41
and this name give me the thing but you can obviously can use Java reflection and make six function calls to get the same effect but it's not a man in0:20:50
vocable entity so they're not functions you don't get to use your information as a functional mapping and a straight0:21:00
product type just completely complex the meaning of things with their position in a list now we know Haskell has a record syntax and I'm going to show that so I'm0:21:10
not trying to say they don't have a way to put names on these things but the fact is you have to know the second string is different from the first0:21:19
string because I don't know what signing the types so this is place oriented programming and and and it matters right0:21:31
because what is the challenge of having a place there always has to be something in the place right now you know this0:21:44
visit there's this big difference between having places and therefore spaces and not but and of course this is another thing we have to you know be0:21:55
defensive about you know at least these records classes whatever they enumerate what's possible we're passing maps around it's the Wild0:22:05
West right it could be anything how do you know what it is all I'm gonna do is debug this thing forever maps us to open there's no guidance0:22:15
there's no delimiting thing there's nothing that enumerates the possible herd but of course that's true until you add SPECT right the idea is that spec is0:22:25
an orthogonal way to add that kind of communication expression validation testing around statements you would make0:22:35
about your aggregates and then we have similar kind of stuff right of course it is not the same at all right what we have is RDF style independent reusable0:22:47
attributes right especially when they're named spaced keywords and we connect them to their range specifications and0:22:58
they're there all by themselves until we go in a second step and we aggregate0:23:07
them when we say let's take a set of those and name that and that's our little herd or flock we're gonna group0:23:16
some of them together and I would like to call those aggregation schemas right they sort of imply a shape and we'll0:23:26
talk a little bit more about that shape not just being always a list in a second so that's how we we can say this is information about cars that travels0:23:36
together right cars car is the spec our names the spec which is a key spec which means that sort of describes the keys0:23:45
that can be present in a map man let's make model and year okay so this gives us the same kind of ability to say there's a name for the kind of herd and0:23:54
there are names for the things that could be part of the information that travels together so we're sort of drawing a circle around a particular kind of shape or we're drawing a shape0:24:03
around the particular set of information so now we are at the core question what do we do when some of the stuff0:24:15
can be missing in a particular houston's context right well we sorta talked about this before if we're dealing with maps and closure what do we do for if I don't0:24:26
have the the street address for some users or what am I gonna do I'm just gonna leave the key out they'll leave it0:24:35
out of the set and there's a tremendous benefit from that because the thing is that in addition to being functions the maps being functions maps are also self0:24:45
descriptive right you can call keys on a map unlike a function right if you want to know what mapping does a function make between x and y the categoric0:24:54
descriptions of it takes your string and returns a string and it actually doesn't really help you understand if I gave you this string what string would I get categoric descriptions don't really tell0:25:04
you what's happening in the function but maps as functions you can do that you can say exactly what things can you take and keys tells you exactly what things0:25:14
can you return Vala sells you so this enumerable 'ti is super important which is why you don't want junk empty keys in your maps you want to0:25:24
leave it out that way the map can tell you I do not know the last thing or the address I don't know that the maps know0:25:34
what they know that she is missing today well it was sick stayed in the barn not out in the field I don't have to worry0:25:43
about it I'm not like where is you know Fred cheapy you know just not present it doesn't help me now I'm anxious right0:25:53
should I have Fred cheapy what about slots now you have a problem we have those boxes we saw the sheep in the boxes right if you have places you0:26:04
have to have something into place right that these languages pride themselves in like not having uninitialized memory right cuz in the old C days we could0:26:13
just like you could just do nothing you know and have at it when you try to touch it you know it will definitely blow up spectacularly0:26:24
but in the area of like no uninitialized memory you have to have something to put there which means now you have to what are you going to put you have you know you have0:26:33
a couple of choices you have billion dollar mistakes right we say don't love maybe sheep right I you know so that's0:26:42
the thing when you say maybe sheep you know that that's not really a thing so how do we know it's not a thing how do0:26:52
we get to it's not a thing well I do think that the RT of people who are information representation experts0:27:01
who've been working on that problem for a long time really have good ideas and I think their ideas about properties being0:27:10
independent and about making declarations about properties about their ranges that are independent of how you might ever put them together with other properties to form any kind of0:27:20
aggregate is a completely sound one right and when you do that you realize that you would never say maybe anything0:27:31
because when you're talking about something in isolation destined to be combined in myriad ways and many different aggregates to be part of many different hurts who who knows that it's0:27:42
may be that you might not need it or or will need it definitely will need it you can't decide then because you know this0:27:52
is a building block and that's how you know maybe is not a good idea maybe types and I don't care what they are they're not really a great idea0:28:02
especially maybe types now in slots because the thing is there's no such thing as a maybe thing right if you're in if names or strings names are always0:28:12
strengths you either know the name or you don't know the name that's an orthogonal idea from what is a name my name is a string knowing a name is a0:28:21
different idea if type systems make you Jam those two things together they're wrong because there are separate ideas we'd like to keep them separate right0:28:30
we're trying to use our programs to model the world and communicate with each other and when we communicate with each other you never say I got six maybe sheep in my truck never ever0:28:42
right nothing is inherently a maybe string so we don't want to do this and you know this is actually you know sort of usage guidance right we don't want to0:28:52
say that a thing is a nillable whatever because we don't know where it's going to be used we'd like that to be something that happens later and as part0:29:01
of the talk is to talk about this so let's talk about how we do this then right so want to contrast these two things so we have these ideas in both spaces if you were doing it in Haskell0:29:12
and this now shows the records in fact so because they do have names possible it's just an alley you know it's just sugar over that product type I showed0:29:21
you before but we have the same idea a make is a string we have a car has a make a model in a year and we're saying0:29:30
maybe it has a model and maybe it has a year and in spec we can say the same kind of thing we say keys and we say we require the make and that the model and0:29:39
year are optional does this word in your head it's like it's like the tell-tale0:29:49
heart when when when when when when we don't know the model in the year when0:29:58
don't we know the model in the year we don't give you the model in the year when don't we give you the model in the year we don't require the mouth when don't we require the month who knows0:30:08
when when does this say when this doesn't say when this says forever and ever and ever cars maybe have years and models no you know there's a lot of0:30:18
times when they do have years and miles and there's other times when I don't care about the years of miles or I only care about the make and the model but not the year right show me everything0:30:28
you know about Mustang Ford Mustangs so this is a mistake it's a mistake to put optionality in0:30:39
aggregate definitions right there's no usage context at least when you look at function arguments and returns you're in0:30:48
a usage context you're saying of function foo it requires these arguments and it provides this return value there's like a baked in context in the0:30:58
fact that you're talking about foods arguments and foods return the context is when calling food this is required and that's not making an aggregate0:31:07
definition that you're going to use all over the place it may be an argument sometimes it may be return sometimes it may be arguments to five different functions that do different jobs it's0:31:18
the wrong place for optionality and I made the same mistake right I just showed you this right this is not better0:31:28
this is the same problem this is not closure being better than Haskell or Kotlin or Scala when you put maybe and in the definition of a structural record0:31:38
this is the same problem this is there's no context here and optionality is context dependent so you know I know people have been0:31:48
wondering well yeah we're gonna be finished or whatever it's gonna be finished you know when I figured this out right it's the last year last year I0:31:58
had a pang and I I saw this I have seen some people using it I'd done more thinking about it and I realized that this was not right and I spent the last0:32:08
year in addition to other stuff like you know picking out this scarf thinking about optionality and how it should work0:32:19
and in particular I you know I saw a lot of people struggling trying to use spec and you know when I talk about some of the areas in which things could be0:32:29
better I think you'll all recognize how maybe they've been hard because a lot of times you just don't you know I gave it to you in it look good and it is good0:32:39
I'm not saying that's bad but it could be better especially right in this area so what do we want so it's easy to say0:32:48
that things are wrong what would be right well what we want is to maximize scheme or reuse we want to maximize the0:32:57
reusability of the idea of a heard of information that represents a car right that's the kinds of stuff we might think0:33:07
are interesting about ours include these things and we're going to give that a name that helps us communicate it can help us validate things that can help us check for errors0:33:17
even before we get to optional required miss there's a lot we can do with that notion right the other thing that we want is we want to make sure that we0:33:29
don't have a proliferation of different schemas just because the contexts are different my car for passing tofu my car0:33:38
for passing Tabar my car that gets returned by bass we don't want that what happens when you do that well first of all besides having a proliferation of0:33:47
types which how many people have worked in type languages and had a proliferation of types yeah I mean it's like what happens right the problem with0:33:59
that is that's not really helping you those names don't help you and they drive down the reusability of your consuming code I rate some I read some code that you know deals with my cars0:34:09
while my code deals with my cars and your car deals with your cars we don't have any code that deals with cars because we all had to make separate cars because we all had to make a car that0:34:19
had a different Smashing of optionality for use in a different context so we don't want that so we want to maximize the reuse of the idea of car or other0:34:28
schemas other shapes the shape is sort of a generic idea it is not yet instantiated right a schema is a form for a model it's not the thing right0:34:40
it's sort of like the outline of a think form you know that's what schema means we want to support a whole bunch of situations and these are the situations0:34:50
I think people have encountered and see if you recognize them in trying to apply spec for instance there are many kinds0:35:00
of AP is especially wire protocols and things like that that have symmetric request/response specifications from a0:35:09
schema standpoint give me a partially filled in form I will give you back a more filled in form right that's quite a common thing right0:35:20
but with spec if you had to say well the thing I require is you must give me the ID and database something context and what I provide will0:35:29
definitely include the names and phone numbers but maybe not these other things they became they were forced to become two different specs right one was0:35:38
suspect for the what you need what's required and the other was suspect for what was provided and everybody wanted to reuse the specs across those things0:35:47
and they wrote really goofy predicates inside to try to reuse some stuff because what the other problem with not0:35:56
being able to reuse it's a recipe for error if you have to define car and I have to define car well maybe you'll0:36:05
call it make and model and maybe I'll call it brand and model and now we've got no connection where we absolutely0:36:16
should have had a connection because we've had to restate the same ideas so that's a context another context which is quite common is a pipeline of0:36:26
information building right so you think about like ring you know request chains and things like that where each handler can sort of adorn the request with more0:36:35
information or to fill out default information things like that right we have a bunch of handlers that work that way well what would the spec be for each stage again it's sort of like an0:36:46
explosion what we want is the overall name for the idea of this heard is coming through but you know the heard may start small and then you know I walk0:36:55
by with my sheep I added to the her and you add your seats the herd and you we're acquiring information right acquiring information should not be hard and we were doing it already in closure0:37:05
and closure programs are actually really good at both of these things but spec wasn't as good as closure was in allowing you to talk about orthogonal E0:37:15
the information set the information schema and the actual requirements and provisions of for instance stages in a0:37:24
in a pipeline how many people have felt tension applying spec in these kinds of contexts yeah so eventually you know if you get to become you're doing more0:37:33
you'll feel this more it's even harder than this right because the thing is schemas nest right you can0:37:47
have a schema that is an aggregate and one or more of the things in the aggregate are themselves aggregates and this is where you truly realize that0:37:56
putting in aggregates is impossibly wrong right because essentially a schema means shape if I give you a scheme it0:38:07
says a b c d and c and d are themselves XYZ foobar baths what is the shape it's not a four thing0:38:18
vector is it that's shape described by that schema which has pointers to other aggregates describes a tree the shape of the thing0:38:29
is actually a tree and the thing you get past will be a tree and the thing you return will be a tree it's deep and it0:38:40
means the optionality specs should be deep right because you can't talk you can't talk about a tree only by putting0:38:49
you know annotations on the root there's no place for it right if I said C has0:38:58
XYZ and you need X where are you going to put that in a definition of of the top you can't so we want it to be we0:39:08
want it to be deep we want this to be deep so the you know like all design things this is just what was wrong two things were combined that should've0:39:17
been combined and how do you fix it you take them apart the rest this is all this is all it is the whole thing is0:39:26
this you got a dictionary and the idea of like taking things apart and you're done0:39:35
so we had the talking about forms right is schema right just the overall shape and then talking about subsets of the0:39:44
schema subsets of the shape in context is selection what things are we gonna pick as being required or as being0:39:54
provided right and we do selections in contexts and that gives us this orthogonality and two things we can combine so let's look at how we would do0:40:03
this all right so we have the schema this is shapes only this is you know pseudo future code and the idea here is0:40:15
that this doesn't apply or optional at all it's not what it's talking about it's only talking about in this herd we can have sheep and we don't have helicopters you know that's the idea0:40:27
we're just talking about that so we can have an idea of an address that has a street a city a state and a zip I'm not advocating any of these things as canonic whatever and I know zip codes0:40:37
are hard and blah blah blah so we say you know Street describes its range so you describes this range etc etc state has you know an arbitrary0:40:46
predicate state codes you know there are thing zip code could be its own function right and then address is a schema that says you could have streets or cities or0:40:58
states or zips and addresses that's all it says and that is a useful thing to be able to say and to name and then we have0:41:07
a user you know we have a user in our system people make systems with users all the time and so a a user has an ID a0:41:16
first name and last no can have an ID a first name and last name and and an address all right so we're going to0:41:26
define new attributes for ID first name and last name and then we're going to say user could have ID first name last name or address which was the the other0:41:35
aggregate so this describes a little tree now we have some imaginary usage context right so maybe we're building a0:41:44
system we have users earnest and our system can let you get movie times and it lets you buy popcorn so get0:41:56
you movie times in order to give you the movie times I need I want to see your user ID and your zip code that's all I need I'm gonna use that I'm gonna go0:42:05
find this stuff so I want a user to be past but all I need to know about it are these two things now the user ID is up high in the root definition of user but0:42:16
the zip code is an attribute of the address of a user inside a nested aggregate further down the tree what0:42:25
about placing orders placing an order you know I want to see your first and last name and I'm going to ship it to you so I need your whole address these are both functions of users that have0:42:35
different requirements in different contexts these are the kinds of things we want to model the important thing is that there's no way there's no0:42:44
optionality spec at the top level that can represent saying these things just you can't say it which means nobody can0:42:53
say it we just had a graphic you all talk guess who can't say it yeah but you'll0:43:03
be the first to feel of saying this would be awesome so what how will this work well and again this is this is like not syntax yet but imagine you could say0:43:15
that that this spec for user will be this selection it'll say from from the herd user from the shape the schema user0:43:25
I'm interested and I must have the ID and the address and of the address I need the zip code right and then to0:43:36
place an order we're saying again I'm interested in user information here this is what I'm expecting to see and I need the first and last name and the address0:43:45
and from the address I need the whole thing Street city state and zip right this select notion is a is a deep requirements thing0:43:55
if you've ever used the atomic pull it smell like pizza right it's this it's a similar pizza it's a you know you need a0:44:05
language for talking about trees and in recursion and things like that so this separates requiring the attribute from the requirements of an attribute you saw0:44:15
address and then zip of address that seems like good I don't have to say that it's just like four more characters or you know like this is so hard but it is0:44:26
if they're different things because there are definitely contexts in which you say addresses are optional but if you give me an address you have to give me a whole address okay those are two0:44:36
different ideas I need to in an address or I don't need addresses but when you give me addresses I need this part those0:44:45
are two separate things so they're said separately in this in this model does that make sense okay and this allows you to spec into0:44:57
members of the collections right because that's what you need what you're actually accepting as an argument is a tree implied by the schema it's the0:45:06
whole tree the context specifications you need to supply have to be about the whole tree because0:45:16
otherwise how are you going to compose this if you need like different fractions of addresses in different contexts I mean think about the explosion right the conference oral explosion of route things with different0:45:27
kinds of nested things so that the roots can have the right stuff like you just can't do this job on the aggregates themselves you have to be able to talk about the trees just it just took a long0:45:39
time to figure out the other thing that this will be able to do I'm not showing on the slide is to spec into members of collections so sometimes you'll say the0:45:49
spec for something will be I have I have friends and then friends is a collection of Franta person and persons have0:45:59
whatever and so you want to be able to say of every friend you're telling me about for each friend I need this information so to be able to0:46:08
spec not we into nested schemas but also nested collections of things so you will be able to talk down into nested Simas as0:46:18
well as each member of a nested collection this is the kind of power you need to apply spec really everywhere0:46:27
because obviously function it data in and out is one thing and it gets pretty it gets pretty complex but you know0:46:36
people are you how many people use spec for like wire stuff and AP is and things like that I mean it's definitely intended to be used there that's part of the value proposition of it those kinds0:46:46
of things definitely need this kind of stuff so I'm I'm really happy about being able to go there what about this0:46:55
is this saying anything um that you're not forcing me to do anything like where where's the fun in that you know what's what are the points of0:47:05
types if you're not forcing something right well there there are two good reasons I mean so anyway this is going to be okay you're just saying what I expect to see0:47:16
is user information right and the thing also to remember about these selects this is just minimal requirements you0:47:27
can always have more stuff you can have way more stuff coming in there may be more stuff coming back and there may be stuff not in user right spec is an open0:47:39
system having more is okay I am NOT going to help you write closed brittle braking systems I'm not going to do it0:47:49
no matter how much you complain on Twitter it's not going to happen right it's just not gonna happen so this0:47:58
is like minimal requirements minimal provision it's not a boundary around things so saying this just says I have0:48:07
an expectation of seeing the user data stuff from the user flock I want to see sheep I don't want to see helicopters or I can't do anything with helicopters I'm0:48:17
expecting sheep you could send me a helicopter maybe my job is to pass it along to the the next thing which is gonna airlift the sheep to somebody else that's not what I do but they do it you0:48:27
know I I think that that's an important part of making flexible systems that you could flow information through things that just don't even know it's happening that's important right that's how0:48:38
transportation networks are built you can't not have that you can't have trucks that only hold certain kinds of things that run on roads that only hold0:48:47
trucks or certain kinds of things that's not the the world doesn't work that way so why would you say something like this well it helps you communicate right the user gets a sense of like what am I0:48:56
supposed to pass right or what will I get and it will help us with test generation right your function does expect to see user data I will generate test to give you user data and in this case you know various0:49:07
random subsets of anything that a user could you know is implied by user deeply down the tree should already do the trees and all that that work and that's0:49:17
another part of what why this makes sense right when we generate user stuff we don't generate roots only we generate trees right we go down into the nested0:49:27
specs and if they're collections we generate down into those this this lack of symmetry between selection and generation was you know that was it was0:49:38
a warning sign so is that like super tiny it doesn't0:49:47
really matter it's it's exactly what I was talking about before so you don't need to read the text in the in the boxes this is still the same users and addresses and whatnot but what I'm0:49:58
trying to show here is the split right we start with RDF style attribute you know and they map to RDF properties0:50:08
definitions that describe their own ranges they're just floating around waiting to be gathered up in Hertz and heard it around in your programs and you0:50:19
can gather them up and of course that creates other attributes which point to the the gatherings right the the aggregates but we're going to call those0:50:28
schemas they still do not have any requirement provision subsetting and then finally we have selections which0:50:38
you'll tend to use only at the edges of usage contexts right it's unlikely although it will not be impossible for0:50:49
you to make you know named specs that point to selections but I will probably prohibit nested selections because then0:51:00
you're just back to the thing I just fixed yeah let you make that and do it to yourself but it would it would fall out of this being fully general that you0:51:10
could so will probably close that door and so yeah so now you have these separate ideas which are the way you think about things anyway and now you0:51:19
get to say this that you know say it the way you think about it and this is going to make systems a lot more reusable and0:51:29
extensible right that's part of the idea of spec is that you can make systems that you can change that you can enhance over time that is that is the game0:51:39
saying today you could do X or Y it's not enough every program changes every program grows you need the ability to0:51:48
talk about type like things in ways that are compatible its program evolution that's the idea behind spec so so this is coming this you know of0:51:59
all the things we were working on this one was least far along by by cons but this is the next thing coming in spec it0:52:08
will eventually replace keys but you can you know these are obviously two different names so there may be migration world where you know all three names exist we also have been working on0:52:19
better programmatic manipulation of specs if anybody's looking that alpha-2 this pretty cool system I think for defining macros on type of multi methods0:52:30
which now gives us the sort of intermediate step that's program accessible that doesn't involve generating the shape of a macro form and0:52:39
eval it because I know a lot of people want to write programs that write specs that that has room to grow more but I mean the underpinnings are in that0:52:48
system also it's a cool system to make extensible macro libraries so have a look and other things I've been thinking0:52:57
about have been refining the function specs so I'm of course very wary right now about any other type system e Guk0:53:06
getting into spec and the next thing I was going to work on a year and a half ago in spec was you know trying to0:53:16
refine the idea of the return specs I know people are struggling to say it takes you know a collection of X and0:53:26
returns a collection of X this kind of thing you would say with parameterised types right right the amazing type you know type signature0:53:38
for reverse you know it takes a list of a and returns a list of a and the problem is when a is predicative that's0:53:48
harder to say but this is a bigger problem it's pointless to say that that's not something you want to say that reverse takes a list of a and0:53:58
returns a list of a it doesn't communicate anything about what reverse does if if I asked you what reverse did and you told me that I would not be happy right0:54:08
if you need to implement reverse and I told you that you would not be happy right because it doesn't communicate anything what do you want to say about0:54:17
reverse at least you want to say it reverses the list it was given so if that list was all of strings what could0:54:27
you possibly derive using like the most basic logic about the return of what you said was the stuff that was in the collection that came in well you'd know0:54:37
if that was all strings that there it would return all strings the categoric declaration of that is almost information free you almost always want0:54:47
your return specifications to be dependent on your arguments in other words the fun specs the fun specs are the real deal because you can derive the0:55:00
trivialities from that but it also means that you don't need something like parameterization to say I take a collection that's you know I don't care0:55:09
what it is but it will satisfy some set of predicates if I could say I returned that same stuff or a subset of the stuff that you gave me you would know those same predicates supplied right you could0:55:20
use logic to do that you wouldn't need some icky category language to talk about return types because it doesn't it0:55:29
doesn't really say what's happening at all right the fact that you returned the same stuff for a subset of the stuff says way more and of course then you0:55:38
could do more with spec you could start talking about like what reverse actually does what are the properties of the reverse thing compared to the incoming thing you know what it reverse do which0:55:48
is what fun specs allow you to say so I am I'm starting to smell rat in fun specs but I want to make it concise to0:56:00
sort of do something without having to fully define your fun spec because sometimes that's a challenging thing to0:56:09
do right but the fact is if you could just say returns the same the stuff from the collection it was given you just be saying more than type systems let you0:56:18
say and if you can't say everything about the nature of your algorithm and all the transformations it's okay you're still adding value you're still adding rigor to your system and you're still0:56:28
helping people understand what it does maybe it's a combination of a parcel specification of the of the result and documentation that helps them totally0:56:37
put it together which is another thing I would just sort of say generally about spec is there's often a desire to like0:56:46
completely nail everything down that's not necessary in a lot of cases there's a spectrum of what you can communicate what's straightforward to communicate0:56:55
and what isn't and all along that spectrum pretty much after the very first step you're saying more than type systems ever let people say and you're0:57:05
letting things be tested in an automatic way more than you were ever getting so don't go crazy if you can't completely speck the entire nature of your inner0:57:15
algorithm because sometimes it's challenging other things about making return types talk about the inputs is that a lot of people in spec are0:57:25
struggling with talking about functions that rely on external state reifying external state as an additional input0:57:34
which is what it is is another thing that i've been thinking about so that's really future thinkI kind of stuff but0:57:43
the important thing is I have been working on spec new things are coming they're going to make spec better we are extremely sensitive to breaking programs0:57:54
that use spec and making the transition of closure closures use of the current spec to the next spec straightforward so like we're thinking about those things0:58:03
and we're working on it and that's it you [Applause]0:58:20
you