Picture Me Coding

Complexity Part 2: Out of the Tar Pit

February 07, 2024 Erik Aker and Mike Mull Season 2 Episode 4
Picture Me Coding
Complexity Part 2: Out of the Tar Pit
Show Notes Transcript

This week Mike and Erik conclude their discussion on complexity with a review of the 2006 fan-favorite paper "Out of the Tar Pit."

[MUSIC PLAYING] Hello, welcome to "Picture Recoding" with Eric Aker and Mike Moll.

Hi, Mike.

How's your life?

It's doing OK.

It's a very nice day in San Diego, even by San Diego standards.

It's beautiful outside.

We last week talked about complexity.

And this week, we are going to take up part two of that conversation, where we focus pretty much entirely on the paper out of the tar pit.

But as my soundtrack for this week, I've been listening extensively to this album called "Fish Bowl" by Kate Davis.

Have you heard this one, Mike?

I have.

I've listened to it a number of times after you passed it over to me.

It's kind of a weird one.

It feels like "Indie Rock," but a little odd.

These songs that keep surprising me.

They start with these familiar chunks, and then they go up these odd staircases around weird corners or something.

I really like this album a lot.

I feel like, oh, this is familiar to me.

Chunks of familiarity.

And then wait, wait a minute.

OK, it's a little weird.

Yeah, I like it too.

I think your description is good.

Sort of slightly off kilter "Indie Rock."

Yeah, off kilter.

It's great.

I'm going to keep listening to it.

I really am enjoying it.

What have you been listening to?

There's been so much interesting stuff recently that I think I'm going to do too this week.

One is a hardcore album from 2023 by a Ukrainian band called "Death Pill."

The album is self-titled, also "Death Pill."

It's only about 23 minutes long.

Very hard.

It's an all-female band.

They seem to be pretty pissed off, which I guess is not too surprising when your country is under attack.

I've heard people describe it as "riot girl," but I think that intercells it a little bit.

It's pretty metal.

But anyway, I like it a lot.

It's very listenable, but also pretty angry.

The other one I've been listening to is an album called "LA Shit" by a group called Gracie Horse.

I think it's just one person, but I think I find it on Bandcamp first.

It's kind of country rock, some really appealing, interesting lyrics.

There's this really good song on the album called "What Am I Missing?"

There's this line in there about at the Motel 6 eating dive bar wings in a hazmat suit, which I just love that image so much.

That's pretty good image, yeah.

Anyway, it's not 100% great.

There's a couple of skippers here, but I recommend it.

It's got some really good tunes on it and some really fun lyrics.

So "Death Pill" and "Gracie Horse," that's what my homework is for the weekend.

That's right, yeah.

So, Mike, the paper out of the tar pit, the first time I ever heard about this paper, you told me about it.

I don't remember what our initial conversation was, but I probably read it 2014, 2015, somewhere in there.

The paper came out in 2006.

When I read the paper, I remember feeling like, "Yes, right on.

These guys are right.

Complexity, it's a huge deal in what we're doing.

Our work is a fight against complexity, and they've identified the sources, and they even have suggestions for how to solve it."

It was an influential paper for me.

I just wanted to recap what we talked about last week.

In the first part about complexity, we gave a few different definitions of complexity.

We said, "Complexity may be a human limitation.

It might be what Brooks is talking about, accidental and essential complexity.

Their essential nature of software is some complexity."

But then there's some accidental stuff too.

It might be what Simon talks about, where complexity is often organized in a hierarchical and redundant way, where there are self-repeating patterns inside.

At the beginning, and we didn't really carry this through, I gave the example of the Linux kernel.

That's a complex piece of software.

It's got 8 million lines of code.

We can even say from these definitions, it's large.

A human can't hold the whole thing in his or her mind.

That makes it complex.

Does it have accidental and essential properties, according to what Brooks would say, like required properties and things that are maybe not required?

One thing I thought I could ask you about is the Linux scheduler, because you looked at that code and you were interested in the scheduler.

The scheduler itself is a piece of the kernel and pretty complex.

Finally, also, we could say the Linux kernel is organized, it's hierarchical and redundant.

I thought instead of talking about all of Linux or maybe even trying to understand Linux and how to even operate Linux, maybe we just talk about something like the scheduler.

From your memory, is the Linux kernel scheduler, is there a lot of code?

There's a lot of code, although I suspect it's a relatively small percentage of the kernel overall.

You have to read through some parts that are not specifically scheduling to get the whole gist of it.

Maybe those are the accidental complexity parts.

Partially, yeah.

I think there's also another maybe accidental part is that there are multiple implementations of the kernel that you can turn on with various compiler switches.

When you're reading it, you have to make sure that you're actually reading the relevant part.

There's a lot of stuff in there that is not directly relevant to the functionality of the scheduler.

Can you give one example?

Well, there's things in there.

For instance, there are these, I forget exactly what the macro is, but there's a macro that they use on certain conditionals when they know that that conditional is going to be invoked relatively infrequently.

If you have an if block and you know that if block only gets, that section of code only gets executed in fairly rare circumstances, there's some compiler macros that you can use to tell the compiler, "Hey, this chunk doesn't really get executed very frequently."

I think it's sort of a hint to the predictive branching in the process.

That's a performance optimization.

That's accidental complexity.

Yeah, I would say so.

What's essential complexity in there?

What do you absolutely have to do to successfully achieve this piece of software?

Well, I think the basic idea of scheduling is in there.

You have to look at all the things that are in a potentially runnable state.

You have to find the things which need CPU.

You have to cycle through those things in such a way that giving things a reasonable shot at the available CPUs.

All that functionality that's sort of fundamental to scheduling is in that code.

It's kind of hard to argue what's essential and what's not essential since there have been so many implementations of the scheduler now that it's hard to say what is sort of the essential element of a scheduler.

I think we all understand what scheduling is.

There's got to be stuff in there that takes processes that can be run and puts them on processors.

Mm-hmm.

The interesting thing about your answer is that you brought up two of the three sources of complexity that I talk about in Out of the Tar Pit.

The first one you said, and it's their prime source of complexity, we'll talk about this a little more in a second, is state.

There's a lot of state to manage.

You talked about control flow.

Do I want to schedule things that, I only want to schedule things that are runnable.

There's some sort of branching there.

I need to do this and then do that or wait for a thing and then do a thing.

The third is code volume, but we had previously mentioned that.

It is a lot of code.

All right.

Let's talk about Out of the Tar Pit specifically if you're up for it.

In 2006, published by Bent Mosley and Peter Marks, they were inspired by Fred Brooks, which we talked about last week.

Fred Brooks is no silver bullet.

Dykstra and Tony Hoare who have also talked about this topic in various ways.

The papers were cited 41 times.

When I looked at the citations, I thought the most notable one here is Rich Hickey when he published his paper, The History of Closure.

Closure actually, so this paper Out of the Tar Pit published in 2006.

Closure started in 2005 and ultimately released in 2007.

It's kind of concurrent with this paper.

Looking at the citations though for this paper, I have to conclude it looks like a paper that's had a lot stronger influence on industry and people like us than on academics and academia.

It doesn't look like a paper that had a huge influential impact in academic research or programming language research or anything.

I would agree.

I've seen this paper referenced numerous times on Reddit and blog posts and typically things that are being written by engineers.

I have to.

As you say, I don't think it's had a very significant effect on the academic world in terms of programming language development or software architecture.

I agree.

It seems like it's referenced by people like us.

I want to read a section of the abstract here because it does condense what's going on in the paper.

Complexity is a single major difficulty in the successful development of large scale software systems.

Following Brooks, we distinguish accidental from essential complexity, but disagree with his premise that most complexity remaining in contemporary systems is essential.

We identify common causes of complexity and discuss general approaches which can be taken to eliminate them where they are accidental in nature.

To make things more concrete, we then give an outline for a potential complexity minimizing approach based on functional programming and CODS relational model of data.

There's really two halves of this paper.

One is talking about complexity, where it comes from, what are the sources, why does it behoove us to address it?

The second is what they propose to do about it.

Complexity they argue is the number one thing we want to address.

This comes from early on in the paper.

They say it's the root cause of the vast majority of problems with software today.

Unreliability, late delivery, lack of security, often even poor performance in large scale systems can all be seen as deriving ultimately from unmanageable complexity.

The primary status of complexity is the major cause of these other problems comes simply from the fact that being able to understand a system is a prerequisite for avoiding all of them.

And of course, it is this which complexity destroys.

So complexity destroys our ability to understand a system and therefore it hinders our ability to have reliable software, efficient delivery of software and secure software.

I would agree with all of those things.

Hard not to agree with that, right?

But here's the first question I have for you.

They're talking about understanding.

So already this sounds slightly different from what Brooks is talking about.

And they do say in the abstract they disagree with Brooks, but Brooks is saying there's essential complexity in software.

And what Mark supposedly you're saying is software is hard to understand, but that's our fault most of the time.

They're grounding it as like a human limitation.

Yeah, I think they sort of differ on what they mean by essential.

It's fuzzy in both cases, I guess, both Erin and those silver bullet paper.

But I think that they make a comment in here is probably later on about how if the subject matter expert who's defining the requirements doesn't know about something in your program, it's probably not essential.

And I am not entirely sure I agree with that.

I mean, so for instance, you know, if you have a very large data set and to make it efficient to access that data set, you implement some sort of index or a be-tree or something like that.

Your subject matter expert clearly doesn't know about that thing, but maybe the program can't operate without it.

And so is that thing essential or not?

I think in their view, it would not be.

Yeah, and they actually have this even pithy statement, which is nothing that the user does not know about can be considered essential.

So if you have a user of software, only what they directly know about in their usage of the software is essential.

Anything else is incidental to the implementation of what you're trying to achieve for that user.

Right.

And I'm not sure that's entirely what Brooks was getting at.

I think that's a much more restrictive definition of essential than what Brooks was using.

Yes.

And they are saying we disagree with Fred Brooks, but I think you're exactly right.

Brooks is asserting that there is, he's like, when I read Brooks's paper, it feels to me like reading these philosophers from the 18th and 19th century.

There's an essence of complexity, like an amorphous stuff.

And if we could better realize the form, the platonic form of that stuff in the world, we could isolate it and avoid it.

Or not avoid essential, but we could just sort of corral it into a box.

And these guys are saying, nah, most complexities accidental.

Now let me continue from the beginning of this paper because they're saying what they're interested in really is large systems.

The type of complexity we are discussing in this paper is that which makes large systems hard to understand.

It causes us to expend huge resources in creating, maintaining such systems.

Okay, so we want it to be reliable.

Complexity gets in the way of that.

They're interested in large systems.

So how do we make large systems reliable and deliverable and secure?

And ultimately, early on in the paper, if you're wondering, well, okay, what are the sources of all this accidental complexity?

They drop in your lap, but I think is the equivalent of a hot take.

Do people still say hot take or is that a decade out of date?

As far as I know, people still say it, but I'm probably not the best resource for what's current.

So this is a flaming take.

It's your programming language that is the source of your accidental complexity and that is immediately clear why this is such a popular paper to cite for programmers.

So here's from the paper.

Another side, John Backus, who in his touring award lecture in 1977, he talked about the complexities and weaknesses of traditional languages.

A noted quote, "There is a desperate need for a powerful methodology to help us think about programs.

Conventional languages create unnecessary confusion in the way we think about programs."

This immediately sounds like vikinsign to me.

"Pilosophy is the bewitchment of our intelligence by means of language."

So if we have the right programming language, sorry, if we had the right programming language, we could immediately reduce all kinds of accidental complexity.

And I read this 10 years ago when I was doing a lot more Haskell and I was thinking, "Yes, these guys have it.

That's right."

And I read it now and I'm like, "Wait a minute."

So the problem with large systems is we're using the wrong programming languages?

This seems like they kind of claim that people on Hacker News would foam at the mouth about.

Yeah.

I'm sure this had a very, at some point in time, has had a very significant thread going on Hacker News.

But yeah, I mean, again, I think it's a point that maybe is hard to argue with, but also a point that is hard to prove.

You have to kind of go point by point to explain why you think that is and how it would be replaced, which I guess they attempt to do, but I'm not sure they do it successfully.

Let's go back to the scheduler.

This is the example I want to use.

It's written in C.

You described a macro which has some accidental complexity because it's just a hint to optimize the running of the code.

What if that Linux kernel scheduler were written in a different language?

What if it were written in Python?

It's a good question.

I think it would probably have some benefits, but also some liabilities.

You could probably sketch out the general algorithm a little bit more clearly in a language like Python or even maybe Rust.

Rust was my backup example.

It's a pretty different example, but keep going.

I think the problem there is the problem that you have in general with the Linux kernel, which is that there's so many very small scale optimizations that you make to make things really work the way they need to work.

You're dealing with a piece of software here that really has to deal with, in some cases, specific types of hardware architectures.

Doing it in a language that has access to that type of thing is essential without trying to overload that word.

I think this is the form of our critique of this paper.

When I read this and I really thought, "I should be using Haskell at work.

I should get more of my peers to be using Haskell," I really bought the argument of this paper.

It's not like I reject it whole-cloth now.

I guess I find myself a little more skeptical that we can cleanly divine accidental and essential and that we could put so much of the weight of accidental on the shoulders of the programming language we're using.

I remember reading this paper the first time and being inspired by the first parts of it.

As you said before, you start to read these statements that they make about complexity being the key problem and state being one of the issues that causes that complexity.

At the time, when you're reading that part of it, you're thinking, "Yes, I'm glad they're bringing up these points.

I'm really looking forward to see what their solution to these problems will be.

Okay, but let's not give away the ending.

The next section after the intro, they're talking about how we understand systems.

Again, I think it's worth pointing out, when we talked about complexity last week, understanding was a piece of the conversation.

It wasn't entirely a conversation about how humans understand complexity because we want to do a law that maybe complexity exists outside of human minds.

That's a tantalizing thought.

It says objective truth about ideas and information and trying to represent those things in code.

Out of the tarp, it really is about understanding through and through.

They immediately start talking about how we understand systems.

They say we have really two primary ways of understanding systems.

They say we use testing and we use informal reasoning.

All of these have limitations.

Testing notably not going to be a complete answer.

The famous line, it can show the presence of bugs but never their absence.

You can't test everything.

There's too many possible inputs to test everything.

They say, right out, informal reasoning is the preferred way to understand systems.

In order to get to that, you have to have simplicity.

Again, this is very human cognitive level argument.

The problem may not even be complexity itself but that it's too hard for us to understand systems.

I think there's a way in which they may be conflating this human cognitive view of complexity with the accidental essential view.

That's really what their disagreement with Brooks is.

Does that make sense what I'm saying?

You follow?

I do think they definitely emphasize the issue of human understanding to a greater extent than what Brooks does.

I think you're right.

For instance, they make a claim in here that they don't think that code size is really as big of an issue as Brooks makes it out to be.

More or less explicitly say that they don't think it's a nonlinear problem as Brooks sort of implied.

I think they have a quote from Dijkstra to sort of back that up.

I'm not sure I agree with that.

I'm not sure that's a known thing.

I think it is a bit of a contrast to what Brooks was saying about our ability to reason about systems as human beings.

I want to talk about that.

You mentioned code volume there.

We talked about state before.

This comes up in the next section of paper.

The next section is about what are the sources of complexity in these systems?

Again, they're saying large systems.

They're interested in large systems.

What are the sources of complexity in large systems?

One is state.

We have to keep track of the state of the world, the state of things in our program.

We've got variables.

We've got data.

We have to keep track of these and modifications to these.

The second is control flow.

First, this instruction happens in our program.

Then that instruction happens if state is in some particular, you know, there's a branching.

If something has happened before, then do this thing or do that thing.

I'm going to loop over this stuff and do some other things.

This is all control flow.

The last, as a source of complexity, they say is code volume.

They do say it's not a linear relationship.

If we could build the right abstractions, we can reason about general chunks, which, okay, that's part of that.

If we can exploit features of organized complexity, we can make our code hierarchical.

We can have patterns.

It allows us to not think about the whole thing at once.

We can abstract away pieces of it.

Okay, cool.

State control and code volume.

But I did find myself wondering, if we're talking large systems, sure, it may not be a linear relationship.

Do large systems require large code volume?

May not be linear.

But if we're interested in complexity, complexity is a feature of large systems.

It almost seems like there's got to be some relationship, right?

I think so, and I think it also probably is harder today to define the boundaries of a system than maybe it was at the time this paper was written.

Why do you say that?

Well, if I think of things I'm working on now, I can see subsystems within them that sort of focus on a particular area of functionality, but they also tend to operate within a larger context or larger workflow that's facilitated by, let's say, published, subscribed techniques or by event-based architectures where I don't really know too much about the pieces that are downstream from me, but you could argue that they are still part of an overall system.

This is a microservices phenomenon.

To a certain extent, I think it is, yeah.

Let's talk about state.

They say state's complex.

This is why sometimes you get told to reboot the machine when things are broken.

This is, in my own opinion, why web front-end is so friggin' hard because it's all state management and everything's async, so things get into weird state.

I think this is the most important part of web front-end, state, managing state.

They talk about the exponential rate at which a number of possible states can grow that's really hard to reason about, it's hard to hold in our brains, and they talk about the impact of state on informal reasoning and testing.

We believe that the major contributor to complexity in many systems is the handling of state and the burden this adds when trying to analyze and reason about the system.

I find this a compelling point.

I really agree with this.

State is hard.

I do as well.

I have a harder time articulating I think why exactly or how exactly I would go about addressing that, but it seems right and it seems hard to counter with anything useful.

Agreed.

So state, control, and code volume, but code volume is kind of more of a footnote compared to these other ones.

There are ways we can manage code volume if we organize the stuff really well.

Control, that's really a feature of the programming language, and that's where we get to their argument that your programming language could be the reason why you're dealing with so much complexity.

Control flow.

They make this argument ultimately if you didn't have to deal with control flow, you could just not think about it.

Maybe your programming language allows you to not worry about when expressions are getting evaluated.

That's a very Haskell way to do it.

With Haskell, you don't control the evaluation order.

It's a lazy language, and that's what the creators of the language wanted.

Indeed, the writers out of the tarpette are pretty much saying you should use a language like Haskell.

That's a hot take.

It was probably maybe less hot in this time than it was written, but I'm not sure.

They do give a couple of other sources of complexity.

They say if you have complexity, well, that breeds more complexity because when it's hard to understand a system, it's hard to simplify it, so it'll just get more complex.

They say simplicity, and this is a bit of a throwaway point in the paper, but I think it's really worth going into as a whole exploration, but they say it's hard to achieve simplicity.

They say also that power corrupts, that we should in fact have a lot more guardrails.

This is the argument that you see a lot in Rust versus C or C++.

It's hard to keep yourself from making use after free mistakes.

You should have guardrails that prevent you from that.

They're saying power corrupts.

You have ultimate power.

It makes your work a lot more complex.

That's hard to argue with also, although I do typically when people use something in, like a Haskell or Rust that is marked as unsafe, it's not because they are cowboys.

It's because they needed to be able to do that thing.

It's not clear to me that adding a guardrail to prevent unsafe things is necessarily a cure-all.

Yeah.

I mean, they give garbage collection as one of the examples of what you should have, but again, that's not going to be a cure-all.

That's not going to work for everybody.

What you end up saying to people who can't write with garbage collection, sorry.

Deal with it.

That's accidental, but it's accidental you are forced to contend with potentially.

Yeah.

I suppose garbage collection is a tricky case.

It's probably true that if you don't have to worry about memory management yourself, that makes things simpler, but my experience has been that you then have to worry about how the garbage collector works and how it's influencing your environment.

Some of these abstractions are leaky is what you're saying.

Well, I guess I'm saying that I don't necessarily think that they are, to use Brooks's term, a silver bullet.

Garbage collection is not something that makes everything easier.

It makes some things easier and some things harder, and maybe the net effect is simpler for the majority of programmers and the majority of applications, but I feel like there's always trade-offs with these things.

Yeah.

When you say that, that sounds to me like there is some essence of complexity out there beyond us.

It doesn't matter what programming language you're using.

It's going to be there.

That sounds fundamental, a shared thing, a reality of software looming.

It seems like it to me.

I think as we've grown a little bit more mature, I think we're more in line to be skeptical of the paper.

That's what I'm hearing from us, a little more skepticism.

They do talk about previous approaches to managing complexity before they get to what their recommendations are.

That's really part one, part two of this paper recommendations, part one, characterizing the problem.

Part two, I think I'm inclined to not dismiss out of hand, but just say, "You know what?

I don't think there's a lot for us to work with there."

We'll make that argument in a second.

In previous approaches to managing complexity, they're focused again on the programming language.

Let's just take at phase value programming languages involved in what makes software more complex.

Okay, I'm willing to accept that.

They say, "Well, there's three approaches we're going to characterize.

Object order in programming, functional programming, and logic programming like prologue."

They really end up arguing for functional programming.

They do say a thing that I still to this day find a lot of sympathy with.

Object order in programming, like the cardinal sin of object oriented programming is we're going to bind state and control.

I'll tell you the dirtiest Java functions methods I ever look at are those which take no arguments and return no values.

I hate those methods.

I hate it when people write that stuff.

Just look at it like, "What the hell does this thing do?"

Yes, just something to stir around the insides.

Exactly.

If it doesn't modify any state, then it's a no-up.

There's no point calling that method.

If it modifies state, I have no clue what it's doing.

Maybe it's got some weird long method name.

I think they're absolutely right.

With object oriented programming, you're marrying state and control frequently.

As a paradigm, you are making things harder on yourself when you do that.

I think that's absolutely right.

I think it's probably right.

It's maybe a case also that bolsters their power corrupts argument.

I think there are ways to manage that complexity that happens with object oriented programming if you follow certain guidelines.

I think people inevitably don't do that.

What you can do, people will do.

Yes, that's their power corrupts.

They make the argument about functional programming that you get in a pure functional programming language.

Not all.

Pure as in, like, Haskell is a pure functional programming language.

A pure functional programming language has a referential transparency.

I think it's worth commenting on the history of Haskell a little bit here.

They first started talking about it as a replacement for Miranda in 1989.

They didn't care about purity.

They wanted a lazy language.

They wanted a research language.

They could do their academic programming language research not to dismiss their interests.

That's what they wanted.

They wanted a lazy language.

But with a lazy language, you can't control evaluation order.

You can't know when a particular side effect is going to run.

A side effect like, "Hey, go and ask the user for some input.

Print to the screen."

There's a little bit of a problem.

In order to have a lazy language, they had to make it pure.

But then purity over time became thought of as a goal, not necessarily by the creators of Haskell, but people who are users of it.

Purity became thought of as a goal in and of itself.

Was not an original goal of Haskell.

It was really an aspect of trying to make a lazy language.

In this paper, they talk about referential transparency, which will give you definition of a second.

They say it greatly improves informal reasoning and control gets abstracted out of the organization of the code.

Referential transparency is there's nothing aside from the inputs to a function that will determine the outputs of the function.

The output of the function is completely determined by the body of the function and the inputs to the function.

It sounds like a non-statement, really.

But there's no random other thing the function can do that is not accounted for in its inputs and outputs.

And so you could always sort of immediately substitute the evaluation of the function on a set of inputs with the result for the invocation, sorry, with the set of inputs.

They write, "In a functional program, you can always tell exactly what will control the outcome of a procedure simply by looking at the argument supplied and where it is invoked.

In a stateful program, this property is completely destroyed.

You can never tell what will control the outcome.

Potentially have to look at every single piece of code in the entire system to determine this information."

I remember when I was doing Mohr-Haskill, there was a moment where I could look at a program I had written and immediately substitute the invocation like I would think about some function arguments and just substitute that result for what I was looking at in some other function.

Referential transparency powerful.

Did you ever get that feeling when you did Haskell?

Do you feel like, "Wow, this is simpler?"

In some cases, yes.

There are programs where this approach just makes so much more sense and feels so much more elegant.

But there are other cases where I was never really able to quite make these ideas work the way I wanted them to.

Can you characterize why that might be or what you were hitting?

Well, suppose that you've got a program where you're tracking a number of things over a series of time steps.

At each time step, those things interact with each other.

It might be a physics program that's tracking the trajectories of particles or it might be an artificial life simulation or you've got a bunch of artificial creatures that are interacting and morphing at each time step.

I could never get past the idea of, "As I get larger and larger numbers of these things, how do I deal with that functionally?"

I'm sure people do it, but I always struggled with it and it kind of defeated the simplicity for me.

Maybe that's a specific to Haskell, but they're saying you can think less about control and control may be a tool that we use to think about something like what happens in time.

First this line of code runs, then this line of code runs.

That's important because my code maybe models what's happening in order.

They also talk about state and Haskell, of course.

By default you have no mutable data.

You have to buy new variables.

The idea is that functional programming can help you with the complexity brought about by the mutation of state.

I don't want to take that one up.

That one's a tricky one, I think.

Before we get to their recommendations, they do try to redefine, again, accidental and essential complexity.

This is about almost the middle of the paper.

This is a long paper.

It's about 26 pages, about page 2122.

They say essential complexity is inherent in the essence of the problem, but they put in parentheses as seen by users.

Accidental complexity is everything else.

It's complexity with which the development team would not have to deal with in the ideal world.

For example, complexity arising from performance issues and from suboptimal language and infrastructure.

They write, "We hence see essential complexity as the complexity with which the team will have to be concerned even in the ideal world."

Note that the "have to" part of this observation is critical.

If there is any possible way that the team could produce a system that the user will consider correct without having to be concerned with a given type of complexity, then that complexity is not essential.

If there's any way to produce a system that the user considers correct and they don't know about that thing, then it's not essential.

Let's apply this to the Linux kernel scheduler again.

As a user of a Linux kernel scheduler, maybe I've checked it too far.

Maybe that's a terrible example.

What am I concerned with directly as a user of a Linux kernel scheduler?

I think the tricky part there is identifying who the user is as an end user of Linux.

You just want to be able to multitask.

You want to be able to run your editor and your video processor and your Bitcoin miner all at the same time without having to wait on one or the other.

That's the essential end result of a scheduler.

I don't know that that is what you would mean by user in this particular case because I don't think most users, very few people using Linux are going to have an idea of how they would want the scheduler to work other than, I guess, there is an idea that you want your programs to have access to the CPUs when they need CPU.

I think performance, which they kind of hand wave in either way.

Maybe it's essential sometimes.

Scheduler sounds like a place where performance is essential because if it's going too slowly, you would think this is not working.

This is broken.

As an end user, whether I'm a person who writes a program to be executed on Linux or a person, like you said, who's just running some programs, maybe I'm running my browser and a Slack and a Bitcoin miner, if these things grind to a halt or they're making so little progress, it looks to me like my system is broken.

That's essential.

Yeah, I think performance issues, it's very hard for me to dismiss them in the way that they appear to do so in this paper.

I'm not sure, for instance, how they would deal with, how they would consider, you know, suppose you have something that's pulling tasks from a distributed task queue and you start to get so many tasks that you have to scale up the number of replicas.

Clearly the end user doesn't care about that.

The user may not understand that idea, but I don't see how you don't do that.

I mean, there may be other ways to do it.

There may be alternatives, but at some point, your system needs to deal with that volume of tasks.

I want to go back to their disagreement with Brooks because as I'm reading this, I kind of lean more in the Brooksian direction.

Brooks and others, such as Butch agree, they write that the complexity of software is an essential property, not an accidental one.

This would suggest that the majority, at least, of the complexity we find in contemporary large systems is of the essential type.

We disagree, they write.

Complexity itself is not an inherent or essential property of software.

It's perfectly possible to write software, which is simple and yet is still software.

And further, much complexity that we do see in existing software is not essential, in parentheses, to the problem.

When it comes to accidental and essential complexity, we firmly believe that the former exists and that the goal of software engineering must be both to eliminate as much of it as possible and to assist with the latter.

Here's a hot take of my own.

Brooks wrote No Silver Bull in 1986. 20 years later, 2006, this paper comes out.

They're saying, "Eh, most complexity is accidental."

Here we are 20 years after this paper.

And we're looking at Brooks's paper like, "Wait a minute, all these problems he's talking about, we really haven't solved them."

We still suck at this stuff as an industry, as a field, as a group of people, as software developers.

That almost suggests to me, now 40 years, obviously, not a long timeline.

But 40 years of sucking in the face of complexity suggests to me that maybe there's something, some hard kernel in there.

Maybe there is something essentially hard about what we're trying to do when we take human requests and encode them in programming languages and run them on computers.

Maybe there really is something essentially difficult there.

It's maybe also an argument in favor of the idea that system size does cause complexity in contrast to what they say in this paper.

I think Brooks makes the argument that it's not only that the code gets more complicated as the system gets bigger, but the communication between people working on it gets more complicated.

So there are organizational issues of developing software and keeping things consistent and making the architecture be consistent and not having people work around and develop their own way to do things.

Because it seems to me like the one thing that's really different about 40 years ago is that the systems that we are working on are sort of innately just larger.

Even if you are doing something relatively simple, you're starting with such a large underpinning of frameworks and you've got things on networks that you didn't have in 1986.

So it just seems like even relatively basic CRUD applications are considerably more complicated than they were 40 years ago or even 20 years ago.

And there's layers of complexity.

We took the things that in 1986 and 1996 and 2006 were complex and we just now use them as foundations to build ever more complex things on top of web servers, messaging systems, queuing systems.

We build ever more complicated things on top of these complicated things that people have built.

It's not like complexity is going down.

If anything, it's a hockey stick going up. 15 generations of JavaScript frameworks.

So they talk about how in an ideal world you can translate informal requirements to formal requirements, there's no relevant ambiguity as the phrase from them.

That one's a little tricky.

But inspired by the artificial intelligence conversation, it does seem like a significant aspect of the complexity in what we do is translating a request into a working program.

There's a blobby human mind thing and then a physical mechanical thing that we're turning that into.

If that's part of there, can I take informal requirements, turn them into formal requirements?

There's a little bit of handwaviness in here.

They start talking about state and data.

What state is essential?

Well, what data is essential?

And they pretty much say something I find relatively easy to agree with.

Anything the user gives you, you got to keep that probably.

Right?

Anything you can recompute, non-essential.

You can just recompute it.

Accidental.

All right.

That's okay.

They're saying avoid mutable state and avoid any data which has accidental complexity, which is anything derivable.

All right.

So then the caching comes in here.

Well, is it essential when I go to use my bank's website that it responds quickly enough?

And perhaps caching is a requirement for them to achieve that.

Is that essential?

They're kind of saying no.

I'm willing to concede that caching is not essential, but I am not willing to concede that we can therefore eliminate it from all systems.

I don't think they necessarily argue in favor of that.

I think they just say that you have to sort of shove that into the accidental complexity box.

They ultimately say here's our recommendation with complexity.

Avoid it and separate it out completely by that.

They write this on page 31.

I'm totally with you.

We can identify aspects of our code base which are involved in accidental complexity and say, well, maybe we can improve this someday.

But for right now, we're going to put it over here.

We're going to draw a boundary around it.

And we're going to say kind of as a mea culpa, sorry, couldn't make this any better for now.

I think that would help us in understanding our systems.

But then ultimately, they get to the second part of the paper, which I really, in hindsight, I don't think has been very influential.

As I'm reading it, I'm reading it like, okay, this I guess I could work.

I've never seen an implementation of this and is this really going to help us?

So they spent a lot of time characterizing the problem.

And I find things to agree with in their characterization, but also things to critique.

And then their suggestion, I'm like, eh, really?

That's what we're going to do.

So they're like, we are inspired by functional programming.

We are inspired by the relational model, like sequence calculus, like SQL for data.

So we're going to do this combination of functional relational programming.

It'll be like we will treat modifications of data in this special way in a declarative way.

And then everything else we'll use functional programming for.

So it's a little engineering-ish kind of take.

Here's things that work out there that help me.

I'll just smash them together and make a new thing.

And that'll be great, right?

I remember thinking, even at the time I was reading this when I first encountered it, which was, I don't know, 10 years ago, 15 years ago, that I was really excited about the first part of the paper.

And then I got to their proposed solution and I think my immediate reaction was, I'm not going to do that.

Yeah, I think that's what everybody thinks.

I think that's the take.

It's like we find things to sympathize with in the first part of the paper.

State is problematic.

Control, it's hard.

It's hard for us to reason about stuff.

Large systems, hard to reason about.

I get it.

Your programming language is the problem.

That's right.

I have a favorite programming language, like every other programmer.

And when I read the paper, it's telling me, use your favorite programming language.

And everybody else should, too.

It's kind of preaching to the choir, right?

Then they get to this part and they're like, well, we should do FRP, functional relational programming, where all the state takes the form of relations and the essential logic is expressed using relational algebra, extended with pure user-defined functions and my take is just like yours.

Yeah, I'm only aware of a couple of people even attempting this.

I think we were discussing this earlier.

There's maybe some languages that people developed around this idea.

I've heard of people talking about mini-canron in relationship to this paradigm.

But as you said, it does not seem to have been influential at all.

I found one example in Rich Hickey's history of closure, which he published in 2020.

He talks about how when he worked on the atomic database in 2012, he writes, "In 2010, I started work on a database released in 2012 as 'detomic' databases are giant, mutable, global variable at the heart of any system."

And it is interesting throughout the paper.

He's like earlier in the paper about closure.

He says he meets with these programming language researchers and they're scoffing at the idea that you would need a database.

They've never written programs that have used databases.

And he's horrified.

He's like, "This is 2005."

He's like, "Or three.

I've never written a program that did not have a database as a core part of it."

What do you guys are out of your minds?

He's never really written stuff that people use.

So he's saying no amount of programming in functional languages is going to make a system with an update in place database, much easier to reason about.

My idea was to build a functional database that could present to the developer the entire database at any point in time as an immutable value.

It would never update in place, nor have any notion of place.

The data model would be akin to RDF entity attribute value with the addition of time.

This was the final piece needed to get us out of the tar pit, he writes, in the history of closure.

So this is 'detomic'.

I've never used 'detomic'.

Apparently they have a company hosting it for people and it's been successful.

I haven't heard of anybody else inspired by the second part of this paper and saying, "Yes, let's get out of the tar pit.

Let's build functional relational programming languages."

I think it's extremely hard to introduce a new paradigm to the programming world.

Even functional programming, which had its moment in the 20 teens, has not really taken off in as large a way as we might have expected it to.

I think people, the industry as a whole, still prefers languages that allow them to get things started and completed quickly, even if the end result is more complexity.

Maybe we're sacrificing complexity for speed.

So here I am, 10 years later, more skeptical of the claim that the programming languages involved in accidental complexity to the degree which they claim it is.

First read this paper 10 years ago, 2014, and I thought, "Wow, if I wrote Haskell at work, I could eliminate classes of bugs."

But I was never that much of a true believer.

In the Haskell community, I would see people railing about how virtually all of the bugs could be eliminated if you could correctly encode programs in the type system for Haskell.

You can have every bit of intentionality stated in your types for a program.

I remain skeptical of that.

I was skeptical of it at the time, and I remain skeptical of it.

There were moments where at work, I would hit a bug in a programming language like Python.

I would in my head be like, "Would I have hit this bug in Haskell?"

There was either a yes checkbox in my mind or a no.

I feel like the noes, they happened, but it wasn't like a 90% of the time.

It was more like a 50/50 kind of, "Would I have hit this in?"

Now the programming language is rust.

Would I have hit this in rust?

It's no maybe 50% of the time.

No, that's a pretty good hit rate.

I should probably use a language if I can reduce 50% of those bugs.

I should, but it's not 90% or 100%, which I feel like they're arguing in this paper.

Most of the bugs I hit in a language like Python that I wouldn't hit in rust or Haskell is I'm updating my dependencies.

I don't know what the new version of the library is going to do.

That's not, I've got this massive code base and someone in a module, 100 tree branches away from the module I'm working on wants to invoke my function incorrectly.

That's not normally the type of problem I'm hitting.

I've become a bigger and bigger believer in static types over the years, but I find that most of the issues that I run into with bad type specifications happen when I'm either, so like if you're using Python, there's no compilation, so it's easier to write something that's wrong and you don't discover it until you either run your tests or deploy it.

That's bad, but it's not that frequent and it's not pervasive.

I think that as you were saying, the issues that I see more frequently are I'm doing the thing as designed, it's just not the thing that people want it to do.

That's what I feel like I'm correcting more of the time.

Those bugs I think are more pernicious.

Sure, like I wrote a runtime error and it's catastrophic potentially for the application.

Sure, we have testing and things, we can eliminate those.

It might crop up.

Those other bugs, the program doesn't do what it was supposed to do because I misinterpreted it or I wrote it wrong.

Those were not going to be solved by static types or functional programming, are they?

I'll give you another example.

You mentioned time.

I don't think we can represent time cleanly in types.

Yeah, I don't have a good response to that.

I would have to think about what things like TLA+ can do for us.

Yeah, I think you're right.

I asked Lindsey Cooper about this.

I said, "When I tried to build raft in Haskell as opposed to my raft in Python, I found that I was encoding all the easy stuff in types in Haskell."

My original idea was, "Oh, if I write this in Haskell, I can abstract a lot of this logic and then not think about it and I will be able to think about less.

I'll be able to verify using informal reasoning fewer components aspects of my program."

But after I had gone far enough in my experiment, I realized, "Geez, all the easy stuff in raft, like is it a leader or a follower or a candidate?

I've encoded those things in Haskell."

These are the parts of my Python program I struggled with the least.

The stuff I struggled with the most is all of this control flow stuff about what's happening in time.

I need to mark an entry as completed.

There's different sort of statuses for what that means.

I've told my followers about it.

A quorum of followers has responded and said they accepted that entry.

I need to index this as a finished thing.

That stuff happens in time.

I couldn't figure out how to encode that in types.

Lindsay Cooper said to me, "Well, you're saying that it's easier for you to encode safety properties of your system in types than liveness properties are extremely difficult to encode in types."

This is a known thing for us, that there are essential complexities.

She didn't use that term, but that's how I think of it now.

To close this out, I wanted to bring up some maybe some counterpoints that I think I've been referring to this paper a few times about Rich Hickey's history of closure.

We talked a little bit about static typing.

Hickey says in history of closure, he says, "The benefit of static typing is minimal in the face of, quote, the overwhelming complexity inherent in imperative, stateful programming."

He's saying it's imperative, stateful programming.

He's agreeing without a tarpid.

A lot of the complexity is from that.

He says with Lisp, it was a revelation.

"I had the flexibility to use exactly as much language as was needed for the problem," which I thought was a subtle, there's an idea working in there that subtle.

When Rich Hickey felt like he discovered the right language, he felt like he could write a lot less code and still solve his problems.

This is actually the most beautiful part I thought of Haskell.

When I really liked using Haskell, I was using it.

I thought, "Wow, I can write extremely interesting solutions to my problems with a very small amount of code."

But sometimes the tragedy of that was I would be using these operators and people who would look at it.

It was totally inscrutable to them.

It's a tiny amount of code based on pretty powerful abstractions like function composition and Haskell, just a dot.

Run this after this.

Curry the function arguments.

It's brilliant, very dense idea if you're unfamiliar with it.

"Did you ever feel like you got to a language where you could use a programming language exactly as much language as you needed for the problem as opposed to a language where you had to use a lot more code to achieve the same?

Have you ever experienced that feeling?"

Not exactly.

I think there are cases where I've encountered languages that have a lot more than is necessarily required to solve the sorts of problems that I want to require.

I end up doing them in a particular way.

Then a couple of years later, when I become more conversant with that language, I realized that was the totally wrong way to do it.

Oh, initially I was going to ask you, "Do you want to name and name those languages?"

But you're not saying that's a fundamental property of the language.

You're saying that was a fundamental property of you being inexperienced in the language.

Yeah, and I think maybe this is an aspect of complexity that gets overlooked here a little bit, which is that especially when systems get big enough, the informal reasoning about the system that they talk about here stops being the responsibility of a single individual.

You need to be able to talk about what's happening and how it's happening with other people who might be working on other parts of the system.

I think that's where it gets difficult to introduce new programming languages and new programming paradigms because ultimately you have to communicate to other people what's going on.

Those other people may be less willing to abandon their hard-foot knowledge and pick up a new paradigm.

That's a fascinating point that you made that's not present in this paper at all.

Part of the complexity of programs is our ability to communicate it, to share it with others who are also contributing to that program.

That reminds me of Fred Brooks' line that you said, one of your favorite lines, "The reality of software is not inherently embedded in space.

How do I communicate what this program is doing?

I need different tools to do that."

It's almost as if that in itself is a measure of complexity.

The communicability of it.

Yeah, the ability to describe it and get the idea into other human beings' brains.

I have one final quote from History of Closure.

The reason why I think History of Closure is a useful comparison point for Out of the Tar Pit here is Hickey obviously read out of the Tar Pit.

Obviously it was partially inspired by it and obviously ascribes to some of the ideas in it.

You can see that in some of these quotes.

He writes, "It was a design objective of closure that there not be much cognitive load from the language itself."

I really like this sentence.

Want it to be that the language itself does not produce much cognitive load.

It's like a pane of glass that you look through.

I get the sense that this is what they want in Out of the Tar Pit.

We could dismiss their suggestion, the recommendation of what to do, but I think that's what they want.

They want the language to be transparent so you can immediately identify accidental and essential complexity.

That seems reasonable, but I'm not entirely sure that means the same thing to every person.

What do you mean by that?

I remember reading many years ago one of these Paul Graham essays where he was talking about why Python had become so popular.

I made the point that people just liked Python because they could hack stuff out with it.

I think that maybe some people think of when they think of the language not getting in their way is you learn a few basics and you can pretty much assemble things.

You don't have to memorize a lot of symbols and keywords and different structures for similar things.

I don't know.

I agree with the sentiment, but I'm not sure that it means the same thing to everyone.

I want to call this out as the programming language as a tool metaphor.

What if I said to you, and apologies, this is totally a Heidegger example.

It is the design objective of this hammer that there would not be much cognitive load from using it to hammer nails.

A language is a tool and increasing familiarity with the tool gives us to a point where we don't have to actually spend a lot of mental effort to use the tool.

You give Python an example.

It's true for a program.

It's true.

I can write gobs of Python without even thinking much about it.

I just type and the stuff just comes out.

Maybe this is a neutered analogy.

There is some creativity in writing Python.

Is there creativity in using a hammer?

Where do I put the nails?

What pieces of wood am I nailing together?

Where am I going to nail them together?

Where on the wood?

Maybe it's a toolbox.

I'm going to build a cabinet.

There's creativity in that.

But maybe a language is just a tool and it becomes transparent based on our familiarity with it.

In other words, somebody extremely adept at writing C, they're not going to see the language as presenting complexity and or making their code less simple.

What do you think?

I think that's a good take.

I also think the hammering sample is apt in a way that maybe you didn't mean it to be.

So suppose that you want to pound a stake into the ground and you've never seen a hammer.

Probably the thing with the least cognitive load is the nearest rock.

I may have conjured this example from too many camping trips or something like that.

So a hammer in a sense is a rock on the end of a stick.

It's a better tool that you've discovered gives you more leverage for pounding things into the ground.

But it's not the most cognitively obvious thing for pounding that stake into the ground necessarily.

Yeah.

I think people like languages that make their jobs easier, whether or not that is transparent as you say.

I don't know.

Or is it ever going to be objective?

Probably not.

It's probably subjective and it probably has a lot to do with familiarity.

In the Haskell world, people always said the reason you find Haskell so hard to learn for people who complain that it's hard to learn.

And I, for one, agree that it's hard to learn.

And I never said this, but I saw people say this.

You find it hard to learn because you first learned an imperative language.

If you learned this one first, it would make loads of sense.

Unfortunately, the sample size of those people is pretty small and I have no idea if that's even true.

Maybe thinking in a lazy way, and I might mean lazy in terms of lazy evaluation, nothing else.

Maybe thinking in a Haskell way where I don't determine what runs in what order.

I just determine what functions are calling other functions.

Maybe that's fundamentally hard for humans.

Maybe we, or it's easier for us to think in terms of time.

I don't know.

I think my overall takeaway for out of the tar pit right now, 2024, 10 years after I first read it, is they are right in talking about sources of complexity.

State, control flow, trying to reason about my programs.

These are hard things.

As a code gets larger, yes, it gets harder.

Complexity does get in the way.

I would like to eliminate complexity.

I'd like to know what's accidental and draw boundaries around it and reduce it.

I'd like to know what's essential and draw boundaries around that.

I'd like to be very careful about mutable data.

I find myself hugely skeptical of their claim that programming language paradigms are going to make much of a dent in accidental complexity.

Ultimately, I find myself more skeptical than I did before in that their disagreement with Fred Brooks is right.

I think maybe these things are more essential than they will allow.

Because you're thinking about this paper evolved over time.

A little bit.

I think even at the time I wrote it, I was not super enthusiastic about their solution to the problem.

I think probably when I first read it, I was more enthusiastic about their description of the problem.

I guess the ultimate test is that it's 2023 and we're all pretty still much in the tar pit.

Yeah.

I think that's exactly right.

Well, we didn't answer and I think it's worth asking eventually is your question, will the computers who replace us, the artificial intelligent machines who come replace us, will they care about any of this stuff?

And if they don't care about it, maybe that means that complexity is not essential, that it's all just a limitation of the human mind.

I think we'll save that question for the future.

You want to weigh in before we go?

Well, I would just say that that probably comports well with my current beliefs, but need a lot more details to justify myself.

And we're out of time.

This has been picture me coding with Eric Aker and Mike Ball.

Thanks for hanging out with us.

Mike, as always, it's been a pleasure.

We'll see you next week.

See you next time.

Bye-bye.

Bye.

Bye.

Bye. (piano music) [BLANK_AUDIO]