The Nine Fallacies of Distributed Computing Artwork

Picture Me Coding

Picture Me Coding is a music podcast about software. Each week your hosts Erik Aker and Mike Mull take on topics in the software world and they are sometimes joined by guests from other fields who arrive with their own burning questions about technology.

Email us at: podcast@picturemecoding.com
Patreon: https://patreon.com/PictureMeCoding

You can also pick up a Picture Me Coding shirt, mug, or stickers at our Threadless shop: https://picturemecoding.threadless.com/designs

Logo and artwork by Jon Whitmire - https://www.whitmirejon.com/

All Episodes

Picture Me Coding

The Nine Fallacies of Distributed Computing

July 17, 2024 • Erik Aker and Mike Mull • Season 2 • Episode 38

One day Erik decided to foolishly not worry about the 9 fallacies of distributed computing. Surprisingly, Mike seemed to indicate that was fine to do! These guys are pretty irresponsible! Listen along and see for yourself if they're making a terrible mistake.

Links

Deutsch’s Fallacies 10 years later
2021 Software Engineering Radio podcast episode with L. Peter Deutsch
Google SRE Book Chapter 2 "Embracing Risk"

Send us a text

[MUSIC] >> Hello, welcome to Picture Me Coding with Erik Aker and Mike Mull.

Mike, welcome to your show.

>> Oh, thanks.

Is it my show now?

>> It's your show.

>> So we're going to call it Mike's Picture Me Coding with Erik from now.

>> Mike Mull Picture Me Coding with Erik Aker as co-host, who introduces the two hosts every week.

>> Cool.

>> What have you been listening to this week, Mike?

>> So I have to preface this a little bit.

So one of my favorite bands is Guided by Voices, but I'm always reluctant to recommend them to people.

>> Oh, yes.

You look like a total weirdo when you do that. Here's 40 songs, half of which you may not be interested in.

>> Yeah.

They featured prominently in that show the IT crowds.

I think they have a nerd reputation anyway.

>> I didn't know that.

Actually, here's a Guided by Voices question for you.

Let's say I'm a new Guided by Voices person, listener, and I'm like, "Hey, Mike, you like Guided by Voices, one of your favorite bands."

I want to know which album to look up to listen to them.

They probably don't have very many, right?

>> This is exactly the point I was going to try to make here, is that the reason why it's hard to recommend them is because, if you just discover a new band, you're probably going to do something like, "I'll go listen to their most recent album," or "I'll sample across their discography."

So Guided by Voices, they now have 40, 41 albums.

>> They're like the Stephen King of Indie Rock.

>> Yeah.

That's a very good analogy also in the sense that, "Are you talking about horror Stephen King, or are you talking about fantasy Stephen King, or are you talking about Shawshank Redemption Stephen King?"

Anyway, so they made a lot of albums.

The favorite ones I like are albums from the late '90s and early 2000s.

They're hard to recommend because you have to point them to very specific things and you have to warn people that this is a classic indie album, but it also has 40 songs and some of them you're probably going to hate.

The point being, there's a band, a UK band called the Bevis Frond, which is similar in certain ways.

In fact, I think they probably predate Guided by Voices.

They're also similar in that it's really just one guy.

Guided by Voices was a band at the beginning, but for the last 20 years it's been mostly Robert Pollock.

This is a band that's hit and miss for me.

The early music by the Bevis Frond was psychedelic.

It's good, but it's got this 1960s psychedelic vibe to it.

Anyway, it came out with a new album called Focus on Nature.

I believe I read it's their 26th album.

Wow.

That is a very Guided by Voices type of thing to hear.

Yeah.

Anyway, it's really good album.

It's 76 minutes long or something like that.

It's like a double album or something.

So it's maybe probably not every song on here is going to hit for you.

They're not all bangers.

It's called Focus on Nature, right?

Focus on Nature, that's right.

Yeah, but I like a lot of it.

It's basically conventional rock and roll with a little bit of a psychedelic thrown in, kind of a catchy lyrics and ear worms and good hooks and stuff.

So I've been listening to it all week and I really like it.

I should pivot to that.

I'm listening to the Anderson Pack record that just came out with Knowledge spelled with an X.

They have this group that they call NX Worries.

Actually, I don't know if it just came out.

I hadn't heard of it.

I think it came out this year, but it might have been last year.

And I keep wanting another Malibu from Anderson Paak.

Don't we all, yes.

Yeah.

It's OK.

There's some good stuff on there.

There's some fun, easy to listen to stuff.

I think I'm going to pivot to what you got, though.

Not a great review for me, I guess.

I kind of like this thing.

I wanted to talk to you this week about the "Nine Fallacies of Distributed Computing."

This is a known list that came out of sun in the '90s.

This came up for me because over the last couple weeks, someone asked me to give a talk about distributed systems at work.

And I ended up not giving the talk.

And my first thought was, well, what do the assumptions people have about distributed systems that are often incorrect?

And I remembered this list.

I was like, oh, yeah, there's that list of fallacies that I've encountered over the years.

And so I looked it up.

And I had this kind of funny reaction.

As I was looking it up, I was like, I got to talk to Mike about this because I've never had this response before.

But as I read these, I feel like a lot of these just don't quite apply to me, which is probably-- you're going to probably tell me, you're crazy.

You are falling into the fallacy trap.

Have you heard of the nine fallacies of distributed computing before I reminded you of them this week?

I have, yes.

They're pretty well known, right?

They've been passed around for many years, I thought.

Yeah, not maybe as well known as the solid principles and stuff, but definitely one of those things that shows up frequently on news groups and discussion boards and stuff.

Well, I'm curious if for you they land differently as they did for me.

But let me read them first.

Let me read them first.

So these are nine fallacies.

I'm going to read them.

The first one is, the network is reliable.

The second one, latency is zero.

The third one, bandwidth is infinite.

The fourth one, the network is secure.

This is all networking stuff.

So we started with fallacies of distributed computing, and there's a lot of network already.

OK, so now it's five.

Topology doesn't change.

Number six, there is one administrator.

Number seven, transport cost is zero.

Number eight, the network is homogeneous.

And number nine, the party you are communicating with is trustworthy.

Now, what I've heard is that number nine was added just recently in the last few years.

I don't remember that being there before.

Got response when you hear these.

Initial first blush response.

So I think I have been around for so long that I look at a lot of these and I think, duh.

It's-- so I think these were mostly formulated in the 1990s, let's say.

Yeah, that's right.

That's according to the history of them.

I read and listened to some interviews with Al Peter Deutsch, who is credited with putting these together, assembling them as a list.

And it's mostly from the sun in the '90s.

Yeah, and so by the time-- by the time I think people were thinking about these, networks had matured to a certain extent.

And especially at sun, you probably had quite a bit of access to internal networks.

So when I started out, people still called the internet ARPANET.

I worked at one of the national supercomputer centers and we had a literal satellite dish on our roof that communicated with facilities in Hawaii or something like that.

Some of these, like the network is reliable and latency is zero and topology doesn't change.

It's just like, well, yes, I know that.

Of course, I'm kind of surprised that people don't naturally assume these things.

But I think that it-- I think it is to a certain extent because I started out in this world where the network wasn't really what people think of as the network today.

I think it hits me a little different.

I read this list and I think this sounds very infrastructure focused.

Where we at in the OSI model that's throughout this thing?

Are we really all at one layer of the stack?

And is that layer a layer that I like to pretend almost doesn't even exist?

Yeah, that's fair too.

I mean, some of these-- I mean, we talked about this offline.

But if you're talking about an application that needs to operate across the internet, the public internet, that's one thing.

If you're talking about something that is two machines connected by a switch, that's another thing.

So I don't know exactly if they're assuming this is internet scale software or if they're assuming this for internal networks and so forth.

Yeah.

So there was this article which is called 10 Years After the Fallacies.

And in that article, they interviewed Deutsch and they talked about what he had seen at the time in the 90s at Sun.

They say, here's a line from the article.

"What Deutsch saw with fresh eyes was that as engineers inside and outside Sun designed and built network infrastructure, they kept making the same mistakes, based largely on the same basic yet false assumptions about the nature of the network."

So as they're building network infrastructure-- so these fallacies are for network engineers.

Here's my question for you.

Does this apply to me as an application developer?

Do any of the nine?

We don't have to go specifically into them, but what's your gut response to that?

Yeah, I think at least potentially some of them do.

We've talked about it before, but there's a certain aspect of distributed computing where it's not like making function calls.

There was another paper that came out of Sun in the early 90s.

I think it was called something like a no-done distributed computing or something like that, where they talk about how the idea of distributed objects is problematic because of some of these assumptions here.

You can't assume that if you're treating your distributed object call as just another function call, you can't always assume it's going to work.

You can't always assume that it's going to return to you and return data to you in a timely manner.

So I think you do have to think about some of these issues as just a developer of distributed applications.

You mentioned that Sun paper, and I remember reading that with you.

And I remember thinking, wow, people really thought you were going to have distributed objects in memory.

That's crazy.

Why did they ever think that that was possible?

It was just like complete naivetivity from not being there on the ground when this stuff was happening.

Now, here's a funny aside.

I want to look at some of these in detail, but I had a funny-- I noticed this quote, and it made me realize there's something kind of funny about these.

El Peter Deutch, who is credited with assembling this list, he had put together like a presentation internally in Sun.

And as part of the presentation, he had this list of fallacies.

It didn't have the ninth.

He added the ninth later.

And the first four he didn't come up with.

He just sort of cobbled together the first four, which had been known things at Sun with these others that they had observed.

And he said in this article where he's interviewed, if you had told me when I was at Sun that 10 years later, this one page of Maxims was going to be the one of the things I was best known for, I would have been floored.

Can you imagine, Mike, if we said to you, hey, could you put together a PowerPoint on some arbitrary topic, and then for 30 years, people are coming up to you on the street and saying, Mike, that one slide in that PowerPoint changed my life, bro.

Like, you had so much influence on my career.

With that incredible PowerPoint slide.

Yeah, but the closest I've got is a talk I gave at a Piedata conference one time that I've had people reference.

But yeah, it's crazy that you had become famous for something that was probably like a talk over lunch or something.

So these things have lasting power.

I think there's an aura and a feeling of concreteness when somebody says it's like the listicle.

Here are nine things to remember.

And you go, wow, there's nine of them.

Yes, there's 10 commandments.

Like, there's a number.

And if I just keep those in mind, I'm going to be pretty good.

Yeah, there's something appealing.

I think there's two appealing things about it.

One is it puts a limit on it.

So it's like the nine fallacies.

So if you learn these, you're set.

You're good.

And the other thing is it says the nine fallacies.

So it's not like-- it's sort of a weighty word.

It doesn't say nine things that you might want to consider if you're doing application development or messing with networks.

It's giving this sort of stamp of approval to these ideas that these are things that are definitely a problem with distributed computing.

And you need to know these.

I would almost argue that's a false sense of confidence that we get from these types of things.

OK, so I want to talk first about-- I want to seize in on what I call the crusty network concerns.

This is like I've got cables.

I've got switches.

I've got routers.

There are people in the organization who know what that crap does.

And when it fails, their pagers go off.

These are fallacies five through eight.

These are what are most foreign sounding to my ears.

Topology doesn't change.

There is one administrator.

Transport cost is zero.

The network is homogeneous.

And apparently the last one added by James Gosling, the creator of-- James Gosling, the creator of Java.

Topology doesn't change.

One administrator.

Transport costs zero.

Network homogeneous.

What the hell are we talking about?

Administrators, topology, homogeneity.

Can you help me?

I'm not sure.

To be perfectly honest.

Let's just talk about topology.

That's a term from graphs, right?

So we've got nodes and edges.

Yeah, so this one confused me a little.

So I mean, obviously they're talking about the way things are connected together.

But very raw network infrastructure connected together, right?

We've got signals topping from server to server in a particular way.

That's what we're talking about, right?

Yes, but again, this is-- I don't know if they mean this on different scales.

So one possibility on topology changing is that the network engineer at your company decides that some machine should now be on a network that is-- a network segment that is isolated from the network segment that your application is on, and suddenly you can't reach the server-- a server that you needed to-- a database or something.

And so that's topology at a very local level.

And I suppose you do need to consider that those things could change.

But I don't know why that seems like a very precise subset of you might not be able to reach the machine.

Yeah, it's like a highly specific concern.

Let's say you've got a company.

You've got three buildings.

All those buildings have networks.

And one building is recently purchased.

So you've got your network engineer.

They go and set up the beautiful network at the new building with all the cool techniques they've learned over the last decade.

And the old buildings are on this crusty crappy network.

And you want to talk from new building where everything's great and fast to old building.

And you might run into these problems, right?

Topology is different.

Do I have to care about that as an application developer?

Yeah, that's the other possibility here that I wasn't totally sure about.

So one reading of this is somebody literally changes the way things are connected on your network.

The other reading of this is depending on where the client is for your server-based application.

They might be going through the local area network.

They might be going building to building.

They might be going through a mobile network.

And so in a particular use of your application, the network topology from the client to the server might be totally different.

Maybe that's what they mean.

I don't know.

But again, it seems like, sure, you presumably know that when you start building the application that there's going to be some scenarios where the client has very high-speed access to your application and some cases where it doesn't.

Because you presumably designed it to do that.

So there's a funny difference in between how we're approaching this stuff.

You approach it like, this is obvious.

I approach it like, that doesn't matter to me.

Yeah.

Well, maybe it's not obvious.

Maybe it's entirely possible.

I'm just misreading what they mean.

It's just my reading of it makes it seem like, yes, of course, I would have probably planned for that from the beginning.

Well, what about one administrator?

That, it seems like, the most remote thing for me to even care about.

Like, yeah, if I send a request across the internet, it's going to bounce through a bunch of machines.

And it'll bounce through networks, maybe, that are owned by people who are hostile.

It reminds me, when I studied German, that my professor would talk a lot about the Knights die Hütte in Germany in the Middle Ages.

And they would hang out on rivers and build castles and attack people going down the river and be like, you got to pay me if you want to pass by my castle.

So they were actually jerks.

They weren't these chivalrous noble people at this point in time.

We could call those guys like network administrators, right?

You going to go through my network?

You going to play by my rules, Mike?

This one, also, I have trouble interpreting.

Because again, it seems like there's at least a couple of scenarios.

One is that, like you say, if you're going across the internet, then there's your local network.

There's some ISP that's connecting you to the larger internet.

There's various networks that the packets go through on their way to the other end.

And then there's another local area network on the client end.

And there's probably administrators at various points across there.

So yes, there's absolutely-- there's more than one administrator.

But what am I going to do about that?

I think in 30 years, it's a thing that has rarely come up as like, oh, you got to avoid that network because that administrator is a total dick.

I suppose another reading of this is that in your local network, there are multiple administrators.

So if you've gone to a network guy at your company and said, hey, my application-- for my application, I need you to book a hole in this firewall so I can get out on this port or whatever.

And you're relying on that.

And then some other guy updates the firewall rules without knowledge of that and breaks your application.

OK, all right.

That's fair.

I mean, we deploy stuff in the cloud a lot.

And so these concerns often get abstracted behind some cloud interface and maybe DevOps or infrastructure team.

And that's a fair point.

They might mess stuff up sometimes for you that you're relying on.

I can't think of a particular example.

But even like if you're doing cloud stuff, you might have to request a specific egress for an application.

Maybe somebody sets that up and somebody else shuts it down.

So I don't know.

So OK, boil these down to a Pithy one line statement that I should care about as an application developer.

What is the Pithy one line statement?

Hey, man, when you send requests over the internet, they're going over physical things.

Is that good enough?

Yeah, or I guess another way you could put it is your distributed application is distributed.

It's not doing inter-process communication on the same machine.

It's not doing a fetch from memory.

It's not doing something that you can rely on working a very high percentage of time.

I don't buy that that's in these ones about topology administrator, transport cost, well, maybe transport cost, homogeneity.

But actually, I think what you just said is hidden in the other ones.

But I want to talk about transport cost is zero because I listened to this interview with Alpeter Deutsch on Software Engineering Radio, another podcast.

And when they got to this one, transport cost is zero, he said, I don't think this is very relevant.

This is a quote from him.

He said, I'm trying to remember what was going through my mind when I added that to the list.

For users today, transport cost is usually zero.

And he said, actually, the thing you pay for usually is a user is bandwidth.

OK, so transport cost.

I don't know.

Transport cost is zero.

And maybe he's saying it's really a bandwidth cost.

There's a funny coincidence that popped in my head here.

Definitely not what they're talking about in the '90s at Sun.

But a gotcha, a thing that bites people with cloud all the time, is, hey, if you want to spin up a bunch of AWS stuff and load all your data up into AWS's and S3 and put it on virtual machines or whatever, you know what?

It's free to put stuff in their network.

But it's going to cost you if you want to send stuff out of their network.

You know what?

That's a little bit asymmetrical.

It's free to send stuff in, but it costs money to send stuff out.

Why would that be the case?

Yeah, it's not exactly the same thing.

But I remember reading recently that there are still certain cases where if you need to send somebody a very large volume of data, it's still easier to put it on physical media and snail mail it to them.

But yeah, I was not entirely sure if this meant actual dollar cost or if there was something more abstract about this one.

And I'm also not sure if I understand the difference between transport cost and bandwidth cost.

I don't.

Yeah, I'm glad you said that, because that's where I feel a little guilty.

I'm kind of conflating transport and bandwidth here.

So I come to you looking for absolution in my sin of, hey, Mike, I read these mystical fallacies from the '90s, and I want to throw away these four of them.

And you're kind of patting me on the back and saying, that's OK.

Yeah, I don't know if throwing them away is the way I put it, but it just seemed to me-- I don't know.

It just seemed to me like they all sort of boiled down to keep in mind that the network is not always going to work for you.

Keep in mind it's not just abstract ideas.

There are real things involved.

I want to go to the first three, because I think the first three are things that as an application developer, I'm more likely to be concerned with even today.

The first three are the network is reliable.

Now, I like that one, because we're not talking about administrators and topology and homogeneity.

It's a very simple statement.

The network is reliable.

The second one is latency is zero.

And the third one is bandwidth is infinite.

My grandfather used to work-- he used to write these programs on octal.

And these programs would do things like they'd talk to satellites for a few minutes in a day.

And he had a sign on the wall in his office that read, there is no substitute for bandwidth and salary.

He's told me this many, many times.

And it was like the idea was that people would walk in his office with these crazy requests.

And it was this down-to-earth mechanism for being like, look, your request is crazy.

You need to pay us a lot of money and get a lot of bandwidth for us to be able to achieve what you want.

This is in the '60s.

Is bandwidth a concern for you?

Well, since I have spectrum internet, it's still a concern for me.

Oh, OK.

I guess I'm on gigabit internet.

[LAUGHTER] So these first three remind me, again, of the good old days, one of my good friends at the time worked on what time they called this gigabit internet challenge.

And his particular project had to do with visualizing climate model data.

So he was working with a professor at UCLA who was doing this climate modeling stuff.

And their idea was that they would visualize in real time this evolution of this climate model across the internet to another-- it was actually the facility here in San Diego.

And so they were trying to build this gigabit network, which nowadays gigabit network sounds like no big deal.

But back then, it was pretty cutting edge.

And so again, when I see these things, I'm like, yes, these are exactly the things that the people on that project were fighting against at that time, dealing with the fact that this network was flaky, dealing with the fact that you were sending a lot of data over a fairly long distance.

And you were trying to send it in such a way that you could actually do a visualization in essentially real time of something.

So all three of these first things come into play very keenly.

And so I just don't think they ever-- I don't think there was ever a point in my career where I got to the point where I said, oh, those things are solved now.

I'm just getting data across the network.

It's just fetching it from memory now.

I guess my immediate response when I read these is I thought TCP/IP was pretty reliable.

Do you hear somebody like me say that?

Go, oh, you sweet summer child.

Yeah, although to be fair, most of these projects were probably not using TCP/IP.

I guess most of my projects are.

That's why I immediately reached for that thing.

Like, I'm using this thing that's pretty reliable.

I mean, even Deutsch in this software engineering radio podcast, he says in the '90s, the network transmission, it was not as reliable as it is today.

This is a quote, "We now have enough experience with error detection and correction, so data corruption is just not a factor anymore."

So this was a thing.

And it's not really anymore.

I mean, you listen to him talk about these things, and he's like, yeah, they're relevant.

Yeah, he says, what's relevant now are outages, disruptions, handling errors, corrupted data, corrupted packets, not as relevant.

Handling timeouts, handling failures to transmit, still relevant.

He says, dealing with data errors at the application level is not really necessary.

Dealing with outages is still really relevant.

Now, I buy that.

I send a lot of requests from services I write to other services.

And I pretty much always have to have a timeout.

But is that a fallacy?

Do people not know that, do you think?

Well, I think if people are doing, calling a service, calling an API across the internet, and they are not assuming that it could fail, or not assuming that it could timeout, that would be a fallacy.

I'm just not sure that there are that many people who have that belief.

Yeah, I mean, in a lot of libraries, we use bake-in timeouts now.

In Python, for example, we use a sync HTTP request library like HTPX, and they've got timeouts baked in.

I suppose I have seen cases where people don't do retries.

But it's-- I don't know.

I feel like this kind of goes back to that distributed objects thing a little bit, where there was a point where people thought, I can make these distributed objects, and they're just going to look like objects in the same process.

And I don't have to think too much about-- the semantics are identical, so I don't have to think too much about the underlying network.

And that would be a fallacious thing to do.

But I'm not sure that people have these beliefs now.

So this is a lesson that had to be learned at one point.

But Deutch's argument here is, well, I need to be good at handling timeouts, expecting failures.

But I always come up against this kind of confrontation.

It's like, as an application developer, I need to be aware that my request might timeout.

Cool, I'll do that.

But then what?

I could try again sometimes.

Or it's in a failure mode.

Is this an item point request?

What if I'm paying my electric bill on the internet?

That request fails.

The people who run the electric company, they're total bastards.

I don't want to send that request to them twice.

It's like, OK, yeah, I get.

I need to handle the timeout.

But my apps, like in this kind of failure mode, what's the best thing I can do in that situation?

I can be like, hey, I failed.

Sorry, user.

Hey, send an alert to these people.

Go figure it out.

Yeah, try again later.

There's this funny section in the SRE book from Google that I always remember.

It's in a chapter called Embracing Risk.

They're talking about you want to calculate what your risk tolerance is.

And so this is the very first sentence in that chapter.

You might expect Google to try to build 100% reliable services, ones that never fail.

It turns out that past a certain point, however, increasing reliability is worse for a service and its users rather than better.

Why?

They write extreme reliability comes at a cost.

And there's a funny thing that shows up immediately after this that I always remember.

They say, further, users don't typically notice the difference between high reliability and extreme reliability in a service because the user experience is dominated by less reliable components, like the cellular network or the device they are working with.

Put simply, a user on a 99% reliable smartphone cannot tell that there's been a 99.99% and a 99.999% service reliability.

I always love this because here's what they're saying.

Yeah, you could try to make your service more and more and more reliable.

But past a certain point, user has no clue.

So don't even bother.

Yeah, I suppose there should be a fallacy in here or maybe it's implied by some of the other ones that you shouldn't assume that all segments of the network operate the same way.

Well, that sounds like topology homogeneity.

That's a homogeneity thing.

That's what that is, right?

Yeah, I guess that's kind of implied by number eight is just because there's a highly reliable network between building A and building B doesn't necessarily mean that your computer is going to get to the gateway.

Let me give you another Google quote from the same chapter farther down where they're talking about, let's calculate how much it would cost to produce an increasingly reliable service.

This is a quote from the Google SRE book from the chapter Embracing Risk.

"One useful strategy may be to consider the background error rate of ISPs on the internet if failures are being measured from the end user perspective and it is possible to drive the error rate for the service below the background error rate.

Those errors will fall within the noise for a given user's internet connection.

While there are significant differences between ISPs and protocols, for example, TCP versus UDP, IPv4 versus IPv6, we've measured-- this is Google-- we've measured the typical background error rate for ISPs as falling between 0.01% and 1%.

I find that to be hilarious.

[LAUGHS] They did all this data collection to figure out how often the error rate for the internet service providers is.

And it's up to 1%.

So you know what?

Your service could just fail.

And most of the time, the user would be like, oh, AT&T, or my ISP sucks.

That's freaking funny.

Yeah, my ISP actually runs these commercials where they talk about how their network is 99% reliable.

And I always think that's almost two hours a week.

[LAUGHS] Exactly.

Isn't that hilarious?

So if you're Google, you could be like, yeah, let's make this service super reliable, but let's not go crazy because the freaking ISP is going to break.

And they're not going to get Gmail anyway.

They call it the background error rate.

Now, the fallacy that we started with at the very beginning-- the network is reliable.

And that's what I want to ask you.

Is the network reliable, Mike?

For some definition of reliable, yes. 99% of the time.

It'll be pretty good. 99% of the time.

Yeah, I mean, I have trouble with all of these.

I think this one, in particular, the question I come back to on this is, what do you mean by the network?

OK, all right.

That's a fair question.

I just would assume any network to call.

But what you're saying is that's too fuzzy.

What are my options?

Yeah, I mean, if you're talking about the network in your building or even the network, maybe between your buildings, if you work at a company big enough to have multiple buildings, I'd expect that to be fairly reliable.

But once you're going out through your ISP and it's going through various hops that you don't know and can't control, then I would expect it to be a lot less reliable.

If I've got something deployed in a particular-- Cloud region, thing like that?

Yeah, what's the word?

Not region, but-- Availability zone?

Yeah, availability zone type of thing.

So if I've got multiple machines in the same availability zone, I'm probably going to expect networking between those things to work pretty reliably.

That'd be pretty fast.

Yeah, well, not necessarily zero.

But I mentioned this to you earlier that I've read something recently where inter-machine network speeds can be as almost as fast as memory access as now.

So if these things are connected and there's nothing more than a switch between them, I expect that to be pretty fast and pretty reliable.

Again, I think the key to a lot of these first three issues-- I mean, bandwidth is never infinite.

I think we can agree on that.

But the parameters on these first three is what do you mean by network?

When I see latency near zero, that's the one that lands when I read this list.

There's one other, which I'll get to in a second.

That's the one that lands.

Because if I spin up a bunch of cloud resources, I've got some on the West Coast, some on the East Coast.

I am baking in some latency when I'm cross talking between those things.

Further, if I just deploy everything into a single region in the US because I've got a smaller budget, totally fine.

That's not a judgment on reliability or whatever.

This is a company restriction sometimes.

I'm going to put everything into one region in a cloud provider.

Maybe I pick West because that's where my offices are.

I got to know that if there are people in London accessing my website, and it's going to go all the way over to this cloud region on the West Coast, and I don't have the CDN involved, I've got to pay those latency.

They're going to wait for those requests to traverse half the world, or maybe it's coming from Delaware, and the ping time is already 20, 30 milliseconds or whatever.

That's baked in.

I think that's relevant, and it is easy to ignore that.

I guess I don't know how often people ignore it, but it's baked in there.

Yeah, Deutsch makes the comment in that podcast, too, that you can have a pretty responsive network that still have some latency issues just because of the amount of data that's being returned to you from a service call.

But it seems like trivially obvious that latency is not zero, but there seems to be a very wide distribution of outcomes on the latency, depending on, again, on what you mean by network.

Okay.

I guess I also, this one lands for me because I think I have encountered people who don't think about latency, server to server, or network to network, whatever, data center to data center.

Okay, let's talk about security.

Number four is the network is secure, and then actually in that podcast episode where he's interviewed, he actually added another one.

I don't know if you're allowed to do this.

You come up with this list of great things in these fallacies in the 90s, and 30 years later on a podcast, you just extemporaneously lob a new one onto us.

It's like, "Wait, wait, wait.

We're just getting used to these others."

You put them on PowerPoint.

They became famous.

I'm still trying to deal with the burden, the meaning of these, and there's another one.

Anyway, security comes up, and number four, the network is secure.

And that's like raw, again, infrastructure.

Can we attack the network?

And number nine, the part of you are communicating with is trustworthy.

And I got to say, number nine lands for me.

Number four, no, not so worried.

I'm not much about people attacking the raw network.

I think I'm a little worried about it, but I don't think there are.

So when I see these described as fallacies, what I think that means is that people using the network or people writing applications on the network believe that the network is secure, and so they are ignoring that aspect of their application or whatever.

Okay, all right.

That sounds problematic.

Yeah, that sounds troubling, yeah.

And I don't think people do that anymore, or maybe they do because the networks are significantly more secure than they were in the '90s.

So they're lulled into a false sense of confidence.

I don't know what Deutsch says.

This is a quote.

"When networking was new, these small networks all had near zero latency, and all the networks had fairly well established trust boundaries."

Hmm.

Yeah, I mean, he's an old guy like me, even older than me, I think.

That's possible.

And so, you know, I think he goes back to the ARPANET days, too, where you had a handful of computers on a local area network, and the link between sites was probably from, you know, one high security facility to another high security facility.

So now maybe people, like, in our work, we kind of assume now that when we're talking to a service across the internet, it's probably TLS.

Yeah, HTTPS.

That's a huge one.

Before it was unencrypted traffic, bouncing around server to server, HTTP.

And it's been many years since that assumption that you would use HTTPS has just become completely normalized.

He does talk about, in that podcast, he does talk about how the fact that end-to-end encryption is still not as common as you would expect it to be, and possibly because some of these businesses rely on the ability to be able to read your data.

Oh, like it with email.

But I'm not an application developer using email as my deployment platform.

Fair enough.

I guess I do send data via email from my application, so maybe that was a naive knee-jerk response.

I just had.

Like, we've even been assuming TLS connections between pods and Kubernetes, or at least between services and Kubernetes.

Well, that's mutual TLS from-- if you use a service mesh, we have used Linkerd for many years in our Kubernetes clusters.

And Linkerd enforces pod-to-pod traffic using MTLS, mutual TLS.

So it's not just a client ascertaining that the server is who they say they is.

It's the server also saying, yes, I know who the client is.

And we're going to encrypt the traffic between those two.

That's the sort of the free thing we get, quote unquote, for free, for using that service mesh.

But maybe here's another one.

You talk Kubernetes.

I build some service.

You need some data from my pod.

Then, cool, I'll let you have it.

The assumption that often underlies that is, what if an application developers don't often think, well, what happens if someone spawns a malicious service in this trusted network?

That question often doesn't come up.

Should people worry about that?

Yeah, I mean, I think we have used the service mesh and some access controls within the service mesh to deal with that particular issue.

But it is one of those things where you're like, OK, I can imagine a scenario where somebody launches a service in my Kubernetes cluster.

But they had to work pretty hard to get there in the first place.

So it's not like that's a trivial thing to do.

So you're saying respect.

They can do what they want.

They pass the test.

[LAUGHTER] Not necessarily that if they get past your guards, then you should allow them to do what they want.

But I think the idea that if you are doing that within a Kubernetes cluster, you've already bypassed some levels of security.

So it's not like this is a simple thing for an attacker to do, even if you don't have mutual TLS.

That sounds like when people say, well, if you got to root on Linux, then you could run this particular exploit.

And the response is almost almost, wait, wait, wait a minute.

If you got to root, you can then do that.

Let's worry about the first thing.

Yeah.

What we're talking about here is what people often talk about is zero trust networking.

You can't just trust things in your network.

If I'm on a Kubernetes cluster, I can't just trust other pods, even pods in my same namespace or whatever.

I don't know.

I read this list of the nine.

The ones that land for me are the assumption that latency is near zero, the assumption that the party you are communicating with is trustworthy, and maybe I work with people who assume latency is near zero.

I don't think it's common to assume that the party we're communicating with is trustworthy.

What do you think about that?

You're saying you think that people assume that the party is not trustworthy?

Yeah.

It's common for us to write applications where we're like, OK, who are you?

And you're going to send me a header, and I need to make sure that that's secure.

And yeah, people might make mistakes, but it's really common.

It's OAuth2 or OIDC or some of these other things.

And these practices are so ingrained in what we're doing that I think it has coached us to assume that the requests we're getting may be untrustworthy.

Is that naive, what I'm saying?

No, I think that's right.

I was thinking about this in a different way, sort of like in the Byzantine consensus mode.

I had a feeling you were going to go with Byzantine, fault tolerant.

Yeah, I assume that's what this meant, was that you can't always assume that the thing you are communicating with is not trying to do something malicious.

But yeah, you're probably right that if you're dealing with some sort of commercial service and you've set up OAuth and authentication, and you have tokens in the headers and stuff like that, you're sort of making the assumption that there could be somebody doing-- trying to-- that they would try to do something bad.

Or we've done things to make sure that people can't call services.

Like if we set up webhooks, we've set up things to make sure that people try to call that webhook if they can't get anything useful.

Or that we've-- you make assumptions that people might try to deduce your webhook or something like that.

So you mentioned Byzantine consensus.

That's a pretty heady concept there to introduce to this podcast.

But we're just talking about some random PowerPoint slide somebody wrote in the '90s.

When I read the title of this-- I'm kidding, by the way.

I respect that these fallacies are out there.

I respect the history of them and the lessons encoded in them.

But when I read the title, the fallacies of distributed computing, I think of distributed systems research.

And these don't read to me like the nine fallacies of distributed systems.

They're sort of implicated.

They're sort of involved.

But if I'm thinking about the assumptions that my peers would have about distributed systems, where they would be in error, I'm not talking about most of stuff on this list.

And then that leads me to the question, like, Mike, I want to charge you with coming up with the PowerPoint that people in 30 years are going to come up to you and be like, that slide changed my life.

It was about distributed systems.

What do you put on there, Mike?

What are your fallacies?

Well, I think maybe one fallacy that I would add is assuming that your distributed application is going over a network.

There's really nothing into the idea of distributed computing.

I mean, I think fundamentally the idea of distributed computing is you've got things on multiple computers, right?

So some of the ideas are just basically you've got multiple processes.

Oh, that's an interesting point.

So what you're saying is, hey, that broken program you made, you could actually make it run on a single computer.

And it would still be broken.

Right.

So if you look at some of the things that Lampert was talking about in the early days, the timing things and so forth.

That happens before.

Even if you've got two processes on the same machine, you still have to deal with those.

And so I guess maybe I misstated it.

But the idea is that distributed systems issues still come up even when you aren't talking over a network.

Well, I started thinking about this because of the FLP result.

FLP result from 1985.

It's called FLP named after Fisher, Lynch, and Patterson.

These are luminaries in computer science.

They proved that if you want to achieve consensus asynchronously, if you have one process that fails, you can't do it.

According to the three rules of consensus that we're trying to accomplish to achieve consensus, those are termination, validity, and agreement.

Sorry, I'm kind of running through this very, very fast.

So it's not a great example.

So I started thinking about the FLP result as a-- here's the fallacy, as I would state it dumbed down.

Hey, if you want a bunch of service to agree on a value, it's kind of hard.

And the cap theorem is part of that too, right?

Yeah, I think the cap theorem itself-- I mean, it's not a fallacy, but maybe the idea that the cap theorem refutes could be a fallacy.

Yeah, similar.

I think FLP and cap theorem has been interpreted as like an extension of FLP.

So it's hard to get service to agree on something.

Here's another fallacy.

The cloud providers are not your friends.

They will charge you to have data leaving their networks.

Their primary business model is vendor lock-in as a service.

That's a distributed computing fallacy.

Yeah, that's totally true, though.

I would keep latencies near zero and the party you are communicating with is trustworthy.

What do we got?

Do we got enough for a slide to make us famous?

No.

No.

Well, Peter Deutsch, I think he did contribute something useful by putting these together.

I may sound like I'm trivializing his contributions.

I mean, the guy invented the JIT compilation.

That's pretty rad.

So I definitely don't mean to trivialize his contributions.

But it is hard not to read these and go, man, are these that relevant?

And as an application developer, what do you want me to do about them?

I did not find anything where somebody had expounded on these topics in greater detail.

I mean, obviously, there's that podcast where Deutsch is talking about them.

But I don't know, did anybody ever write a book on these particular issues?

I don't know.

I don't know.

I read them and I think, man, I really want to know more what are the potholes people step into when they do distributed systems.

But in order to answer that, you got to go and read the Landport saying, you can't just rely on the clock time.

You can't just start a monotonic clock.

You need linear-- like that type of stuff.

But those aren't pithy enough.

It's hard for me to say on my PowerPoint slide, wall clock time is not your friend.

I guess that is pretty pithy.

But people are immediately going to be like, what do you mean by that?

And we're like, oh, you got to read what Landport said.

And somehow, special relativity is involved?

Yeah, and if I have to explain it, it's a lot less pithy.

So just go with it.

Just go with it, yeah.

There is a fallacies programmers believe about time.

That's pretty good, too.

Anyway, we're kind of drifting here.

That's our jam, man.

We just start off in one direction and then drift elsewhere.

You know what?

I want to thank you because I came to you looking for absolution with my gut response of, I really don't want to have to consider most of these.

And you said, yeah, yeah, they're all obvious anyway.

Don't worry about it, my friend.

And I can go on my way in peace.

This is very like a religious ceremony here for me.

Thanks so much.

Bless you, my son.

Should we just interject here on a totally different topic?

Should we mention to our vast listening audience that we are at basically one year of podcasting now?

Oh, yeah.

Wow, one year on air.

That is amazing.

It didn't occur to me.

That will be this year later.

Just to be clear, not continuously.

I mean, we've stopped and taken breaks.

But yeah.

I don't think we really take a break.

So we've been pretty consistent.

We've had vacations.

I don't think that's the same as stopped and taking breaks.

I was thinking of like old right joke where people probably don't know who he is anymore.

But he was kind of like this one liner comedian back from the '80s and '90s.

And he has this joke about how he goes to a grocery store and it's closed.

But on the sign, it says, open 24 hours.

And so when it's open again, he goes back and it says, hey, it says open 24 hours.

And the guy who owns the store says, not in a row.

OK, Mike, how do we celebrate our one year anniversary?

Not one year in a row.

I don't know.

I'm kind of hoping that alcohol is somehow involved.

OK.

All right.

Well, we're going to hang out tomorrow.

We're going to see each other in person.

Usually there's a coffee break involved there.

We should come back with our greatest hits, maybe.

That's what they do on television shows when they run out of ideas.

They just sort of recycle.

Actually, you know what we could do?

We could go back and listen to our earlier episodes and be like, oh, man, that was so embarrassing, what I said.

That was so wrong.

Why would anybody listen to me?

That type of thing, that sounds like a lot of fun, doesn't it?

Yeah.

We could do like the 10 dumbest things I've said over the last year.

We could call it the greatest misses.

Oh, yeah.

I like that.

All right, my friend, thank you for podcasting with me.

I will see you again next week.

This has been Picture Me Coding with Eric Aker and Mike Moll.

Thanks so much, Mike.

Thanks.

See you next time.

Bye-bye.

[MUSIC PLAYING]

People on this episode

Erik Aker

Host

Mike Mull

Co-host