
Picture Me Coding
Picture Me Coding is a music podcast about software. Each week your hosts Erik Aker and Mike Mull take on topics in the software world and they are sometimes joined by guests from other fields who arrive with their own burning questions about technology.
Email us at: podcast@picturemecoding.com
Patreon: https://patreon.com/PictureMeCoding
You can also pick up a Picture Me Coding shirt, mug, or stickers at our Threadless shop: https://picturemecoding.threadless.com/designs
Logo and artwork by Jon Whitmire - https://www.whitmirejon.com/
Picture Me Coding
Friends and Relations
We're talking about databases again. Or database management systems, we're not totally sure. In any case, they are relational databases (or database management systems).
The relational database has been the go-to system for storing structured data since the 1980s, and is still the most popular type of system to use for applications and business reporting. We discuss their history, what makes them relational, and our experiences with some of the better known commercial and open-source relational systems.
A Relational Model of Shared Databanks - EF Codd
Dr. Michael Stonebraker - A Short History of Database Systems
Erik
Hello, welcome to Picture Me Coding with Eric Aker and Mike Mull. Hi, Mike.
Mike
Hey there.
Erik
Mike, you have a larger volume of birdsong behind you today. It's pretty pleasant, summer birdsongs.
Mike
interesting i am in a different room than normal and a window is probably open i hope people like bird songs i
Erik
bet people like bird songs in the back of their podcasts i bet i would although i don't i don't know for sure if i've heard many podcasts where people are talking and there's birds singing behind them i wanted to mention something right off the top here now mike and i we have been very busy last couple weeks we've had some we both have jobs in addition to this show This show is a hobby for us. And for both of us, the work environment has become quite exhaustingly busy lately. As a result, we're going to start doing this show every two weeks. We're going to release one every two weeks. Instead of every week, you may have already noticed we've been a little slower on the cadence. So I apologize for not announcing this earlier. But Mike, I have a question for you. If we do it every two weeks, is that bi-weekly or bi-monthly?
Mike
I believe it is bi-weekly and semi-monthly.
Erik
Okay, bi-weekly. So bi-weekly is our new intended publishing schedule for this show. So if you are looking forward to hearing us every week, unfortunately we are not going to be able to keep that up over the next, probably at least the next six months. But we do appreciate you listeners. We appreciate you checking out the show and reviewing it and commenting on it and telling your friends about it. If you like the show, we have a Patreon subscriptions available where you can support what we're doing for $4 a month, the cost of a cup of coffee. You can actually send some support to the show and help us pay for our hosting costs, which are not too expensive. And that's why we have really just a $4 a month support membership level. Thanks so much again. So Mike, I have told you recently about a record that I don't think my interest in it approaches obsession level, but I love it and I can't stop listening to it. And I can't remember another record I've heard this year that I love this much. This is actually a record called La Fondare by the artist Heems. I'm pretty sure I made you listen to this. I made you listen to this, right?
Mike
You sent me a song and I ended up listening to it and I also like it.
Erik
I love this record. There's so much funny, clever stuff on it. There is a cultural feel that is not common, I think, in hip hop from this country. There's Bollywood references and references to culture in India and all kinds of stuff. There's some great lines on it. There's a song called Yellow Chakra, where he raps, I'm like a city during COVID, said my bar's infectious. I really like that line. That's pretty good. And then actually on that same song, Open Mic Eagles on it, and he says, I'm track three from It Was Written. And I was like, whoa, this has given me goosebumps hearing it. That's one of my favorite songs. And that's like quite a flex to claim that you are that song. Like what a bold hip hop claim. I've never heard of flex that's bold, I don't think. It's a good one. It's bitchin'. Porches, which I think is my favorite one. And it's very like clear sounding. It reminds me of 90s hip hop, like just this crystal clear sounding message from out there in the world. And then finally, there's a song called Bukayo Saka. And he talks about the Mabarada. And then he says this hilarious thing. He says, I roll around with two glizzies like I'm Slavaj Zizek. I've never heard glizzies before, but apparently that's the word for hot dogs. Slavaj Zizek is a cultural critic. I think he's Slovenian and he's no notorious for saying weird out there stuff. That's hard to follow. Just always, anytime there's some cultural moment, he's just off to the sidelines, just saying his weird culture critic stuff. So he kind of became a little bit of a meme on the internet. And apparently there's a video of him walking down a street in New York, just greedily eating two hot dogs, one in each hand. So this line is, I roll around with two glizzies like I'm Slava Zizek. I just, there's so much that makes me laugh on this record and the music is so good. So if you like hip hop, you got to check it out. Heems La Fondar.
Mike
Yeah, I enjoyed it too. Ironically or coincidentally, I'm not sure what the appropriate word is, but it's been a while since I've heard a rap album that I really liked. And by a weird coincidence, my pick this week is also a rap album. It's one that I somehow missed. It came out back in March. It's called From the Private Collection of Saba and No ID.
Erik
Oh, I liked Saba's, that first album by Saba I liked.
Mike
Yeah, I like a lot of his stuff and No ID as well. She's had a couple of excellent albums and they're both from Chicago and are sort of affiliated and not surprisingly from the title of this album. This is mostly Saba rapping, but it's interesting both musically and lyrically, just kind of like your choice this week. There's a lot of interesting lyrics on here, which unfortunately I can't quote because they are not podcast appropriate. But yeah, I really enjoy this. It's one of those that you can just listen to over and over again and pick up new stuff. and it's got quite a bit of variety on it and quite a few interesting collaborations and so forth. So yeah, give it a listen.
Erik
I feel like we're trading hip-hop records this week. I really like that. We're meeting up, we're in front of the record store and I'm shoving one in your hand and you're shoving one in my hand. I can't wait to go home and hear it. All I am home. I
Mike
think you will like this one. I would be very surprised if you heard this and decided that it was not a worthy recommendation.
Erik
I'm eager to check it out. Last time we met, we discussed the career of Jim Gray, and I actually really enjoyed that discussion. I mean, I enjoy our discussions usually. That's why I enjoy doing the show so much with you. But I learned a lot about Jim Gray, and I was really impressed about his career. And I wanted to continue the discussion with you about the history of databases. Databases are ubiquitous in our work, And I feel like they're a tool that I don't think too much about the history of. They're just a thing that I use as if they were always there. And that's another thing I like hearing about your perspective and your experience, because when I talk to you, I remember, oh yeah, they weren't always there. They weren't always this tool that everybody just used for all of their problems. So this week, we're going to continue our discussion about relational databases, history of relational databases, moving on from the work of Jim Gray and talking more about where the relational databases we use on a day-to-day basis, where those came from and what their history was. Does that sound good to you?
Mike
It does. I thought maybe also we could make an attempt to sort of define what a database is. We made a sort of vague attempt at that in the last episode, but I thought maybe I looked up a few definitions from various vendors that I thought maybe we could mention.
Erik
Okay, well, I don't want to do that, but I would be happy to sit back and listen to you do it.
Mike
All righty. So like Eric said, we're going to talk about relational databases this week, but first of all, let's talk about what a database is in general to sort of distinguish it from the notes that you keep in your editor and stuff like that.
Erik
Or the thing you're going to hiss at, my Excel spreadsheet, Mike?
Mike
Yeah, first of all, I suppose we should also mention that, you know, people typically use the word database to mean like the actual data. IMDB is a database of movie information. We sort of informally use the term to also mean the database management system. Hopefully it will be clear from context, but a lot of times when we say database, what we really mean is database management system.
Erik
Yeah, actually, I'm a little fuzzy on that. Why are you making that distinction? Like I have a table and a database. Where do I apply the word database? Where am I applying it wrong? I say database is a collection of tables, but you're saying that's a database management system?
Mike
Well, I just think that a lot of people, when they hear the term database, especially if they don't work in the field, they probably think of a collection of data related to a certain thing, health database or a protein database, or like I said, a movie database or something.
And the thing that we're going to talk about this week in particular is sort of formally called a relational database management system. Very often, people in our profession would just say, hey, what database are you using?
Erik
And what
Mike
they mean is the actual management system.
Erik
Well, I guess I've seen the acronym RDBMS a lot, but I never even think about the MS.
Mike
And, you know, the term is kind of fuzzy. So I just want to make clear for people who are not full-time professional programmers that we do use the term database informally to mean the database management system.
Erik
All right. All right. I don't know if I'm making any progress on understanding, but let's just keep going. We'll just charge blindly ahead. I don't understand. Let's move on.
Mike
Interestingly, a lot of these definitions use the more informal language. So anyway, from an IBM website, a database is a digital repository for storing, managing, and securing organized collections of data.
Erik
Okay. Storing, managing, and securing organized collections. That's a pretty basic definition that I can get behind that one. That makes sense to me.
Mike
Here's a much more complicated one from AWS.
Erik
A database
Mike
is an electronically stored systematic collection of data. It can contain any type of data, including words, numbers,
images, videos, and files. You can use software called a database management system to store, retrieve, and edit data. In computer systems, the word database can also refer to any DBMS, to the database system or to an application associated with the database.
Erik
Hmm. Okay. Systematic is the word that jumps out to me there. Systematic collection of data. They apparently haven't seen the databases I've worked on. Yeah,
Mike
fair enough.
Erik
But to an application, you can call the database an application associated with the database. You would call it that. That sounds a little weird to me. No need to pick nits on that particular. That's just weird. I'm not sure. I like the IBM definition a little better so far. I'm storing, managing, securing junk.
Mike
Here's one from the Mongo website.
Erik
Are they a little biased, though, the Mongo site?
Mike
Probably, yeah. Okay,
Erik
all right, all right. I'll stop throwing rocks at it.
Mike
Mongo is not a relational database system, so maybe this is a little skewed. But formally, a database is an organized collection of structured or unstructured information stored electronically on a machine, locally, or in the cloud. Databases are managed using a database management system. The DBMS acts as an interface between the end user and the database. Databases use a query language for storing or retrieving data.
Erik
I like this better than the AWS one, but I'm not sure what I'm really getting out of the distinction between a database and a DBMS, like a database management system. I've got a bunch of databases on one server. I want to manage them. The MS, I just feel like it's kind of meaningless to me. Are we, I don't know. Am I being too naive? Do I need to care about the MS?
Mike
Here's what I'm thinking. You know, suppose that you're like, are a hobbyist who is into, you know, art or something like that. And so as you're clicking around on the internet, you see images of paintings from the 19th century, which is your particular thing. And so you store them on your hard disk locally. So you have this collection of files in a folder. It would be reasonably legitimate to say you have a database of images of art.
Erik
Oh,
Mike
okay. So
Erik
we add the MS because the term database is kind of overloaded.
Mike
Right. there's certain things that you're missing with your file collection. In other words, there's no metadata that identifies, you know, who the artist is and what the painting is necessarily. You might have to open up that file and look at it to determine who the artist is for that painting.
Erik
This might annoy people listening to this, but the MS just feels kind of pedantic to me.
Mike
It's a vague term, I think, maybe intentionally. All right, I'll stop fighting
Erik
it. I'll just accept it. We're talking about our DBMSs. Relational today, not even just DBMSs, right?
Mike
Relational DBMSs. And so what makes something relational is, I guess, the next topic of discussion.
Erik
Okay, so what's a relational database? Now, I think these are terms, of course, that a lot of our listeners are going to know. They're going to at least have examples come to mind for relational databases. Oracle, MySQL, PostgresSQL. We've talked about Postgres a lot on the show. We talked about Jim Gray's work I can refer to databases a lot. When you first started working with databases, were they always relational databases or did you work with other weird things? Maybe I'm already prejudicing the answer by throwing in that adjective weird.
Mike
Yes and no. And this kind of goes back to the weird language. So the first database management systems that I encountered were relational. I did not have any experience with the things that came before them, which is why we're not talking much about them. And I did not, until much later, have experience with the things that came after relational systems. I did have exposure to things that were called databases, but did not necessarily have this type of management system.
Erik
Okay, I'm starting to get where the pedantry comes from. This is an old guy thing. This is a, hey kids, we used to call this a database too, okay?
Mike
So for instance, there was a, was and still is a well-known thing called the PDB or the protein database,
Erik
which was a... I thought that was a Python debugger.
Mike
It might also be that.
Erik
Yeah. But
Mike
back in the 80s when I started out, I was working with people who were interested in proteins. And there was this file format called PDB, and it was a format that would give you information on a particular protein,
the amino acid sequence and that kind of thing. But it was not a relational database system. It was not a formal database management system. You had a tool where you could sort of search the database a little bit, but it was
Erik
not a
Mike
relational database.
Erik
Okay, so like object storage, like blob storage in Azure, cloud storage, file storage, Google, right? S3.
Mike
So what you would get out of it was basically a file.
Erik
So S3 is not a relational, well, obviously not relational. It's not a database management system. You could search by prefixes, but that's not a database management system.
Mike
Yeah, that's a good analogy. Most people, if they work in this field, probably are familiar with S3 and other types of object storage. And those are, you know, big systems that store data, but they are not database management systems.
Erik
Even the 10 nines, you would think that after enough nines, you could just call it whatever you want.
Mike
That's true.
Erik
Oh, okay.
Mike
So your first relational
Erik
database then? What was the first one that you dealt with that was relational?
Mike
So the first relational database that I encountered, which is kind of surprising, was Postgres.
Erik
Wow. What time was that a bend, do you think?
Mike
So this was early 90s. Postgres was still fairly new. I
Erik
thought it was created in the 90s. Is that not right? We're getting this later probably. When was it created?
Mike
So it officially started in 1986. Oh, okay. And we will talk about this a little bit more later. So I kind of vaguely mentioned in the last episode that I was involved briefly at this thing called Sequoia 2000. This was a project that was started in the 1990s with the objective of reaching certain milestones by the year 2000, because that was way back then still a thing in the future. One of the things that I and my boss were trying to do was demonstrate that you could store protein data in a relational database. And one of the participants in that project was also Berkeley, University of California, Berkeley,
which is where Michael Stonebreaker, the creator, one of the creators of Postgres, was working.
Erik
Yeah, and you were like on email threads with him, I remember you saying.
Mike
That's right. Yeah, we got into a flame war on an email thread. But so it was a relational database. It was taken ideas from its predecessor, which was called Ingress, which we will also talk about later. But he was also trying to add features to it that looked kind of object-oriented, because
that was kind of in vogue at the time, was this idea of object-oriented databases. So he called it a relational object database. And the idea was that you could have some sort of OO features, but in this underlying relational model.
Mike
So it would have things like fields that could compute things from other fields or, you know, fields that kind of did implicit joins to other data and that kind of
Erik
thing. Okay, that's pretty interesting. So you could imagine what the equivalent of a dotted lookup of an attribute would be and below that is a computation.
Mike
Yeah, exactly.
Erik
That's pretty cool.
Mike
Yeah, and it didn't really persist in Postgres, but it was a nice idea. And so we were trying to use Postgres because it was his research project, and we were doing this research project, and it seemed like it might be a good fit. And so, yeah, so in a sort of weird coincidence, I had access to Postgres long before it became a common thing.
Erik
That's rad. So when recruiters reach out and they're like, I need someone with 30 years of Postgres SQL experience, you can be like, I am one of a dozen people with 30 years of post-prosycreen experience. That's right. But you didn't use it for a long time, right? You just used it in that one project, and then you probably used other databases after that at work in the 90s?
Mike
Yeah, I used that for a while, and then when that project ended, I was exposed to a few different types of database systems, but probably by the late 90s, the main thing I was using was Oracle. Oh. Oracle was definitely, at that time, sort of the 800-pound gorilla of the relational database market.
Erik
Are you kind of ashamed now that you used Oracle, knowing how much Jim Gray hated Larry Elson? A little bit. Weird reference right there.
Mike
Oracle was the database that I was using in the late 90s to early 2000s to work on various, what we called informatics projects back in the day.
Erik
I remember that word. You still see it in academic curricula.
Mike
Yeah, I think it's more
Erik
common in Europe. In bioinformatics, I guess. Oh, okay.
Mike
In Europe, that's kind of what they call computer science, as I understand it.
Erik
So I kind of got you off track a little bit. You want to talk about history of these, or do we still need to define our terms? We're talking about databases, database management systems. Do we need to talk about what's a relational database?
Mike
Yeah, I think we should talk a little bit about what we mean by that term.
Erik
relational?
Mike
Yeah. I mean, a lot of people probably know, but we should probably make an attempt at formalizing it and defining it.
Erik
Okay.
Mike
All right. Go ahead. We mentioned last time when we were talking about Jim Gray that he worked for a guy named EF Codd in the system R group at IBM. And Codd wrote this extremely influential paper in 1970, which kind of laid out the ideas of relational databases and what would be called relational algebra. Paper was called, let me see if I have it, a relational model of data for large shared databanks.
Erik
Yeah, I haven't read this, but I kind of want to read it because every time you start looking at the history of databases, everybody talks about this paper.
Mike
Yeah, I mean, it's a pretty famous paper and, you know, COD is sort of almost synonymous now with the idea of relational algebra. But anyway, the idea was that data is stored as what he called relations, a relation in the mathematical sense. So if you've taken set theory, you've probably come across the term relation. And mostly you'll see binary relations like greater than or less than, things like that, where you have one set on the left side that's related to a set on the right side. In Kod's formulation, these relations are generally not binary, not exclusively binary. So they are enary relationships.
Erik
So it could be enary. So that means binary is like, I've got a thing on the left and a thing on the right. But in this case, we can have a lot of different so-called, maybe we'll think of them as arguments for functions in a way.
Mike
Right. And you can also think of them as he does as being like the cross product of a series of sets. And
you're looking at choosing one thing from the first set and one thing from the second set and one thing from the third set and so forth.
Erik
Did SQL comes from that paper or was that a related thing that he talked about at the time?
Mike
So the idea of the manipulations that SQL can do. And the idea of having a particular special language for doing it kind of comes from Codd's paper.
Erik
Got it. Right. Okay.
Mike
The actual language would come from some other guys a couple years later. So anyway, relations in sort of informal terms are what we now think of as tables.
Mike
So, you know, the relational databases, the sort of key elements are tables with rows and columns.
Erik
But they can be tied to each other in a way or linked to each other. Yeah. And that's a relation.
Mike
You can have, well.
Erik
And they can be tied to themselves. Am I muddying this already?
Mike
The relationship between tables is different than the relation.
Erik
Okay
Mike
The relation is entirely encompassed within a single table.
Erik
Data that's related in here. So
Mike
a row of a database is one element of an n-ary relationship, or sorry, an n-ary
Erik
relationship. Okay.
Mike
I'm trying not to use the word relationship.
Erik
This makes sense. I think I lean too much on my fuzzy working knowledge of these tools to when I, you start applying more formal language to them and I kind of go, yeah, yeah, yeah, that makes sense. But then go right back to my like lazy workaday understanding of the relational database. So
Mike
some aspects of this are that you have a table with a bunch of rows and the ordering of the rows doesn't really matter. When you get down to the level of the actual code and the computer hardware and stuff, the ordering of the rows might have some effect on performance, but conceptually the order of the rows doesn't matter.
Mike
Sometimes the order of the columns does matter though. In particular, every row has to have the same ordering of columns.
Mike
another important thing of this is that there's typically some subset of the columns that is unique across all rows. In other words, that group of fields is unique for that row, for every row in the table. And that's called a primary key. Codd went on to define sort of the operations that you could perform on tables. And this includes things that we think of now as being pretty common, like projection and joins and so forth.
Erik
Yeah. When I first learned SQL, I had a class. I don't remember where it was. But I could have sworn in that class they said, this language SQL comes from sequent calculus. That might not even be right. And they started talking about selection, projection, and was it joins? Was it the third thing? There were three things, selection, projection, and...
Mike
Yeah,
Mike
selection and projection are roughly the same thing. Joins is definitely one of the things in there.
Erik
And
Speaker 3
as we know, there
Mike
are multiple types of joins.
Erik
God also
Mike
talks about what he calls restriction, which is what we would think of as being filtering or the predicates in the where clause.
Erik
Oh, okay. So you could think of it like a giant Venn diagram, right? And you could just keep adding more circles and then really what you're looking for are slices of the Venn. I mean, often the way people teach joins when they're just kind of visually trying to get people who are new to writing SQL to understand what they're doing or how to query databases is they'll use Venn diagrams, left join, right join, full outer
Speaker 3
join,
Erik
inner join on. That Venn diagram metaphor is really useful for this. But thinking in terms of sets is also useful. Set intersection, set difference,
Mike
et cetera. So some people call this relational algebra or cod algebra, but it's really fundamentally sort of based on set theory. And
Speaker 3
you can
Mike
sort of think of a lot of queries that you can form in SQL as being equivalent to defining a set. So another thing that we're sort of glossing over here is that SQL does have sort of multiple parts. So there's the querying part, but there's also typically some sort of data definition language. So you can define tables and define indexes on tables and that kind of thing.
Erik
So you may have a whole bunch of rows, which are just ordered tuples of data, for example, in the actual implementation. But we're naming those two. We're naming the index of the tuple as a particular column with a particular data type. And that data type is consistent.
Mike
Right. Yeah.
Erik
For all rows.
Mike
As you also implied earlier, that you can have sort of complex data by having tables that are related to each other. And this is complication in the language here.
Erik
Yeah, you were like, that's a relationship, not a relation, my friend. I was like, look, I'm still learning the world of relationships, okay? There's still mystery to me. So
Mike
one field, one or more fields in one table can constitute what they call a foreign key, which is actually a key into another table. And one of the things that Codd also defined in this paper, which is now a fundamental part of relational databases, is what he called normal form. That's
Erik
in this paper? As well,
Mike
yeah. Oh, I had no
Speaker 3
idea. here is
Mike
that you want to structure your tables in such a way that you don't repeat too much data. You know, a lot of times you'll hear now people talking about you want to achieve third
Erik
normal
Mike
form.
Erik
Third normal form. Yeah. Now, friends, if you're listening to us and you want to go to work and make your coworkers feel bad, this is one thing you can talk about. You could talk about normal. If you never heard a normal form, you look it up and then you go to work and you say, look at our databases, we're repeating ourselves a lot. We should be more normalized. What are we doing? This is shameful, repeating ourselves so much. And you could use this term, you could use this concept of normal form to make people feel bad about how they're organizing their data. That's how it was used
Mike
on me anyway,
Erik
in the very early days of my career as a cudgel. We're doing a bad job.
Mike
In most organizations, there's almost a guarantee that you will be violating third normal form somewhere in your database. I think there was
Erik
somebody at work one time, at least 10, 12 years ago, where we had one of those kind of database critics who was like, no, no, this is not normalized enough. We should be going for third normal form. And somebody who knew what they were talking about stood up and was like, that's not going to happen. It's virtually impossible. Leave us alone. Stop trying to make us feel bad. I had a meeting
Mike
once where we were talking about, you know, what was our database in third normal form? And this one guy kind of sat back in his chair and he said, I think we should try to get to fourth normal form. And
everybody just kind of went silent. And I still to this day don't know if that's a thing or not. Oh, yeah.
Erik
I thought it goes up to fifth. I think there are higher levels of normalization.
Mike
But anyway, the idea is that, you know, you don't want to repeat data in tables. So, for example, a common thing would be is if you have a, well, let's say a person's address.
Erik
Yeah, I was going to go address example. Yep, that's what I was going to do. And then I was going to take shots at this example. Let's go for it, though. This is the
Mike
one I remember. Yeah, it's kind of a classic case. An address consists of all these multiple parts.
Erik
Let's just do U.S. and Canadian
Mike
addresses.
Erik
Usually a number, a street. A street number and a street. The street and
Mike
the zip code and maybe an apartment number.
Erik
Yeah, sweet. A city and a state and a zip code or a postal code.
Mike
That's pretty standard. So you're going to have a number of...
Erik
But wait, Mike, you and I might live on the same street. What if your street address is in the database and my street address is in the database? Have we
Mike
duplicated something
Erik
because that's the same
Mike
street? You're laughing. Yeah, you're going to pull that out into a separate table so you don't duplicate street name.
Erik
It's not third normal form unless you put that street in its own table and reference it from our addresses, right? Sorry, I'm already like taking huge swings
Mike
at this design. So this is probably like more taken considerably more seriously back in the days when they were defining this stuff. I mean, it is important. Don't get us wrong.
Erik
Oh, yeah, I'm not making fun of it. It was just that I worked at a place where people were like, no, we need to put all the addresses in this table and the addresses need to be normalized. And we were like, why? And they're like, so we could figure out if people are close to each other. And I remember feeling at the time like, that sounds very complicated. Don't we have a simpler way of figuring out if people are close to each other? Like lad and long or something. I don't know. Like there's got to be an easier way to do that. You really want me to join all these tables to assemble a single address? Sorry, I'm fighting battles from 10 years ago. You could see the
Mike
wounds are still fresh. Back in 1970, when God read this paper, the storage medium was potentially like tape or something like that.
Erik
Tapes? So theidea of making
Mike
the database as minimal as possible was probably important.
Erik
All right. I should let that. I'll just let me just, I'll let that go. It's okay. It's in the past. I forgive those people. I forgive myself. Let's heal and come together. Put all the data in one table, right? I don't need relations.
Mike
Yeah, and something that I got very good at over the years was denormalization.
Erik
You got good at duplicating stuff.
Mike
Sometimes it's useful to make things faster.
Erik
I feel like we're going off track here as we're fighting these old battles. You sent me an interview with Michael Stonebreaker, and I enjoyed it quite a lot. The interview was from 2014 or something like that. He had won a Turing Award. And he talks about the history of databases. He says, in the 1970s, there was only one database market, basically business data processing. And the whole goal of data management was to make business databases work better. Relational databases were originally designed with that goal in mind. And that was the only market anyone really saw until 1990. Then I think over the next 15 years, it occurred to most everybody that they needed a database system. I didn't realize this. It's just like, yeah, we were just doing some dumb business stuff.
Mike
You know, in the days when they were defining this stuff, it was business applications, you know, accounts and inventories. And primarily in businesses that could afford the relatively expensive computers that you needed to run these things on. Now it's just kind of amazing how pervasive these things are. You know, at this point in time in 2025, relational databases are not the only choice anymore, but there's still major products like Oracle, MySQL, SQL Server from Microsoft, Postgres, and then all the cloud providers.
Erik
Specialized cloud offerings. AWS and Google and
Mike
Azure I'll provide managed versions of MySQL and managed versions of Postgres and Oracle sometimes. PostgreSQL. Things like that. I saw something
that said that the market for relational database software as of 2024 was almost $22 billion.
Erik
Wow.
Mike
MySQL still has the biggest share of that.
Erik
Oh, wow. I'm actually surprised to hear that. Yeah. What's the market share percent that they gave? As I said,
Mike
MySQL has 41% of the market share, which I was surprised by, but I think maybe it's because it's kind of built into a lot of things.
Erik
Oh, okay. So 41, well, you know, it's sort of baked into WordPress and WordPress has a pretty dominant presence on the
Mike
web still. Yeah, I mean, stuff that I think of as being sort of old now, like Drupal, largely based on MySQL. Yeah,
Erik
that makes sense. Well, so do you have any idea? I mean, there were all these non-relational databases, NewSQL, NoSQL, people called it for a while. You talk about Mongo, their definition, Elasticsearch. Is that stuff eating into relational database market share, you think? Or is there still, I mean, I would imagine that most people build an app that are going to start on relational databases.
Mike
Do you think that's correct, that gut response? I think it's still very common for people to sort of immediately reach for Postgres and MySQL and especially the cloud versions of those. I think it's,
you know, we will talk about different styles of databases in future episodes, but I do think it's probably more common now for people to, for certain types of web apps, people are probably looking at things like Mongo and Firebase and DynamoDB.
Erik
Dynamo, CosmosDB on Azure. I was curious, if you go back to this Stonebreaker interview, Stonebreaker, by the way, I'd never seen a picture of him. When I think the name Michael Stonebreaker, I think of a spy,
a really fit guy who can run fast and jump out of planes. And that's not really the image I got when I saw him. Anyway, that's probably not a really fair comment. But I read this interview with him that you sent me, and I was like, I can't believe Mike argued with this guy. This guy's amazing.
Mike
Yeah. If you didn't hear in previous episodes, I lost an email argument with Michael Stonebraker back in the 1990s.
Erik
So he brings up something that also came up when we talked about Jim Gray, that when Codd wrote this paper in the 70s, talking about how you should view data as tables, right? Actually, let me just read to you a quote from the Stonebraker interview. This is Michael Stonebraker says, Ted Codd wrote his pioneering paper in 1970. that said you should view data management as tables, the simplest possible data structure, and then access them in a high-level language. Now that means SQL. These were revolutionary thoughts at the time and went counter to all the existing data management systems. Immediately, there was a huge debate between the relational folks who said Ted Codd's ideas looked great and the traditionalists who said you can't possibly build one of these to be efficient, and even if you could, no one could understand these newfangled languages. Now, I thought that was a really interesting quote. There's a few reasons, a few things that jumped out to me there. But Jim Gray also made the same comment that when Cod had these ideas, there were these critics who were like, no, no, no, no, that's never going to be efficient. It's not going to work. And I don't, I would be curious, I don't think we know, I mean, I don't know if you know, but I don't know what the argument is about why it wouldn't be efficient. I'd be curious what that argument was. Do you want to try to take a stab at that? I think I
Mike
would have to go back and look through the literature to see what people were addressing. But I think one thing that sort of comes to mind is that, because I did encounter this in some contexts back in the good old days, so the databases that came before relational databases, and we mentioned these a little bit in the Jim Gray episode, There were two styles, one that was called network databases that were largely the work of
a guy named Charles Bachman. And then there was what they were called
hierarchical databases. I think that one issue that people immediately had with the relational model was we're storing things that are really complex. Sometimes we have very simple data, which is just a simple row of some item that we sell or something like that.
Mike
But inevitably,these things turn complex and there's relationships between things. And so the approach that the network databases had taken and the hierarchical database as well was that
when I have these multi-part things, I'm going to just literally make a connection between the larger parts and the smaller parts. And so when I want to look up all the parts that I need to make some larger thing, all that's immediately there. Whereas with the relational model, you have to have this underlying code that looks in one table, then finds the
right keys, goes to another table, loads that up, and matches them together.
Mike
And I think at the time, especially with the hardware that people were dealing with in the early 70s, that probably seemed like a really hard problem to solve efficiently.
Erik
Oh, yeah. They're imagining sequential scans over everything, all data, all the time. Every time you run a query, you're going to do a sequential scan, maybe is what they're imagining. I think that's
Mike
part of it, yeah. Yeah, and just, you know, you couldn't easily go from the top of the hierarchy to the bottom of the hierarchy without doing, you know, joins. joins in filters and so forth.
Erik
I guess it's interesting to me too, the database has become so ubiquitous. There was this critique of the relational model idea. Hey, this is not going to be efficient, but people charge forward anyway and said, these ideas are great. Let's try them. And they built the stuff and it ended up working and succeeding. I mean, by any measure, it's a success. I just kind of thought it's sort of easy to sit back. No, that's not to take shots at those people either, but it's sort of easy to sit back and go, oh, that'll never work. It'll never be efficient. And we're sort of fortunate that people didn't go, oh, you're right. We'll give up on this idea. I
Mike
think also we are a little bit spoiled in 2025.
Erik
Oh, because stuff got faster and Skynet was supposed to happen in 2025, but it didn't actually? If you think
Mike
about how people actually use databases, it's relatively simple in most cases. Sometimes people may not even be aware that they're doing SQL. You know, a lot of systems now, a lot of programming language constructs now exist to do queries on databases that make things look like objects. You know, there's the object relational management system, ORM systems that make access to a database look like objects. And there's various other things in certain languages to sort of abstract over SQL. But even if you're doing SQL, you know, you, you may be doing a join across a table or two and selecting some fields from those tables. There's, you don't have to think about what's happening inside that query, but there's just a huge amount of work that's gone into,
you know, looking at that query, optimizing that query, doing various things to make, you know, once, once the query planners figured out what to do. So there's various things that have made those operations much faster than they were originally. Also, computers have just gotten ridiculously fast.
Erik
Yeah. Yeah, I guess you could look at that problem and go, oh, they'll never be able to figure out how to magically optimize arbitrary queries that people are submitting. One other quote I had from that Stonebreaker interview. he talked about the origin of PostgreSQL and he said that this people critique they critique the idea that it was going to be inefficient but we built these things and then we had like Ingress and we had System R. System R that's the IBM one
Speaker 3
right that Jim
Erik
Gray worked on. Yeah and so from Ingress the the reason they came up with PostgreSQL I maybe I had heard this before but I don't remember it ever being expressed so like as such a obvious reason or immediate goal but Stonebreaker said to the first approximation, relational databases fell on their face when you tried to apply them in different areas. So it's not that they were inefficient. It's just that you couldn't easily use them in other areas. The problem wasn't the relational model, he says. It was more the data types that ingress and system R supported were floats, integers, character strings, money, and that's what the business people wanted. But if you wanted to build a geographic information system, you want points, lines, polygons, that sort of stuff. He gives that as an example of like, you could put arbitrary stuff in today's databases, you couldn't at the time. He says, so one of the basic ideas in Postgres was let the user have whatever basic data types he wants to manage, and don't predefine them by insisting they be the ones that apply to business data processing. I thought this was actually a pretty cool distillation of the reason for them creating Postgres. The reason why, I didn't know this. Did you know this was the reason why they made it?
Mike
I did not know it was the motivating factor. I did know that stuff existed because of my early exposure to Postgres. The idea that you could have a field that was essentially like a structure in a programming language was very unusual at that time.
Erik
Sounds like a nightmare to support that too. Can you imagine the bug reports they must have gotten Yeah, I think it was.
Mike
And, you know, like one of the, one of the issues that I discovered with it, and that was part of the argument I've referenced a few times is that there were cases when you wanted to define a data type. And the only way to make the database supported was to write your own code and see and link it into the database.
Speaker 3
So,
Mike
you know, there were cases where not only was this idea of supporting new data types difficult for the creators of the database, it was also difficult for the users of the database.
Erik
So I jumped ahead to PostgreSQL. We talked about Jim Gray a little bit and IBM and System R, but there's a lot of pieces missing from this story, I think, yeah?
Mike
Right. So, you know, we talked about System R and Jim Gray and then Ingress, which was Michael Stonebreaker's first relational database product, which he developed at Berkeley, was followed right on the heels of that. You know, they were apparently reading the papers. And in addition to that, there were a couple of guys who were reading the paper and heard about these ideas of having a language to query the database. And so a couple of guys, Chamberlain and Boyce, started to develop languages for querying relational databases. And they eventually hit upon what we now call SQL or SQL.
Erik
Oh, those are the SQL. I almost thought COD was the originator of SQL. That's not right then. He just said we need a language. He invented sort of the
Mike
idea that they were working from. They went through a couple of iterations of languages to query databases and eventually settled on SQL. Ironically, none of these things got into commercial use very soon. So they were doing this work in the early 70s, but nobody was really commercializing it yet. And so it gave Larry Ellison the opportunity to be sort of the first to market with relational database. Oh, okay. So Oracle, first to market.
Erik
IBM, it's surprising IBM wasn't. But Jim Gray said that he worked there for like 14 years and they shipped the first line of code that he worked on after he left or something like that. I think they moved
Mike
a little slowly, but Oracle came out in 1970, 1979 or 1980, somewhere around there. And it was really the first thing that you could
buy for mini computers and mainframes and stuff. And then not long after that stone breaker and colleagues started commercializing Ingress. One, one sort of weird antidote about Ingress. I used to know people who pronounced it Angra, like the, like the French,
like the French painter. And because of that, I also knew people who, when Postgres first started becoming popular, people would pronounce it Postgre. So IBM didn't get a database on the market until 1982, which was IBM DB2. So in the 80s, you know, these were kind of the players. There was Oracle, Ingress, and
Erik
DB2. Ingress, Oracle, DB2. Okay. Yeah. And would you say a lot of businesses bought these things? Or were they widely available in big business? Was it small business didn't use these? Big business only? They were primarily for bigger businesses.
Mike
And they were primarily, you know, there weren't really people using them that much for software applications. They were, you know, people would, companies would store their data in these things. And then they would run reports on them. And they would get out reports
Speaker 3
from the
Mike
database on what they, you know, what their inventory was or what their sales were. That started to change in the 1980s and 1990s, but really at this point in time, people weren't thinking of them as being the underpinning of applications. Stonebreakers started working on Postgres around
Erik
1986. Oh, yeah, we talked about that, yeah.
Mike
And then in 1989, Microsoft finally came to market with a SQL.
Erik
SQL Server is 1989? Well, I would have thought earlier. Wow, okay.
Mike
The first commercial version of SQL Server was
Erik
89. I guess that makes sense. Yeah, I guess that makes sense.
Mike
I don't know how much market they had in the early days, but I think they were trying to be
Erik
a business platform,
Mike
Microsoft was.
Erik
And MySQL? Do you know when MySQL came about?
Mike
Yeah, so in 1995,
Erik
MySQL came out. Go ahead,
Mike
sorry. I don't think a lot of people were using it at that point in time, but to my mind, this was kind of like a profound inflection point in the history of relational databases. because MySQL was effectively an open source product. It was something that people started to use to build
into their web applications.
Mike
It really made the idea of using a relational database as the data layer of your application into being a
Erik
much more common thing. So it's viable, but then it starts to become almost like an obvious design decision.
Mike
Yeah, and MySQL was really sort of the predominant database of the... Yeah,
Erik
by 10
Mike
years later,
Erik
2005, 2010, somewhere in there, you got all these products, Twitter, GitHub, everything. You got WordPress. It seemed like everybody used MySQL. It just seemed like the thing. When I started using Postgres SQL, I remember it being kind of an almost unconventional and odd choice. People would be like, are you sure everybody uses MySQL?
Mike
Yeah, I think it was sort of considered to be a little bit of a risky choice, even if you're using MySQL to switch to Postgres, but something, a couple of things happened, I think in, I think it was 2008 when Sun bought MySQL. And I think that made people a little bit nervous that it wasn't going to, it wasn't going to continue to be an open source, freely available product.
Erik
But then I started using MySQL right in the lee of this and people were like, oh, this is going downhill.
Mike
Better be ready. Find something else. And a couple of years later, something even worse happened, which was that Oracle bought Sun. And so MySQL suddenly became an Oracle product.
Erik
Yeah, I remember people like almost panicking at that point, like we need something else. But
Mike
it's surprisingly still extremely popular in the web world. And I don't know, but I would guess it's probably one of the main choices for things like Amazon's RDS, probably also the managed services for the other cloud providers.
Erik
I'd be curious about that. one of the funny things that showed up in that Stonebreaker interview, I know I keep going back to it, but he seemed to have this take almost that our relational databases were almost like a victim of their own success. They're so common, so ubiquitous now that people apply them even in places where it doesn't really even make sense. You've already got this tool. We'll just shove the stuff in there, right? But then you sort of compound your problems. He had this hilarious example, which you drew my attention to. And then when I read it, I was like, yeah, that is really funny. He's talking about, his example is procurement systems. And he's been CTO and advisors for these, a couple of different companies. And he, one of the companies they've advised is General Electric GE. And he said, they have, you know, companies have procurement systems, which you use to go out and buy things, paper, ink, cartridges, computer,
Speaker 3
whatever.
Erik
They have a procurement system. And he said, so it's logically like, if you were to think about this, you just picture in your head a company, you think they have a procurement system. How many of those would you imagine they have? You probably imagine they have one of them. Why would you have more than one? And he goes, yeah, so the obvious correct number for procurement systems for any companies to have one. And the interviewer goes, yeah, logically that makes sense. And he goes, well, GE has 75 of these things. It's like, what? And he uses that as an example for the way in which data gets so dividedly used. It may be roughly the same concepts or the same abstractions. The same company sells you ink cartridges, right? They sell Department A ink cartridges. They sell Department B ink cartridges. But Department A and B have different ways of using this information, different data they want to apply when they make the decision about how many cartridges to buy or whatever. So you get the same data represented in completely different ways and you end up with 75 procurement systems, which still sounds completely bonkers. So databases, a victim of their own success, maybe I'm taking the wrong message out from there, but he seemed to be saying we need to be able to break free from the paradigms. Yeah, we've gained a lot from relational databases, but because we're so reliant on them and we reach for them so commonly, you end up with situations where people have to build their own applications to represent the data in the special way they need to represent it. Did you get the same interpretation from that?
Mike
I think so. And I also see this in practice quite a bit. There's this phenomenon I see from time to time where it's like, well, we're using Postgres for this. Shouldn't we just use Postgres for this thing too? And what you end up with is things like Postgres databases that have one field,
Erik
it's a JSON blob.
Mike
Or you have people using Postgres for their online transaction processing, but then they use the same database for their data warehouse.
Erik
Data warehousing. They talk a lot about data science in this interview and the needs of people doing business analytics and data science and data warehousing. And yeah, I could see how you potentially, as someone who chooses technologies, to build business solutions on top of. Sometimes there's a little bit of a, oh, am I picking the wrong thing? Am I closing off the possibilities I may want to exploit in the future? It can be a little nerve wracking when you're right at the beginning of a big project picking technologies because you're not sure what you don't know yet and you don't want to pick the wrong thing.
Mike
It's very difficult also to have sufficient expertise to understand why you might want to choose one database over
Erik
another database. That's what you were supposed to answer for me here. I was just going to say, Mike says this.
Mike
We've encountered this and I still struggle with it. You know, if you're going to use things like Postgres, MySQL, Oracle are often a great solution. They're very performant. They're very reliable. When you use the managed versions on the cloud providers, you get a lot with that. You know, they do backups for you. They do replication, even geo
-replication for you. So they're safe, they're secure. There are sometimes good architectural, good technological reasons for using something else. You may need a database that is much lower latency. And the latency is an
Erik
important thing,
Mike
whereas maybe consistency
Erik
is not.
Mike
There are cases where you need a database that serves as your warehousing system. You need different specifications than what you need for your
Erik
transaction processing. Small organizations, you start making things complex by having a lot of different tools to reach for, potentially.
Mike
Yeah, and if you tell your bosses, we need three databases, you're going to get a lot of pushback unless you can find very good both technological and
Erik
business reasons
Mike
for why you need three
Erik
databases. Well, I'll tell you one thing that I never got any expertise on myself. Just really, really broad brush stuff. But you remember years ago where people would write these articles like, we were using Postgres SQL for this. I remember Pokemon Go was a big one. They were using PostgreSQL, but they were writing just a massive volume of writes because as people were walking around with the Pokemon Go app on their phone, they were like writing back every place they had been to. So just a huge amount of telemetry going back and getting written into a PostgreSQL table, most of it never being read. So that was one example. I can't remember there was another really big famous example of some web company. And they were like, yes, we were trying to write large volumes of data in PostgreSQL and it didn't work. So we switched to MySQL with InnoDB really makes these writes so much faster because of like index rebuilding. I don't even remember the details, but there were a lot of heated conversation about this maybe 10 years ago. And I myself never really learned all those things. I should probably know to pick one or the other.
Mike
Yeah, the Internet and also Internet of Things has kind of broken databases in a sense, because I think when these things were being developed, people did not think of situations where I'm going to need 10,000 writes per second. You know, it just wasn't really, it was hard to imagine something like, you know, I remember going to a talk by a guy from Pinterest and talking about how they had to process tens of thousands of clicks per second or something like that. And, you know, just, I don't think when people were thinking about accounts and inventories, they really thought that was ever going to be a likely thing. So
Erik
I enjoyed this conversation. I feel like I learned more about the history of SQL, the history of the relational algebra. I want to read this COD paper. And I think the history, the timeline of relational database is a little more solid in my head now. So there's Jim Gray, there's System R, there's IBM with DB2, there's Ingress, and then PostgreSQL, and then a cluster of SQL Server, MySQL. This is more of, I guess, 90s. I think it's pretty cool to think about this tool that I use on a day-to-day basis and sort of see more of the history of where it came from. I don't see it being supplanted anytime soon. What about you?
Mike
I think it's probably, I think relational databases are probably here for a very
Erik
long time.
Mike
I saw a couple of projections of how the market's going to grow. To be perfectly fair, in a lot of applications, it's still probably the most sensible way to store data.
Erik
All right. So this has been Picture Me Coding with Eric Aker and Mike Mull. Thanks, Mike. Thanks for the conversation.
Mike
You are welcome. We will
Erik
see you again in two weeks. We are going to be bi-weekly. Is that the word I want?
Mike
I think we decided that's the right word. Bi
Erik
-weekly, Picture Me Coding. Thanks so much for joining us. We will see you again soon.
Mike
See you next time.
Thank you.