Picture Me Coding
Picture Me Coding is a music podcast about software. Each week your hosts Erik Aker and Mike Mull take on topics in the software world and they are sometimes joined by guests from other fields who arrive with their own burning questions about technology.
Produced by Neiko Will.
Logo and artwork by Jon Whitmire - https://www.whitmirejon.com/
Reach out to us here: podcast@picturemecoding.com
Picture Me Coding
The Agony and the Ecstasy of Excel
This week Mike and Erik are joined by special guest Justin Runia an analyst and Excel wizard and returning guest Bob Farzin who told us a story in the high-frequency-trading episode about trading billions of dollars of swaps from an Excel spreadsheet.
We spent the whole hour talking about Excel: is it a no-code platform? Is it a programming language? Is it a database? Should we build the whole business on top of it?
If you're an immense fan of Excel like Mike is, then join us this hour as we tackle this crazy tool that contains the most widely used functional programming language.
Some Links
- https://phys.org/news/2024-08-business-spreadsheets-critical-errors.html
- Advancing Excel as a programming language with Andy Gordon and Simon Peyton Jones
[MUSIC PLAYING] Hello.
Welcome to Picture Me Coding with Erik Aker and Mike Mull.
Mike, today I have something extra special planned for you.
I know how much you love Excel, and I want to spend this entire episode talking about Excel.
And to help us out, I've invited two friends onto the show.
One is Bob Farzin.
He's been on in the past, and he's told us some interesting stories about Excel at work.
And two is my old, longtime friend, Justin Runia.
He's been working in Excel for a long time.
He's my favorite Excel nerd.
He works currently as an analyst.
Justin, welcome to the show.
Thanks for having me.
And Bob, welcome back.
Great.
Thanks.
Glad to be here.
You guys are ready to talk about Excel, Mike's favorite topic.
Mike, how do you feel about this?
Oh, I'm excited.
We should probably do two episodes.
It depends how this one goes.
We'll probably know about halfway through it if we're driving the train off of a cliff.
OK, so we want to talk about Excel.
The reason we want to talk about Excel is it seems to come up for Mike and I a lot in our work as software engineers.
And it usually comes up in terms of, hey, stop using that thing.
But I have respect for Excel as a platform.
I have respect for people who can do really sophisticated stuff in Excel.
But I don't want to myself.
So I thought, let's talk a little bit about the history of this thing, how it came to be this incredible software platform.
People are building really complex stuff in it.
Really what it means for us as a kind of software that the people who work in are maybe not considered software engineers typically.
So a little bit of background first about Excel.
And everybody jump in if you know more.
I just did very basic Wikipedia research here.
It was first introduced in 1985 only for Macintosh.
And Windows version came out two years later, 1987.
Apparently, the most popular version of Excel came out in 1992, which struck me because that was coinciding right with this abundance of desktop computers in companies.
So Windows becomes super prevalent.
And in 1993, they introduced VBA.
But I think there were some spreadsheet software applications before Excel.
Yeah, for sure.
So spreadsheet as a business planning document was invented more than 100 years ago.
And it fits a lot of needs.
It's very visible and very comprehensive.
And so as soon as computers were invented, people started thinking about how to get spreadsheets into them.
OK.
So one thing I was curious about, I looked this up.
I don't know if you guys know this, but the total number of rows you can have in an Excel spreadsheet is 1,048,576 rows.
But that went up.
It used to be way less.
Oh, OK.
So that's the current max.
So the total number of columns in a worksheet, 16,384 columns.
And those are rookie numbers.
You've got to bump that up.
If I can have a million rows, I want a million columns.
So we've got a million rows, 16,000 columns.
It looks like you could only have 1,026 page bricks, also kind of disappointing.
I really think we need a million for all of these.
And one thing I was surprised by, I really thought there should be a financial value limit in a spreadsheet.
So if I'm running my company on top of Excel, and I've got $4 billion of value in that spreadsheet, I would really think that they should limit that.
You've got to start a new spreadsheet if you get to that financial value max.
Otherwise, it's just too dangerous, right?
I have a question.
I'm curious to know if any of you have used desktop spreadsheet application other than Excel.
No.
No.
Excel is the exemplar.
As soon as it came out on Mac, I think it bumped out Lotus immediately.
And it's been-- yeah, the whole time.
Yeah, I figured I'm probably the only person here who actually predates Excel.
All right.
So let's talk about Excel a little bit as a software development platform.
So VBA, as I mentioned, 1993.
Apparently, there are some other languages you can use in it.
I had heard about Python, but I guess that may not really be rolled out yet.
So they have MLang, OfficeScript.
I hadn't heard of that.
It's a TypeScript language. 2019, they added support for that.
But then there's also the really rich formula languages, right?
And that's where things get pretty interesting.
Excel formulas, lambdas, array functions.
And I had never heard this term, but data analysis expressions.
Justin, you use a lot of the stuff in your work.
Do you want to tell us a little bit about this?
Do you have any?
Yeah.
Go ahead.
Yeah, so it used to be pretty basic.
You could hit standard deviation and some real basic statistical stuff, but it wasn't very sophisticated if your formula was going to return an array, for instance.
It couldn't handle that.
It would just error out.
And so for a long time, you had to jump through a bunch of hoops to accommodate that.
But in the past 10 years, they've gotten a lot better.
And there's a little bit of tension, because with returning an array, it's going to dump out into a bunch of cells besides that cell that you're doing the formula in.
And so they have to manage how that works in the sheet.
But yeah, the stat functions have gotten really good.
And they've gotten a lot better at handling a variety of use cases.
I read in Wikipedia that when they introduced lambda, Excel is now touring complete, just like Super Mario Brothers.
What do we think about that, Mike?
It doesn't make any sense to me that we're using anything else now.
I mean-- [LAUGHTER] I mean, you could probably do Super Mario Brothers in Excel.
So-- [LAUGHTER] It's turned complete.
You've got to be able to.
Bob, what do you think about Excel as a programming platform deploying code into Excel?
You've written pretty heavy spreadsheets in the past, haven't you?
Yeah, I've written some pretty serious spreadsheets.
I've inherited even more serious spreadsheets and seen some absurdities.
But I think that it's really appealing when you don't know anything about programming.
Like when you're not a coder, but you have ideas about systematically doing things, it feels like you can do it in Excel.
And you don't understand the need for all these guardrails.
So you just-- that you would have in a normal programming language all those guardrails around types and what I'm going to return and what types I'm going to return.
I mean, just like the way Justin said, you don't even think about the fact that, oh, when this returns an array, it could be of all these different sizes.
I'll just make this huge region on my Excel spreadsheet.
And I'll just have it dump into this huge region.
And I'll hope that it's about the right dimensions.
And I'll just read it.
So yeah, I think that it gets people from zero to something usually pretty fast.
And my experience, especially in financial firms, has been that it is the de facto tool.
Because almost none of those people-- I shouldn't say that.
I should say more than half of those people don't have any systematic programming knowledge or experience.
But they want to do systematic things.
They want to evaluate how much is this real estate worth.
And then, OK, well, what if I own this percentage of it?
OK, well, now, what if it changes by this much value?
What will happen to my ownership percentage?
And this is like a simple set of linked relationships.
It's really easy to just put it in there.
And it seems kind of deceptively easy to get some first answers.
And I think that that's where sometimes it can go off the rails pretty fast, because people think-- I've got a couple answers here.
This seems like it could work really well.
How about I do it for a million rows?
Let's just copy the formula for a million rows.
What could possibly go wrong?
And I think you can very quickly run into that.
Oh, have you seen that too, Justin?
Yeah, I mean, there's-- for at least a decade now, there's various stats that 80% to 90% of all spreadsheets have fundamental miskeys in them.
There's been some ones that have made the news.
Studies have been retracted.
Wealth inequality studies have been attracted, because someone miskeeds something.
And the low bar to entry generally is good.
But when you start relying on scientific studies and medical research, then you get into some pretty dangerous territory.
Yeah, Justin, you sent me a headline.
Study finds 94% of business spreadsheets have critical errors.
Yeah.
Excel really makes it easy for you to open something.
If it's a pretty common joke that Excel can't recognize dates, but that is a benefit a lot of time.
You might have some janky tab separated value sheet, and it's mislabeled as an Excel file.
And you try and open it in Excel.
Excel's going to do that for you.
And it's going to really make it easy to make stuff look like it's working when it's not necessarily working.
I guess I wonder, though, if we say 94% of business spreadsheets have critical errors.
Mike, is your gut that 94% of software deployed in businesses has critical errors?
I mean, is this just a software thing?
That's my question.
I think that's probably-- I don't know if anybody has published numbers on that, but it seems like that's probably a pretty safe assumption.
My hope is that you'd be like, well, not my code.
I mean, I think it depends a little bit on what you mean by an error.
If you have 100% test coverage that's passing, you probably still have errors.
But I don't know how you quantify that.
I guess, Justin and Bob, would you be up for-- do you have any examples that leap to mind if I say, what's the most complicated software programmatic thing you've seen in a spreadsheet?
Do you have any examples that leap to mind?
I mean, are we talking about macro level stuff, like VBA stuff?
Or-- Anything I've never written anything like this.
I've written some really over-conflicated janky formulas to figure out down to the hour how people are spending their-- in terms of planning, in terms of circular formulas, there's lots of ways to get in trouble.
And I'd say when you get into VBA, the door opens wide.
And you can-- there's event triggers in there.
So it's stuff that triggers when you highlight a cell or open a sheet.
And it just triggers a whole cascade of stuff that you did not plan for.
And you've exploded your workbook.
You've suddenly added 17 new worksheets because you clicked on the wrong cell.
Yeah, you can get into some real mayhem in that area.
Do you have any examples, Bob?
Yeah, the most complicated is probably-- was written by one guy who was working at a major bank, and he was trading some interest rate derivative products.
This was like early 2000s.
And he did not like the internal pricing system.
And the internal pricing system was written by a team of probably 70 people on an objective C on a next platform.
And he single-handedly rewrote the whole pricing in Excel using VBA and formulas.
And he set it up so that it could run with real-time quotes that streamed into Excel.
And so he replaced a whole derivative-- interest rate derivative pricing infrastructure with one giant Excel spreadsheet that he then figured out how to run across Citrix machines so that he could not have it on his local box and get the compute.
And then he split it into four sheets so that he had individual compute parts that then got consolidated on a master sheet on his local box.
So MapReduce-- he implemented MapReduce in the spreadsheet.
He implemented MapReduce on spreadsheet, along with all the derivative formulas, like basically from scratch.
Wow.
And he was pricing and trading multiple billions of dollars on that one sheet per day.
You tell me this, and my stomach kind of gets all knotted up when I hear it.
And like everything in Excel, there was no source control.
Like if he deleted those files, it was gone forever.
That gets to the thing I want to ask in a second.
But let me get Mike's take.
Do you have a take on a response to this, Mike?
Is this anywhere in the universe of what you've seen in the past?
That's pretty extreme.
And it does make my palms sweat a little bit.
Yeah, but the one that shocked me-- so my first exposure to Excel and sort of a professional realm was working with consumer product companies.
And this is sort of the beginning of a pattern that I saw for years afterwards.
But so these companies, they are literally some of the biggest companies in the world.
They're global companies.
They're billions of dollars in revenue, hundreds of thousands of employees.
And they're bread and butter.
The thing that differentiates them from other consumer product companies are essentially the formulas they use to make their products.
And these companies were storing these formulas in Excel spreadsheets.
So somebody would have the formula for some laundry detergent.
And then they would pass it over to a friend of theirs and who was working in R&D.
And that person would modify it and do some experiments.
And so at some point, you've got like 60 spreadsheets, all of which have variations of the formula for your flagship product.
So it's not quite the case.
They were doing some fairly sophisticated statistical things, design of experiments, and analysis of variation, that kind of thing.
But just the fact that the keys to the kingdom were spread out across a bunch of different laptops or desktop machines.
And these giant companies were just horrifying to me.
I think that gets at what I-- when I hear Justin and Bob tell their examples, I immediately wonder, how do you debug that?
Is it hard to evaluate if it's correct?
How do you debug that and make sure that it's correct?
Good question.
Yeah, you don't have very many mechanisms.
I mean, there is change tracking and various things that are baked in there now that has improved.
But at the same time, some of those things don't play well with other Excel features, like tables.
Like if you put a table in a document, it just totally breaks like sharing.
So yeah, it's better.
But it also is not very robust.
Like once you overwrite that, it's not going to really save anything.
So you have one chance to save your break in.
So change tracking, that's almost along the lines of what Bob's talking about, software development, lifecycle stuff.
Like there's no version control.
How do you roll back or fix something?
Bob, have you found out really-- I mean, the one you just described sounds horrifying to me, because if something's wrong in there, how in the world is anybody ever going to find it?
Yeah, that guy was just super diligent about-- basically, he was doing stuff where he was interacting with a market that was a real market.
And so if his sheet said something was mispriced, he would start by assuming his sheet was wrong.
Then he would backtrack into what might be wrong about his sheet.
And then after doing that 100 times, he would eventually say, OK, my sheet's right.
The market's wrong.
And then he would take action.
And if that kept losing money, then it was like, no, she's wrong, because I keep getting it wrong in the market.
So he had this pretty short cycle of interaction where he could find out that, oh, what I'm doing is actually wrong.
So something's wrong.
But how to figure out what that something was was a big mess.
I mean, I've seen people make this joke about the I-Python notebooks or the Jupyter notebook, like untitled, untitled copy one, untitled copy 237.
So Excel predates that.
I definitely remember the Excel copy with the date on it or the copy with whatever.
And there were times where I didn't know about source control long before I knew about source control.
And my solution was, I'm going to save this with today's date and just put it in this other spot.
And then I'm going to change the original.
Oh, it broke.
I wonder what's different.
And like Justin said, there was no way to track what I changed.
But at least I could roll back to what I had done before and deal with it.
The one other thing I want to bring up-- maybe this isn't the perfect spot for it, but I think all the people that I-- back in my finance days that I would interact with that were obsessed with Excel, they were obsessed with the fact that they could, quote, "see all the data."
They really like this idea that it's all right here.
It's all in-- even if it's a million rows, I can see every row.
I can see every column.
In theory, if I hit F2, I can see the range.
And it doesn't look like it's off by one to me.
I can see it.
And it's not hiding in this database that just tells me, oh, you returned 1,253 rows.
It's like, OK, are those the right rows?
They really felt this comfort that they can see it everywhere and that they can see how stuff is being composed.
And down the line, that always made it hard for them to get off of Excel.
It became a gateway drug for them.
Like, they just-- they couldn't quit because they knew, well, if I go to any other programming language, I won't be able to see my data.
And I don't know, maybe Justin or Mike, like, have you guys experienced that where you've talked to users and they're like, no, I have to be able to see my data?
Yeah.
I mean, that's like-- that's a feature is that Excel is more of a document editor than it is a coding platform.
And what you're seeing is a document.
Everything is visible.
And you can control F and find the row that you're looking for.
And certainly, you can do that with a data structure or something in code.
But the bar is way lower in terms of finding the specific case that you're looking for.
Where it gets kind of tricky is that people then start wanting to build interfaces in Excel.
And Microsoft had a whole other product for doing data and interfaces that's kind of like being wound down, essentially, at this point.
But yeah.
And it's everywhere.
You don't have to worry about a license.
You don't have to worry.
Does everybody have this?
Everybody has it.
If they're on office, they have it.
We've seen that, too, with financial people, especially.
Eric and I work with underwriters.
And even if you give them a really nice web application that presents data to them, they still want it in Excel.
Because they can filter.
They can sort.
They can select a column and get a sum or an average.
And you take those things away from them and they feel like it's a lesser experience.
The flip side of that for us has been that collection of spreadsheets then becomes a database, which is largely unsearchable and unindexable.
So there's always been this tension.
And I've seen this in three or four different companies now where the financial people just desperately want this Excel spreadsheet interface.
But they also want to use the data as sort of a database.
And it just doesn't scale.
It doesn't work well.
Yeah.
Google is kind of trying to push that scalability thing.
They're trying to have apps based on their sheets and trying to really lower the bar to getting a viable app.
It seems kind of more sketchy.
Like Microsoft is very aware of the security issues of people being able to copy off your data and people having access to your data.
They've built lots of security into their products.
So maybe this is less of an issue.
I listened to a podcast called Advancing Excel as a Programming Language with Andy Gordon.
Simon Payton-Jones.
This is from March 2021.
Simon Payton-Jones is the guy who originally wrote Haskell.
So Mike and I would know his name.
And Andy Gordon, I guess, had also been involved at a graduate level in functional programming.
And he's this person who's running this Excel department.
And in relation to what you're saying, Justin, they were talking about this.
Andy Gordon was saying, Excel's a great place to learn programming.
So the quote he gives is, I actually believe the spreadsheet environment is a great way to learn because it's so live.
You make a change, you can immediately see results.
And that's pretty much what you're saying, Justin.
Yeah.
Another tangent.
It reminds me of the cellular automata.
Like you just drop in this Boolean thing and you can see how it progresses automatically, programmatically.
And yeah, it's like a scratch pad, essentially.
I love it.
I want to go back to thinking about it, too, as a programming language.
Because the thing that concerns me, when you're all telling the stories you're telling about this spreadsheet you've seen, is it seems hard to verify the accuracy, the correctness.
That's the thing I'm mostly concerned about when I write programs.
Is this right?
I don't want to ship something.
That's not right.
And I have all this stuff baked in around it.
I've got tests.
I've got deployment.
I've got a sandbox environment.
And I've also got software development lifecycle.
I've got Git.
I can see all the changes in the history.
You don't have any of that here.
I read at Wikipedia that-- this is a quote from Wikipedia-- "Despite the use of 15-figure precision, Excel can display many more figures up to 30 upon request.
But the displayed figures are not those actually used in its computations."
So for example, the difference of two numbers may differ from the difference of their displayed values.
Although such departures are usually beyond the 15th decimal, exceptions do occur, especially for very large or very small numbers.
And finally, they say, "serious errors can occur if decisions are made based upon automated comparison of numbers, as equality of two numbers can be unpredictable."
That sounds kind of scary, right?
Yeah, I don't really work in areas where that matters so much.
I'm stuck to two or three decimal points at the most.
So that is in the back of my mind, but I'm not doing astrophysics or medicine.
So-- Yeah, in the podcast I mentioned, they said that the code base-- in C++ code base, it actually predates IEEE arithmetic.
So it doesn't really follow this.
And Bob, you had sent me the Excel eSports competition.
Yeah.
They have these guys who are battling.
Have you watched that?
Where did you encounter that?
I haven't really watched that.
I-- with a little bit of humility, I'll say there was a point where I was one of the Excel masters among the domain of people I knew.
And that came from the fact that I worked for a particular fellow who just short of slapping your hand if you reached for the mouse when you were using Excel.
So there was a lot of verbal-- being verbally berated, because he perceived that as being inefficient.
He was actually a Lotus 123 user.
So not only did you have to use keyboard, you had to use the Lotus keyboard shortcuts, which you could enable in a third level menu on Excel, and still can to this day.
And I do on my local box, because they're the ones I know.
And so when you-- you couldn't sort data by clicking on the Sort button.
You had to do the alt sequence of letters, which I can't even think of what they are.
But I know them.
It's like playing the piano.
I know the movements on my hand.
Not on that right.
This guy sounds like an e-max user.
Was it e-max or-- [LAUGHTER] Hi, OK.
Do we get on camera?
I don't think he ever knew e-max.
What's that, Justin?
Do we get on camera in front of a crowd?
That's like my largest anxiety trigger.
I'm always on calls where I'm like driving around a spreadsheet.
And I know all the shortcuts to get places.
And then immediately when I'm on a call, they just all fall apart.
And I'm like, I've somehow backwards sorted everything when I'm trying to do a certain row.
It's a fun time.
Yeah.
Bob's story reminded me of this guy I know who is a hardcore e-max person.
But also, for some reason, always use the VIP bindings in e-max.
Oh.
So-- Yeah, I love e-max.
I love not ever using a mouse.
It felt like every time I reach for the mouse, it's just inefficient and gross.
Right.
So you can get to a point where you're just like a super analyst, right?
People bring a question to you.
And you say, oh, give me the data.
And then they email you the data.
And like literally in less than 10 minutes, you send them back the answer with the email.
And then you're like, wow, I'm pretty good at this.
And so at one point, I was talking with some colleagues.
And I said, you know, I'm kind of pretty good at this.
Like, I wonder if there's a competition.
And the guy turns around.
He's like, oh, yeah, there's an Excel.
There's a World Sports Excel competition.
And I looked it up.
And that was the link I'd sent you, which is, you know, there's people who are really, really serious about using it.
I watched a few minutes of it.
They get a bunch of problems.
And the problems often are like, well, you can't use floating point IEEE math.
So you got to figure out how to hack your way around that.
And they're on a stage.
And there's six of these guys.
And I watched it.
And I was like, you know what?
They got to put this in the Olympics.
I don't want to watch black football.
I don't want to watch this.
But here's the thing you told me.
Bobby said Excel is a functional programming language, the most widely used functional programming language.
And if you listen to Simon Payton-Jones, someone who knows the originator of Haskell, he says Excel is the world's most widely used functional programming language.
That's kind of shocking to read.
I mean, I used to see it all the time, where people would-- again, in finance, they would set up the formula before they put values in.
Which at the time was really confusing to me.
But they would be like, here's an empty cell.
I'm going to put in the principal value of this loan.
And here's an empty cell.
And I'm going to put the lending rate.
And here I'm going to calculate all my payments.
And they would start writing all these payment formulas.
And they knew the payment formulas, because they knew that math from their MBA.
And so they start writing out the formula with all the exponents.
And they would write all these things, blah, blah, blah, blah, blah, blah.
Do it.
And then they put the values in.
And they're like, oh, I know a 5% loan of $1 million should be this number.
And then they'd look.
And they'd be like, oh, no, there's a formula error.
And then they would fix the formula error with those values.
And then they would say, OK, apply.
Now I've got my function.
It takes two floating point values.
Apply it to this huge grid of values.
Give me all the possible outcomes.
And when I finally started learning about functional programming, I was like, oh, this is the same thing.
They're just setting it up abstractly.
And then they're applying it to whatever proper inputs and getting outputs.
A lot of times-- That's by necessity because of how janked up the function editor is.
It's this tiny little box.
And so if you have a formula that is nested a couple times, it really pays off to get it all out in advance and then drop in your references.
Because once you drop in your cell ranges and everything, that formula comes real hard to read.
They've gotten a lot better about it now.
You can quasi format it and put line breaks and stuff in there.
But for a long time, that was the only way you were going to get a triple nested formula to work was by getting it all the logic out in advance and then dropping in your references.
So if somebody says to me, Excel is the world's most widely used functional programming language.
And we're talking like formula languages probably here.
And I look at the landscape of business, and I say, wow, this thing.
Everybody uses it.
It's a pretty common skill set.
But my level of Excel is real paint by numbers compared to everybody else on this call.
If I looked at the stuff you guys are talking about, I would be pretty confused.
I can do a sum.
Here's my question for you all.
If I have these skills and I'm working in Excel on a day-to-day basis, am I a programmer?
Are those people programmers who are building these fantastically complex Excel spreadsheets?
I thought I was a programmer.
But you don't now.
I self-identified at the time as a programmer.
In hindsight, I didn't understand some very basic concepts.
But at the time, I thought, yes, I am a programmer.
I'm making computers do the things for me I wish to by writing words into them.
This is a programmer.
What were the things you didn't understand at the time that make you feel differently now?
I didn't understand any of these.
I didn't understand how to abstract stuff functionally away.
I was really in the formula space more than the VBA space.
So I didn't really understand functions.
I didn't understand objects.
I didn't understand abstraction.
I didn't understand how to make stuff reproducible in all these different ways or building tests.
We had ways of, again, very rudimentary spreadsheets.
We would check things by doing them two different ways and then having two values that would be outputs.
And then we would check if they were equal.
Of course, we had this-- I didn't even know this 15-decent point precision error problem.
But we would check that they were approximately the same.
And then that would say, oh, we haven't screwed up our spreadsheet in some way.
But there was a lot of just basic, in some sense, basic understanding of how to think about programming that evaded me entirely.
I thought it was a programmer.
I would think that they have these adult coloring books.
If someone is spending time thinking about how they're applying the colored pencil, and they're thinking about the color scheme and what that means and how that's going to affect the coloring book, are they an artist?
And it's like, I mean, they're doing many of the things that an artist is going to think about.
But they're also-- they're doing an adult coloring book.
And there's a point where it doesn't matter, especially in terms of the coloring book, that's more for people's recreation, people aren't generally selling their colored endpages or anything like that.
But they're still doing a lot of the skills involved with that process.
I'll tell you something that motivates the question.
I don't want to be someone who is putting up barriers, because I am a self-taught programmer.
I feel like anybody could come and do what I do if they put the time in.
And I don't want to look at people who are doing what is effectively programming and say, nope, that doesn't count.
I don't want to be one of those people.
So I don't want to make that mistake in this case as well.
When I was working with the consumer product people, I was in an organization where most of the people were doing C++ and Java programming.
Then I encountered these people within the consumer product companies that were doing programming.
They were the programmers and software engineers of their companies.
And a lot of times their focus was VBA or Visual Basic more generally.
And a lot of times the Visual Basic ecosystem was actually running their businesses, even to the extent of taking data from spreadsheets and using that to drive machines through VBA systems that the machine manufacturers had provided them with.
So at the time, my thinking was, well, sure, that's programming, but it's probably pretty niche.
But when I started to look at the numbers, that type of programming is a vastly larger employment than the sort of people doing software development in software companies.
So I really do think of doing Excel, especially if you're combining that with VBA stuff, as being not just programming, but probably the more common programming in the programming software engineering world.
I've worked with someone for a few years now who, when I first started working with this person, she had built these pretty fantastically complicated Excel spreadsheets using VBA.
They would query an on-prem SQL server.
They would create tables and charts and all kinds of stuff.
And the thing was so slow to run.
It would take like 10 minutes to render.
You'd open the file, and then you'd go get coffee or something.
And people would use this for business analysis is really important in the business.
And I remember looking at it at a time and thinking, wow, this is really the wrong tool for what they want.
But this is a person who is fundamentally a programmer.
And she actually became a totally savvy and competent Java programmer.
And I never had any doubts at the time.
It's like, well, you're writing very complicated stuff.
Let's just get you a different tool.
And now that person works as a Java developer today.
So I didn't have any doubts about the skill or quality of the work in there.
It just seemed like this is the wrong tool for what you were trying to-- the problem you're trying to solve.
When you get into the VBA side of it, then that's fully programming.
But if we're just sticking to formulas, yeah, you're not really thinking about types or objects or anything like that.
And it would definitely limit you if you were going to try and be a programmer in any other area.
I do think that the VBA was always an interesting world because I was first introduced to it by the record macro button.
And that was like this kind of magical tool at the time.
Because you could say, oh, I do this on Excel every day.
I select this region.
I copy it here.
I do this thing.
I do this total.
I do this other thing.
I wish that those 10 steps were automated.
And at some point, someone came over to my desk and said, oh, you can record a macro.
And then I did that.
And then they said, watch this.
We hit F11 and the VBA editor suddenly appeared.
And it was like macro one.
And then it gave me a bunch of code.
And this whole light bulb goes off where you're like, wait.
All those actions are just these commands.
Wow, I was off by one cell.
I don't have to re-record my macro.
I can just change that value from X14 to Z14.
And now I'm in the right place.
And I think that that, again, had a lot of appeal and made it very, very accessible to people to start trying to do stuff in Excel and in VBA.
That's interesting.
That's pretty much how I started shell scripting.
I would type a command, type a command, type a command, and they go, wait a minute.
Then I would put them all in a shell script, and I would just run the shell script.
I have a question for Bob and Justin, maybe taking us a little off topic.
But so I've had this experience.
I'm nowhere near as sophisticated a user of Excel as the two of you are.
But I've had this experience where people learn certain things about Excel, and then they start feeling like they are more expert, or at least intermediate.
And the example I always give is people learn about VLOOKUP.
And they start to think, OK, now I'm competent at Excel now, because I know how to use VLOOKUP.
So anyway, my question is, what distinguishes somebody who's really an expert with Excel from the type of people who have to look up what the parameters to VLOOKUP are?
Is it pivot tables?
Yeah, I mean, yeah, with the new features like power query and stuff and knowing that you can make relational data.
But I would say the distinguishing factors actually knowing when it's a bad idea.
It's like this isn't going to work in Excel.
This is going to break.
Because yeah, people get that VLOOKUP juice, and they're like, oh, this is the solves.
I mean, because it solves a lot of wrote things that people end up doing in their business processes.
But it's also very volatile.
And there's better alternatives to that.
But they're locked in on VLOOKUP, and you'll have a very complicated multi-sheet thing that's all running on VLOOKUP that just runs like dog doodoo.
That's kind of the interview question.
Bob, do you have an answer for that one?
How do you know?
What puts you in the pretty good advanced and elite level?
I think it's all those keyboard commands.
Oh, OK.
So the-- Pretty much.
Like if you-- it's one of those things like so many-- I guess I'm going to connect this to music, which I don't really-- which might be dangerous, because that's not really my forte.
But when you sit next to somebody who knows how to play the piano, and then they're playing the piano, and you say, oh, what about this?
And then they're like, yeah, yeah, yeah.
And they just start doing it on the piano.
They're not like, wait, let me look at the music.
I think I have this sheet music.
It's somewhere in this binder.
Let me find it.
Oh, yes.
Let me think about-- OK.
Here's how you play that.
Instead, they're just like, yes, there's the relationship between all these notes.
There's a relationship between all-- I have this mental model of how all these things fit together.
And you want to hear this, and I produce it.
Like the people that are at that high level of Excel are the ones who basically say, wait, you want to do what?
Oh, OK, yeah.
I have this mental model of how everything in Excel fits.
And this is where my data is.
And now you want me to produce this thing, and I just produce it.
I don't look up the formula types, or how they fit together, or any of that.
I just start composing immediately on the fly.
As you're describing it to me, I'm putting it all together.
Because I see what you're saying.
You want to do a V, look up, an H, look up, and then you want to do a split, and then you want to do this, and then you want to do a total.
And then you want it to all adjust when I change the size of the file.
Oh, no problem.
OK, I'm going to do this.
I'm going to make this change.
And then it just shows up.
Like that's the next level operator.
I really like what Justin said.
Knowing when something is a bad idea.
I like this for two reasons.
One, it's incredibly reminiscent of software engineering interviews.
This all happens a lot, software engineering interviews.
They'll say, tell me about this tool you've used, or this programming language, or something.
And you describe it, maybe a framework.
And then the question coming is, when would you not want to use this?
It's almost always the question that get asked to see how far you've gone with the thing.
Because if you've taken a thing far enough, you've gotten to the point where you're like, oh, this is a bad idea.
And that's what I'm interested in.
What are the moments where you've seen where there's some logic, some business core functionality that's encoded in a spreadsheet?
Do you have examples where you encountered those and you knew either right away or somewhere along the line, this is an awful idea.
This should not have been built in Excel.
I want to hear those examples.
Excel can be used as a database.
It can be used as a low code, no code platform.
It could be used as a software development platform.
Surely there are examples where people have gone too far.
How Excel executes its formulas, it starts from the bottom right corner and works this way up.
And volatile formulas like VLOOKUP that are not really tuned, once you get a whole column of those, it will just-- you'll hit a brick wall.
And it's such a brick wall that the whole application locks up.
This is generally when I run into bad ideas.
It's not like, oh, it's unsafe or this is not usable.
It's like you've just made it so you can't even open the worksheet anymore.
You have to start over because-- or you have to go into safe mode.
I would say the other one is that people get clever with turning off calculations and turning off the display.
And it's hard to tell when those things have been turned off.
And then you'll end up turning off the calculations in the workbook and you hand it to someone else.
And it just stops updating and no one has any idea why.
Kind of like productivity, like performance hacks that just go sideways.
Bob, do you have any examples of-- No, no, no.
You have gone too far.
What have we brought?
I definitely have encountered a few, I think.
The one that probably has the most extreme perspective, I'll kind of set up this way.
If you've seen or read the book, the big short, and you've learned about trading mortgages and people interacting with mortgages, there was a group, quote, group.
It was two individuals.
They worked for a major bank that I worked at at the time.
And they were trying to trade mortgages.
And they thought, oh, we have this great idea.
We're going to do these analyses on mortgages.
We're going to do it in Excel, because that's the programming tool we have access to.
So they would pull this data from one other-- probably Excel spreadsheet.
They would basically tell them about these individual pools, and they would put it into the spreadsheet, and then they would run a bunch of Z-score calculations.
They're trying to find out whether any of these things are outliers in terms of their performance.
And then they said, we're not doing this at enough scale.
This is too small, so we're going to scale it up.
What we need is to get these FTP files that get published one the month into Excel.
So then they wrote a macro that basically did the FTP, pulled the file, dumped it into Excel, and then ran this thing on all of those files.
But each file was too big to fit in Excel.
So it did this complicated thing where it would open the file, pull as much as it could, mark a position, close the file, do the analysis on what was available, open the next file, add that in, do a new calculation.
So it would do this, like Justin's saying, it would become this sort of disaster of all these individual computations getting put together.
Of course, the only people that knew how this worked were these two people who thought this was a super proprietary thing.
They didn't want to hand it off to anyone else.
They didn't want to describe how it ran.
And it finally became a thing that ran over a multi-hour window.
So this thing would get published, and then they would hit a button.
Like Justin said, they would turn off calculations.
They would have a macro that would shut down calculations, would do all of this complicated load.
Then it would do all of these analytics.
Then it would calculate all of these values.
Then it would do turn on calculations again.
And then it would dump its final report into yet another Excel spreadsheet that would then have even more analysis points built on top of it.
So that in real time, when someone said, do you want to buy mortgage pool 1234, they could type in pool 1234.
And then it would do, of course, a VLOOKUP and try to find in its file, where's pool 1234, and what are my analytic numbers.
And as you would imagine, on top of all of this complication, whenever there was an error or one of the input files was raw or changed its format, the whole thing would just collapse.
And then you would have Excel in some weird state of semi-computation, where it would just be there partly computed, with formulas turned off, and hung on a particular machine.
And they had an entire dedicated box under their desk that was just for running this set of sheets.
It was only-- it was like a cloud server for them under their desk.
This is the most stressful episode we've ever recorded for me.
I'm not taking this information very well.
So there was another one that was almost as good, where they started by thinking about this world of, I want to pull these positions, and then I want to calculate this value at risk formula, which was like a metric for how much risk the firm was going to have on a particular desk or a particular trading group.
Again, this is a finance example.
And then they were like, well, we're just going to make this a prototype.
So they made this spreadsheet, and it was the prototype that did this calculation.
And then they realized-- I kind of predict what's going to happen.
They simply said, can we ship the prototype?
Is that what happened?
There you go.
They shipped the prototype.
The prototype eventually became the reference code that everyone used.
And then they made copies of the prototype sheet and spread that across multiple groups.
And then they said, we need this whole workflow to be automated.
So they created a VBA script that would automatically-- they would automatically launch the sheet.
It would run all the calculations out of the prototype sheet.
It would generate an emailed report.
It would email the report out of the sheet, and it would save itself and shut down, except when it didn't work.
And again, both of these were-- both of these-- they're both like database problems in the end, right?
They both got to database problems, but they kind of boiled the frog into databases, and then suddenly realized they had this database and realized that reworking it was just quote too hard.
Yeah, those are pretty bad ones.
Justin, what do you think about these examples?
Yeah, that sounds entirely horrifying.
I've had the privilege of not working at that scale.
I've just been trying to manage a group of engineers or run some correlations for myself that they could fund at genuine programmers.
So yeah, this is truly horrifying.
Mike, Bob comes to you with his hands together, and he says, hey, what have we wrought?
We built this thing.
Can you help us?
How do you help him?
What's your suggestion for him to solve whatever whose value-- $4 billion of value is being traded every day on these spreadsheets?
What's the fix?
That's a very good question.
I guess what I've seen happen several times now, in the course of my career, is that a company will come to an engineering organization, and they'll have a situation like this where they say, we built this thing with spreadsheets, and there's two situations that happens.
One is that they're doing calculations that are vastly too elaborate for the spreadsheet to handle.
I saw this in my days when I was doing molecular modeling stuff.
People would try to put very complicated math into spreadsheets and then try to apply it to a million rows.
And it was a sensible idea.
It just, once it got to a million rows, it really needed more horsepower.
And then the other one that I think is more common is that people start to use Excel for things that, as Bob said, are really database problems.
They're things that should have proper ETL tools and proper databases behind them and some understanding of how to query those things.
And then maybe you produce a spreadsheet at the end for somebody to do analytics on.
And I've seen that-- we've joked about this in the past.
But I've seen this so often now that we kind of joked about starting a company that would essentially be convert your Excel spreadsheets to databases as they say it is.
I think the point Bob makes is the reason why people don't do this is they want to see the data.
Somehow I know it's more correct if I can see it.
Is that what you were saying, Bob?
Yeah, that's the experience I had.
And the other experience, the other pushback I always got-- in fact, I was put on a project at one point for six months, converting one of these Excel disasters into an access database, which was very simplistic, and putting a GUI on the front end and controlling some of the inputs because they realized that they had been doing a bunch of errors in entry, data entry, and it was a bunch of problems.
So we tried to fix all of it.
And the problem was that no matter how far you got along, no matter how much of their specification you built, there was always more that they weren't telling you about that they wanted.
Kind of like Mike said, they wanted to be able to do another total of these two numbers or figure out the standard deviation of all of these-- this column of things, which they weren't telling you because they didn't do it very often.
But when you gave them this new UI that was database-driven on the back end, they were like, oh, no, no, I can't use this.
I need these other things.
And so then they wouldn't switch, wouldn't switch, wouldn't switch, wouldn't switch.
And now you are in that terrible situation as a programmer where you invested all this time into building this tool that is better and faster and more effective, and your user simply will not use it.
They just keep opening Excel.
And that's-- I don't know how to solve that puzzle.
That sounds very familiar, what you just described.
[LAUGHS] Yeah.
There's even-- so the tools that I've seen are things like-- you end up with things like Tableau or Power BI, where you can sort of do analytics, but the data is in a database.
And even that does not quite seem to do the trick.
It seems like people still want to get all of the raw numbers out into a spreadsheet.
And there's probably some utility to it, but it does also seem like sort of a psychological comfort.
Yeah.
I mean, visual inspection is a quality step.
And so when you're losing that, there's more quality that you have to build in.
And visual inspection is a very powerful quality step.
And so it's kind of an uncanny valley there.
For like, the quality has to be way better and has to have a lot of interface features to reinforce that.
Otherwise, people are not going to want to make that jump.
It sounds almost like this.
Excel is just like a victim of its own success.
The interface itself is so good.
We didn't even talk about plotting and graphing and things like that.
Also, these other benefits that you can get out of it.
The interface is so good.
And it gives people this immediate ability to see the data.
And maybe I put C in quotes a little bit, because you can't see it million rows.
But it gives this ability to immediately see the data.
And in seeing it, you feel some confidence or feel some way to visually validate it right away.
Whereas as a programmer, I have to build that for you.
And I become the medium through which you see your data, if you're not used to querying a database, for example.
So is Excel a cult?
We can cut that out.
[LAUGHTER] Is there anything that you guys ever think Boya Wish Excel would do this and doesn't do it?
40 million rows, obviously.
As a programmer, I can't do it.
One million.
They had this thing where they introduced tables.
And it was neat, because you could have a header row.
And then in your formulas, you could then reference just the name of the header row.
And you wouldn't have to worry about getting the right range.
Except then, when you would copy a formula across a range of cells, because they changed how it worked in the formula, you couldn't anchor anything.
And so you would copy this formula horizontally.
And it would automatically move your references.
It would just take-- instead of referencing the original column, it would take the column adjacent to it.
And the next column over, and the next column over.
And the workaround for that was really klugey.
You had to specify that as a one column as a range onto itself.
And yeah, tables have made a lot of good functionality.
You can sub-total and do all kinds of neat stuff.
But as I mentioned before, it really breaks sharing of documents.
You have a hard time sharing the document when it has a table in it.
So I would love to see them figure that out in a way that, yeah, it's working.
Microsoft, if you're listening, we've got a product request.
I would say that as someone that has experience in Excel, there's one thing that a user can do that will just light me up and drive me crazy.
And it is a hidden column or row.
And I feel like it is a functionality that should be eliminated-- should have been eliminated 25 years ago, and it needs to be eliminated now.
It serves no purpose except obfuscation and confusion for everyone else.
People will use Excel as a layout document.
So that and merge cells, it really breaks the whole purpose of having the cells in the first place.
Yeah, so if they could figure that out, that would be great.
You're saying merge cells are evil.
I do that all the time.
Merge cells are pretty bad, but not as bad as hidden cells.
It's like, Eric, if you were coding something in Python and then you had hide function, it just took those rows of the file and removed them.
And now when you said, but tell me where my function calc square is, it just couldn't find it.
It didn't exist.
It was no longer available in Git.
It didn't show up in your IDE.
I mean, it's the worst idea they have ever had.
I'm going to spend the afternoon trying to implement map reduce cells.
Just know that it has been done.
You're not the only-- It's a super entry.
You're not the only one.
It has been-- and it was done in like 2002, 2003.
There was a time, maybe 10 years ago, 15 years ago, where people were doing-- figuring out how to show movies and play Doom in Excel.
So all things are possible.
Yeah, there was a guy who insisted on using it for presentations.
He didn't use PowerPorn.
He used Excel.
Each slide-- yeah, this is what he did.
He insisted on using Excel for everything, if possible.
And that included all of his presentations.
And so each tab was a slide.
And so he would have like 75 tabs.
And the first one would be like introduction.
And then he would do the key command to like shift the tab.
And he would-- he refused to use PowerPoint, because obviously Excel is a better tool.
He could put in formulas.
He could show graphs.
He could plot things.
He could write text.
Wow, when all you've got is a hammer, that's what that sounds like.
It's tempting because for design purposes, you can lay out a grid.
And you have like snapped a grid.
And you can do philosophically good design within Excel, because of you can change the grid dimensions.
And you can do really good presentation stuff, but then still in Excel.
So getting it out of there is-- you can export a bunch of PNGs and do all kinds of stuff.
Justin, what you just said sounds better to me than CSS.
Indeed.
I can vertically center a thing in Excel, because I know it's a million rows.
I just go to 500,000 whatever.
And boom, it's in the middle.
Well, I want to send a huge thank you to Justin Rooney, Bob Farzine.
Thank you so much for coming on the show today and talking with us about Excel.
It was absolutely phenomenal having you here.
I enjoyed the conversation quite a lot.
Yes, fun.
Yes, thank you.
How's it been?
So this has been Picture Me coding with Eric Aker and Mike Mall, special guest today, Bob Farzine and Justin Rooney.
We will see you guys all in the future.
Thanks so much for hanging out with us.
Ciao.
Bye bye.
[MUSIC PLAYING]