Picture Me Coding

The Value of Software with Irina Telyukova

Erik Aker and Mike Mull Season 2 Episode 42

Mike asked me the other day: "what's all this software actually worth?" Turns out he was talking about raw dollars. Who the hell knows? I said. Is this a thing anyone can answer?

Mike, then, dug up an academic paper from the Harvard Business School called "The Value of Open Source Software" where they tried to calculate the monetary value of open source software and then we needed a bunch of help just to read and understand this paper so we called our friend Irina Telyukova and she came on the show to explain basic Economics to us. Irina is a former Economics professor, and consultant in the field, but she's also run analytics and modeling departments at software concerns. It was pretty rad she came on to help us understand! 

References
- The Value of Open Source Software
- packages.ecosyste.ms

Send us a text

[MUSIC] >> Hello, welcome to Picture Me Coding with Erik Aker and Mike Mull.

>> Hi, Mike.

How are you doing?

>> I'm doing all right.

>> Mike, this week we have a special guest, Dr. Irina Telyukova, who is a real-life economist.

Hi, Irina.

>> Hi.

Thank you for having me.

>> Thanks so much for coming on.

Can you tell us a little bit about your background?

>> Sure.

I am an economist.

I was in academia for a number of years, specializing in the subject of consumer finance mostly.

I have since then done some work in economic expert witness space working in intellectual property litigation in particular.

That's a little bit relevant to what we're talking about today, and then have been heading up analytics in a fintech company for the last few years.

>> Broad experience there as an economist.

You've been a professor of economics.

>> I have.

>> All right.

That's great.

We have a lot of dumb questions for you, I'm sure.

>> I doubt it.

>> Now, I guess we should admit we have worked together in the past.

Mike and I and you have all worked together at various times for a couple of different organizations.

>> Yes, for a total of 10 years or so at this point.

>> Now, Irina, we'd like to talk about music on this podcast, and you are a trained pianist as well.

Can you tell us a little bit about some of the music you like to listen to?

>> Sure.

Well, I am a classical music nerd.

That's true.

I am a trained pianist, although very inactive one currently.

The music that I'm listening to mostly and sometimes play on the piano has to do with my two-year-old daughter mostly.

I have recently discovered much to my delight that these musical productions of various tales that I used to listen to on LP on my couch as a child are available on Spotify.

So that's what I'm listening to these days mostly.

The current favorite is Alibaba and the 40 thieves.

>> It's actually an excellent musical production.

If anybody's interested.

>> I want to mention too that like a lot of my friends, Irina and I originally bonded over shared musical taste, and we have a shared playlist on Spotify called Second Movements, which is all of our favorite sad, minor key, second movements.

>> So you've got like Cardi B and Charlie X, CX in there and stuff like that.

Is that right?

>> All of their symphonic works, yes.

>> Sort of the 18th century and 19th century versions of those things.

Yeah.

Lots of dead Russians and dead Germans.

>> Well, all right.

So Irina, you came on the podcast today to help us unpack a pretty hard question for us.

The question really is how do we put a price tag on software?

Now, Mike, you wanted to talk about this.

Can you tell us a little bit about where this question comes from for you?

>> Sure.

So we've talked a couple of times about open source and how it's an interesting phenomenon because it's so pervasive, but people generally don't get paid to produce it.

There's been a couple of high profile, I'm not sure exactly what you call them, but like license changes on popular products over the last few years.

So the key one's probably being Redis.

Redis changed their licensing terms and this fork called Valky came out of that, which a lot of people are switching to and Terraform did something similar.

I think Mongo did something similar.

>> Elastic search, yeah.

With the goal of making it so AWS can't just sell that stuff as a product easily.

>> Right.

They're trying to get around a loophole that would, these companies like AWS, the Cloud companies are essentially commercializing their free work, and they were trying to set up a licensing model that would essentially require those companies to either reveal their source code or pay a licensing fee.

What happened in a couple of cases, Redis and Terraform in particular, people just forked them to start a new product so they didn't have to deal with the licensing issues.

So there's an interesting question there about if these are things are so valuable why don't people want to pay for them?

Then I came across a paper out of Harbor Business School that's fairly recent from 2024, I think it's a working paper called the value of open source software where they try to analyze, how much value to the economy open source software adds.

That was part of the inspiration also for getting Merida on the show because I didn't totally understand what they're talking about, and figured it would be useful to have somebody who knew what they meant.

That's what we're trying to talk about today is how you judge the value of software and primarily through the lens of, how would you put a value on open source software that doesn't technically have a price.

Let's take a look at this paper that Mike referenced.

It's called the value of open source software.

It comes out in a January, it's a working paper he mentioned January 1st, 2024.

I'll just read the abstract.

We first estimate the supply side value by calculating the cost to recreate the most widely used open source software once.

We then calculate the demand side value based on a replacement value for each firm that uses a software and we need to build it internally, if open source software did not exist.

We estimate the supply side value of widely used OSS is $4.15 billion, but that the demand side value is much larger at $8.8 trillion.

We find that firms would need to spend 3.5 times more on software than they currently do, if OSS did not exist.

Obviously, if there were no open source software, companies would have to spend quite a lot of money to replace that stuff.

Their data comes from Linux foundations and a census of open source and a site called Built With, which is a tagline of Built With is, find out what websites are built with.

Irina, did you get a chance to read this paper, The Value of Open Source Software?

Yes, I did.

Can I ask you, if I'm completely naive to economics, what kinds of topics do you see from, say, like Econ 101 that I would need to understand to be able to comprehend what they're talking about in this paper?

Yeah, I think basically the key concept here is approach to value, the notion of value.

You've already read the two key terms here, the supply side value and the demand side value.

What we're talking about here is essentially what it takes to produce a good, and then what value it is to the consumer of the good.

In this case, the production of the good is the writing of the open source software.

The way they get at the consumer side or demand side value of the good is they think about the people who are using that software and what it would take to replicate it, i.e., write it again, which is an interesting interpretation.

I think valid interpretation, but lots of detail there.

Does value have a special weight or context in the sense of economics?

It's not just like how much dollars I'm paying for the thing, does it mean something else?

Well, it means a lot of things depending on how you, your context basically.

So definitely there is, there are these two sides of the coin supply side, demand side.

Typically, if you think about how prices are set in sort of your standard economic theory, you're thinking about the demand side and supply side kind of equilibrating.

They agree on a value.

If you have a perfectly competitive market that doesn't have any friction in it, that price is what the consumer and producer agree on based on the costs of production and the value to the consumer.

Of course, most of the time there are frictions to the market and so the equilibration encounters some challenges.

So, I mean, that's one way to think about value and it's mentioned in this paper.

You take price times quantity and you kind of get the overall value.

There are also things like welfare to think about if you want to think about economic value, which means how does this phenomenon or good or service impact people and society across the entire economy.

So value has a lot of different meanings potentially.

It just depends on the context.

It's very interesting you talked about equilibrium and supply side and demand side because that's what they're drawing out here, that there's wildly askew valuations based on supply side and demand side.

It costs a lot less to replace all the open source software once.

That's in the billions.

But then if every company had to write that software themselves, that's the demand side, it'd be in the trillions.

What's going on?

Like why is that relationship or ratio so wildly askew here and what's interesting about that?

So that by itself doesn't necessarily violate the concept of equilibrium because you may have a lot more consumers than producers and that's how you end up with the total value being so different.

I guess one way to think about it is if you are producing a good, let's say you're one company producing a good, presumably the overall value of that good is much greater than whatever you've done to produce it, assuming it reaches some sort of a mass market.

What's interesting here is thinking about demand side value just as the concept of replacing the thing once.

And then basically you have these software that's being downloaded multiple times by the consumer or the receiving companies who then want to build on top of it.

You could have an equilibrating price for which you sell the software and then nobody has to reproduce it.

That would be not inconsistent with the concept of equilibrium.

So I don't think that it's unusual to have this kind of a result where you have two different sides being different.

I think the more key question here is how do you think about the two sides and whether you're taking into account all of the key characteristics of the good.

One thing that occurred to me when I was reading this paper is the alternative approach that you would take, for example, in the context of patent litigation.

Let's say that there is an open source piece of software that somebody patented and if it's open source, it doesn't apply.

But if you were charging for it, then you could potentially think about the license value of this.

In those kinds of contexts, you would often look at what comparable software would license for or sell for to try to understand its value.

So let me make sure I understand what they're doing in the paper because I think I do.

But as an example, what they're trying to say in the paper is that if every company out there that's using say Nginx as a web server did not have access to Nginx, then they're working on the basis of every company that would have to rewrite it themselves.

So they're talking about the cost to each, the collective cost to all these companies to rewrite Nginx for themselves.

And what you're saying is there's an alternative approach where you could, instead of looking at the cost to them to rebuild it, you could look at what the licensing or per unit cost would be for that thing if some other company made it and they had to buy it from them.

Yeah, it seems to me that the alternative, at least, or relevant counterfactual to consider is if everybody didn't have access to it in this economy, somebody would write the software and sell it, right?

So everybody wouldn't have to rebuild it.

The question is then how do you price it and you're back to the foundational question.

What would you be willing to pay for this piece of software, presumably at least whatever you would have to pay to develop it yourself?

As a side comment, I just want to say how happy I am that we managed to work the word counterfactual into the podcast.

I've been waiting for that for a year now.

You're welcome.

So I found this document that talks about meanings of value, sort of like an economic history of Adam Smith and Ricardo and these famous economists and how they thought about value.

I found this is one quote that I kind of want to clarification on.

They say, "We mean value in terms of what other things the good could be exchanged for, value in exchange, and not the inherent usefulness of the good in terms of meeting our needs or desires, value in use or use value."

So this paper, my sense is this paper is using more of the first sort of thing, the value in exchange, what you would pay for it or what you would exchange for the equivalent software.

It's not really saying anything about whether or not it does something useful for the person who is using it.

That's a really good point.

Yes, I would agree with that assessment.

The discussion doesn't take it all into account the value of the products that these receiving companies derive from this, right?

So whatever they develop, how it's useful to society, I mean you could potentially sort of layer on.

So yeah, I would agree with that.

So I wanted to know, there's two things I'm trying to understand as I'm reading this paper.

One is, why are they trying to come up with this valuation?

They're saying if we just sort of imagine OSS doesn't exist, we have to figure out what the value of it would be.

I feel like that's kind of interesting question in the abstract, but is it really useful or relevant question?

Why are we looking at the value of open source?

And the second thing is, how do they actually come up with these numbers?

The 4.15 billion, the how many, 8.8 trillion, how do they actually calculate that stuff?

Did you have a sense for how they figured those things out in this paper?

Yes, the first question is a fun one.

Why is this a valuable question to ask?

Well, it's a huge part of the economy.

So to any economist or I imagine any person in software, it's really interesting to think about this being some very substantial portion of what the technology sector is working off of, which is in itself very valuable.

And so in that sense, it seems like it's kind of a fundamental question for understanding what so much of the economy is built on.

This is very high level answer.

And I've heard you talk about open source before in terms of, is it going to stay?

Is it going to grow?

Is it going to eventually get extinguished?

If these phenomena where you have people out of the goodness of their hearts continuing to maintain these codes, if everybody gets really tired at some point, you have to think about what it would cost to actually replace this.

They conclude, I mean, they have these big numbers on the value, but they also conclude that if we didn't have open source, the companies would have to spend a lot more time and money on software.

Is that something that could potentially affect policy?

Does it encourage the government to maybe support some of these open source projects that are understaffed, undermanned?

Maybe the Fed could set an open source contributor rate that'll go up and down.

Based on employment in the technology sector.

Yeah, I mean, I think definitely I imagine that open source has a lot of value even directly for government projects, although I don't know anything about that.

You would think, and I'm no one to speak about the history of technology, but we know that the government is generally heavily involved in things like the development of the internet and various other computer related technologies.

So I would imagine that there are potential policy implications there.

The second question, the big numbers.

I think this is an admirable attempt because what they're trying to do is go off of actual data of usage, both code writing and code usage in terms of downloads to estimate the value.

I think it's a great paper a lot.

I think it does a really good job.

You could sort of quibble with a lot of the details, but an exercise like this is inherently difficult and I think what they're trying to achieve is largely to say, look, these numbers are actually much bigger than any of the previous literature has concluded.

So it's still an open research question, basically.

The way they do this is they look at these two data sources, one of which looks at code bases of companies that are about to be acquired, if I remember that correctly, and just trying to understand what software they're using.

And then they're also looking at these website data to see what they're built with and essentially trying to get at what packages are in use and then how often they're being used.

So in that sense, I think it's a really valid approach.

We raised a couple of issues in our discussions of this, one of which is I was really surprised that Python received such a small valuation in this project because my particular exposure to software over the last 10 years has been very Python centric and I would have thought that it's much higher comparative value than what was highlighted.

And so the possibility is maybe these data sets are not fully representative and they're capturing some slice of things that could potentially be the value could be even higher if you extrapolate from that.

We could zoom into that, but I did want to continue thinking about this value question, value overall, what this means as an economist to you.

Here's what they say in the paper.

Understanding the value of OSS is of critical importance, not only due to the role it plays in the economy but also due to it being one of the most successful and impactful modern examples of the centuries old economic concept of the commons, which run the risk of meeting the fate known as the tragedy of the commons.

If you read this paper and you say, "Wow, this economic value provided by open source software is much, much larger than I would have ever anticipated or guessed before."

And then you come to Mike and I and Mike and I say, "Oh yeah, that's like 12 people who are just giving away their labor for free.

Does that worry you at all?

Do you have a set where, is there some sort of other conclusion that you draw from that?"

And my other thinking is I still fight open source, the phenomenon of open source to be weird that so much people are giving so much economic value away for free.

Is there any analog, historical analog, where there's so much economic value being created by volunteers just because they want to?

And you think of anything?

So I guess the first question is, are you worried?

The second question is, does this ring a bell with anything else that you can remember in the history of economics?

So hard questions.

Am I worried?

I mean, I think the sustainability is in question, right?

How long can this be sustained?

I guess the question is, are people renumerated for this work?

And if not, are they renumerated either monetarily or through some non-pecuniary value that they derive from it?

Like likes and subscribes and internet points.

Sorry, go ahead.

Or they really want to create a particular type of software in the world that doesn't exist and so they're willing to start it and then other people pick it up and that's value in itself to them.

And yeah, so how long is this going to continue?

In that sense, I guess that would be my primary concern about this.

In terms of tragedy of the commons, whenever there's a free public resource, there is the possibility of kind of the abuse of that resource.

In terms of looking for the historical parallel, I can't think of anything where it's specifically like volunteer based, what you're thinking about.

The parallel I thought of, which is a little bit different in flavor, is you have markets where you don't know the inherent value of something.

Can you give us an example?

Yeah, the housing market came to mind.

You're buying and selling these goods and no one actually knows what they're worth.

And the only way to arrive at that valuation is to look for what the comparable properties in your area have solved, right?

But those are also inherently unknowable.

And so you have this entire market that's based on the purchases and sales of goods.

There's not an obvious market price for them.

So the market kind of clears and there is sort of an equilibration process, but it's interesting that inherently there's not a known value to the good.

Wow, that is a really interesting comparison.

Yeah, but it's a little different from what you asked for in terms of volunteers contributing.

Yeah, I just will take that as a win.

If you can't think of an incomparable example, I will take that as a win for my theory that open source is weird.

It's a very weird phenomenon.

Is there a comparison to literal commons or people grazing federal lands or things like that, sort of stuff that gives you economic value that you don't have to pay for?

That's true, right?

So back to the commons itself.

That's paying for it, right?

You don't for the use, but the government is paying for it.

Usually you end up with examples of a government provided good somehow.

Maybe your tax dollars contribute.

You're paying in some sense, but maybe not the full value of it.

I don't know.

Water comes to mind also.

I'm thinking of water, yeah, like aquifers and stuff.

How's about, let's look at how they actually try to build formulas to construct their value of software.

I think if I'm reading right, they're using Kokomo 2 at the heart of it.

Is that am I right there?

Yeah, I think their approach to determining the actual cost to reconstruct the software is based on the constructive cost model or Kokomo 2.

And I think they're using the approach of essentially counting lines of code.

So yeah, when I first saw Kokomo 2, Mike, you introduced me to it a number of years ago.

It seemed like, oh, you count lines of code.

And then you sort of say, well, it would probably take about this much effort and/or time to produce those lines of code.

And then I saw this other thing and I was like, well, you have all these other factors you can add into your Kokomo 2 model.

But ultimately it seems like lines of code is the most important variable.

Is that right?

Yeah, you can use different proxies.

So Kokomo also talks about something called the function points.

But I think that's very difficult to do across different languages.

So they seem to have taken the approach and the paper of trying to count source lines.

Now Irina mentioned Python.

She said, why is Python so low?

And Irina, I believe that's probably because you've worked with data scientists, data scientists use and love Python commonly.

I'm going to read a little chunk from the results section of the paper.

The author's right.

We find that OSS packages created in Go have the highest value with $803 million in value that would have to be created from scratch if the OSS packages did not exist.

Go is closely followed by JavaScript and Java with $758 million and $658 million respectively.

The value of C and TypeScript is $406 million and $317 million respectively, while Python has the lowest value of the top languages with around $55 million.

So it goes at the high end, $803 million, Python at the low end, $55 million.

So Irina, your question, why is Python so low in value relative to Go?

That was a thing that surprised you.

Did you have any idea why they came to that conclusion in the paper?

I mean, from their methodology, I understand that it's not showing up very much in these code bases when they're doing their counting.

Can you bring up data science?

One thing that I think of, if that's true, if my view is super biased because I'm looking at data science work, it's possible that most of it is proprietary and wouldn't show up in these databases.

There's an alternative source for this, for usage data.

There's a site called Ecosystems, and there's actually a dot, so it's Ecosystems.ms.

And they have analysis on packages for a whole lot of open source ecosystems and even platforms, so like Debian and things like that.

And I thought, so maybe their valuation of Python being low here is based on the salaries of those working within the language.

I wasn't sure if they used that.

They had developer salaries they looked at, but I wasn't sure if they correlated with language.

So that may be involved.

Go salaries, the median Go salary is higher than the median Python salary.

There are, according to Ecosystems, there are actually fewer Python libraries published.

There's actually 594,000 packages published on the Python package index compared to 1.6 million packages for Go and 4.6 million packages for JavaScript.

And they're using Kokomo 2, so maybe there's a lines of code thing.

And this is where Mike and I can say, well, it just takes less Python to write that stuff than it would.

Mike, you're laughing.

I admit I have a little bit of skepticism about their dataset too, because so one of their datasets is this thing called Built With, which apparently they, apparently that organization, it's not something where they're actually analyzing the source.

They're apparently like actually hitting these websites and trying to infer from what they get back with the websites built with.

Anyway, that's what I'm understanding.

And you tell me that and I immediately have doubts about that.

Go ahead though.

Yeah, I would have expected to see things like Django and Flask be very prominent in that dataset and they don't seem to be there.

So I have a little bit of skepticism about that dataset in particular.

So I don't know.

I don't have like a contrary dataset that shows different results, but I do wonder where things like Django are in this data.

We have, I mean, ecosystems has an API.

We could start writing our own scripts to do some data polling here if we want.

How do you, Irina and Mike, how do you feel about talking about the value of a software by looking at the lines of code?

Does that seem like an okay proxy or is it not a great?

To an economist maybe, but I've been hanging out with you guys long enough to understand why there is skepticism there.

And it depends on how efficient your code is, I guess, for one thing.

Yeah, I have a lot of doubt about that as a good method for estimating cost, but to be fair to the authors of the paper, I don't think there's a better alternative.

So I don't think it's a case of they chose the wrong thing.

I think it's just a case of they chose the only thing.

Some talks about massive download numbers for these packaging ecosystems.

You look at NPMJS, there's 25 billion downloads of these 4.6 million packages.

You look at Go, and they don't have downloads stats for Go, but you look at Python, there's 42 billion downloads for 600,000 packages.

And you look at C#, there's 527 billion downloads for 650,000 packages.

These are just mind-boggling numbers.

What about downloads as a proxy for value?

Does that make sense?

Should we look at that?

So the paper uses this GHTorrent database, which my understanding was is trying to get it that flavor as well.

Basically all of the activity in GitHub downloads, commits, etc.

I do think, I mean, this is how you get at the idea of how many people are actually using these packages as a consumer.

That's my understanding of what's going on in the paper.

And there's a long tail of packages that are not widely used.

The distributional results are really, really interesting.

So they bring up a statistic.

I don't have it in front of me.

You might, something like 5% of developers are responsible for upwards of 80% of... 95%, something like that, I think.

Yeah, we'll pull that out.

So that actually corresponds to a lot of what we heard about open source.

Actually, I found the quote, "Over 95% of the demand side value is generated by only 5% of programmers."

Isn't that why?

"And those programmers contribute not only to a few widely used projects, but to substantially more projects than the programmers that are engaged at the lower end of the value distribution."

I guess I was a little confused.

It's like, why does that matter that they're contributing to many packages when we're trying to figure out value?

Is that supply side or demand side?

I'm pretty confused about this.

How many different packages I contribute to?

Why is that relevant?

That's a good question.

I think they're trying to get at some notion of productivity, I suppose.

If you're really thinking about lines of code as your basis, then presumably the more you contribute to, the more overall code you write.

But of course, that doesn't have to actually be the case.

It almost sounds from that sense, like, well, we're going to replace these 5% of people.

How much to replace those 5% of people?

Here's my question, Arina.

This may be a pretty hard question to ask you on the spot, but we invited you on our podcast so we could lob really hard questions at you.

Let's say you were going to design a study like this.

What are the variables, what are the features you would look at to try to understand what is the value of open source software?

Let me write that paper and I'll get back to you.

I think that this paper actually does get at the flavor of what you would want to do with the data that's available.

It seems reasonable to me to think about what open source code is out there and how often it's being used.

You kind of understand, I think, getting to the sense of the inherent value of something.

One way you could do that is by looking at how often a given package is being updated and how often it's being used, downloaded, and used in other projects by others.

In that sense, it seems like the flavor is the right one, at least thinking in the vein of this exercise.

What's missing for me here is, or maybe the limitation that we talked about, that maybe lines of code is not the best representation of that alone, because what you're missing is sort of quality of code and then that probably correlates to value to some extent.

But it's also, it's very difficult to measure true productivity of engineering.

I don't know that anybody has come up with a definitive way to do that.

How do you actually measure the notion of productivity by somebody who's writing code?

Other than, I guess, to look at the final usage.

Companies headed for acquisition is maybe the best sample in the sense that they are the most likely to realize the value.

In that sense, I think it's a very strong data source, but perhaps there is a large tail that's not being captured here of projects that are being worked on that maybe don't eventually head for acquisition, but still have inherent value in the future.

I have a question.

Go ahead.

I have a question that's, I think, economics related.

So over the last 20 years, there have been certain open source packages that have gotten very popular, like within the world that we've worked in, things like pandas and NumPy and Psychitle and Package, and super popular and super useful and kind of the standard.

And so there's been a couple of ways that people have made, tried to make money from these things.

One is they've kind of built, started companies sort of around building products with them.

So like, for instance, Wes McKinney, the guy who started the pandas project, has done a couple of startup companies and he's built Aero and Ibis and things like that and tried to make money by selling software.

And then recently I came across a company that's apparently having success at sort of charging companies to do support and feature addition on open source packages.

So I guess the question I'm trying to ask is, people have figured out ways to make money on open source kind of in the original Richard Stallman model where you don't actually sell the code, you do something else related to it.

And my economics related question is, why does that work?

But people don't want to pay for just regular licensing.

It would seem like if you're paying a company to do support or if you're buying like a special version of an open source package to get a little bit more support, you're effectively supporting that development.

Why would you not want to just pay a licensing fee like the Redis and Terraform have tried to do?

Why is there an economic idea behind why people are willing to pay to do one type of pay for something in one sort of way, but not in a direct way?

But I had a different way of thinking about your question, Mike.

Like the software itself is infinitely reproducible.

And there's something where I think of a thing that's infinitely reproducible is not having much value, but the support, the effort, the labor that you would have to educate or like that's a one time thing.

That's not that infinitely reproducible.

Maybe I'm derailing what Mike's asking, but I wanted to present that to Rita.

Is there a way we could talk in an economic sense about things that are just infinitely copyable as having virtually no value versus something that is very unique, like the actual labor that it would take to roll this out?

Well, infinitely, I mean, you're basically talking about something like a zero marginal cost product, right?

What does that mean?

So it's a product that takes some amount of cost to produce a unit of, but then the additional unit costs zero, right?

That's the infinitely reproducible thing.

So if you're producing a physical good and you're wanting to sell it, then whatever quantity you need takes additional cost to produce basically, but in this case, the marginal cost of producing an extra unit is exactly zero because you just copy the code, right?

But the original question of, and of course companies have successfully sold software precisely under that have become very profitable selling software precisely because of that.

So Microsoft is an example.

Or even in hardware, like just selling IP Qualcomm does that here.

A lot of the other business just selling their IP.

Yeah, definitely related.

Mike's question is like, is there a resistance to maybe I'm...

Yeah.

And what's the economic principle behind it?

I don't know that I know the answer to that.

It's a really interesting phenomenon because in principle, so I think what's, I guess, what are the two alternatives that you have?

One is you pay a developer or a company to develop features for you.

The other is you develop it yourself, which I sort of see as equivalent because you still need a developer who is going to develop the code or you buy some packaged piece of software.

The issue there is that it's not as customizable potentially, right?

Depends on what it is.

So customization probably has value, but I don't know that there's any economic principle where you're willing to pay.

And in principle, if markets are working well, what you're paying should sort of equilibrate because you should become kind of equivalent because you're either pay for a piece of software that's already packaged or you pay for some equivalent value of somebody developing that software for you.

But again, there are a lot of friction, so probably it won't equilibrate like that.

So probably if you think you're spending your money to get things that are specific to your needs, you're more willing to do that than to just pay a generic licensing fee and hope that the money goes toward things that will be useful to you.

Yeah.

I mean, some sort of either out-of-the-box usability or the ability to customize exactly how you want has potentially additional economic value.

And that kind of takes us to, we were having this conversation about in the world that I come from, people pay very significant licensing fees for programs like Stata and Matlab, and they're very expensive, which all of that stuff you can reproduce in Python, for example, right?

Stata, for example, from personal experience, I know does such an amazing job of accounting for every statistical or econometric aberration that can happen in the process of estimating a model that you don't have to think about it.

It does it all for you.

Whereas if you reproduce the code, you actually have to become aware of a lot of edge cases and then solve for them yourself.

So it's worth it to me, potentially as a consumer or as an economist who is not a software engineer to just pay for every edge case being ironed out for you out of the box.

Mike, did I misinterpret your question was, why don't companies want to pay for this stuff?

Is that what you were asking?

Partially.

I guess it's a question of why will people pay for some things but not other things?

In the open source world, it seems like companies are willing to pay for, sometimes companies even generally support open source projects or they hire an internal developer to work on that thing.

But it seems like there's much more willingness to do things that are specific to your organization than to just pay a licensing fee.

They'll pay huge amounts of money for things like Salesforce where they're not getting specific stuff for their companies.

So I don't know.

I guess I'm confused why people will pay for some things and not for other things.

I guess maybe it would just be too hard to reproduce Salesforce with open source packages and internal developers.

But then you spend all these resources customizing Salesforce to your needs so much so that then you have to ask yourself, should I have built this from scratch in the first place?

Right?

Yeah, we've had that argument a few times as well.

Mike, I wonder if you put your CTO hat on and you're reporting to CEO or the board or something like that and they're saying to you, hey, Mike, we're spending all this money on these salaries for these developers and I can't actually tell what the value of what they're producing is.

Maybe it's not a software company where you can easily sell the software and you know the value of it based on how many units of it you've sold.

So that's an easier value.

Maybe this is not a software company.

They have some other product, but they need to write software to augment what their staff are doing or whatever.

Somehow, to actually run their business, they need to write software.

So the board comes to you, Mike, and they're like, what is all this junk that these people have produced actually worth?

How are you going to try to answer that question?

How have you tried to answer that question in the past?

I just say, look, dude, you wouldn't understand.

It's too technical for you.

It's worth megabucks.

Yeah, I've invested a lot of brain cells into these questions and have not really successfully answered them to my satisfaction.

I mean, what I've tried to do is you look at the levers that you're trying to move with the thing that you're building, and then once the thing is built, you try to see if the consequences of moving those levers are increasing your sales or increasing your visitors or whatever.

But even then, it's kind of hard to tie.

You can see this number go up as you had hoped that it would.

But I've never been able to successfully determine how you show people that that is the direct effect.

It's a direct return on your investment in building that software.

I'm sure people are successful at it, but the best I've been able to do is kind of look at metrics and show that they are going in the direction that you had presumed they would go by virtue of building the thing you built.

There is actually a very related problem in the field of analytics, too, which is showing that there is ROI to whatever analytics initiatives you're investing into, and the exact same problem exists.

There is a statistic out there that I read somewhat recently that something like 85% of analytics projects fail.

And what that means is you can build a beautiful model that's going to automate part of your process or something, or forecast or make predictions, but then you can't actually demonstrate or people fail to demonstrate any business return to that initiative.

And it's actually calling the entire-- it's a huge challenge to the field of analytics, which is pretty new anyway, but I think it's very closely related to exactly what Mike was just talking about as well.

And it's really important for businesses to be able to show what that value is in order to justify the payroll and the investment and so on.

I want to go back to your reference to intellectual property and also your comparison to the housing market.

Let's say I'm a company and I produce some software and then I see some other company and I think they've stolen my software, so I'm going to sue that company.

And I want to say, "Hey, my software is worth X," so that when I sue them, I can say, "You know what?

I should get $400 million because maybe that's what my software is worth on the market."

And so then I hire you, Arena, and I say, "Please help me definitively establish how much my software is worth on the market."

And you might come to me and say, "Okay, that's great.

I'm happy to do that.

Thank you."

But all the comparable software is given away for free because it's open source.

So what is the value of my software?

You might then go, "Well, it would cost X, Y dollars to reproduce that open source, but is that really a good comparison if the actual market value of the open source, like what people actually pay, they exchange to get that open source is really effectively nothing.

Can I—does it devalue my work that so much of this other stuff is given away for free?"

Yeah, if you don't have anything comparable that's being licensed or sold, I think it makes it more difficult to derive damages in terms of what's called a reasonable royalty or lost profits or something like that because if it's free out there, then your lost profits are zero.

I think you go back to the labor market value as the likeliest kind of estimate of what your software is worth.

And that's all, of course, contingent on you having been able to patent or license your technology in the first place, which is challenging too.

Okay, so that's a good point.

If there's already a whole bunch of open source stuff, I probably can't even patent it anyway because it's—the idea is already out there.

Mike, you talked about how there are companies that are offering like a support model.

So they'll have open source software, but then they support it.

What about SaaS?

I mean, you mentioned Salesforce, Arena and Mike, you were talking about Salesforce, but there's a lot of SaaS products out there.

That's like closed source software.

We can look at those SaaS products potentially to figure out what software is worth, right?

I don't know.

One question I had about this is, you know, that the—Arina was talking about the Microsoft model and, you know, Microsoft.

The thing that helped Microsoft so much was that they had this model where they would charge a fee for every copy of their software.

And then they ended up in this world where, to everybody's surprise, there were millions and millions of computers that wanted to use their software.

Maybe not their surprise.

Maybe they knew that was going to happen.

But it's unclear to me that that same dynamic is in place with software as a service since you're not running it on your computer.

And I guess some of these places do charge like a per-seat license, but it doesn't seem like—like with something like Microsoft Word, chances are you just—you know, if you have 500 people in your company, you just buy 500 seats in Microsoft Office and put it on everybody's computer.

But it's not clear to me that you do that with SaaS software.

So again, I don't know how that affects the value question, but it does, I guess, one effect of it is that SaaS software is usually vastly more expensive per seat than desktop software.

Well, and there's a whole plethora of pricing models for SaaS software, right?

So you talked about seats, but if you look at auth providers like Auth0 or Octa, this is common nowadays.

Companies want to have an SSO solution.

They sign in once and they get into all the resources those company employees need to access.

And it's also popular on websites.

If you don't want to implement your own user registration sign-on, forgot password, all that stuff, you can contract with Octa.

Used to be Auth0, but they were bought by Octa.

And their pricing models are how many active users do you have per month?

It's not quite seats.

It's like the people using the app, how many of them are there?

And that can fluctuate wildly.

It's not really that static.

That's really not a proxy for the software, like software size.

It might be a proxy for the resources, the constraint on the actual producer of that stuff because every user means CPU and memory and virtual machines and whatever.

But you look at the pricing model for these things and they are wildly expensive way, way more than you would initially assume from thinking about it as just a feature add-on or a security bonus for your, well, maybe not bonus.

They're not great analogues is what I'm saying for the value of the software, I don't think.

Yeah, I don't have a good answer for the original question, but I think maybe we've gotten a little off track of the value question.

Well, it didn't really talk about the aspect of the paper that, you know, they provide these wide bands and valuations, right?

So the demand side is, or supply side is 1.9 to 6 billion and the demand side is between 2.5 and 13 trillion.

And what is that based on?

Salaries, right?

It's depending on who is developing your software.

So and they're wildly divergent around the world.

Salaries in the U.S. notoriously are much higher than salaries in India.

Exactly.

So they're basically giving you the lowest bound of, you know, everyone working, I forget how they do it, India, I think, and everyone in the U.S. and then something in between.

So there's also that aspect of it because you did mention that there is sort of the value of your labor, the cost of labor is a key variable here.

We didn't really talk about that.

But it ends up in these wildly divergent estimates on top of that.

They pick the middle value for the abstract, I think, but it's just worth pointing out.

I mean, I like this paper as a starting point.

Like I said, you can sort of argue with the details, but, you know, the constants and their equations or whatever.

But I think it's a really good exercise and a good baseline.

But I think the fundamental question of why is not necessarily answered here.

And I don't know that I have an answer for it either.

Yeah.

I think I just always want to ask this weird question that is probably my own personal bias.

I want to read the paper, my own bias comes out.

It's like, why does all this economic activity exist?

It's cool that people are writing all this stuff that they're not getting paid for.

But why in the world are they doing that?

There's probably not much in economics that helps to answer that question, I guess.

Well, what motivates you when you work on open source projects?

Cool internet points.

Mike, how about you?

Yeah, it's basically to demonstrate to the rest of the world how incredibly intelligent I am.

Well, since I know you're both kidding, is that all you can say?

I think there's creative, you know, like intellectual puzzling type things that are involved.

It's kind of like friends of mine who started bands.

They didn't plan to become huge and they mostly didn't.

So why did they write music and put it out and sell CDs for the replacement cost of producing them?

People who put up these repositories on GitHub where they're like, "Oh, I just wanted to see if I could solve this problem.

Why are they doing that?"

Yeah, there is a kind of maybe crossword puzzle-ish exercise to it.

It's like a weird kind of Sudoku.

Maybe passion is part of the answer, right?

I mean, the music example is actually a really good one.

If you really love doing this stuff, maybe that alone is so inherently valuable that it actually trumps being paid for.

I think there's a lot of cases too where people don't start off thinking, "Hey, I'm going to make something for the rest of the world."

I think a lot of times you start off saying, "I'm going to make this thing for myself."

And then you go, "Hey, this other people might be interested in this."

You know, I remember several years ago when we were working together, we were working on this thing that we were trying to predict recessions or something like that.

And it used this technique called Mitus, and all the code was in MATLAB.

And I was just interested in it, but I didn't want to read the MATLAB code.

So my approach to understanding it was I wrote my own version of this Mitus stuff in Python.

And when I finished it, I put it on GitHub in a public repository, and it has like 30 stars now.

You know, it's not a big project, and it's certainly not on a par with most of the open source we're talking about in this paper.

But I think a lot of stuff does start off that way where you just build something because it's interesting to you or it's educational for you, and then you put it out there.

So it's not really an attempt to do something that might have value to other people.

It just turns out that way.

Somewhere along the line you go, "Whoops, this is now used by 60% of the Fortune 500 companies in this country.

Uh-oh.

They want me to fix things for them."

In that case, you may have missed out on the value a bit, right?

But it actually does bring up the point that maybe the parallel to all of this is like everybody trying to write for themselves or write their music, except now they have the technology to put it out there.

So people self-publish or they put things on SoundCloud or you put it in GitHub just because we all post our musings to the world.

Yeah.

Maybe it's similar.

Can you even put a value on that?

Infinite, right?

Well, Irina, I want to thank you so much for coming on.

I have not ever interacted with economists outside of my questions that I have for you.

The questions you've been able to answer for me.

So thank you so much for coming on and sharing insights with us.

As much as they are, you're very welcome.

Thank you for the honor.

So this has been at Picture Me Coding with Mike Moll and Eric Aker.

Thanks for joining us today.

Thanks, Mike.

Thanks.

See you next time.

Bye-bye.

Bye.

Bye.

[Music] [MUSIC]

People on this episode