AI Code Generators: Are We Going to Be Out of Work Someday?! Artwork

Picture Me Coding

Picture Me Coding is a music podcast about software. Each week your hosts Erik Aker and Mike Mull take on topics in the software world and they are sometimes joined by guests from other fields who arrive with their own burning questions about technology.

Email us at: podcast@picturemecoding.com
Patreon: https://patreon.com/PictureMeCoding

You can also pick up a Picture Me Coding shirt, mug, or stickers at our Threadless shop: https://picturemecoding.threadless.com/designs

Logo and artwork by Jon Whitmire - https://www.whitmirejon.com/

All Episodes

Picture Me Coding

AI Code Generators: Are We Going to Be Out of Work Someday?!

March 27, 2024 • Erik Aker and Mike Mull • Season 2 • Episode 29

Along with our friend Bob Farzin, we visit upon the sleeping body of AI three ghosts this week: one for the past, one for the present, and one for the future. We make a sincere attempt to haunt the crap out of LLMs like GPT, Claude, and Github Copilot and to give them a truly spooky, creeped out feeling as they look upon what they're doing to their industry.

What skills will we need in the future if AI writes all of our code for us? What's going to happen to our tools, our programming languages, our industry? Will we have to go to more meetings?

Erik is a lot more nervous about the future than the calmly reflective Mike and Bob, but we'll do our best this week to predict the future.

Send us a text

[MUSIC]

>> Hello, welcome to Picture Me Coding with Eric Aker and Mike Mull.

Hi, Mike. Nice to see you again.

>> Hey there.

>> Mike, today we have Bob Farzin back with us. Bob was a guest on our most popular episode about high frequency trading. Hi, Bob.

>> Hey, Erik, how's it going?

>> Great.

Mike wanted to talk about AI code assistants today. And Bob, we were happy you were able to join us, you were available.

You have some experience with, I guess, what the precursors of what people are calling AI now. Do you want to mention that at all?

>> Sure, just as a quick background in 2012, late 2012, I started getting involved in looking at machine learning just for fun.

It's kind of an obvious extension of some of the data science work I was doing.

And I fell down that rabbit hole and I've been kind of looking and working on that for ten years in various capacities.

Looking at neural nets and building neural nets and playing with them to accomplish some tasks.

So I was hoping I might have some insight on this topic and be able to describe some of the detail where I can.

>> And I think you can be accused of actually reading the papers, right?

Mike sends me this paper as I try to read them I get real lost. There's transducers and stuff. And you've actually read some of these papers, yeah?

>> Yeah, I've read some of these papers.

I've looked at a couple before today.

>> All right, so you're Mike's co-host today and I'm this guy in the audience asking dumb questions.

>> All right.

>> Yeah, it's transformers, but it's just to be clear.

It's not the kind where it's like a car that turns into a person.

>> Okay, not transducers, transformers, the T in GPT. So Mike, tell us, where does your interest in this topic come from?

>> So I am allegedly a programmer.

And even though I am quite old, I try to keep abreast of things.

So about a year ago, I figured I probably needed to pay attention to some of these things.

And so I signed up for co-pilot and I've been using it in my day to day work for close to a year now.

>> So GitHub co-pilot.

>> Right.

>> Co-pilot is an editor integrated code assistant.

>> It was originally based on a model called Codex, which I think was tailored for code generation.

It was based on a general language model to begin with, but was kind of fine tuned for code generation.

My understanding is that it probably works with GPT-4 at this point in time.

But in any case, I've been using it and I've been more impressed with it, I think, than I expected to be.

And it's become sort of a expected part of my toolkit now.

I was mentioning I was setting up a new system at work just yesterday and my normal IDE is PyCharm.

And I set up this new environment, but I hadn't installed co-pilot into PyCharm.

And it felt very weird not to have it all of a sudden.

So it definitely feels like part of my toolkit.

And I sort of became interested in what it's doing and the techniques.

And so I started to experiment with other things.

I experimented with a couple of online, I'm sorry, local models using the Hugging Face Transformers Library and using this tool called O-Lama.

>> Let's just stick with co-pilot for a few minutes.

Your take on it seems to be, well, it doesn't suck as much as I thought that it might.

>> That may be a little more negative than I intend.

But I think when I heard people talk about it initially, the feedback I was getting was that people would say, this is really interesting, I'm surprised it can do this.

But ultimately, it's just not helping me that much.

And so when I signed up for it to experiment with it, I was expecting sort of the same thing.

And there are some things to get used to.

There's quite a few cases where it will suggest this huge block of code.

And I look at, I scan through it and I go, that's nothing at all like what I want.

And then the other thing that quite often happens is that it will suggest something which is about 70% of what you want but not ultimately correct.

And so you're left with this conundrum of, do I type in all the stuff that it generated or do I accept the suggestion and just alter the things that it got wrong?

And so it takes a little bit of getting used to.

But I have found that definitely makes me more productive and sometimes just absolutely amazes me with the things that it anticipates.

>> Surprising behavior, surprising results.

>> Exactly, yeah.

>> So you talked a little bit about how it works, what it does.

Bob, have you seen co-pilot at all?

Have you used any code completion?

You write a lot of code in your day-to-day work as well.

>> Yeah, I read a lot of code and I don't use it.

It's just not coming to the space of things.

But the way Mike's describing it, it's almost like linting.

It's almost like something where he just expects it to be there to help him with small details.

I mean, when I played with language models probably five or six years ago, they were more primitive.

And I did try them a little bit with some code things and they would often generate, like Mike said, like either nonsense.

So at that point they were still generating a lot of nonsense.

This is like pre-GPT.

But when they did generate nonsense, they would generate things that would often be either exactly like you described incomplete or way more verbose than I needed.

Four times the amount of code I was looking for.

So it's interesting that that hasn't been solved.

>> I started using co-pilot just a few months ago.

I came to it pretty late.

I wasn't negative about it.

I don't know, I just thought, I don't really need help writing the code.

I just sort of tune out and just start writing it.

It comes out of the hands.

I don't need to think about it.

It's like coloring with my kids.

You pick the color you want, you fill in the lines and then.

Anyway, so I started using it and I noticed the same thing.

It did surprise me quite a lot.

Just yesterday, for example, I was working on an authorization system.

And I wanted to send out events when things happen in the authorization system.

New users created, they forgot their password.

Some other systems might be interested in this.

So this is in Rust and I started typing out this enum and the enum is called generic user event type.

And before I could even think about it, in front of my eyes, co-pilot had dreamed up, I don't know how many lines, 15 lines of members for the enum.

And I was like, wow, okay, yeah, that seems viable.

That's surprising.

And so they're like, new user created, user updated, user deleted, user verified their email, stuff that you would expect.

And then I started looking at it a little more detail.

I was like, wait a minute, what are all these other ones?

There was user password forgotten reset verified.

User password forgotten changed verified.

I was like, what does this mean?

[LAUGH] I was confused trying to understand how it was helping me.

So I kept some of what it gave me and then deleted a bunch of the rest.

It seems similar to what you're talking about there, Mike.

It generates a lot of stuff and you kind of go, this is cool.

Wait a minute, what?

Does it always make sense?

It doesn't always make sense.

>> Hey Mike, in the paper about that system, which is a little older, it basically describes this idea of writing and doc string and having it complete.

Is that a use case that you ever engage in where you say, here's a function signature and here's a doc string I'm looking for, and now work on it or is it similar to Eric's experience where you start typing something and you get a completion suggestion?

>> I've done both.

I think I probably do the latter more frequently just because it tends to be more in the flow of programming.

I had an example similar to Eric's.

I wanted to write a really simple function that would take a start date and end date and turn that into a sequence of the months that were contained within that range, basically the beginning of the month and the end of the month for every month between the start date and end date.

I had a pretty good idea of how I was going to do that.

I mean, it's a relatively short function.

There were two things interesting about it was I just, I wrote the function name and the start date and end date inputs with type annotations.

Then I wrote a doc string and Copilot wrote a perfectly valid function for what I was describing, but it did not do it in the way that I wanted it to be done.

So the output that it would have produced would have been slightly different than what I was actually looking for.

So one thing I did was I decided to use the relative delta thing from date util rather than just using time delta because it makes it a little bit easier to add a month to a date.

And I just added that as an import.

And when I regenerated code again, Copilot used the relative delta, which I thought was interesting.

So it understood somehow that I had that available now.

It still didn't get the function quite like I wanted it to, but it was generating something from the doc string that was within the ballpark of what I had described in natural language.

So it wasn't perfect.

It wasn't something that could have happened without human supervision, but it was a, to me, a fairly impressive example of working from a natural language description.

I've experimented with other systems.

I think one of my concerns about Copilot and a lot of these systems in general is that people are ultimately going to be conservative about letting their intellectual property go off to other people's servers.

And so I experimented with some local models.

And generally speaking, not quite as good.

And also I was kind of surprised at how long inference sometimes takes when you're not running it on a extremely powerful GPU or GPU machine.

So that I wanted to get to that part because this is what's hard for me to imagine.

These models, they talk about how big they are, how many parameters they are.

I'm still having trouble understanding what that means.

What are we talking about in terms of compute resources?

Huge amounts of disk space.

A lot of, is it distributed?

A lot of CPUs, a lot of GPUs.

I know that one of those foundational papers that you talked about, Mike, they were saying that transformers are great because you can do parallels, parts of the training in parallel.

You don't have to do it sequentially anymore.

I kind of get at grasp at, but not quite.

And it does sound like these things need a huge amount of compute to run.

So they're really not tools that are going to be owned by people like us, I don't think.

Does it sound right?

I'm a little unclear on the future of that.

And Bob might actually be able to address better why it takes them so long to infer.

But what's considered a small model now, like the 7 billion parameter Lama 2 model, even with a model like that, which is considered small, you've got 7 billion parameters.

So there's a little bit of math going on there.

Yeah, I would, I think my jump, if I jumped in here, I would say that the, the GPU, the difference between CPU and GPU inference is usually pretty material.

So I'd be really curious what these Lama models look like doing inference locally on GPU.

Because in the end, all these things boil down to matrix multiplies of various kinds, transform or whatever it is.

And even when the on-load, off-load time or system time is slow, your ability to run it on GPU, especially, and exactly as Eric said, in parallel.

So what ends up happening is you have all of these different inference components.

You want to run them all simultaneously.

You can run them.

You can decouple all of it and run it in parallel.

And now you can push all of that off to GPU computations.

And if you have enough GPU resources, you could even have one GPU do each one of these multiplications and then aggregate all of your outputs back and then say, okay, next token is def.

Next token is if.

Next token is whatever.

And then just come back and give you your next token.

So without knowing very much at all about what's going on on the back end of these bigger servers, I think that doing that, my experience running inference on GPU has been a significant speed up.

I mean, obviously training on GPU makes a big difference, but even inference on GPU makes a big difference in terms of what gets run.

And I would say to answer Eric's or try to address some of Eric's question, it's the computation that I think is the biggest resource at inference time, probably more than memory.

There's an element of memory that's required, but usually it's a computation problem to figure out what that next token is going to be.

Because as he said, it's matrix multiplication below the server.

Exactly.

It's big matrix multiplies, right?

So it's like a 10 by 10 matrix times a 10 by 10 matrix.

How much can you decompose the problem, map reduce it?

How much can you send it out to a whole bunch of machines, get answers from those machines and aggregate them back up?

Is that a core requirement to being able to do this at scale?

That's usually part of the design is that you want to be able to send stuff off to parallel processing in pretty trivial ways.

So transformers suit themselves very well to that.

There's other architectures that also suit themselves very well to that.

But each one of these components are semi-independent so I can send them off and then I can aggregate their result probabilities in some way and do an inference.

Does that mean that it's hopeless to be able to run these on a local computer and get really good results like what Mike's talking about?

No, I think that you could run it locally on GPU.

Usually when you get a GPU card from a supplier, it has many, many cores.

So unlike a CPU machine that you usually get, 8 or 12 or 16 or 64 cores on a chip, you can get 512 cores.

Now, their clock speeds are lower, but they're specialized around matrix multiply.

So that's kind of the magic of GP GPUs, the general purpose GPUs used for training and inference in machine learning.

There's a couple of suppliers now too who are selling GPUs and GPUs that are specifically for inference.

So you might not want to use them for training a model, but they're perfectly fine for doing the inference side.

You might not want to run it on the same laptop where you're coding, but you could certainly have a local server sitting in the rack at your office that was talking back and forth to your IDE or whatever.

So I think the thing that happens most of the time when we start talking about these code assistants, automating, code completion, suggesting code, is we do start leaping to conclusions about where does this go in the future?

And I think that's the hardest part to project out and imagine what's going to happen.

There are millions of software developers in the US, millions of software developers worldwide, and it's expensive to employ them to write code.

So you can imagine it's probably a large incentive to automate some of that code generation.

So if we could imagine the future a little bit, where do we see this stuff going?

Well, I think for me, I think it depends on how far into the future you look, the take of people like Jensen Huang and maybe Sam Altman is that human beings probably won't be writing code at all at some point.

It's going to be, you just simply describe to the computer what you want to know, and it generates the code for you.

I'm a little skeptical that that's going to happen soon.

There does seem to be a movement beyond simply the code generators that have been the main thing for the last few years to these more agent-based systems that sort of collaborate and doing multiple steps of the software development lifecycle.

My guess is that's probably the next evolution, but I don't feel like that is a completely hands-off thing.

I think it's something that still requires some human beings to monitor, but it may be a different skill set than what typical programmer has of today.

I guess, Bob, I'm curious what your intuition is on this one.

Yeah, I draw parallel back to late '90s.

There was a brief period where people were really excited about chess computers and human chess collaboration.

There were a bunch of articles printed.

I remember reading them, and everyone was like, "The human is really important in the cycle.

We need humans.

It's really so valuable because the chess computer alone can't think creatively.

It's not particularly good.

It only solves specific problems."

Then you move like 12 years later, and AlphaGo Zero can outpace any human literally from scratch in two and a half days of compute calculation.

It just blows past the humans.

If I map this problem to that problem, I think humans really want to believe that they're critical to this process.

If there's a clear goal that we can define and a cycle of improvement that we can define, it's more likely that the computer gets way better than any human gets at doing this task and just eliminates people almost entirely.

That's a little bit pessimistic.

I think my take is in a fairly short timeframe, that's likely, I think that's a better than 50% chance of being the outcome.

I actually really like writing code.

I'm kind of sad to hear I won't get to do it in the future because it can be better out of it than I am.

You still can.

There's Magnus Carlson still plays chess against other human players, and they enjoy it.

It's fun, and we rank them.

Even though we know that any computer can beat every one of these players nearly 100% of the time, we still enjoy engaging in these activities.

Not to jump further ahead by too much, but I think that many things like architecture and specification of what we actually want, similar to the way we've seen prompt engineering in GPT become a thing or even in some of these stable diffusion types of models, like these anthropic models and these other ones where they draw images.

If you say using this kind of lens and this kind of camera, you get a totally different image than if you just described the scene abstractly.

You're saying there's expertise involved in crafting a prompt that produces an answer that's viable.

Yeah, I think that that might be an area where to riff on what Mike is saying, we have to specify to the machine what we want.

That might not be a trivial task, specifying what you actually want with all the details required such that the machine can produce the code required to give you that result.

That might become the role of the programmer.

It usually takes about 20 hours of meetings over the course of a couple of weeks for a team of software developers and managers and other people to actually figure out what they want.

Maybe it isn't trivial to figure out what we want.

I'm pretty sympathetic to what Bob was describing.

I think my first reaction to that was people had suggested that same idea and I thought, well, maybe programming is harder because the trick with chess and go was that you could apply reinforcement learning and it was very clear what it meant to win a chess game or to win a go game, I think.

Maybe that's not so true of programs.

You've brought up this idea to me a couple of times where maybe these co-generators get to the point where they can generate code, test code, verify that it's producing right answers and then it just produces the answer and throws away the code.

It doesn't matter if a human can read it.

It doesn't matter if it's elegantly designed.

The computer just runs through options that give it the optimal computational complexity and so forth and produces the answer and then throws it away and no human ever has to read it.

I don't know.

I think I'm leaning in the direction that Bob is suggesting now.

It does make me wonder, all the stuff that we do as software engineers to manage the software lifecycle, it's very human-oriented, collaboration-oriented stuff.

If most of the work is not produced by humans in terms of producing software and making it look pretty in the way that we try to do, if humans aren't doing it, you could throw away a lot of those human practices, collaboration tooling and protocols that we've developed over the years.

Here's an example.

There's this Autodev paper that came out recently, mostly from researchers at Microsoft.

In Autodev, there's a funny moment.

As I was reading it, I thought, "Well, that's kind of funny."

They're saying, "We can get this tool using these agents to produce code and then automatically write unit tests for the code."

My response was, "If the tool is producing code, who cares if it writes unit tests?

I got to trust one or the other.

Do I trust the unit test for the code that I may be trusting?

Why would I even need unit tests if the machine is relying on decades of code?

And assembling it magically, who cares?"

You look like you wanted to say something there.

I wanted to go back one step and say with what Mike said about the game being well-defined.

There's always been criticism that Go and chess are well-defined games of perfect information, but there are agents that have solved.

The two other problems I've seen solved with reinforcement learning that have really impressed me is one of them was expert level or championship level poker, heads up, Texas Holden poker, which was really impressive to me because that is not a complete information game.

You have to do a lot of tricky things and clever things to be able to play that.

It was nearly world-class playing.

This was 2017.

This was a while ago.

It's probably further improved.

The second one was that Alpha Fold paper where they basically said, "This is a problem that no human has been able to solve and no computational problem has been able to solve."

And yet on these novel protein foldings, we can say, "Oh, this is how it's going to fold."

That was a real mindbender for me.

That really pushed me into a new place.

That one is convincing me as well.

That was a field that I was working in 20, 25 years ago and was considered to be an essentially intractable problem.

Even if he knew how to do it, it was probably NP-hard and was not easy to compute.

So you'd never be able to do it for large proteins or for a bunch of proteins in a reasonable way and then voila.

Yeah, here it is.

This works.

You can if you just do it the right way.

But I think that what Eric's saying has to validity, which is if the under the hood part doesn't matter, maybe nobody cares about any of these under the hood components anymore.

And again, kind of going back to that same group or inside that same group, they found some linear algebra algorithms that were faster than current linear algebra algorithms.

But they're basically intractable by people.

They're really complicated.

They've got all these extra steps, but they computationally meet the criteria of running faster.

And it's one of those things where, well, if it runs faster and it does what we want, do we care what's under the hood?

And like you said, do we care what the unit tests are?

I mean, I would think that probably not unit tests, but some level of integration tests.

It'd probably be the integration test to make sure that when you add to this code, you don't break everything you've done.

Sounds more useful.

Yeah.

If you look at the whole history of software programming languages, for example, right?

Grace Hopper writes cobalt because she wants to have a language that is closer to natural language to be able to express programs.

And Dijkstra is talking about structured programming.

It's really hard to read these programs and understand them.

We need to fight complexity and we should stop using go-tos.

You have Donald Nooth.

Code is meant to be read more than written.

The whole foundation of our field is predicated on these assumptions that humans will have to write and comprehend this stuff.

And if that's no longer the case, then it's sort of like all the programming languages we have until now, you don't even really need them, right?

You might as well just have these machines, right?

Machine code.

Who cares?

Yeah.

I mean, to ask a bit of an inverted question here, like, I wonder what exploration has been done around asking a GPT-type model, what does this code do?

Even at a machine code level?

I mean, in theory, there's nothing that bounds such a system from being able to read machine code and give you a short paragraph summarizing, oh, this machine code moves numbers around and situates a compute the square root of the number.

Here's a bunch of machine code.

Here's a bunch of bits shuffling.

And that's what it does.

And it just tells you that.

Like, I wonder if there's a path that solves the problem using the same tools again.

To tell me, okay, you wrote all this code, can you summarize what this code is?

Yeah.

Well, and it's, it'd be completely black box at that point.

But then there's another thing where you said, I would probably want it to write an integration test so that when we add to this code in the future, and going back to what Mike said, why would you add to a code base if you could just spontaneously generate the stuff?

The only reason we keep code bases is because it takes a lot of effort to produce it.

If that effort is drastically reduced, I don't even need to keep the stuff around.

I don't need it anymore.

I generate it spontaneously when I need it.

And then I throw it away.

I don't care what language it's in, because it's all just black box minimal effort stuff.

And maybe I'm trivializing it minimal effort, meaning massive data centers running somewhere else and I'm paying people pennies to produce all the stuff.

There's so many expectations around software, the management, the curation of it, the production of it, linting, unit tests.

These are all human collaboration expectations.

And I start to wonder if they disappear, if humans are not primarily producing the stuff.

I've tried the experiment that Bob was suggesting of trying to explain this code.

My purpose was, say you've got code in Fortran 77 or you've got code in MATLAB and you want to put that into some sort of production mode.

So you want to convert it to Python or REST or something like that.

My starting point was to take chunks of MATLAB code and say, can you tell me what this does or can you rewrite this in Python?

And what I found was, if you ask it to describe what it does, it will usually give you a sort of vague answer saying, well, it looks like it does this, but I need more context.

Can you explain how this is being called and what the ultimate objectives are?

And also notice that if you ask it to rewrite the code in a different language, it tends to look like the code that you would see coming out of old fashioned code generators, like the C that you would get out of YAC back in the old days, that kind of thing.

It doesn't write elegant human readable code.

But again, the question is, do we care if it's correct and it's performance?

Does it matter if humans think it's elegant?

I want to go back to the point you guys raised, which is the initial expectation is, if this is like a binary problem, then machines should be pretty good at getting to almost like a logistic regression, yes or no.

They're going to be pretty good at reasonably guessing.

And maybe it's a linear value or whatever.

So the example people often give is, well, I would expect an AI that could read radiology results to be a thing that can replace the radiologists.

And in my head, I think, yeah, of course, you would expect that because it's a logistic, like a binary answer, yes or no, the patient has this disease or not, or even linear.

Like they're likelihood of having this disease is here.

So that seems like a problem that is automatable from one of these tools.

But then I reflect on the work I do and I think, well, yeah, I'm writing software, but I'm also doing system design.

That doesn't seem linear.

It doesn't seem binary.

I'm trying to imagine this creative universe which doesn't exist yet.

There's a person who's going to go over here and build a thing and they want me to interact with these other four tools that are outside of our company and I have to figure out how this fits together.

Am I being naive to think that this type of like, Bob, you brought up poker.

Am I overestimating my contributions?

I guess the creativity it requires to do system design, not just writing code.

I think my take on that is you're kind of bouncing back to you what you've already said, which is that a lot of system design, in my experience, has been about how do I do this so that I can understand it and other people can understand it.

So this is broken down into logical components that have some semantic relationships where I can assume that these types of objects do these types of things in these types of ways and can live together.

And that might just be for the humans.

If that's just for the humans, it might not be a critical component.

The experience I've had is that when things are, if it's something where humans can explain very clearly and preferably with mathematical relationships, why design A is better than design B, if that can be described, you can usually get an objective function around that.

You can turn that into a mathematical objective.

Okay, you only like relationships where architectures have so much interconnection or so many levels of abstraction or whatever it might be.

If that is, in fact, describable in a very specific way, you can probably write elements of code that optimize for that.

You may not get the output you expect, but that's usually optimizable.

That's been my experience in those spaces.

I've also encountered things where we've thought, oh yeah, that can be written down and we go to write it down and you can't write it down.

There's an element which I'm going to call taste or opinion where people have a certain taste and flavor for what they want that they can't really put their finger on.

But when people see it, they're like, yeah, that's the right way to do it.

And then you go to build it and you can't make a machine do that because the machine doesn't really understand.

I haven't been able to figure out how to turn that into some form of math.

It's ineffable.

It cannot be easily described.

I think the mental model that people are thinking of at this point in time is that, imagine you go back to a world in which everything is done by human beings.

So human beings have a pretty good ability to adapt to things and to try different alternatives, but they're slow.

So imagine a human run process, but now the agents doing that human run process are super intelligent and can communicate at light speed.

If you've got that, do you really care about whether your system is event driven or monolithic or maybe these communicating agents are the software architecture?

Mike, what I'm trying to do though is I like writing code, but I also like having a job and I like getting paid.

So there's a sieve here that's sieving.

I'm trying to see what falls through the holes in my sieve and what's kept in there are the skills that that computer can't steal from me.

What are those skills?

It's my taste, Bob says.

Your taste will save you.

We will hire you in the future because you have taste.

So the important thing is you have to own land because ...

But I really liked Bob's ... not reduction, but his way of describing it as a describability problem.

I don't know if that's a good way to phrase it.

If you could describe the thing very clearly and there are mathematical components, it's probably going to be something a software application can solve.

If it's difficult to describe and/or involves taste and there's a kind of selectivity, then you probably can't resolve it easily using these tools.

Is that fair, Bob?

Yeah, that's a good paraphrase of what I'm trying to say.

I think here's another ... there's an analogy if you look at the arts.

I've thought of this a few times.

We could have one of these AI tools ... open AI has this tool called SOAR and they're using it to generate moving images, movies really, right?

You could have one of these tools generate a huge amount of films, huge quantities of films and just sort of dump them out on the world.

But it seems like it'd be a pretty difficult problem to have another tool that can determine not what's good, that's a taste thing, but even what's going to work on the market.

What's going to have huge viewership numbers, right?

Because it involves some sort of human selectivity, some sort of taste.

Yeah, that's where I'll jump in.

So there's architectures and reinforcement learning called actor critic architectures, where you basically train a critic to tell you whether what the actor does is effective or not.

And this kind of falls into that, but there needs to be some trainings that there needs to be some metric.

There needs to be something that says, "Oh, here's a bunch of film types that are no more than five minutes long that were successful in whatever metric you want, the produced value that generated ad revenue that brought joy," whatever that number is.

But you need that base.

If you don't have that base, you can't build an actor critic interface.

You can mock something up.

You can say, "If it has this much action, if it has this many characters, I'm going to give it lots of points.

If it has this much crying, it stimulates this many tears, I'm going to give it negative points or positive points, whatever you want to do, you can construct something."

But if that's just a made up metric, and it doesn't tie to reality that even that architecture isn't going to get you through selecting good films, it's an interesting problem to think about.

It sounds pretty depressing to imagine that feature right there.

You guys are depressed by this.

It also depends a little bit on how you define the objective.

Ultimately, in all these systems, you are effectively trying to optimize something.

It's not clear to me that making good films is an optimization problem, or it's at least a multi-objective optimization problem.

If you're a studio, probably your main objective is to make money.

I watched Anatomy of a Fall last night, which is a two and a half hour movie about a trial, lots of talking.

My guess is it's probably not a movie that everybody would like.

It's probably not a movie that you could put some sort of sensible objective around and end up with it through an automated system.

But it was nominated for an Academy Award, so clearly a lot of people thought it was a good film and thought it was worth watching.

Maybe the AIs aren't going to produce, at least in the short term, aren't going to produce films that are great films.

They're just going to produce films that are lucrative.

I think this maps back to software engineering though, because it's hard to build programs to determine if programs are correct.

It was Bob's critic example.

Can you mention doing that for software, for just generalized software?

We want to know that this description was adequately satisfied by this output program.

If you had that, it does seem like the problem of generating the code gets a lot easier, right?

Yeah, I think there's a couple of things there.

One is, how do you know if it's correct?

There are some applications where that's probably easy to evaluate.

If you're calculating the trajectory of a spaceship and you crash it into the moon, it was probably a bad computation.

But say that you're evaluating somebody for a life insurance policy.

What's the optimal outcome there?

You clearly want to sell them the life insurance policy.

You clearly want the premiums on average to exceed what you're paying out to people who die.

But you can't discount the possibility that you insure somebody who's perfectly healthy and appears to have a long life ahead of them and then they get hit by a bus.

It's not clear to me that you can ...

There's not a way there that you can say, "My answer in this particular case is the right answer."

It might be the wrong answer, even if it's a sensible thing to do.

Does that make sense?

I think so.

I think that's similar to my gut response.

It's hard to evaluate software objectively.

I mean, that's why we spend weeks arguing about it and meetings before we build any of the stuff.

We don't know how long it'll take us to build it.

We don't know exactly what they want us to build.

We're just trying to figure that out.

If we could have a really clear description from the people in the business, we could build what they want.

But they tend to be pretty bad at that part.

I guess, let me jump in with a different hypothetical here.

Even the code generation world, you cut the ...

I agree.

I've been in these same kinds of conversations.

I'm reflecting back on that, which is if I knew that it only took me an hour to write the whole system, couldn't I just try lots of different ones and say, "Okay, here's one.

Is this what you want?"

They'd say, "No, that's wrong.

I don't want that."

You'd say, "Okay, I'll be right back."

An hour later, you come back and go, "Here's a whole new system.

Is this what you want?"

And they'll say, "Well, that's closer, but that's not what you want."

Maybe that becomes the life cycle of the ...

Again, is the role of the programmer, the prompt engineer that interfaces between this fast generation of entire systems and the group of people that's looking for that system.

You know how to go in and say, "Okay, this is 90% of the way there.

These things aren't right.

I can get under the hood and fix a couple of parts here and/or ask for those to get regenerated appropriately and then come back with a new system and I can just iterate so fast that it doesn't matter if I know how to architect it or design it.

It's about how do I get this into your hands and how do I respond to feedback?"

I'm imagining a possible future where that becomes a role.

There's a lot of possible features we can imagine.

Collaborative processes start going away.

The way we manage and curate software over a long period of time, a code repository, that kind of goes away.

Mike, you had suggested some of these papers where they're talking about generating intermediate languages.

Do you want to mention that at all?

That's another possible future.

Yeah, so there's a couple of approaches I've seen here.

There's a paper out of Stanford for this system called Parcel and I think their objective there was to tick natural language descriptions of relatively complex hierarchical problems and then come out with sort of a formal language description of that that would be easier to generate code.

The other thing that I've seen is systems where instead of generating Python or Rust or C or whatever, the system generates something like Daphne or some other formal system that's essentially a proof and that may not be the code you deploy, but it's the code that you can use as sort of a template to then generate what you believe is proved correct code.

I kind of like that idea.

I kind of hope that's what happens.

I think it's an interesting idea and it has the potential to solve this issue of, you know, you brought the issue before of if the robot's writing the tests for its own code, it just becomes sort of this robot version of it works for me.

But if you have sort of a formally verifiable thing, you can at least say, okay, this we've shown that this code is correct or at least model check this code to know that it's probably going to work like the specification expects it to.

To go back a few steps to the question you had before, it seems to me like inevitably whatever happens here, the skill set of the human beings, if there still are human beings involved, it seems like the skill set of the human beings has to change considerably from what we do right now.

Well, this is a depressing way to start my Saturday.

You guys are trying to cut me out of the process of building software in the future.

And Mike, you're saying the skill set has to change, which sounds like there's a glimmer of hope that I can be involved in it still someday.

What are the skills I'm going to need in the future you think?

That I think is the important question.

It's not clear to me that things like being able to produce code or even to understand code are necessarily going to be the key skills in 20 years.

But it's not clear to me what they will be either.

I mean, maybe domain expertise is more important.

People like Jensen Huang have said, "Don't learn how to program, learn how to farm."

Maybe the ability to produce food and reduce pollution and solve global warming and that kind of thing are going to be more important skills than writing code.

I really like Bob's description of taste because I like to imagine that I have taste.

My opinions are valuable.

I've cultivated them over the last dozen years.

So there's a lot of blood, sweat, and tears that went into shaping these opinions.

They're valuable to me.

And I want to imagine they're valuable in producing code or software design or system design in the future.

I mean, to some degree, Mike's description and your description of using co-pilot so far has been an element of error.

Is it error-free?

And what's my opinion of it?

What's what kind of taste does it have?

Maybe the role looks more like code review.

It just sends you a PR.

Here's the PR.

It works.

What do you think?

You say, "Yeah, that kind of sucks.

Yeah, this doesn't work right."

You give it a bunch of PR comments and it goes back and tries to fix them.

And so your constant job is specifying what you want and being a PR reviewer and assessing the taste of how this comes out and how it looks and if it makes sense and if it's compact and concise in its expression of the idea.

We were talking to GitHub sales team.

We were talking about using co-pilot and they said, "Yeah, we have a tool we're working on now which does automatic code review."

So it sounds very similar to the, "We'll have the machine produce the code.

We'll have the machine produce the unit test.

We'll have the machine write the PR.

We'll have the machine review the PR."

Done.

This might take is that there probably will be humans involved in the process for the foreseeable future.

And I know that some experts have said that there may be more humans down the road when these systems get more sophisticated.

There may actually be more humans involved in producing software just because software will be even more ubiquitous than it is now.

But I think people at least need to be thinking about what a future looks like where the very substantial software engineering workforce is no longer needed.

I feel pretty confused now.

I guess I want to know the likelihood of these potential futures where all the programming languages I've learned to enjoy writing, they don't even exist anymore.

The machine produces things I can't even review.

That's kind of a bummer.

What's the likelihood of that happening?

It sounds like you guys say likely.

Yeah.

Can't we just back up 50 years and like say, you know, everything was written in Fortran machine language and COBOL.

You thought to yourself, "I'm really good at COBOL.

This language is sweet."

And when I don't know how to do it in COBOL, I write it in machine language anyway.

This is great.

I hope that never goes away.

You know, then C++, Java, Haskell all show up and there are people are like, "Whoa, this is awful.

Why can't I just move the bits around?

This is so terrible.

I wish that I could just move the bits.

I know we're going to move the bits on the register.

What's going on here?"

So like, I think it's kind of like that.

It's analogous, yeah.

All right.

Thanks.

Thank you for that.

So, what's your self-lifting statement?

I think my personal strategy is going to be to follow this trend as far as I can and then retire.

I got the late stage programmer.

I got 20 years.

I got 20 years left.

I got to figure out what to do.

Bob, what do I do here?

I think that you learn how to use the tools.

You ride the wave.

There's another element to this that I have said to Mike many times, so he's going to see this coming a mile away.

Who the hell is going to deploy and manage all this software?

We're going to expect these machines are going to do that too.

There's a lot of taste involved in deploying the stuff.

Do you want Serverless?

Do you want Kubernetes?

It's got to be on a cloud.

Am I going to have these machines spontaneously manage data centers?

That stuff is well far off at this point.

Yeah.

I'm going to map all of this taste stuff to an experience I've had in some recent work I've been doing, which is that in the machine learning world, there's definitely the pull-the-crank approach, which is I'm going to put a bunch of data in.

This is going to go tangent for a moment.

I'm going to bring it back.

You can put a bunch of data in.

You can pull a crank.

You can get a number.

You can put a bunch more data in.

You can pull a crank.

You can get a number.

You can hope that you get to something that's relatively good or optimal and never really understand the problem or the math or what's going on or why you're even doing it.

You usually end up with a bunch of sticky weird code that may or may not work, that's moderately unstable.

However, if you look at that problem, there's some problems where you can look at them and say, "Wow, I could map this into this other space and I could make this a convex optimization and I can write down the math and I can tell you exactly where the minima or the maxima of that objective function is.

You don't have any of the stickiness.

You don't have any of that messiness.

It's very clean.

It's very fast.

It's very efficient and it works well."

Mapping this back to your deployment idea, it's like you can let the computer go figure out how to deploy all of this stuff and you can end up in a very messy place where things quote unquote work, but they're extremely fragile and you don't know what's going on.

You're spending on a lot of resources and a lot of compute.

But if the human is usually the one who can take a step back and say, "Hey, this is actually a problem like X.

We can actually make this very simple and we can deploy it very efficiently and we can make it work in this way."

And that might become a real sort of human role where whether it's in deployment or it's in programming or it's in design where you say, "The complicated way is just too complicated.

I need to cut through all of this and make it much simpler."

That's taste, you said.

I think that's about taste.

I think that's about looking at the code and being like, "This is 500 lines of code.

It should be five.

This is way order of magnitude.

It's wrong."

So there's a few qualities that you both have raised here.

Taste is a thing that Bob gave us.

It seems like a useful idea.

Taste is important as a human contributor to these automated processes.

Describability is the problem describable.

If it is, then it's probably automatable or optimizable, maybe.

Is that a better word for like you could build an optimization function if you can describe it to eventually get to the description, clear description.

And I think the other thing is what Mike's raising and you're raising too, Bob.

The progression of these tools is such that you can't ignore the direction things are going in.

Things are going to probably look pretty different in a decade from now.

I don't know.

I don't think anybody knows.

I think people are definitely marketing things.

Nvidia was talking about this thing they call NIMS.

I can't remember what it stands for, but it's sort of like this container that contains the latest language models and so forth.

And it sits on top of a GPU based system.

One of theirs, obviously.

In their view, these are just agents so that there's not really a deployment thing.

There's not really deployment component because these live on your GPU or they live in the cloud on their GPU and they just talk to each other.

But it does seem like there's this thing in the middle that has to sort of coordinate a plan and coordinate how these things are communicating.

And it's not clear to me yet that that is a solvable problem with the current tools that people have.

It may be, but it still seems like there's some human ideas and some human objectives that have to go into that planning stage.

The big question for me is with a lot of this AI stuff, we're talking about it's going to replace programmers and it's going to replace other digital workers and other knowledge workers.

And I guess the problem that I don't see people thinking about or making any progress on is, suppose that does happen, suppose the robots replace all human work.

It seems like we've reached a point then where we have kind of maximized productivity and it seems like that should be good for human society.

It seems like we should be benefiting from that somehow and it's not clear to me yet that I know how that happens.

You laugh saying it should be good for society, but it sounds economically terrifying to me to get to that.

I mean, if you've got robots that can produce food and you've got robots that can produce entertainment and you've got all of your needs and wants met by these robots, it seems like that should be a good thing.

But who the hell is going to pay for those things?

If I don't have a job, I can't buy the robot that gets me the food.

I can't think of what I'm getting at and maybe this is something we just cut out, but it seems like we've put in play or we are getting to the point where we have this option of like a post-scarcity society where the things that we need are not, don't cost as much because they are easy to manufacture, they are easy to obtain.

It seems like that should be the future that we're shooting for and not the future where we're all unguaranteed income from the federal government because there are no jobs left.

I don't know.

I guess that's a different topic for a different day.

All right.

I'm going to close this off here.

So this has been Picture Recoding with Eric Aker and Mike Moll and our friend Bob Farzine joined us today.

Thanks for coming to hang out with us, Bob.

Thanks, Eric.

Thanks for having me.

Thanks to you, Mike.

We'll see you again next week.

See you next time.

Bye-bye. [music] [music] (upbeat music) (laughing)

People on this episode

Erik Aker

Host

Mike Mull

Co-host