The goal here is simple: Explain what a derivative

is. Thing is, though, there’s some subtlely

to this topic, and some potential for paradoxes if you’re not careful, so the secondary

goal is that you have some appreciation for what those paradoxes are and how to avoid

them. You see, it’s common for people to say that

the derivative measures “instantaneous rate of change”, but if you think about it, that

phrase is actually an oxymoron: Change is something that happens between separate points

in time, and when you blind yourself to all but a single instant, there is no more room

for change. You’ll see what I mean as we get into it,

and when you appreciate that a phrase like “instantaneous rate of change” is nonsensical,

it makes you appreciate how clever the fathers of calculus were in capturing the idea this

phrase is meant to evoke with a perfectly sensible piece of math: The derivative. As our central example, imagine a car that

starts at some point A, speeds up, then slows to a stop at some point B 100 meters away,

all over the course of 10 seconds. This is the setup I want you to keep in mind

while I lay out what exactly a derivative is. We could graph this motion, letting a vertical

axis represent the distance traveled, and a horizontal axis represent time. At each time t, represented with a point on

the horizontal axis, the height of the graph tells us how far the car has traveled after

that amount of time. It’s common to name a distance function

like this s(t). I’d use the letter d for distance, except

that it already has another full time job in calculus. Initially this curve is quite shallow, since

the car is slow at the start. During the first second, the distance traveled

by the car hardly changes at all. For the next few seconds, as the car speeds

up, the distance traveled in a given second gets larger, corresponding to a steeper slope

in the graph. And as it slows towards the end, the curve

shallows out again. If we were to plot the car’s velocity in

meters per second as a function of time, it might look like this bump. At time t=0, the velocity is 0. Up to the middle of the journey, the car builds

up to some maximum velocity, covering a relatively large distance in each second. Then it slows back down to a speed of 0 meters

per second. These two curves are highly related to each

other; if you change the specific distance vs. time function, you’ll have some different

velocity vs. time function. We want to understand the specifics of this

relationship. Exactly how does velocity depend on this distance

vs. time function. It’s worth taking a moment to think critically

about what velocity actually means here. Intuitively, we all know what velocity at

a given moment means, it’s whatever the car’s speedometer shows in that moment. And intuitively, it might make sense that

velocity should be higher at times when the distance function is steeper; when the car

traverses more distance per unit time. But the funny thing is, velocity at a single

moment makes no sense. If I show you a picture of a car, a snapshot

in an instant, and ask you how fast it’s going, you’d have no way of telling me. What you need are two points in time to compare,

perhaps comparing the distance traveled after 4 seconds to the distance traveled after 5

second. That way, you can take the change in distance

over the change in time. Right? That’s what velocity is, the distance traveled

over a given amount of time. So how is it that we’re looking at a function

for velocity that only takes in a single value for t, a single snapshot in time. It’s weird, isn’t it? We want to associate each individual point

in time with a velocity, but computing velocity requires comparing two points in time. If that feels strange and paradoxical, good! You’re grappling with the same conflict

that the fathers of calculus did, and if you want a deep understanding of rates of change,

not just for a moving car, but for all sorts of scenarios in science, you’ll need a resolution

to this apparent paradox. First let’s talk about the real world, then

we’ll go into a purely mathematical one. Think about what an actual car’s speedometer

might be doing. At some point, say 3 seconds into the journey,

the speedometer might measure how far the car goes in a very small amount of time, perhaps

the distance traveled between 3 seconds and 3.01 seconds. Then it would compute the speed in meters

per second as that tiny distance, in meters, divided by that tiny time, 0.01 seconds. That is, a physical car can sidestep the paradox

by not actually computing speed at a single point in time, and instead computing speed

during very small amounts of time. Let’s call that difference in time “dt”,

which you might think of as 0.01 seconds, and call the resulting difference in distance

traveled “ds”. So the velocity at that point in time is ds

over dt, the tiny change in distance over the tiny change in time. Graphically, imagine zooming in on the point

of the distance vs. time graph above t=3. That dt is a small step to the right, since

time is on the horizontal axis, and that ds is the resulting change in the height of the

graph, since the vertical axis represents distance traveled. So ds/dt is the rise-over-run slope between

two very close points on the graph. Of course, there’s nothing special about

the value t=3, we could apply this to any other point in time, so we consider this expression

ds/dt to be a function of t, something where I can give you some time t, and you can give

back to me the value of this ratio at that time; the velocity as a function of time. So for example, when I had the computer draw

this bump curve here representing the velocity function, the one you can think of as the

slope of this distance vs. time function at each point, here’s what I had computer do:

First, I chose some small value for dt, like 0.01. Then, I had the computer look at many times

t between 0 and 10, and compute the distance function s at (t + dt), minus the value of

this function at t. That is, the difference in the distance traveled

between the given time t, and the time 0.01 seconds after that. Then divide that difference by the change

in time dt, and this gives the velocity in meters per second around each point in time. With this formula, you can give the computer

any curve representing the distance function s(t), and it can figure out the curve representing

the velocity v(t). So now would be a good time to pause, reflect,

make sure this idea of relating distance to velocity by looking at tiny changes in time

dt makes sense, because now we’re going to tackle the paradox of the derivative head-on. This idea of ds/dt, a tiny change in the value

of the function s divided by a tiny change in the input t, is almost what the derivative

is. Even though out car’s speedometer will look

at an actual change in time like 0.01 seconds to compute speed, and even though my program

here for finding a velocity function given a position function also uses a concrete value

of dt, in pure math, the derivative is not this ratio ds/dt for any specific choice of

dt. It is whatever value that ratio approaches

as the choice for dt approaches 0 Visually, asking what this ratio approaches

has really a nice meaning: For any specific choice of dt, this ratio ds/dt is the slope

of a line passing through two points on the graph, right? Well, as dt approaches 0, and those two points

approach each other, the slope of that line approaches the slope of a line tangent to

the graph at whatever point t we’re looking at. So the true, honest to goodness derivative,

is not the rise-over-run slope between two nearby points on the graph; it’s equal to

the slope of a line tangent to the graph at a single point. Notice what I’m not saying: I’m not saying

that the derivative is whatever happens when dt is infinitely small, nor am I saying that

you plug in 0 for dt. This dt is always a finitely small, nonzero

value, it’s just approaching 0 is all. So even though change in an instant makes

no sense, this idea of letting dt approach 0 is a really clever backdoor way to talk

reasonably about the rate of change at a single point in time. Isn’t that neat? It’s flirting with the paradox of change

in an instant without ever needing to touch it. And it comes with such a nice visual intuition

as the slope of a tangent line at a single point on this graph. Since change in an instant still makes no

sense, I think it’s healthiest for you to think of this slope not as some “instantaneous

rate of change”, but as the best constant approximation for rate of change around a

point. It’s worth saying a few words on notation

here. Throughout this video I’ve been using “dt”

to refer to a tiny change in t with some actual size, and “ds” to refer to the resulting

tiny change in s, which again has an actual size. This is because that’s how I want you to

think about them. But the convention in calculus is that whenever

you’re using the letter “d” like this, you’re announcing that the intention is

to eventually see what happens as dt approaches 0. For example, the honest-to-goodness derivative

of the function s(t) is written as ds/dt, even though the derivative is not a fraction,

per se, but whatever that fraction approaches for smaller and smaller nudges in t. A specific example should help here. You might think that asking about what this

ratio approaches for smaller and smaller values of dt would make it much more difficult to

compute, but strangely it actually makes things easier. Let’s say a given distance vs. time function

was exactly t3. So after 1 second, the car has traveled 13=1

meters, after 2 seconds, it’s traveled 23=8 meters, and so on. What I’m about to do might seem somewhat

complicated, but once the dust settles it really is simpler, and it’s the kind of

thing you only ever have to do once in calculus. Let’s say you want the velocity, ds/dt,

at a specific time, like t=2. And for now, think of dt having an actual

size; we’ll let it go to 0 in just a bit. The tiny change in distance between 2 seconds

and 2+dt seconds is s(2+dt)-s(2), and we divide by dt. Since s(t)=t3, that numerator is (2+dt)3

– 23. Now this, we can work out algebraically. And again bear with me, there’s a reason

I’m showing you the details. Expanding the top gives 23 + 3*22dt + 3*2*(dt)2

+ (dt)3 – 23. There are several terms here, and I want you

to remember that it looks like a mess, but it simplifies. Those 23 terms cancel out. Everything remaining has a dt, so we can divide

that out. So the ratio ds/dt has boiled down to 3*22

+ two different terms that each have a dt in them. So as dt approaches 0, representing the idea

of looking at smaller and smaller changes in time, we can ignore those! By eliminating the need to think of a specific

dt, we’ve eliminated much of the complication in this expression! So what we’re left with is a nice clean

3*22. This means the slope of a line tangent to

the point at t=2 on the graph of t3 is exactly 3*22, or 12. Of course, there was nothing special about

choosing t=2; more generally we’d say the derivative of t3, as a function of t, is 3*t2. That’s beautiful. This derivative is a crazy complicated idea:

We’ve got tiny changes in distance over tiny changes in time, but instead of looking

at any specific tiny change in time we start talking about what this thing approaches. I mean, it’s a lot to think about. Yet we’ve come out with such a simple expression:

3t2. In practice, you would not go through all

that algebra each time. Knowing that the derivative t3 is 3t2 is one

of those things all calculus students learn to do immediately without rederiving each

time. And in the next video, I’ll show ways to

think about this and many other derivative formulas in nice geometric ways. The point I want to make by showing you the

guts here is that when you consider the change in distance of a change in time for any specific

value of dt, you’d have a whole mess of algebra riding along. But by considering what this ratio approaches

as dt approaches 0, it lets you ignore much of that mess, and actually simplifies the

problem. Another reason I wanted to show you a concrete

derivative like this is that it gives a good example for the kind of paradox that come

about when you believe in the illusion of an instantaneous rate of change. Think about this car traveling according to

this t3 distance function, and consider its motion at moment t=0. Now ask yourself whether or not the car is

moving at that time. On the one hand, we can compute its speed

at that point using the derivative of this function, 3t2, which is 0 at time t=0. Visually, this means the tangent line to the

graph at that point is perfectly flat, so the car’s quote unquote “instantaneous

velocity” is 0, which suggests it’s not moving. But on the other hand, if it doesn’t start

moving at time 0, when does it start moving? Really, pause and ponder this for a moment,

is that car moving at t=0? Do you see the paradox? The issue is that the question makes no sense,

it references the idea of of change in a moment, which doesn’t exist. And that’s just not what the derivative

measures. What it means for the derivative of the distance

function to be 0 at this point is that the best constant approximation for the car’s

velocity around that point is 0 meters per second. For example, between t=0 and t=0.1 seconds,

the car does move… it moves 0.001 meters. That’s very small, and importantly it’s

very small compared to the change in time, an average speed of only 0.01 meters per second. What it means for the derivative of this motion

to be 0 is that for smaller and smaller nudges in time, this ratio of change in distance

over change in time approaches 0, though in this case it never actually hits it. But that’s not to say the car is static. Approximating its movement with a constant

velocity of 0, after all, just an approximation. So if you ever hear someone refer to the derivative

as an “instantaneous rate of change”, a phrase which is intrinsically oxymoronic,

think of it as a conceptual shorthand for “the best constant approximation for the

rate of change” In the following videos I’ll talk more about

the derivative; what does it look like in different contexts, how do you actually compute

it, what’s it useful for, things like that, focussing on visual intuition as always. As I mentioned last video, this channel is

largely supported by the community through Patreon, where you can get early access to

future series like this as I work on them. One other supporter of the series, who I’m

incredibly proud to feature here, is the Art of Problem Solving. Interestingly enough, I was first introduced

to the Art of Problem Solving by my high school calculus teacher. It was the kind of relationship where I’d

frequently stick around a bit after school to just chat with him about math. He was thoughtful and encouraging, and he

once gave me a book that really had an influence on me back then. It showed a beauty in math that you don’t

see in school. The name of that book? The Art of Problem Solving. Fast-forward to today, where the Art of Problems

Solving website offers many many phenomenal resources for curious students looking to

get into math, most notably their full courses. This ranges from their newest inspiring offering

to get very young students engaged with genuine problem solving, called Beast Academy, up

to higher level offerings that cover the kind of topics that all math curious students should

engage with, like combinatorics, but which very few school include in their curriculum. Put simply, they’re one of the best math

education companies I know, and I’m proud to have them support this series. You can see what they have to offer by following

the link in the screen, also copied in the video description.