# The paradox of the derivative | Essence of calculus, chapter 2

The goal here is simple: Explain what a derivative
is. Thing is, though, there’s some subtlely
to this topic, and some potential for paradoxes if you’re not careful, so the secondary
goal is that you have some appreciation for what those paradoxes are and how to avoid
them. You see, it’s common for people to say that
the derivative measures “instantaneous rate of change”, but if you think about it, that
phrase is actually an oxymoron: Change is something that happens between separate points
in time, and when you blind yourself to all but a single instant, there is no more room
for change. You’ll see what I mean as we get into it,
and when you appreciate that a phrase like “instantaneous rate of change” is nonsensical,
it makes you appreciate how clever the fathers of calculus were in capturing the idea this
phrase is meant to evoke with a perfectly sensible piece of math: The derivative. As our central example, imagine a car that
starts at some point A, speeds up, then slows to a stop at some point B 100 meters away,
all over the course of 10 seconds. This is the setup I want you to keep in mind
while I lay out what exactly a derivative is. We could graph this motion, letting a vertical
axis represent the distance traveled, and a horizontal axis represent time. At each time t, represented with a point on
the horizontal axis, the height of the graph tells us how far the car has traveled after
that amount of time. It’s common to name a distance function
like this s(t). I’d use the letter d for distance, except
that it already has another full time job in calculus. Initially this curve is quite shallow, since
the car is slow at the start. During the first second, the distance traveled
by the car hardly changes at all. For the next few seconds, as the car speeds
up, the distance traveled in a given second gets larger, corresponding to a steeper slope
in the graph. And as it slows towards the end, the curve
shallows out again. If we were to plot the car’s velocity in
meters per second as a function of time, it might look like this bump. At time t=0, the velocity is 0. Up to the middle of the journey, the car builds
up to some maximum velocity, covering a relatively large distance in each second. Then it slows back down to a speed of 0 meters
per second. These two curves are highly related to each
other; if you change the specific distance vs. time function, you’ll have some different
velocity vs. time function. We want to understand the specifics of this
relationship. Exactly how does velocity depend on this distance
vs. time function. It’s worth taking a moment to think critically
about what velocity actually means here. Intuitively, we all know what velocity at
a given moment means, it’s whatever the car’s speedometer shows in that moment. And intuitively, it might make sense that
velocity should be higher at times when the distance function is steeper; when the car
traverses more distance per unit time. But the funny thing is, velocity at a single
moment makes no sense. If I show you a picture of a car, a snapshot
in an instant, and ask you how fast it’s going, you’d have no way of telling me. What you need are two points in time to compare,
perhaps comparing the distance traveled after 4 seconds to the distance traveled after 5
second. That way, you can take the change in distance
over the change in time. Right? That’s what velocity is, the distance traveled
over a given amount of time. So how is it that we’re looking at a function
for velocity that only takes in a single value for t, a single snapshot in time. It’s weird, isn’t it? We want to associate each individual point
in time with a velocity, but computing velocity requires comparing two points in time. If that feels strange and paradoxical, good! You’re grappling with the same conflict
that the fathers of calculus did, and if you want a deep understanding of rates of change,
not just for a moving car, but for all sorts of scenarios in science, you’ll need a resolution
to this apparent paradox. First let’s talk about the real world, then
we’ll go into a purely mathematical one. Think about what an actual car’s speedometer
might be doing. At some point, say 3 seconds into the journey,
the speedometer might measure how far the car goes in a very small amount of time, perhaps
the distance traveled between 3 seconds and 3.01 seconds. Then it would compute the speed in meters
per second as that tiny distance, in meters, divided by that tiny time, 0.01 seconds. That is, a physical car can sidestep the paradox
by not actually computing speed at a single point in time, and instead computing speed
during very small amounts of time. Let’s call that difference in time “dt”,
which you might think of as 0.01 seconds, and call the resulting difference in distance
traveled “ds”. So the velocity at that point in time is ds
over dt, the tiny change in distance over the tiny change in time. Graphically, imagine zooming in on the point
of the distance vs. time graph above t=3. That dt is a small step to the right, since
time is on the horizontal axis, and that ds is the resulting change in the height of the
graph, since the vertical axis represents distance traveled. So ds/dt is the rise-over-run slope between
two very close points on the graph. Of course, there’s nothing special about
the value t=3, we could apply this to any other point in time, so we consider this expression
ds/dt to be a function of t, something where I can give you some time t, and you can give
back to me the value of this ratio at that time; the velocity as a function of time. So for example, when I had the computer draw
this bump curve here representing the velocity function, the one you can think of as the
slope of this distance vs. time function at each point, here’s what I had computer do:
First, I chose some small value for dt, like 0.01. Then, I had the computer look at many times
t between 0 and 10, and compute the distance function s at (t + dt), minus the value of
this function at t. That is, the difference in the distance traveled
between the given time t, and the time 0.01 seconds after that. Then divide that difference by the change
in time dt, and this gives the velocity in meters per second around each point in time. With this formula, you can give the computer
any curve representing the distance function s(t), and it can figure out the curve representing
the velocity v(t). So now would be a good time to pause, reflect,
make sure this idea of relating distance to velocity by looking at tiny changes in time
dt makes sense, because now we’re going to tackle the paradox of the derivative head-on. This idea of ds/dt, a tiny change in the value
of the function s divided by a tiny change in the input t, is almost what the derivative
is. Even though out car’s speedometer will look
at an actual change in time like 0.01 seconds to compute speed, and even though my program
here for finding a velocity function given a position function also uses a concrete value
of dt, in pure math, the derivative is not this ratio ds/dt for any specific choice of
dt. It is whatever value that ratio approaches
as the choice for dt approaches 0 Visually, asking what this ratio approaches
has really a nice meaning: For any specific choice of dt, this ratio ds/dt is the slope
of a line passing through two points on the graph, right? Well, as dt approaches 0, and those two points
approach each other, the slope of that line approaches the slope of a line tangent to
the graph at whatever point t we’re looking at. So the true, honest to goodness derivative,
is not the rise-over-run slope between two nearby points on the graph; it’s equal to
the slope of a line tangent to the graph at a single point. Notice what I’m not saying: I’m not saying
that the derivative is whatever happens when dt is infinitely small, nor am I saying that
you plug in 0 for dt. This dt is always a finitely small, nonzero
value, it’s just approaching 0 is all. So even though change in an instant makes
no sense, this idea of letting dt approach 0 is a really clever backdoor way to talk
reasonably about the rate of change at a single point in time. Isn’t that neat? It’s flirting with the paradox of change
in an instant without ever needing to touch it. And it comes with such a nice visual intuition
as the slope of a tangent line at a single point on this graph. Since change in an instant still makes no
sense, I think it’s healthiest for you to think of this slope not as some “instantaneous
rate of change”, but as the best constant approximation for rate of change around a
point. It’s worth saying a few words on notation
here. Throughout this video I’ve been using “dt”
to refer to a tiny change in t with some actual size, and “ds” to refer to the resulting
tiny change in s, which again has an actual size. This is because that’s how I want you to
think about them. But the convention in calculus is that whenever
you’re using the letter “d” like this, you’re announcing that the intention is
to eventually see what happens as dt approaches 0. For example, the honest-to-goodness derivative
of the function s(t) is written as ds/dt, even though the derivative is not a fraction,
per se, but whatever that fraction approaches for smaller and smaller nudges in t. A specific example should help here. You might think that asking about what this
ratio approaches for smaller and smaller values of dt would make it much more difficult to
compute, but strangely it actually makes things easier. Let’s say a given distance vs. time function
was exactly t3. So after 1 second, the car has traveled 13=1
meters, after 2 seconds, it’s traveled 23=8 meters, and so on. What I’m about to do might seem somewhat
complicated, but once the dust settles it really is simpler, and it’s the kind of
thing you only ever have to do once in calculus. Let’s say you want the velocity, ds/dt,
at a specific time, like t=2. And for now, think of dt having an actual
size; we’ll let it go to 0 in just a bit. The tiny change in distance between 2 seconds
and 2+dt seconds is s(2+dt)-s(2), and we divide by dt. Since s(t)=t3, that numerator is (2+dt)3
– 23. Now this, we can work out algebraically. And again bear with me, there’s a reason
I’m showing you the details. Expanding the top gives 23 + 3*22dt + 3*2*(dt)2
+ (dt)3 – 23. There are several terms here, and I want you
to remember that it looks like a mess, but it simplifies. Those 23 terms cancel out. Everything remaining has a dt, so we can divide
that out. So the ratio ds/dt has boiled down to 3*22
+ two different terms that each have a dt in them. So as dt approaches 0, representing the idea
of looking at smaller and smaller changes in time, we can ignore those! By eliminating the need to think of a specific
dt, we’ve eliminated much of the complication in this expression! So what we’re left with is a nice clean
3*22. This means the slope of a line tangent to
the point at t=2 on the graph of t3 is exactly 3*22, or 12. Of course, there was nothing special about
choosing t=2; more generally we’d say the derivative of t3, as a function of t, is 3*t2. That’s beautiful. This derivative is a crazy complicated idea:
We’ve got tiny changes in distance over tiny changes in time, but instead of looking
at any specific tiny change in time we start talking about what this thing approaches. I mean, it’s a lot to think about. Yet we’ve come out with such a simple expression:
3t2. In practice, you would not go through all
that algebra each time. Knowing that the derivative t3 is 3t2 is one
of those things all calculus students learn to do immediately without rederiving each
time. And in the next video, I’ll show ways to
think about this and many other derivative formulas in nice geometric ways. The point I want to make by showing you the
guts here is that when you consider the change in distance of a change in time for any specific
value of dt, you’d have a whole mess of algebra riding along. But by considering what this ratio approaches
as dt approaches 0, it lets you ignore much of that mess, and actually simplifies the
problem. Another reason I wanted to show you a concrete
derivative like this is that it gives a good example for the kind of paradox that come
this t3 distance function, and consider its motion at moment t=0. Now ask yourself whether or not the car is
moving at that time. On the one hand, we can compute its speed
at that point using the derivative of this function, 3t2, which is 0 at time t=0. Visually, this means the tangent line to the
graph at that point is perfectly flat, so the car’s quote unquote “instantaneous
velocity” is 0, which suggests it’s not moving. But on the other hand, if it doesn’t start
moving at time 0, when does it start moving? Really, pause and ponder this for a moment,
is that car moving at t=0? Do you see the paradox? The issue is that the question makes no sense,
it references the idea of of change in a moment, which doesn’t exist. And that’s just not what the derivative
measures. What it means for the derivative of the distance
function to be 0 at this point is that the best constant approximation for the car’s
velocity around that point is 0 meters per second. For example, between t=0 and t=0.1 seconds,
the car does move… it moves 0.001 meters. That’s very small, and importantly it’s
very small compared to the change in time, an average speed of only 0.01 meters per second. What it means for the derivative of this motion
to be 0 is that for smaller and smaller nudges in time, this ratio of change in distance
over change in time approaches 0, though in this case it never actually hits it. But that’s not to say the car is static. Approximating its movement with a constant
velocity of 0, after all, just an approximation. So if you ever hear someone refer to the derivative
as an “instantaneous rate of change”, a phrase which is intrinsically oxymoronic,
think of it as a conceptual shorthand for “the best constant approximation for the
rate of change” In the following videos I’ll talk more about
the derivative; what does it look like in different contexts, how do you actually compute
it, what’s it useful for, things like that, focussing on visual intuition as always. As I mentioned last video, this channel is
largely supported by the community through Patreon, where you can get early access to
future series like this as I work on them. One other supporter of the series, who I’m
incredibly proud to feature here, is the Art of Problem Solving. Interestingly enough, I was first introduced
to the Art of Problem Solving by my high school calculus teacher. It was the kind of relationship where I’d
frequently stick around a bit after school to just chat with him about math. He was thoughtful and encouraging, and he
once gave me a book that really had an influence on me back then. It showed a beauty in math that you don’t
see in school. The name of that book? The Art of Problem Solving. Fast-forward to today, where the Art of Problems
Solving website offers many many phenomenal resources for curious students looking to
get into math, most notably their full courses. This ranges from their newest inspiring offering
to get very young students engaged with genuine problem solving, called Beast Academy, up
to higher level offerings that cover the kind of topics that all math curious students should
engage with, like combinatorics, but which very few school include in their curriculum. Put simply, they’re one of the best math
education companies I know, and I’m proud to have them support this series. You can see what they have to offer by following
the link in the screen, also copied in the video description.