I want to have the symbols $dx$ and $dy$ represent physical quantities, namely changes between two variables that are connected by the derivative. I believe the Frechet derivative is what I want, but maybe different.
- Given a function $y=f(x)$ and a small error $\Delta x$, we have the quantity $\Delta y = f(x+\Delta x)-f(x)$. We seek to find a linear function $dy = m dx$ so that as $dx$ tends to zero, the difference between $\Delta f$ and $dy$ is not only small, but super small. What do we mean?
Start with examples.
Let $y=x^2$.
We replace $x$ with $x+dx$ and $y$ with $y+dy$ to obtain $y+dy=(x+dx)^2$.
Solving for $dy$ gives $$dy = x^2+2xdx+(dx)^2-y^2 = 2xdx+(dx)^2.$$
As $dx$ tends to zero, this entire sum becomes smaller (not this).
The idea is that the product of two small things is so small that we will ignore it.
Let's separate the parts above so that really small things are on one side. This gives
$$dy -2xdx = (dx)^2.$$
I don't want this.
- When I write $dy$ and $dx$, I want these symbols to represent parts of the tangent object. This means $dy -2xdx = 0.$
- When I write $\Delta y$ and $\Delta x$, I want these to represent parts of the original expression. This means $\Delta y -2x\Delta x = (\Delta x)^2.$
- The goal is to rearrange $\Delta y =2x\Delta x + (\Delta x)^2$ so that the really small parts are on one side.
- In one dimension, we can divide by $\Delta x\neq 0$ to obtain $\frac{\Delta y}{\Delta x} -2x = \Delta x,$ and the derivative sits in the problem. I don't want to use division at this stage. I want to postpone it.
- I specifically want the students to remember the idea that the product of two extremely small thing is so small that when we want to linearize something, we just ignore it. We are not claiming that $\Delta y = dy$, rather we are saying that $dy$ comes from a linearization of the function, and so quadratic terms involving squares of small things will be removed.
- This requires
If we started talking about tolerances $|dy|<\varepsilon$ and $|dx|<\delta$, could that help learn the language. I think I see my own failure here, as the notation $|dy|<\varepsilon$ is not what we want.
Traditionally, we write given $\varepsilon>0$, there exists $\delta>0$ such that for all nonzero $dx$ with magnitude less than $\delta$, we have $|\frac{f(x+dx)-f(x)}{dx} - f'(x)|<\varepsilon$. Rearranging gives $$|f(x+dx)-f(x) - f'(x)dx|<|dx|\varepsilon.$$ This is equivalent to $$|\Delta(y) - f'(x)dx|<|dx|\varepsilon$$ or $$|\Delta(y) - dy|<|dx|\varepsilon.$$ We want $\Delta(y)$ and $dy$ to not just be close, but really close. Given $\varepsilon >0$, find a small tolerance $\delta$ so that if $0<|dx|<\delta$, then $\Delta y$ and $dy$ are no more than $|dx|\varepsilon <\delta \varepsilon$ away. The difference between the actual difference, and the linearized difference.
Can this be rewritten completely so that it only focuses on if $0<|dx|<\delta$, then $\Delta y$ and $dy$ are no more than $\delta \varepsilon$ apart (skipping the intermediate $ |dx| $ ). That sounds like a fun experiment in analysis.