| Lecture 6: January 30, 2008 |
Parabolas are familiar to us as the form of the trajectory of an object projected upward and obliquely with respect to the source of gravity, for instance a bouncing ball.
Ellipses are often seen when a circle is projected onto an oblique plane, for instance by projecting the shadow of a round object onto a wall, viewing a circle from the side, or tilting a glass of water. The special properties of the foci of an ellipse are used by architects to create elliptically shaped rooms where the slightest noise made at one focus is reflected off the ceiling directly to the other focus (Whispering Gallery).
Hyperbolas are visible whenever a circle is projected onto a plane by a source of light with the plane at a sufficiently steep angle (otherwise an ellipse is produced): lamp.
One of the greatest discoveries of physics, of course, was the revolutionary insight due to Johannes Kepler (1571-1630), who understood that, contrary to the received wisdom of centuries, the trajectory of a planet passing through space, influenced by the gravitational attraction of a star, is a conic section. This allowed him to predict planetary positions with an accuracy unheard of for his times; he also published astrological almanacs predicting the weather and political events, and was found to be correct so often that he came under suspicion; around this period, his mother was arrested and spent 14 months in jail accused of witchcraft.
If the gravitational attraction of the star is sufficiently strong, the passing planet will be caught by it, and beginning orbiting the star in an elliptical orbit with the star as one focus (like the earth around the sun). If the gravitational attraction of the star is not strong enough to capture, it will nevertheless modify the incoming trajectory of the passing planet, bending what would have been a straight line in the absence of the star into a hyperbola.
One can measure the energy of the planetary situation by a number E, depending on the gravitational attraction of the fixed planet, and the incoming velocity of the passing planet. This number E is defined so that if E is negative, the orbit is an ellipse and if E is positive, the orbit is a hyperbola.
What happens in the very special situation where the energy E happens to be exactly 0?
Interestingly, if E=0, the trajectory is a parabola. In fact, a parabola is exactly the point at which an ellipse "becomes" a hyperbola. One way of seeing this is to note that a parabola is an ellipse which has one focus at infinity. Another way is to visualize the plane cutting through the cone. The plane starts horizontal, so that the section is a circle, and one then tilts the plane, obtaining an elliptic cross section. The cross section continues to be elliptical as long as the angle of the plane is less than the angle of a generating line. Then, when the plane reaches an angle exactly equal to a generating line, the cross section is a parabola, and when the plane is tilted to an angle even greater than that of a generating line, the cross section is a hyperbola. In this way we also perceive that the parabola is the exact shape at which an ellipse "becomes" a hyperbola.
Let us visualize some planetary orbits around a fixed star using the
a planetary motion program based on Kepler's laws.
Planetary motion.
CONIC SECTIONS AS LOCI ON THE PLANE
One often sees a completely different definition of the ellipse,
parabola, and hyperbola, given as loci of points on the plane
having certain properties with respect to a fixed point F,
the focus, and a fixed line D,
the directrixD not containing F.
Planar definition of conic sections:
An ellipse (parabola,
hyperbola) is the set of points in the plane the
ratio of whose distances from the fixed point F and the fixed line
D not containing F is a positive constant e,
called the eccentricity, which is less than 1
(equal to 1, greater than 1).
Using only the techniques of pure geometry available to the Greeks,
it turned out to be remarkably difficult to prove that the two definitions are
equivalent to each other. One indication of this is how difficult it
is to find such a proof online. See for example the way in which any
discussion of this topic -- the connection between the two definitions --
is avoided in the usually trustworthy source
Wikipedia, or in
History of Conic Sections.
If you visualize the plane cutting the cone, it really is
difficult to see in your mind's eye what the focus and the directrix
can be! In about 1820, a Belgian military engineer named
Germinal-Pierre Dandelin discovered a truly beautiful geometric explanation of the
connection between the two definitions, using spheres inscribed in the
cone. The proof is difficult, but the observation he made, which explains
how to identify the foci and directrices of a conic section on the plane
cutting through the cone, is simple to describe and visualize in a figure:
Dandelin spheres.
You may possibly be familiar with another definition of the ellipse, generalizing the definition of a circle as the set of points on the plane equidistant from a given fixed point: namely, an ellipse is the loci of points the sum of whose distances from two fixed points is a constant.
It is also far from obvious, geometrically, why this definition is
equivalent to either of the definitions of the ellipse given above. The
geometric proofs of these facts are deep and involved. To summarize
the main results on conic sections:
Theorem: All these different
definitions of conic sections are equivalent.
In the next lecture, we will apply algebraic notation to the
theorem on conic sections and show that with this new and modern tool,
the proofs become simple and easy. Then, in order not to minimise
the incredible advance that the use of algebraic notation represented
in the development of all branches of mathematics, we will spend the
following lectures on a closer inspection of the development
of algebra and algebraic notation, from the Renaissance through to
Descartes.
| Lecture 7: February 1, 2008 |
The equation of a right circular cone is
with r > 0. The equation of a non-vertical plane in 3-dimensional space is
The assumption that the plane does not pass through the origin means that the point (x,y,z) = (0,0,0) does not satisfy the plane equation, which simply means that we are assuming that k ≠ 0. Having these two equations makes it easy to compute the equation of the intersection of the cone and the plane: any point (x,y) on both the cone and the plane satisfies
Setting A = (rm)2 - 1, B = 2r2mn, C = (rn)2 - 1, D = 2r2km, E = 2r2kn, F = -(rk)2, this becomes
This equation is the generic form of an intersection between a right circular cone and a non-vertical plane in 3-dimensional space. In other words, every type of conic section is described by an equation of this form.
The key question here is how to determine conditions on the coefficients A,B,C,D,E,F which ensure that the curve described by this equation is a hyperbola, an ellipse, or a parabola. We will show that in fact, the answer is very simple: the three types of curves correspond to the sign of the discrimant of the equation:
What we have now are three different definitions of conic sections :
Geometric definition: A conic section is the cross section obtained when a plane cuts a right circular cone. It is an ellipse if the cutting plane is less steep than the side of the cone, a parabola if the slopes are equal, and a hyperbola if the plane is steeper.
Analytic definition: A curve in the plane is a conic section if it is the locus of points in the plane such that for a given point (the focus) and a given line (the directrix), the ratio of the distance from a point on the curve to the focus and the distance from that point to the directrix is a constant. The curve is an ellipse if the ratio is less than 1, a parabola if the ratio is equal to 1, and a hyperbola if the ratio is greater than 1.
Algebraic definition: A conic section is a non-degenerate curve in the plane defined by a quadratic equation of the form AX2 + BXY + CY2 + DX + EY = F. The discriminant is defined to be Δ = B2 - 4AC. The curve is an ellipse if Δ < 0, a parabola if Δ = 0 and a hyperbola if Δ > 0.
Big Conic Section Theorem. Consider the quadratic equation AX2 + BXY + CY2 + DX + EY = F. Assume that it does not describe a degenerate locus in the plane (the empty set, points or lines). Let Δ = B2 - 4AC. Then
* If Δ = 0, then the planar curve defined by the equation is a parabola according to both the geometric and the analytic definitions.
* If Δ < 0, then the planar curve defined by the equation is an ellipse according to both the geometric and the analytic definitions (and the two-focus definition).
* If Δ > 0, then the planar curve defined by the equation is a hyperbola according to both the geometric and the analytic definitions.
Important remark: Before embarking on the proof of this theorem, let's describe the plan
of the proof. We are going to first prove that the assumption on the
sign of Δ implies the geometric definition of a conic section.
Then, we will prove that the assumption on the sign of Δ implies
the analytic definition of a conic section. The important remark here
is that these two implications actually prove all possible
implications,
in other words they prove all of the equivalences between all of the
definitions. For instance, if we have a curve which is an ellipse
by the analytic definition but has an equation whose discriminant is
zero, then the theorem shows that the curve is a parabola by the
analytic definition, so it can't be an ellipse.
| Lecture 8: February 4, 2008 |
Proof of the conic section theorem (first of 3 lectures).
Part 1: The algebraic definition implies the geometric
definition.
The first thing we will do is connect the assumption about the sign
of Δ with a condition on the slope of the cutting
plane z = mx + ny + k. This connection uses a little bit of math
more advanced that what we usually use in this course, so there may be
a couple of notions you will see below for the first time, such as
vectors and dot-products. Not to worry.
A vector is nothing but a ray between two points in 3-dimensional space; what counts is the length and direction of the ray, not the starting point. It is considered to be the same vector no matter where it starts. For instance, the vector pointing from (0,0,0) to (1,1,1) is the same as the vector pointing from (2,2,2) to (3,3,3), or the one from (2,1,3) to (3,2,4), since all these vectors are identical as rays; they are simply being translated around 3-space without being changed. When we specify a vector as though it is a point, writing v=(a,b,c), we automatically mean the vector pointing from (0,0,0) to (a,b,c).
If u=(a,b,c) is a vector in 3-space, we write |u| for its length. We can define a multiplication on vectors, written as a dot and called the dot-product, by the formula
Notice at once that if u and v are perpendicular, then u.v=0. This is because if they are perpendicular, the angle between them is 90o, and so we compute u.v = |u| |v| cos 90o = 0. Similarly, if two vectors are parallel, then the angle between them is 0o, and cos 0o=1, so if u and v are parallel, we have u.v = |u| |v|.
Now we have two small theorems on vectors (both proved in class).
Vector Theorem 1. If u=(a,b,c) is
a vector in 3-space, then the length |u| of u is the square root
of a2+b2+c2.
Proof. Use Pythagoras' theorem twice, once
to show that the length of the base vector (a,b) in the xy-plane is
√(a2+b2), and again to show that the vector itself,
which is the hypotenuse of the triangle of that base and height c,
has length the square root of c2+(a2+b2).
Vector Theorem 2. If u=(a,b,c) and v=(d,e,f) are two vectors in 3-space, a formula for the dot product is
Proof.
Let i=(1,0,0), j=(0,1,0) and k=(0,0,1). These
vectors are mutually perpendicular to each other, so the angle between
each pair of them is 90o, and since cos 90o=0,
we find that the dot products i.j=i.k=j.k=0.
On the other hand, since i is parallel to itself, the angle
between i and itself is 0, and since cos 0o = 1 and
the length of the vector i is 1, we have i.i = 1, and
similarly j.j=k.k=1. Now we can compute the dot product
of any pair of vectors u.v=(a,b,c).(d,e,f) by
writing u=ai+bj+ck and
v=di+ej+fk. The dot product is then computed by
multiplying out u.v = (ai+bj+ck).(di+ej+fk) = ad(i.i) + ae(i.j) + af(i.k) + bd(j.i) + be(j.j) + bf(j.k) + cd(k.i) + ce(k.j) + cf(k.k) = ad+be+cf.
If θ is the angle between u and v, then Vector Theorems 1 and 2 together give us the equality
Now we are ready to turn to the proof of the conic section theorem. First, recall from higher up on this page (lecture 7) that the intersection of the cone x2+y2 = r2z2 with the plane z = mx + ny + k is given by the equation Ax2+Bxy+Cy2+Dx+Ey=F where A=r2m2-1, B=2r2mn, C=r2n2-1, D=2r2km, E=2r2kn, F=-(rk)2. This lets us translate the quantity Δ = B2-4AC back into terms of the original coefficients r,m,n,k of the cone and the plane, and we find
| Lecture 9: February 6, 2008 |
Proof of the conic section theorem (second of 3 lectures).
Our task today is to show that the condition Δ<0 (Δ=0, Δ>0), which is the algebraic definition of an ellipse (parabola, hyperbola), implies that the angle of the cutting plane is less than (equal to, greater than) the angle of the generating line of the cone, namely implies the geometric definition of an ellipse (parabola, hyperbola).
We start by giving an equivalent formulation of the geometric definition that the slope of the plane is less than (equal to, greater than) the slope of a generating line: namely, we rephrase this by saying that the angle θ between a vector perpendicular to the plane and a vector along the generating line of the cone must be less than (equal to, greater than) 90o. The equivalence of these two geometric conditions is clear from the following diagram. Cone diagram
The above discussion means that the result we must prove can be phrased as follows:
Proposition. If &Delta < 0 then &theta < 90o. If &Delta = 0 then &theta = 90o. Finally, if &Delta > 0 then &theta > 90o.
This result can be summarized as: The algebraic definition of a conic section implies the geometric definition of the same conic section.
The remainder of this lecture is devoted to proving this proposition. (Then, in the following lecture, it will remain only to prove that the algebraic definition implies the analytic definition.) To prove it, we are going to need to explicitly compute a vector u along a generating line of the cone, and a vector v perpendicular to the plane.
Fact 1. The vector u given in 3-space by
(rm,rn ,√(m2+n2)) lies along a generating line
of the cone.
To see this, note simply that the point
satisfies the cone equation x2+y2 = (rz)2
, so it lies on the cone. Thus, if u denotes the vector
(ray) from the point (0,0,0) to this point, clearly v lies along
a generating line.
Fact 2. The vector v from (0,0,0) to the point
(-m,-n,1) in 3-space is perpendicular to the plane of equation
z = mx + ny + k (and pointing upwards, since it has a positive
z-coordinate).
To see this, note that the plane of equation
z = mx + ny is parallel to the plane z = mx+ny+k, since
adding a constant simply translates the whole plane vertically in 3-space.
So to show that v=(-m,-n,1) is perpendicular to z = mx+ny+k,
it is equivalent to show that it is perpendicular to the plane z = mx+ny,
which passes through the origin. For this, it is enough to show that
(-m,-n,1) is perpendicular to every single vector starting at (0,0,0)
and lying on the plane z=mx+ny. All points on the plane are given by
(x,y,mx+ny); each such point determines a vector on the plane from
(0,0,0) to the point. By the dot-product formula, we find that
We are now in possession of a vector v=(-m,-n,1) perpendicular to the plane, and a vector u = (rm,rn,√(m2+n2)) along a generating line of the cone. Applying the dot-product to these two vectors yields the useful formula
Recall that what we want to show is that
Δ<0 implies θ < 90o,
Δ=0 implies θ = 90o, and
Δ>0 implies
θ > 90o.
Let us consider each of the three cases separately.
The first case is when we assume that Δ < 0. We saw above that this is equivalent to 4r2(m2+n2) -4 < 0, which itself is equivalent to the inequality r < 1/√(m2+n2). This shows that the left-hand side of the dot-product formula above is strictly positive. Since the coefficient on the right-hand side is also positive, this must mean that cos θ is strictly positive, which means that θ < 90o. A sketch shows that this precisely means that the slope of the plane is less than the slope of the generating line, so we have proved that if Δ < 0, then the cutting plane is less steep than the generating line of the cone. Thus, the conic section is an ellipse according to the geometric definition.
The second case is when we assume that Δ = 0. We saw above that this is equivalent to 4r2(m2+n2)-4=0, which itself is equivalent to r = 1/√(m2+n2). Plugging this into the dot-product formula makes the left-hand term become equal to zero! Thus, the right-hand term must be zero, and since the coefficient is strictly positive, this means that cos θ=0, which in turn means that θ=90o. In other words, we have proved that when Δ=0, the two vectors are perpendicular, so the plane is parallel to the generating line. This is of course nothing but the geometric definition of the parabola.
The third case is when we assume that
Δ > 0. As above, this now means that
r > 1/√(m2+n2). Then,
the left-hand side of the dot-product formula above is strictly
negative. Since the coefficient on the right-hand side is positive,
this must mean that cos θ is strictly negative, which means that
θ > 90o. And this exactly
means that the slope of the plane is greater than the slope of the
generating line, so we have proved that if
Δ > 0, then the cutting plane is steeper
than the generating line of the cone.
Thus, the conic section is an hyperbola
according to the geometric definition.
| Lecture 10: February 8, 2008 |
Proof of the conic section theorem (third of 3 lectures).
Today we are continuing the proof of the main theorem on conic sections. Last time we proved that the algebraic definition of each type of conic section, given by the sign of the discriminant B2 - 4AC of the fundamental equation AX2 + BXY + CY2 + DX + EY = F implies the geometric definition of each type of conic section, given by the slope of the plane cutting the cone. Today we are going to prove that the algebraic definition implies the analytic definition of each type of conic section, given by the size of the ratio of distances of a point from a focus and a directrix. This will complete the proof of the main theorem.
Part 2: The algebraic definition implies the analytic definition.
Case 1. Δ = 0
We begin by assuming that Δ = 0. We will show that the curve described by the fundamental equation satisfies the analytic definition of a parabola. Note to start with that we cannot have A = C = 0 in the fundamental equation, because then Δ = B2 - 4AC = 0 would imply that B = 0, and the equation would describe a line, contradicting our assumption that it describes a non-degenerate conic section. We have three cases: the first case is where C = 0 and A ≠ 0. Then again the assumption that Δ = B2 - 4AC = 0 means that B must equal 0, so the fundamental equation has the form AX2 + DX + EY = F. Now, E cannot be equal to zero here, because if it were, this would be a quadratic polynomial in X, having at most two possible solutions for X, which would mean that the equation describes a pair of lines, contradicting our assumption that our curve is a non-degenerate conic section. Thus E ≠ 0, so we can divide the whole equation by E, and then, up to renaming the coefficients and variables, we get an equation of the form (P) given by
The second case is where A = 0 and C ≠ 0. It is exactly the same as the previous case, again giving an equation of the form (P). Finally, consider the case A ≠ 0, C ≠ 0, Δ = 0 . Multiplying both sides of the fundamental equation by -1 if necessary, we may assume that A > 0. Let α = √A. We make the variable change x = αX + (B/2&alpha)Y, y = Y in the fundamental equation. It becomes
Since the coefficient of y2 is (1/4A)Δ, it is equal to zero, so renaming the coefficients, the equation simply has the form Ax2 + dx + ey = f. As above, the coefficient e of y equation cannot be equal to zero, otherwise the equation describes two lines and not a conic section. So dividing out the whole equation by e, putting y alone on one side and renaming the other coefficients, we again find an equation of the form (P). What we have done so far is to show that if Δ = 0, the curve described by the fundamental equation has the same form as the curve described by an equation of the form (P).
Now we can show that Δ = 0 implies that the equation (P) describes a parabola according to the analytic, focus/directrix definition. So we have to specify a focus and a directrix for the curve described by the equation y = ax2 + bx + c. Set δ = b2 -4ac (do not confuse this quantity with Δ !). We claim that we can take:
Now all we need to do is prove that the curve satisfies the analytic definition of a parabola, which says that any point (x,y) on the parabola must be equidistant from the focus and from the directrix. If (x,y) is a point on the curve described by y = ax2 + bx + c, then
To compute the distance between the point (x,y) and the focus F=(α,β), we use the Pythagorean distance formula
Using the equality y = ax2 + bx + c =
a(x+b/2a)2 + c-b2/4a = a(x+b/2a)2
-(&delta/4a), we find that
(x+b/2a)2 = y/a + (&delta/4a2)
. We can plug
this in above to obtain the distance from the focus
F as
Thus, a point on the curve described by the fundamental equation with Δ=0 is equidistant from the focus and the directrix, which shows that it is indeed a parabola according to the analytic definition.
Case 2. Δ ≠ 0
Set
and substitute X = x- p, Y = y - q into the fundamental equation AX2 + BXY + CY2 + DX + EY = F. These values of p and q are chosen on purpose so that the linear terms cancel out and the equation becomes:
Now we make a second variable change, x = (1/α)X - (B/2A)Y and y = Y, where α = √A. Then the equation in terms of the new X and Y becomes
Case 2.1. Δ < 0
Then G cannot be negative, since the left-hand side is the sum of two positive terms, so if G were negative we would have an equation with no solutions, which is impossible since we know that the plane cuts the cone in a non-empty set of points. So G must be positive, and we can define a = √ G. Also, since Δ < 0, we can define b as the square root of the positive number -4AG/Δ. Then the equation takes the final form (E) given by
To prove that the curve in the plane defined by this equation satisfies the analytic definition of an ellipse, we have to give a focus F and a directrix D and show that for every point p on the curve, the ratio (distance from p to F) divided by (distance from p to D) is less than 1. In the special case where a=b, we don't need to do this, because the equation takes the form x2+y2 = a2, so the distance from every point on the curve to the origin is equal to a; thus, the curve is a circle (which is an ellipse). Now consider the general case a ≠ b. We may assume that a > b (otherwise exchange names by symmetry). Then we take:
If (x,y) is a point on the curve described by x2/a2 + y2/b2 = 1, then
To compute the distance from the focus, we use the Pythagorean distance formula
Expanding and plugging in the identity y2 = b2- (b2/a2)x2, this lets us compute the distance to the focus F as
As a final bonus yielded by this computation, we note that by symmetry, the second focus of the ellipse is the point (-√(a2- b2),0). It is a simple algebraic calculation to see that for any point (x,y) on the ellipse, the sum of the distances to the two foci is always equal to 2a.
Case 2.2. Δ > 0
Here, we again have an argument to show that G must be positive. In fact, we see that for every point (x,y) on the (non-empty) curve defined by the equation, the point (-x,y) also lies on the curve. Thus, the curve is symmetric around the y-axis. Since the curve is continuous, it has a point with y-coordinate equal to 0. This point (x,0) satisfies the equation 4Ax2 = 4AG, showing that G cannot be negative.
Thus, we may define numbers a = √ G and b = √ 4AG/Δ, and the equation takes the final form (H)
To prove that the curve in the plane defined by this equation satisfies the analytic definition of a hyperbola, we have to give a focus F and a directrix D and show that for every point p on the curve, the ratio (distance from p to F) divided by (distance from p to D) is greater than 1. We take;
If (x,y) is a point on the curve described by x2/a2 + y2/b2 = 1, then
The computation of the distance from the focus is analogous to the one above for the ellipse, and we find that
Dividing this distance to the focus by the distance to the directrix gives a constant value, the eccentricity, equal to e = (1/a)√(a2+b2). In order for our curve to be a hyperbola, we must have e > 1. Squaring both sides, we see that this is equivalent to (a2+b2) > a2, which is of course true since b2 is strictly positive.
We have now completed the proof that the algebraic definition of a conic section implies the analytic definition in all three cases. Thus, the proof of the main theorem on conic sections is complete.