The ultimate goal of Minkowski theory is to derive a fairly sharp bound on the class number of a number field, and on the way provide some tools to prove Dirichlet’s theorem on units. It can also be used for pure number theoretic proofs, such as that every prime of the form 4n + 1 can be written as the sum of two squares, or that every positive integer can be written as the sum of four squares.
To get the bound, we need to prove a theorem on matrices, called Minkowski’s linear forms theorem. In its most basic version, suppose that we have n real linear forms in n variables, L(i)(x1, x2, …, x(n)) = a(i1)x1 + … + a(in)x(n). A linear form in n variables is just a linear function in all the variables with no constant term. Now, look at the matrix A whose (i, j)th entry is a(ij). If |A| != 0, and we have real constants t1, t2, …, t(n) such that t1*t2*…*t(n) >= ||A||, then there exist integers x1, x2, …, x(n) not all zero such that |L(i)(x1, x2, …, x(n))| <= t(i) for all i (||A|| is just the absolute value of |A|).
To see why, let’s argue by contradiction. Suppose that it’s not true. I’m going to show that we have an infinite chain of points in Z^n whose images under the L(i)’s get successively closer to the t(i)’s. Let’s look at any point in Z^n, say w, and label its image under each L(i) by s(i). By assumption, for some i we have s(i) > t(i). For simplicity, let’s suppose that s1 > t1.
Now, look at the region S in R^n consisting of all points (x1, …, x(n)) such that |L1(x1, …, x(n))| < s1 and |L(i)(x1, …, x(n))| < t(i) for i > 1. Its volume is 2s1*2t2*2t3*…*2t(n) = 2^n(s1t2…t(n)) > 2^n||A||, and it’s clearly convex and symmetric. The points of the form (L1(v), L2(v), …, L(n)(v)), where v in Z^n, form a lattice in R^n whose matrix is A. By Minkowski’s theorem, S then contains a nonzero point of the lattice, say w‘.
By assumption, we can’t have s‘1 = L1(w‘) <= t1, so we have t1 < s‘1 < s1. In particular, w‘ and w are different. In similar vein, we can get w” with t1 < s”1 < s‘1 < s1, then w”’, etc., all of which belong to the same lattice, and all of which are in S. This means S has infinitely many points belonging to the lattice.
But in fact, every convex region of finite, nonzero volume can only contain finitely many lattice points. Here I’ll assume that the region is any S, and the lattice is Z^n, to simplify matters, but by using a linear transformation, it works for every lattice.
Since the region has nonzero volume, we can find some small solid volume in it, say a ball of some positive radius e. We can assume that the center of the ball is the zero point, and then get the more general result by translation. If S contains a point v = (x1, …, x(n)), then it contains the entire cone between v and the ball of radius e. The volume of the cone is proportional to its height, regardless of how many dimensions there are, and the height is just the distance of v from 0, which is SQRT(x1^2 + … + x(n)^2). Since S is finite, the volume of the cone is bounded, so SQRT(x1^2 + … + x(n)^2) is bounded, say by r.
That means S is contained in a ball of radius r, which is contained in a cube of side 2r whose sides are parallel to (1, 0, 0, …, 0), (0, 1, 0, …, 0), …, (0, 0, …, 1). But that cube can only contain (2r+1)^n lattice points. So S only contains finitely many lattice points and we are done.
A somewhat more general form of the linear forms theorem allows some of the linear forms to be complex. But then for every L(i), we must find some L(j) that is its complex conjugate (the complex conjugate of a+bi, where a and b are real, is a–bi; note that complex conjugation is a ring homomorphism of C, and that all and only real number are their own complex conjugates). Since in that case we’ll invariably have |L(i)| = |L(j)|, we also need to have t(i) = t(j).
The proof of this more general formulation is just a reduction to the basic case. If i1 and i2 are two distinct indices such that L(i1) and L(i2) are complex conjugates, then we use row operations to change L(i1) to L(i1)+L(i2) which will be real and then to (L(i1)+L(i2))/2 which is the real part of L(i2), and then L(i2) to L(i2) – (L(i1)+L(i2))/2 = (L(i2)-L(i1))/2 and to (L(i2)-L(i1))/2i which is again real.
If A has r real forms and s pairs of complex conjugate forms, then this trick turns A into a real matrix whose determinant’s absolute value is ||A||/2^s. We can also change each t(i) to t(i)/SQRT(2) for indices of proper complex forms, which reduces the absolute value of the product of the t(i)’s by a factor of 2^s as well. The basic theorem gives us a nonzero point in Z^n.
Now, that point will also satisfy the conditions of the complex forms. For indices of real forms, there’s nothing to prove. For proper complex forms, we have |(L(i1)(v)+L(i2)(v))/2| < t/SQRT(2) and |(L(i2)(v)-L(i1)(v))/2i| < t/SQRT(2), where t = t(i1) = t(i2). The absolute value of a+bi is SQRT(a^2 + b^2), so |L(i1)(v)| = |L(i2)(v)| = SQRT(|(L(i1)(v)+L(i2)(v))/2|^2 + |(L(i2)(v)-L(i1)(v))/2i|^2) <= SQRT(t^2/2 + t^2/2) = t as desired.
The obvious application of this to number fields is about the linear forms defined by the matrix corresponding to an integral basis. But to turn it into something useful requires some more work.