Magic Squares and Algebraic Number Theory

March 9, 2007

Foxy asks whether it’s possible to have a 3*3 magic square all of whose entries are square integers. It’s fairly elementary to show that in a 3*3 magic square, the common sum is three times the middle entry. So let’s say the entries of the magic square go [a^2, b^2, c^2; d^2, e^2; f^2; g^2, q^2, h^2]; we want to have, for example, a^2 + h^2 = 2e^2.

This alone is very restrictive, since it requires (a^2, e^2, h^2) to be an arithmetic sequence, a sufficiently special situation that unless a = e = h, the next term in the sequence will never be a square (no, I can’t prove it; I’ve tried). So to find such magic squares whose entries are not all the same, it makes sense to start bashing through the Diophantine equation a^2 + h^2 = 2e^2.

First, the easy part: we can assume a, e, and h are coprime; otherwise, divide by their greatest common divisor. If e is even, then 2e^2 is divisible by 8, in which case it’s impossible for a and h to be odd since odd squares are equivalent to 1 mod 8, so that a and h are even and the three numbers are not coprime. So e is odd, from which it’s easy to show that so are a and h.

Now, when we have sums of two squares, the most natural environment to study divisibility is Z[i], the ring of Gaussian integers. We get (a + hi)(ahi) = 2e^2. As a and h are odd, a (+/-) hi is not divisible by 2, so both a + hi and ahi must be divisible by 1 + i. That condition isn’t a problem: (a + hi)/(1 + i) = (a + hi)(1 – i)/2 = (a + h)/2 + i(ha)/2, and similarly (ahi)/(1 – i) = (a + h)/2 + i(ah)/2.

It now makes sense to write x = (a + h)/2, y = (ah)/2. Conversely, a = x + y and h = xy; as a and h are odd, this means that x and y have opposite parities. Since dividing once by 1 + i and once by 1 – i factors out 2 on the right-hand side, this turns the equation into (x + iy)(xiy) = e^2, or x^2 + y^2 = e^2.

Here is where things get nightmarishly complicated. Any common factor of x and y divides a, e, and h, so x, y, and e are coprime. Consider the primes of Z[i] that divide x + iy; none can be in Z because that would imply a common factor of x and y. But all primes in Z[i] that have no associate in Z have norm equivalent to 1 or 2 mod 4, where here 1 + i is impossible because 1 + i divides x + iy iff x and y have the same parity. So right off the bat, e is the product of primes equivalent to 1 mod 4.

Further, each distinct prime factor after the first of e gives exactly two choices of x and y. To see what I mean, first consider the case when e is prime. Then e = (r + is)(ris), so e^2 = (r + is)(ris)(r + is)(ris). There are two ways of grouping them into x and y, but the one that sends the first two to x + iy factors e^2 as e*e, i.e. x = e and y = 0. Hence there’s only one way, the one that sends (r + is)^2 to x + iy. If e is divisible by an additional prime and breaks as (r1 + is1)(r1 – is1)(r2 + is2)(r2 – is2) then once the decision to send (r1 + is1)^2 to x + iy has been made, it’s possible to send (r2 + is2)^2 or (r2 – is2)^2 to x + iy. This easily generalizes to further prime factors, but not to repeated factors.

This means that e can’t be prime because then the eight elements of the magic square around it will all be either a^2 or h^2, making it impossible for the other magic square conditions to hold. Now, using capital letters to denote squares, we get that the magic square is [A, B, 3E – A – B; 4E – 2A – B, E, 2A + B – 2E; A + B – E, 2E – B, 2E – A].

The above considerations show that E must be divisible only by primes equivalent to 1 mod 4 because it’s sandwiched between A and 2E – A. But by the same token, 2E – A is sandwiched between 4E – 2A – B and B, so it must similarly be divisible only by primes equivalent to 1 mod 4; similar considerations apply to A, which is sandwiched between 2A + B – 2E and 2E – B.

Carnival of Mathematics #3 is Up

March 9, 2007

The third Carnival is up on Michi’s Blog, complete with algebra, topology, and a lot of math teaching. There’s less computer science than before, but it simply makes me feel less stupid for not knowing Haskell. There’s even a HUHO-themed post that explains why payday loans are a bad idea.

Carnival of Mathematics #4 will be posted on EvolutionBlog on March 23rd. Send submissions to Jason Rosenhouse at rosenhjd at jmu dot edu or to me at or use the submission form (which surprisingly few people seem to do).

Galois Theory: Infinite Galois Theory

March 7, 2007

So far I’ve only discussed finite Galois theory, that is the Galois theory of finite extensions. But there’s also an infinite theory; the extensions it studies are separable and normal, just like finite Galois extensions, but the fundamental theorem needs to be modified to apply. I could transfer it almost in its entirety by using the theory of topological groups, but I think it’s simpler to just make exceptions and note the equivalent topological terminology where necessary.

Infinite Galois theory is wedded to the theory of projective limits of groups, which is based on directed systems.

A set I is a directed system if it comes equipped with a partial order, which I’ll denote <–, such that given any two elements i and j of I, there exists a k in I such that i <– k and j <– k. For example, a total order induces a directed system. For a less trivial example, let I = N (without zero), with a <– b iff a divides b; given any i and j, a suitable k is their least common multiple. Note that if I has a maximal element, then that maximal element is unique.

Now, an inverse system of groups is a family of groups G(i) indexed by the set I, and homomorphisms from G(j) to G(i) whenever i <– j, denoted f(ij). The homomorphisms must be consistent with one another, i.e. given i <– j <– k, every g in G(k) must satisfy f_{ik}(g) = f_{ij}(f_{jk}(g)). Typically the homomorphisms are taken to be surjective, but strictly speaking this is not necessary.

For example, if I = N with the normal ordering, letting G(n) = Z/(p^n)Z and f(mn) be the obvious remainder map from Z/(p^n)Z to Z/(p^m)Z yields an inverse system. Alternatively, if I has the divisibility ordering defined above, then G(n) = Z/nZ with the remainder maps is similarly an inverse system.

An inverse system has a limit, defined to be the group that projects on all groups in the system. More precisely, let P be the product of all groups G(i). Note that this is a product rather than a direct sum, the difference being that in a direct sum all but finitely many terms must be the identity element. The inverse or projective limit \underleftarrow{\lim}G_{i} is the subgroup of P consisting of all elements g in P such that for all i <– j in I, g_{i} = f_{ij}(g_{j}).

Finally living up to my blog’s name, I’ll also prove the abstract nonsense definition of projective limits. The projective limit comes with obvious projection maps f_{i}: g \mapsto g_{i}; if each f(ij) is surjective, then so is each f(i). Furthermore, taken together with those projection maps, it’s universal with the property that f_{ij}(f_{j}(g)) = f_{i}(g). In plain mathematical English, it means that if H is another group with projection maps h(i) such that f(ij)h(j) = h(i), then there exists a unique group homomorphism p from H to the projective limit such that h(i) = f(i)p.

This is likely the first time you encounter universal properties – if it is, consider yourself lucky to have never seen these monsters until now – so I’ll prove it in full instead of appeal to other universal properties. The idea is to explicitly define p in such a way that will yield h(i) = f(i)p.

Every a in H induces an element h_{i}(a) for each i in I. Furthermore, the condition that f(ij)h(j) = h(i) implies that f_{ij}(h_{j}(a)) = h_{i}(a). Therefore, the element of P whose ith entry is h_{i}(a) is in fact in the projective limit, yielding a function p from H to \underleftarrow{\lim}G_{i}. This is a group homomorphism, roughly because the ith element of p(ab) is h_{i}(ab) = h_{i}(a)h_{i}(b). Further, that the ith element of p(a) is h_{i}(a) implies immediately that h(i) = f(i)p.

Conversely, this homomorphism is unique because the condition that h(i) = f(i)p forces the ith element of p(a) to be h_{i}(a). This is important because it shows that if H and H’ satisfy the universal property of the projective limit of the same inverse system of groups, then there exists a unique p with h(i) = h‘(i)p and a unique p‘ with h(i)p = h‘(i) so that p and p‘ are isomorphisms that are inverses of each other, and the universal property uniquely defines the projective limit.

Note that it’s possible to define similar inverse limits of every algebraic structure – rings, modules, algebras, and so on – as well as of topological spaces. Also note that if I has a maximal element then the projective limit is just that element, making the case of interest the one with no maximal element. Finally, duplicating an index – i.e. splitting i into i1 and i2, with i(j) <– k iff i <– k and k <– i(j) iff k <– i, and with i1 and i2 either comparable or incomparable – doesn’t change the projective limit.

As an example, the projective limit of G(n) = Z/(p^n)Z is \mathbb{Z}_{p}, the additive group of the p-adic integers; it’s certainly true that every sequence of residue classes modulo p^n compatible under taking remainders induces a p-adic integer and vice versa.

More interestingly, the projective limit of G(n) = Z/nZ is the product of Z(p) over all primes p. This is because by the universal property, \underleftarrow{\lim}(G_{i} \times H_{i}) = \underleftarrow{\lim}G_{i} \times \underleftarrow{\lim}H_{i}, and Z/(p1^a1)(p2^a2)…(p(k)^a(k))Z = Z/(p1^a1)Z * Z/(p2^a2)Z * … * Z/(p(k)^a(k))Z.

Back to Galois theory now. If L/K is an infinite Galois extension (i.e. separable and normal, as in the finite case), then the Galois groups of the intermediate Galois extensions form an inverse system, roughly because if F/E/K is a tower of Galois extensions, then Gal(F/K) projects onto Gal(E/K). If you deal with sequences better than with abstract inverse systems, then let L be the splitting field of {f1, f2, …} and look at the splitting field of f1, then this of f1f2, then this of f1f2f3…

A K-automorphism of L is defined by its action on each a in L. But L/K is assumed to be algebraic, so each a in L is contained in a finite extension of K, which has a Galois closure, say F. As F/K is normal, every K-automorphism of any extension of F will map F to itself; this was proved together with the part of the fundamental theorem concerning normal subgroups of the Galois group and normal extensions. Therefore, a K-automorphism of L is completely determined by the automorphisms it induces on each finite Galois extension F/K.

Finally, to see that Gal(L/K) is the inverse limit of Gal(F/K) over all intermediate F, note that given a tower of Galois extensions L/F/E/K, the map from Gal(L/K) to Gal(E/K) is fairly obviously the same as the map from Gal(L/K) to Gal(F/K) composed with the one from Gal(F/K) to Gal(E/K).

Here is where the standard Galois correspondence breaks down. To see why, let K = F(p) for any p, and L be the union of F(p^(q^n)) over all n; then Gal(L/K) is Z(q). The group Z(q) is uncountable and so has uncountably many subgroups, but L/K only has one proper intermediate extension for each integer n.

It’s necessary to view the projective limit as not just a group but also the inverse system it’s based on and the homomorphisms to each of the groups in the system. In this view, a subgroup must come from the groups in the system. More precisely, a sub-projective limit of G(i) arises as the projective limit of a collection of subgroups H(i) such that f(ij)(H(j)) = H(i). That will correspond to a field generated by the fixed fields of all H(i)’s.

Conversely, let F be an intermediate extension of L/K. Then F intersects every finite Galois extension in some finite intermediate extension, yielding a corresponding sub-projective limit. These two correspondences are inverses of each other by the fundamental theorem of (finite) Galois theory. Further, the sub-projective limit defined by H(i) is a normal subgroup iff each H(i) is normal in G(i), so the second part of the fundamental theorem holds as well.

Obviously, L/F is finite iff the sub-projective limit is a finite group, and F/K is finite iff the sub-projective limit has finite index in the projective limit. In the latter case, F is contained in some finite Galois extension, so it arises from a single H(i); then for all j with i <– j, H(j) is the preimage of H(i) in G(j), and for all k <– j, H(k) = f(kj)(H(j)). Conversely, a sub-projective limit that arises from a single H(i) this way has the property every g in the inverse limit that maps into H(i) in G(i) maps into the preimage of H(i) in each G(j) with i <– j, so that it’s in the sub-projective limit; the sub-projective limit is then the inverse image of H(i) in the projective limit, so its index in the projective limit is [G(i):H(i)], which is finite.

Galois Theory: The Primitive Element Theorem

March 5, 2007

The fundamental theorem of Galois theory immediately implies several nifty results about Galois extensions, which with a little work extend to a greater class of extensions, typically separable extensions. Here I will show that every separable extension L/K admits a primitive element, that is an element a of L such that L = K(a).

First, the normal closure of a separable extension is itself separable. To see why, recall that the normal closure of L/K is the splitting field of all the minimal polynomials of the elements of L. L/K is separable, so each a in L is separable over K; but separability is a property of the minimal polynomial of the element, so all other roots of m(a) are separable, and the normal closure is generated by separable elements.

In addition, the normal closure of a finite extension is finite, since if {a1, …, a(n)} is a basis for L over K, then it’s enough to take the splitting field of the finite set of polynomials m(a(i)).

Second, since the intermediate fields of a Galois extension correspond to subgroups of a finite group, there are only finitely many intermediate fields. This property then extends to extensions that are merely separable, since if M/L/K is a tower of extensions then every intermediate field of L/K is intermediate between M and K.

And third, an element not contained in any proper intermediate field is primitive, because K(a) is an intermediate field strictly containing K for a not in K. If K is finite and L has p^n elements, then just take one of the roots of the polynomial x^(p^(n – 1)) – 1, which will generate all the others. Otherwise, note that every proper intermediate field is a proper subspace of L, and no finite vector space over an infinite field can be covered by finitely many proper subspaces.

Proving that last assertion is an induction argument on the dimension of the vector space K^n. If n = 1, then a proper subspace is the trivial subspace consisting only of the origin, so the theorem obviously holds. If n = 2, then a non-trivial proper subspace is a line generated by scalar multiples of some point (a, b). No line can contain two points of the form (1, a) because then it will necessarily contain (0, ab), hence (0, 1), (0, a), (1, 0), and finally the entire space. But K is infinite so there are infinitely many points of that form, proving the theorem for n = 2.

More generally, suppose the theorem is true for all dimensions less than n. It’s enough to find a proper subspace of dimension n – 1 in K^n that isn’t contained in any of the proper subspaces we’re covering the space by; by a dimension argument, this is equivalent to showing the subspace of dimension n – 1 isn’t one of those proper subspaces. So it suffices to show that there are infinitely many subspaces of dimension n – 1.

But if a and b are distinct, then the subspaces spanned by {(1, 0, 0, …, 0), (0, 1, 0, …, 0), (0, 0, 1, …, 0), …, (0, …, 1, 0, 0), (0, …, 0, 1, a)} and {(1, 0, 0, …, 0), (0, 1, 0, …, 0), (0, 0, 1, …, 0), …, (0, …, 1, 0, 0), (0, …, 0, 1, b)} are distinct, by a similar argument to the one that works for K^2. That’s enough to prove the theorem.

Galois Theory: Examples of Galois Groups

March 1, 2007

Armed with the fundamental theorem of Galois theory, the aspiring mathematician can now go and compute Galois groups of some easy extensions.

Example #1: K = Q, L = Q(SQRT(n)) for any squarefree integer n. The minimal polynomial of SQRT(n) is x^2 – n; L is obviously normal over K, so it’s Galois and the fundamental theorem applies. The degree of L over K is 2, so Gal(L/K) has order 2. There is only one group of size 2, Z/2Z. That group has only two subgroups, the improper subgroup consisting of the entire group and the trivial subgroup consisting of the identity element 0. The non-identity element of the Galois group is the automorphism sending a + bSQRT(n) to abSQRT(n).

Example #2: K = R, L = C. As with example #1, this is a quadratic extension with minimal polynomial x^2 + 1, so Gal(L/K) = Z/2Z.

Example #3: K = Q, L = Q(2^(1/3), w) where w is a primitive cube root of unity. This is simply the splitting field of x^3 – 2 over Q, since the three roots are 2^(1/3), 2^(1/3)w, and 2^(1/3)w^2. The degree of the extension must divide 6 = 3! since 3 is the degree of the polynomial it’s a splitting field of; it must also be divisible by 2 since [Q(w):Q] = 2 and by 3 since [Q(2^(1/3)):Q] = 3.

There are only two groups of size 6, Z/6Z and S(3), the group of permutations of three elements. If Gal(L/K) = Z/6Z, then every subgroup of Gal(L/K) is normal since it is abelian; therefore, every intermediate field is normal, contradicting the fact that Q(2^(1/3))/Q is not normal. So Gal(L/K) = S(3). The elements of S(3) can be written in disjoint cycle notation as {(1), (1 2), (1 3), (2 3), (1 2 3), (1 3 2)}; then the proper nontrivial subgroups are generated by (1 2), (1 3), (2 3), and (1 2 3).

Assigning 1 to 2^(1/3), 2 to 2^(1/3)w, and 3 to 2^(1/3)w^2, we get that (1 2) fixes Q(2^(1/3)w^2), (1 3) fixes Q(2^(1/3)w), (2 3) fixes Q(2^(1/3)), and (1 2 3) and (1 3 2) fix Q(w). The only one of the four that is normal is Q(w), corresponding to the fact that {(1), (1 2 3), (1 3 2)} is the only proper nontrivial normal subgroup S(3).

Example #4: K = Q, L = Q(2^(1/4), i) where i is the imaginary unit. This is the splitting field of x^4 – 2, which has degree divisible by 4 and dividing 8. To see that it has degree 8, it’s enough to prove that x^4 – 2 is irreducible over Q(i). But it has no root in Q(i) by taking norms, so it has no linear factor. If it has two quadratic factors, then they can be written as x^2 + ax + b and x^2 – ax – 2/b; equating x-coefficients gives a(2/b + b) = 0, i.e. a(2 + b^2) = 0, so a = 0 since x^2 + 2 has no root in Q(i). But then equating x^2-coefficients yields b = -(2/b), i.e. 2 + b^2 = 0, a contradiction.

There are five groups of size 8: Z/8Z, Z/4Z * Z/2Z, Z/2Z * Z/2Z * Z/2Z, Q(8), and D(4). D(n) is the group of symmetries of a regular polygon with n sizes, where D(3) = S(3); note that notations vary, and some people write what I call D(n) as D(2n). Q(8) = \{\pm1, \pm i, \pm j, \pm k\} where i^{2} = j^{2} = k^{2} = ijk = -1; although it is not abelian, all of its subgroups are normal. But L/K has the non-normal intermediate field Q(2^(1/4)), so Gal(L/K) = D(4).

Example #5: L and K are finite fields (classified here), with K = F(p) and L = F(p^n). L is the splitting field for x^(p^n) – x over K, and K is a perfect field, so L/K is Galois. Hence, Gal(L/K) has size equal to [L:K] = n. In fact, it’s isomorphic to Z/nZ, generated by the Frobenius automorphism, which sends each a to a^p.

To see that the Frobenius automorphism f generates Gal(L/K), consider its fixed field. The fixed field contains K, obviously. In addition, a^p = a iff a is a root of the polynomial x^px; the p elements of K are then the roots of the polynomial, so L^{f} = K. Now, consider the subgroup of Gal(L/K) consisting of powers of f. Its fixed field is clearly K, so it must be the improper subgroup Gal(L/K); hence, f indeed generates Gal(L/K).

The group Z/nZ has one subgroup per divisor of n. The subgroup whose size is the divisor d is generated by n/d, so its fixed field is the set of all roots of x^(p^(n/d)) – x in L. But that’s just F(p^(n/d)), confirming that L/K has one intermediate field for each divisor of [L:K]. Note that this also solves the case of the extension L/K where K is any finite field; if K = F(p^m), then Gal(L/K) is generated by the largest subgroup of L/F(p) fixing K, i.e. by the automorphism that sends a to a^(p^m).

Visualizing Math

February 28, 2007

Lynet writes about how she tried explaining the Poincaré Conjecture to a literary theorist and a historian. The bone of contention wasn’t really the conjecture, which the media dumbed down just enough so that non-mathematicians could understand a statement vaguely resembling what Perelman actually proved. Rather, it was how mathematicians could visualize a four-dimensional world.

“Can mathematicians actually picture four dimensional space?”

“Roger Penrose says he did it – briefly – once,” I said, grinning [1].

“No, but are there people out there who can actually…”

“Not that I know of.”

My historian friend was relieved. My literary theorist friend was confused. “If you can’t picture it,” he asked, “how could you have any intuition about it? I mean, you could just say whatever you wanted about it and no-one would be able to refute you.”

I’ll leave it up to you to make an appropriate snarky comment about literary theorists. I’d reply to Lynet’s friend by noting that at least the simpler intuitions, namely, those relating to the vector space structure, are easily generalizable.

I can view points in the plane as pairs of coordinates (x, y), and work out things like angles between lines, lengths of lines, tangent lines to curves, functions on the plane, and so on. Regarding the point (x, y) as a vector from (0, 0) to (x, y), I can add vectors pointwise by (x1, y1) + (x2, y2) = (x1 + x2, y1 + y2) and multiply them by scalars by k(x, y) = (kx, ky). I can look at linear transformations of the plane, or even affine transformations.

All of those are intuitively thought of using very concrete notions: length and angle are measurable concepts; tangent lines touch curves at only one point; vector addition consists of walking from (0, 0) to (x1, y1) and then along the same direction and for the same distance as from (0, 0) to (x2, y2); linear transformations are combinations of rotations, reflections, shears, stretches, and compressions, while affine transformations add translations.

But in higher mathematics, they’re considered abstractly, in order to generalize as much as possible. Length is defined using Pythagoras’s theorem, and angle is defined using inner products. Linear transformations are defined by the more easily generalized property that T(v1 + v2) = T(v1) + T(v2) and T(kv) = kT(v), which coincides with the more concrete definition in two dimensions.

Not coincidentally, for two or three years of university, students only ever see algebraic arguments, which don’t involve visualizing anything. Later some geometric aspects return, but even they are typically schematic; people who draw a line with a loop in it in algebraic geometry only look at very general aspects, like having a point with two tangent lines and not being decomposable into two lines.

The Carnival of Mathematics #2 is Up

February 23, 2007

Two weeks ago, I put up a carnival with posts from slightly more than 10 different people. Now Mark Chu-Carroll has posted his edition of the Carnival of Mathematics, with 27 different posters and a fitting theme.

Next edition will be posted on March 9th on Michi’s blog. Send your submissions to Michi at, or to me at, or use the submission form.


February 22, 2007

I’m not going to skewer the radical right’s attempt to relativize Wikipedia in full; better bloggers than me have already done so. But looking at Conservapedia’s mathematics entries is a good reminder that polemical hacks don’t usually produce any useful knowledge.

The combined knowledge of Wikipedia’s NPOV editors has produced a page about the prime number theorem that explains in length how the theorem relates to the Riemann zeta function and how the Riemann hypothesis implies a better estimate, and derives some explicit bounds. The first section, comprising only a small part of the article, says,

Let π(x) be the prime counting function that gives the number of primes less than or equal to x, for any real number x. For example, π(10) = 4 because there are four prime numbers (2, 3, 5 and 7) less than or equal to 10. The prime number theorem then states that the limit of the quotient of the two functions π(x) and x / ln(x) as x approaches infinity is 1. Using Landau notation this result can be written as

\pi(x)\sim\frac{x}{\ln x}.

This does not mean that the limit of the difference of the two functions as x approaches infinity is zero.

Based on the tables by Anton Felkel and Jurij Vega, the theorem was conjectured by Adrien-Marie Legendre in 1796 and proved independently by Hadamard and de la Vallée Poussin in 1896. Both proofs used methods from complex analysis, specifically the properties of the Riemann zeta function and where the function was non-zero.

Meanwhile, the editors of Conservapedia, constrained by the requirements of a radical ideology that displays every radical pathology in the book (for a really egregious example of symbolism, check out the Conservapedia policy on British vs. American spelling), have produced the following article:

The Prime Number Theorem is one of the most famous theorem in mathematics. It states that the number of primes not exceeding n is asymptotic to $latex \frac{n}{\log n}, where log(n) is the logarithm of (n) to the base e.

The number of primes not exceeding n is commonly written as π(n), and an asymptotic relationship between a(n) and b(n) is commonly designated as a(n)~b(n). (This does not mean that a(n)-b(n) is small as n increases. It means the ratio of a(n) to b(n) approaches one as n increases.)

The Prime Number Theorem thus states that π(n)~n / log(n) .

In other words, the limit (as n approaches infinity) of the ratio of pi(n) to n/log(n) is one. Put a third way, n/log(n) is a good approximation for π(n).

Section Break

Gauus [sic] conjectured the equivalent statement that π(x) was asymptotic to Li(x) defined as:

\mbox{Li}(x) = \int_2^x \frac{dt}{\ln t}.

In fact, for large x this turns out to be a better approximation than π(x).

Now, you might say I’m just picking and choosing, and other articles could be better. In fact, I’m picking and choosing here in Conservapedia’s favor; the prime number theorem is one of the few mathematical entries that even exist on Conservapedia. I could compare the articles on the Langlands program, or local rings, or global fields, or the Riemann hypothesis; on those subjects there is no Conservapedia article. Conservapedia doesn’t even have an article on mathematics.

You might also say that Conservapedia is a young project, so I shouldn’t be comparing it to a 6-year-old encyclopedia. Alright; the news on Conservapedia go back a month, so just compare the math there to the math posts I’ve put up in the month of February. On 2/1, I put up a basic concepts post that could make it to an encyclopedia. That took me maybe an hour net to write; how come the Conservapedia editors can’t come up with something better than a few stubs in a month?

Mark CC’s takedown is a good read; Conservapedia complains that Wikipedia doesn’t use “elementary proofs.” But Mark makes a slight mistake about elementary proofs:

There is currently an entry on “Elementary Proof” on Wikipedia, but to be fair, it was created just two weeks ago, most likely in response to this claim by conservapedia.

But that’s trivial. The important thing here is that the concept of “elementary proof” is actually a relatively trivial one. It’s sometimes used in number theory, when they’re trying to pare down the number of assumptions required to prove a theorem. An elementary proof is a proof which makes use of the minimum assumptions that describe the basic properties of real numbers. And even in the case of number theory, I don’t think I’ve ever heard anyone seriously argue that an elementary proof is more rigorous than another proof of the same theorem. Elementary proofs might be easier to understand – but that’s not a universal statement: many proofs that make use of things like complex numbers are easier to understand than the elementary equivalent. And I have yet to hear of anything provable about real numbers using number theory with complex numbers which can be proven false using number theory without the complex – proofs about real numbers that use complex are valid, rigorous, and correct.

The concept of elementary proof is fairly relative. In number theory, it means no complex analysis, and Mark’s assessment is entirely valid. But in other subjects, it can mean something slightly different. When I took advanced group theory three semesters ago, my professor, an arithmetic geometer/number theorist, told me that to him, “elementary” in a group theoretic context meant no cohomology. There are certainly deeper techniques than just complex analysis; suffice is to say that if someone discovers a proof of Fermat’s Last Theorem that utilizes complex analysis but only at the level of the prime number theorem, his proof will likely be considered more elementary than Wiles’, which uses modular forms, Iwasawa theory, and other state of the art gadgets.

Galois Theory: The Fundamental Theorem

February 21, 2007

After the preliminaries of field automorphisms and the conditions of normality and separability, I can prove the fundamental theorem of Galois theory. The basic idea is to relate intermediate fields of L/K and subgroups of Aut(L/K). Every intermediate field F maps to Aut(L/F), regarded as a subgroup of Aut(L/K), and every subgroup H of Aut(L/K) maps to L^H.

From the previous post, those maps are inverses of each other when L/K is finite and K = L^Aut(L/K); in that case, Aut(L/L^H) = H and L^Aut(L/F) = F. The ideal case in Galois theory is when K is indeed equal to L^Aut(L/K). In particular, the fundamental theorem states that this case holds precisely when L/K is separable and normal.

First, two examples. Let K = Q and L = Q(2^(1/3)). The only root of the minimal polynomial of 2^(1/3), x^3 – 2, that is contained in L is 2^(1/3). Automorphisms fairly obviously preserve minimal polynomials, so any automorphism must be the identity on 2^(1/3). But that generates L over K, so the only automorphism is the identity. This implies that L^Aut(L/K) = L.

For the second example, let K = \mathbb{Z}_{p}(t), L = \mathbb{Z}_{p}(\sqrt[p]{t}). The minimal polynomial of t^(1/p) over K is x^pt = (xt^(1/p))^p, so it’s again the only root lying in L. And again, as it generates L over K, every K-automorphism of L must be the identity, so that L^Aut(L/K) = L.

Before I prove the fundamental theorem, there are two small preliminaries. One, from [L:L^Aut(L/K)] = |Aut(L/K)| it follows that L^Aut(L/K) = K iff |Aut(L/K)| = [L:K]. And two, a not necessarily finite extension is normal and separable iff it is the splitting field of a family of separable polynomials. The latter result follows from the fact that if L/K is normal then it is a splitting field of a family of polynomials, and if it is separable then they all have to be separable; conversely, a splitting field of a family of separable polynomials is normal, and generated by separable elements.

For one direction, suppose that L^Aut(L/K) = K, and let a be any element of L. Let a1 = a, a2, a3, …, a(m) be the distinct images of a under the various elements of Aut(L/K); note that I’m assuming that the field extension is finite. Then under each element g of Aut(L/K), the set of a(i)’s gets mapped to itself, because g(a(i)) = gh(a) for some h in Aut(L/K), and g(a(i)) = g(a(j)) implies that a(i) = a(j) since g is an automorphism.

Consider the polynomial f(x) = (xa1)(xa2)…(xa(m)). The polynomial is invariant under the action of Aut(L/K), since every g in it merely permutes the a(i)’s. Therefore the coefficients of the polynomial are in L^Aut(L/K) = K, and the polynomial is in K[x]. The minimal polynomial of a then divides f, so that it inherits the properties of splitting over L and having distinct roots. As this applies to any a in L, it makes L/K separable and normal.

Conversely, it’s enough to show that Aut(L/K) acts transitively on the roots of each minimal polynomial m_{a}(x), a \in L. In other words, it’s enough to show that if a and b have the same minimal polynomial over K, then there exists a K-automorphism g of L such that g(a) = b. This is because if a is any element of L outside K, then deg(m(a)) > 1. By normality, all roots of m(a) are in L, and by separability they’re distinct, so there exists a b != a such that g(a) = b for some g, so that a is not in L^Aut(L/K).

To see that the action of Aut(L/K) is indeed transitive, let a and b have the same minimal polynomial over K. Then K(a) = K[x]/(m(a)) = K[x]/(m(b)) = K(b), so there exists a K-isomorphism from K(a) to K(b) mapping a to b. In L, m(a) = m(b) splits, so that the isomorphism from K(a) to K(b) can be extended to an automorphism of L fixing K and sending a to b.

If L/K is separable and normal, it’s called a Galois extension, and Aut(L/K) is called the Galois group of L/K and denoted Gal(L/K). Both separability and normality are inherited from L/K to L/F, where F is an intermediate field, so we get a bijection from each intermediate field F to each subgroup of Gal(L/K), denoted Gal(L/F).

If L/K is Galois, then F/K is separable; however, as the case of L = Q(2^(1/3), w), K = Q, F = Q(2^(1/3)) readily shows, F/K need not be normal. This induces the second fundamental theorem of Galois theory, which states that F/K is normal iff Gal(L/F) is a normal subgroup of Gal(L/K), and that in that case, Gal(F/K) is the quotient group Gal(L/K)/Gal(L/F).

For one direction, suppose that F/K is normal. Every automorphism of L sends a to an element with minimal polynomial m(a), so the normality of F implies that if a is in F, then so is g(a) for every g in Gal(L/K). In that case, g can be regarded as a homomorphism from F to itself, i.e. an automorphism of F.

Hence there is a group homomorphism from Gal(L/K) to Gal(F/K) defined by restriction to F. The kernel of the homomorphism is \{g \in textrm{Gal}(L/K): \forall a \in F, g(a) = a\} = \textrm{Gal}(L/F), making Gal(L/F) a normal subgroup since kernels are normal. The homomorphism is surjective by a counting argument, so Gal(F/K) = Gal(L/K)/Gal(L/F).

For the other, if E and F are two intermediate extensions such that Gal(L/E) = H and Gal(L/F) = J, then there exists g in Gal(L/K) with g(E) = F iff there exists g in Gal(L/K) with gHg^(-1) = J. To see why, suppose that g(E) = F; then for every h in H and a in F, ghg^{-1}(a) = g(h(g^{-1}(a))) = g(g^{-1}(a)) = a since g^{-1}(a) \in E, and by a counting argument J = gHg^(-1). Conversely, if gHg^(-1) = J, then for every a in E and h in H, ghg^{-1}(g(a)) = gh(a) = g(a) so J fixes g(E) and g(E) = F.

Thence, if Gal(L/F) is normal, then for every g in Gal(L/K), gGal(L/F)g^(-1) = Gal(L/F). Thus g(F) = F, or else Gal(L/F) is not normal because different intermediate fields have different Galois groups. But if F is not normal, there exists some a in F that shares a minimal polynomial with some b in L but not in F. By the same argument I used to prove separable normal extensions satisfy L^Aut(L/K) = K, there exists some g in Gal(L/K) such that g(a) = b, which is a contradiction.

To summarize, the fundamental theorem of Galois theory states that the following are equivalent for a finite extension L/K:

1. L/K is separable and normal, i.e. Galois;

2. L/K is the splitting field of a separable polynomial;

3. L^Aut(L/K) = K;

4. |Aut(L/K)| = [L:K].

For an intermediate field F of a Galois extension L/K, the following are equivalent:

1. F/K is normal;

2. F/K is Galois;

3. Gal(L/F) is normal in Gal(L/K), in which case the quotient group is Gal(F/K);

4. For every g in Gal(L/K), g(F) = F.

Among other things, if L/K is Galois, then every polynomial over L that is invariant under the action of Gal(L/K) has coefficients in K. In algebraic number theory, that’s the easiest way of proving that the norm and trace of an element do indeed belong to the ground field (i.e. Q, classically).

Next time, I’ll be more concrete and compute a few Galois groups, as well as point out a few general computationsl tricks.

Galois Theory: Field Automorphisms

February 20, 2007

An automorphism of any mathematical object is a bijective homomorphism from the object to itself. It’s easy to see that the automorphisms of any object form a group: the composition of homomorphisms is itself a homomorphism, function composition is associative, there exists an identity function, and the inverse of a bijective homomorphism is a homomorphism. Recall that a homomorphism of fields is one that satisfies f(a + b) = f(a) + f(b), f(ab) = f(a)f(b).

The automorphism group of a field L is denoted Aut(L). If L/K is a field extension, then the automorphism group of the extension, Aut(L/K), is the subgroup of Aut(L) consisting of all automorphisms of L that fix every element of K, i.e. \textrm{Aut}(L/K) = \{f \in \textrm{Aut}(L): \forall a \in K, f(a) = a\}.

First, elements of Aut(L) are linearly independent over L. Denoting elements by a, b, c, etc., and automorphisms by f, g, etc., we get that a_{1}f_{1} + a_{2}f_{2} + ... + a_{n}f_{n} = 0 \Leftrightarrow a_{1} = ... = a_{n} = 0 with a_{i} \in L and f_{i} = f_{j} \Leftrightarrow i = j.

To see why, suppose otherwise and let n be the minimal integer for which the equation holds for some nonzero a(i)’s. Clearly, this implies that all a(i)’s are nonzero. As the automorphisms are distinct, let f_{1}(b) \neq f_{n}(b). For every a in L, a_{1}f_{1}(ab) + a_{2}f_{2}(ab) + ... + a_{n}f_{n}(ab) = 0 and f_{n}(b)(a_{1}f_{1}(a) + a_{2}f_{2}(a) + ... + a_{n}f_{n}(a)) = 0; subtracting the two equations, we get (a_{1}(f_{1}(b) - f_{n}(b))f_{1}(a) + ... + a_{n-1}(f_{n-1}(b) - f_{n}(b))f_{n-1}(a) = 0, contradicting the minimality of n. Note that f_{1}(b) - f_{n}(b) is a nonzero constant in L rather than an automorphism.

Every subset of Aut(L) has a fixed field, consisting of all elements of L fixed by every element in the subset. That the fixed field is indeed a field follows trivially from the definition of a homomorphism. The fixed field of the subset S of Aut(L) is denoted L^{S}, but I’ll sometimes ASCIIfy it to L^S. The pair of fields – L and L^S – form an extension, so we can try relating the extension to the subset.

First, if S is finite, then [L:L^S] >= |S|. To see why, let [L:L^S] = n and suppose on the contrary that |S| > n. Let b(i) from i = 1 to n be a basis for L over L^S, and let f(j) from j = 1 to n + 1 be distinct elements of S. The system of equations  f_{1}(b_{i})x_{1} + f_{2}(b_{i})x_{2} + ... + f_{n+1}(b_{i})x_{n+1} = 0 from i = 1 to n in the variables x(j) in L has n + 1 unknowns and n equations, so it has infinitely many solutions. In particular, it has a solution other than x1 = x2 = … = x(n + 1) = 0.

For that solution, x_{1}f_{1}(k_{1}b_{1} + ... + k_{n}b_{n}) + ... + x_{n+1}f_{n+1}(k_{1}b_{1} + ... + k_{n}b_{n}) = 0 for all k(i) in L^S; since every element of L can be written as k_{1}b_{1} + ... + k_{n}b_{n}, this implies that every x(j) = 0 by the linear independence of automorphism. That’s clearly a contradiction, so [L:L^S] >= |S|.

Second, if S is a subgroup rather than just a subset, then in fact [L:L^S] = |S|. To see why, suppose on the contrary that |S| < [L:L^S]. Let |S| = n; then there exist n + 1 elements of L that are linearly independent over L^S, say b(i). The system of equations f_{j}(b_{1})x_{1} + ... + f_{j}(b_{n+1})x_{n+1} = 0 has n equations and n + 1 unknowns, so it has a nonzero solution. Choose a solution with a minimal number of nonzero values of x(i), say m, and relabel the b(i)’s to make these the first m variables.

Now, S is a group, so it contains the identity, say f1. The linear independence of the b(i)’s and the resulting equality b_{1}x_{1} + ... + b_{m}x_{m} = 0 imply that not all x(i)’s are in L^S. So divide throughout to get x(m) = 1, which is in L^S, and suppose x1 is not in L^S and f2(x1) != x1 = f1(x1). For every j, we have f_{2}(f_{j}(b_{1})x_{1} + ... + f_{j}(b_{m})x_{m}) = 0; since S is a group, we can relabel the index gotten from composing f2 and f(j) as j, to get f_{j}(b_{1})f_{2}(x_{1}) + ... + f_{j}(b_{m})f_{2}(x_{m}) = 0. But x(m) is in L^S while x1 isn’t, so by subtraction we get a solution with at least 1 but at most m – 1 nonzero values of x(i), contradicting the minimality of m. Thus [L:L^S] = |S|.

The above results help us relate subgroups of a finite subgroup G of Aut(L) to intermediate fields of L/L^G. To see why, if H is a subgroup of G, then [L:L^H] = |H|, and by the tower law, [L^H:L^G] = [G:H]. Further, Aut(L/L^H) = H, because every element of H clearly fixes L^H, and if Aut(L/L^H) has any additional elements, then [L:L^S] = |S| implies that it has a smaller fixed field than L^H, a contradiction.

Conversely, if E is an intermediate field of L/L^G, then L^Aut(L/E) = E, because by construction L^Aut(L/E) contains E, and if it’s any bigger then we’ll get Aut(L/E) has fewer than [L:E] elements, another contradiction.

WordPress Installs LaTeX (and I Classify Finite Fields)

February 17, 2007

Hat-tip to Rod: WordPress has a new \LaTeX feature for mathematicians like me. I like Whig’s take on it the most; mine is going to be more prosaic.

Let K = \mathbb{F}_{p} = \mathbb{Z}/p\mathbb{Z}, and let L be an extension of K of degree n < \infty. Then L is unique among all degree-n extensions. To see why, note that |L| = p^{n} \Rightarrow \forall a \in L^{*}: a^{p^{n}-1} = 1, so L is the splitting field for x^{p^{n}-1} - 1 over K. As splitting fields are unique, the result follows.

Conversely, for every n, there exists a field L with [L:K] = n. To prove that, it’s enough to prove that the splitting field of x^{p^{n}-1} - 1 has exactly p^{n} elements. The polynomial has a nonzero constant, so it doesn’t have zero as a root.

If the field has fewer than p^{n}-1 nonzero elements, then it must have repeated roots. But its derivative is (p^{n}-1)x^{p^{n}-2} = -x^{p^{n}-2}, whose sole irreducible factor, x, doesn’t divide x^{p^{n}-1} - 1, contradicting the result that the polynomial has a repeated root.

To see that it doesn’t have more than p^{n} elements, note that the roots of the polynomial together with 0 form a field. This is because if a^{p^{n}} = a and b^{p^{n}} = b, then (ab)^{p^{n}} = ab clearly, and (a + b)^{p^{n}} = a + b by repeatedly applying the Frobenius automorphism.

That completes the classification of finite fields, which states that there is a unique finite field for each prime power order, and no finite field with an order that isn’t a prime power.

On another note, be nice to me. If you ask politely, I might go back to my earlier math posts and edit them to incorporate \LaTeX.

Fibonacci-Type Sequences, Part 2

February 15, 2007

Recall from part 1 that a sequence is said to be of Fibonacci type if it’s given by the recursion relation a(n + t) = k(t)a(n + t – 1) + k(t – 1)a(n + t – 2) + … + k1a(n), with set initial conditions on a1, a2, …, and a(t). Recall also that every such sequence is given by a linear combination of sequences of the form a(n) = (n^k)(r^n), where r is a root of the associated polynomial x^tk(t)x^(n + t – 1) – … – k1 = 0 and k is a number between 0 and one less than the multiplicity of r.

A Carnival of Mathematics submission that notes how a convoluted sequence that generates all integers not divisible by 2, 3, or 5 has just enough material that Fibonnaci-type sequences are relevant to to prompt me to write a follow up, showing a few additional properties of those sequences.

1. Trivially, a sequence is Fibonacci-type iff it can be written as a sum c1(n^k1)(r1^n) + … + c(m)(n^k(m))(r(m)^n). The only if direction is clear from the calculation in the previous post, while the if direction follows by creating a polynomial for which every r(i) is a root of multiplicity at least k(i) + 1, such as the least common multiple of (xr(i))^(k(i) + 1) over all i.

2. Every periodic sequence is Fibonacci-type. This is because a periodic sequence is defined by the relation a(n + t) = a(n) for some t, which provides a suitable recursion relation.

3. The sum and difference of two Fibonacci-type sequences are themselves Fibonacci-type. This follows directly from #1, because there is no restriction on the possible values of r(i) and k(i).

4. The product of two Fibonacci-type sequences is Fibonacci-type. To see this, multiply elements of the form (n^k(i))(r(i)^n) pointwise. We have (n^k1)(r1^n)(n^k2)(r2^n) = (n^(k1 + k2))((r1r2)^n).

5. If the associated polynomial of a(n) is f(x) and this of b(n) is g(x), then this of a(n) + b(n) divides the lowest common multiple of f and g. This follows from the fact that if r is a root of the new associated polynomial of multiplicity k + 1, then (n^k)(r^n) appears somewhere in a(n) + b(n), so it must appear in a(n) or b(n).

6. A Fibonacci-type sequence can be continued to zero and negative values of n by reversing the recursion relation to a(n) = (a(n + t) – k(t)a(n + t – 1) – … – k2a(n + 1))/k1.

7. Specifying any t distinct points of a Fibonacci-type sequence and its recursion relation is enough to determine it. Usually the points specified are a1, a2, …, and a(t), but any t points will suffice, since they will provide t linear equations using the basis elements (n^k)(r^n) that allow recovering all values of c(i). These points can of course correspond to negative values of n.

8. Given any Fibonacci-type sequence a(n), the shifted sequence b(n) = a(n + 1) is Fibonacci-type. This is obvious from the recursion relations, which are invariant under shifting. Changing n to n + 1 changes (n^k)(r^n) to ((n + 1)^k)(r^(n + 1)) = r((n + 1)^k)(r^n) = r(n^k + kn^(k – 1) + (k(k – 1)/2)n^(k – 2) + … + kn + 1)(r^n), which corresponds to the same polynomial as (n^k)(r^n), (xr)^(k + 1). Using #6, this also applies to the shifted sequence b(n) = a(n – 1), except of course with a slightly modified change to the basis elements.

9. It makes sense to define the derivative of a(n), a‘(n), as a(n) – a(n – 1); it shares some characteristics with the derivative of a function. If a(n) is Fibonacci-type, then so is a‘(n) from #3 and #8. By observing the action of differentiation on each (n^k)(r^n), it follows that a‘(n) obeys the same recursion relations as a(n).

10. The general root r of the associated polynomial of a(n) appears in the associated polynomial of a‘(n) with the same multiplicity. That the multiplicity in a‘(n) is no higher than in a(n) follows from #9. For the other direction, if the multiplicity of r in a(n) is k + 1, and the coefficient of (n^k)(r^n) in a(n) is the nonzero number c1, then the coefficient of (n^k)(r^n) in a‘(n) is c1(r – 1), from #6. But if r = 1, then the multiplicity goes down by 1, since r – 1 = 0. Indeed, observe that n^k – (n – 1)^k = kn^(k – 1) – (k(k – 1)/2)n^(k – 2) + … + (-1)^(k + 1) is a polynomial of degree k – 1.

11. It makes sense to define the inverse differentiation operator, integration, by int(a)(n) = a1 + a2 + a3 + … + a(n). It’s not difficult to see that int(a)'(n) = a(n). If a(n) is Fibonacci-type then so is int(a)(n), with the same associated polynomial except that if 1 is a root then its multiplicity goes up by 1. Proving it is complicated, so I’ll do it in stages:

11a. If a(n) = n^k, then int(a)(n) is a polynomial of degree k + 1. For that, let p(n) = c(k + 1)n^(k + 1) + c(k)n^k + … + c0. We want the n^k-coefficient of p(n – 1) to be 1 less than this of p(n), so that c(k + 1)(n^(k + 1) – (n – 1)^(k + 1)) = n^k + q(n) where deg(q) < k; for that, we expand (n – 1)^(k + 1) binomially to get c(k + 1) = 1/(k + 1). By a similar process, we equate the c(k) coefficients to get c(k) = 1/2, and so on until we get c0 = 0. That polynomial was constructed to have the recursion relation p(n + 1) = p(n) + n^k and the initial condition p(0) = 0, so it’s indeed int(a)(n).

11b. If a(n) = r^n and r != 1, then int(a)(n) is a multiple of r^n plus a constant. To see why, note that (r – 1)(r^n + r^(n – 1) + … + 1) telescopes to r^(n + 1) – 1, so that int(r^n) = (r^(n + 1) – 1)/(r – 1) = (r/(r – 1))r^n – 1/(r – 1). That turns a(n) into a Fibonacci-type sequence whose associated polynomial is (xr)(x – 1). To get rid of the extra root, we need to sum from negative infinity when |r| > 1, i.e. look at (r – 1)(r^n + r^(n – 1) + …) = r^(n + 1). When |r| < 1, we need to instead look at the negative of the sum from n to positive infinity, which will be (r – 1)(-r^nr^(n + 1) – …) = –r^(n + 1). When |r| = 1, this sum does not converge and is therefore meaningless.

11c. If a(n) = (n^k)(r^n) and r != 1, then int(a)(n) is Fibonacci-type. For that, assume that this is true for all exponents smaller than k, and in particular for all polynomials of degree at most k – 1. Then write rint(a)(n) = r^2 + (2^k)r^3 + … + (n^k)(r^(n + 1)) = r + (2^k)r^2 + (3^k)r^3 + … + ((n + 1)^k)r^(n + 1) – r – (2^k – 1)r^2 – (3^k – 2^k)r^3 – … – ((n + 1)^kn^k)r^(n + 1). By the definition of int(a)(n), we get (r – 1)int(a)(n) = ((n + 1)^k)r^(n + 1) – r – (2^k – 1)r^2 – (3^k – 2^k)r^3 – … – ((n + 1)^kn^k)r^(n + 1). The first term is Fibonacci-type by #9, and the sum of the rest is by assumption.

11d. The associated polynomial of int((n^k)(r^n)) is (x – 1)(xr)^k. The first term in #11c has the associated polynomial (xr)^(k + 1) from #9; the remainder has (x – 1)(xr)^k by assumption. From #5, int((n^k)(r^n)) is Fibonacci-type for all k with associated polynomial dividing (x – 1)(xr)^(k + 1). That division can’t be proper, because only the first term
has the term (n^k)(r^n) while only the remainder has a constant term.

12. The trigonometric functions sin and cos are Fibonacci-type. Note that because pi is irrational, sin(n) is not periodic. However, the identity e^ix = cos(x) + isin(x), derived from considering the Maclaurin expansions e^x = 1 + x + x^2/2 + x^3/3! + …, sin(x) = xx^3/3! + x^5/5! – …, cos(x) = 1 – x^2/2 + x^4/4! – …, allows us to express sin and cos in terms of exponentials. We get cos(x) = (e^ix + e^(-ix))/2, sin(x) = (e^ixe^(-ix))/2i. But ((e^i)^x) and ((e^(-i))^x) are Fibonacci-type; hence, so are sin and cos, by #3.

13. Separating a Fibonacci-type sequence into parts results in Fibonacci-type sequences. That is, if a(n) is Fibonacci-type, and b(n) = a(cn + d), then b(n) is Fibonacci-type. To see why, note that this turns (n^k)(r^n) into ((cn + d)^k)(r^(cn + d)) = (c^k)(r^d)((n + d/c)^k)((r^c)^n) where ((n + d/c)^k) is a polynomial in n of degree k.

14. Weaving several Fibonacci-type sequences into one results in a Fibonacci-type sequence. I’ll only prove it when “several” means “two”; the generalization is straightforward enough to be left as an exercise. When a(n) and b(n) are Fibonacci-type and c(n) = a(n/2) for even n and b((n + 1)/2) for odd n, we can use the fact that (-1)^n + 1^n = 2 when n is even and 0 when n is odd to alternatively activate or deactivate the sequence. More precisely, given (n^k)(r^n), note that (((n/2)^k)(SQRT(r)^n) + ((n/2)^k)((-SQRT(r))^n))/2 = ((n/2)^k)(r^(n/2)) when n is even and 0 when n is odd.

15. If a(n) is periodic of period t, then all of its basis elements are of the form r^n with r^t = 1 for all r. This follows from the definition, since the associated polynomial of the sequence is x^t – 1. Now, if r^t = 1, then e^(2pi*i) = 1 implies r^n = e^(2pi*in/t) = cos(2pi*n/t) + isin(2pi*n/t).

The carnival submission takes a sequence with periodic differences and constructs an explicit formula for it with trigonometric, constant, and linear terms. That is immediately a Fibonacci-type sequence from #1 and #11; #15 also shows that the associated polynomial has 1 as a double root, corresponding to the constant and linear terms, and every other 8th root of unity as a simple root, corresponding to each trigonometric term, where the argument is indeed a multiple of pi*n/4.

Galois Theory: Separability

February 13, 2007

In classical Galois theory, separability isn’t a real condition, since it holds for every extension of Q. But in the most general setting, it’s useful to treat it, especially since it does apply to function fields.

Let K be a field, and f be an irreducible polynomial over K. There’s a splitting field for f, where it by definition splits into linear factors. If the linear factors are all distinct, the polynomial is called separable; otherwise, it is called inseparable. For example, x^2 + 1 splits into (x + i)(xi), so it’s separable since i != –i. A reducible polynomial f is called separable iff all of its irreducible factors are separable.

If char(K) = 0, then every polynomial over K is separable. This is because a repeated root of any polynomial divides its derivative. Further, if f(x) = a(n)x^n + … + a0 is irreducible, then f‘(x) = na(n)x^(n-1) + … + a1 is a nonzero polynomial of smaller degree, so it is not divisible by f. As f is irreducible, f‘ must then be coprime to f, which means it can’t have any root in common with f.

The above fails to hold in positive characteristic because the factor n might be 0; for instance, if f = x^p – 1, where p = char(K), then f‘ = 0. However, there’s still a large class of fields of positive characteristic over which every polynomial is separable, including all finite ones.

To see why, first note that if char(K) = p, then the function on K defined by f(a) = a^p is a ring homomorphism: it’s clearly multiplicative, and it’s also additive because the binomial expansion of (a + b)^p is such that all factors except a^p and b^p have coefficients divisible by p, making them equal to 0 in K. That function is called the Frobenius endomorphism on K; it’s injective because every nonzero homomorphism whose domain is a field is injective.

An inseparable irreducible polynomial must be a polynomial in x^p, because if x^k has a nonzero coefficient and p is not divisible by k, then f‘ includes the nonzero term ka(k)x^(k-1). But if it’s of the form (b(n)^p)x^(np) + … + (b(p)^p)x^p + b0^p then it’s equal to (a(n)x^n + … + a0)^p, contradicting its irreducibility.

So if every a in K can be written as b^p, then every irreducible polynomial over K is separable. This is clearly equivalent to the condition that the Frobenius endomorphism is surjective, i.e. an automorphism. The converse is also true, because if a in K can’t be written as b^p, then x^pa is an inseparable irreducible polynomial.

If K is finite, then the Frobenius endomorphism has to be surjective by a counting argument. If K is merely algebraic over a finite field then it’s still surjective, because then K/F is algebraic, so for every a in K, F(a)/F is finite. The finiteness of F then turns F(a) into a finite field, on which the Frobenius endomorphism is surjective; that yields a suitable b with b^p = a. Therefore, the simplest field over which there exists an inseparable polynomial is F(p)(t), over which x^pt is inseparable.

Now, let L/K be a field extension. An element a of L is said to be separable over K if its minimal polynomial is. That clearly holds iff in the normal closure of L, m(a) has precisely deg(m(a)) distinct roots. The extension is then said to be separable if every element in it is.

Clearly, if M/L/K is a tower of extensions and M/K is separable, then so is L/K. In fact so is M/L, since the minimal polynomial of each a in M over L is a factor of the minimal polynomial of a over K; therefore, if the minimal polynomial over K has distinct roots, then so does the minimal polynomial over L.

If L/K is a field extension, then a in L is inseparable iff m(a) only has nonzero coefficients in front of powers of x that are divisible by p; in that case, a^p has minimal polynomial m(a^p)(x) = a(n)x^n + … + a0 where a(i) is the coefficient of x^ip in m(a), so that [K(a):K(a^p)] = p. Conversely, if [K(a):K(a^p)] = p, then a^p has no pth root in K(a^p), so K(a)/K(a^p) is inseparable. But that implies K(a)/K is inseparable by taking the tower K(a)/K(a^p)/K. Therefore, a in L is separable iff K(a) = K(a^p).

The set of all separable elements in L forms an intermediate field. To see why, note that if K(a) = K(a^p) and K(b) = K(b^p), then K(a, b) = K(a^p, b^p), whence it’s readily seen that K(a + b) = K(a^p + b^p) and K(ab) = K((ab)^p). In addition, K(1/a) = K(a) = K(-a), and every element of K is separable over K with minimal polynomial xa.

Therefore, if L is normal over K, it is separable iff the polynomials it is a splitting field for are all separable over K.

If L/K is any field extension, then let M be the intermediate field consisting of all separable elements over K. If M = K, the extension L/K is called purely inseparable.

An equivalent condition for L/K to be purely inseparable is for every element of L outside K to have a minimal polynomial of the form x^(p^k) – b.

To see why, note that if every element of L has such a minimal polynomial, then L/K is trivially purely inseparable. Conversely, if a has a different minimal polynomial, then let p^k be the p-power portion of the greatest common divisor of all exponents of x with nonzero coefficients in m(a). Then m(a^(p^k)) has an exponent not divisible by p, so that a^(p^k) is separable. Therefore a^(p^k) is in K, so a satisfies x^(p^k) – b = 0 for some b in K.

The characterization of pure inseparability in terms of only m(a) makes it sensible to call a purely inseparable over K iff its minimal polynomial is of the form x^(p^k) – b, or, equivalently, if a^(p^k) is in K for some integer k. It’s fairly clear that this holds iff the only conjugate of a in the normal closure is a itself, since the minimal polynomial splits as (xa)^(p^k). Elements of K are purely inseparable, by letting k = 0.

The set of purely inseparable elements of L/K forms an intermediate field. To see why, note that if a^(p^k) and b^(p^l) is in K, say with k <= l, then (a + b)^(p^l) = a^(p^l) + b^(p^l) is in K by repeating the Frobenius endomorphism l times, and similarly for ab. Therefore, the splitting field of a family of purely inseparable polynomials is itself purely inseparable.

Carnival of Mathematics: Inaugural Edition

February 9, 2007

I’ve gotten plenty of submissions that span the entire gamut of math-blogging: education, pure math, applied math, debunking bad math – it’s all there. Only the gender distribution could be made slightly more equal (and that’s an understatement). I’m linking to the posters in roughly increasing order of mathematical difficulty, but don’t let my opinions deter you from reading the posts closer to the bottom.

First, Denise of Let’s Play Math has a long collection of quotes exhorting people to study mathematics. George Washington, Martin Gardner, Henri Poincaré, Abraham Lincoln, Galileo Galilei, Bertrand Russell – they’re all there.

Barry Leiba of Staring at Empty Pages uses modular arithmetic to solve a digit puzzle. The puzzle is based on writing any positive integer – say 8293 – and jumbling its digits and subtracting: 8293 – 3982 = 4311. Then delete a nonzero digit. Based on the resulting number, it’s possible to tell which digit you deleted.

P. Sternberg of Discreet Math links to various companies producing non-orientable material, like Möbius shoes and Klein bottle wool hats.

Heath Raftery writes about a probability paradox. Two dice are rolled, of which one is a 4. Given that, what is the probability that the sum of the two dice is a 7? Without further qualification – for example, “At least one die is a 4” – it’s obviously 1/6. And yet Heath’s professor maintained that the answer was 2/11.

Charles Daney of Science and Reason gives a brief history of algebra from the Middle Ages till the Renaissance. Whereas most accounts begin with a relatively modern mathematician, like Fermat or Euler, Charles begins far earlier, with Al-Khwarizmi, who invented the word “algebra” and after whom the word “algorithm” is named.

In the bad math department, Tyler DiPietro has a long rant fisking the arguments of Sal Cordova, an intelligent design proponent who makes dishonest arguments from information theory and computability theory in order to try refuting evolution.

JD2718 has a puzzle in plane geometry. A parallelogram is defined as a quadrilateral where two opposite pairs of sides are parallel. But there are other equivalent definitions, and other conditions that look like they define parallelograms but in fact don’t.

Science Pundit Javier Pazos tells an anecdote about John von Neumann and the bee problem. The bee problem involves two trains initially spaced 40 km apart and approaching each other at 20 km/h, and a bee that flies at 40 km/h, starting from one train, flying toward the other, turning around when it meets it, flying toward the first train, etc. How far does the bee fly until the two trains meet?

Suresh Venkatasubramanian of GeomBlog has a long primer for the game theory of author ordering. When publishing a scholarly paper, each contributor wants to be as close as possible to first author in order to get more credit; Suresh explains which ordering strategies are stable and which appear the most utilitarian.

He also notes that the theory of algorithms is undergoing fundamental changes as computing power reaches saturation. In the past, it was sufficient for researchers to say that the run time of a process is proportional to some simple function, like x^2; but now, the constant that accompanies that function is becoming increasingly important.

John Kemeny introduces spigot algorithms, which can be used to systematically generate digits of some irrational numbers, including e and pi.

Jeffrey Shalit of Recursivity writes about the prime game. The initial observation is that the decimal representation of every prime p can be reduced to one of 26 primes by scrubbing off digits. That leads to a more general investigation of the concept of subsequences.

Eric Kidd programs infinity into Haskell, in order to solve such expressions as dividing infinity by 2 or adding 1 to infinity. The trick is to teach Haskell about the finite natural numbers first, and then write into it something strong enough to code for infinite numbers.

David Eppstein of 0xDE writes about subgreedy algorithms for Egyptian fractions. An Egyptian fraction is a representation of a fraction p/q as 1/n1 + 1/n2 + … + 1/n(k). A greedy algorithm is one that tries getting the best-looking result in one step without regard for efficiency in later steps; a subgreedy algorithm is one that is almost greedy, but tends to get far better results in the long run.

He also writes about a local version of the central limit theorem, which differs from the regular theorem in that it says repeated distributions look approximately normal in a small neighborhood of 0 rather than globally; and about coloring tilings in such a way that on the one hand the colors display a regular pattern, just like the tiling, but on the other no two adjacent tiles have the same color.

And finally, Mikael Johansson has multiple great posts about algebraic topology and related subjects; choosing three was a fairly hard decision. Of the three that made it into this edition, the easiest is the Borsuk-Ulam theorem, which roughly states that any given time, there’s a pair of antipodal points on Earth with the same temperature.

Also, A for the Layman explains algebraic topology, beginning with simple definitions and ending with an overview of homological algebra. For the braver souls, there’s his post about carry bits and cohomology; group cohomology is an indispensable tool in algebra, and Mikael applies it to addition and multiplication modulo 10, or in other words carry digits.

The next edition will be posted in two weeks, on 2/23, on Good Math, Bad Math. Send submissions to Mark CC, or to me so that I’ll forward them to him.

On a final note, I should remind everyone I’m still looking for someone talented enough to make a decent logo or banner for the carnival.

Galois Theory: Normal Extensions

February 9, 2007

In Galois theory, some algebraic extensions are more useful than others. Typically, there are two important attributes an extension must have to satisfy the nice properties Galois theory can prove: normality, and separability. I’ll deal with normality first, and with separability in a later post.

A field extension L/K is called normal if whenever a polynomial f in K[x] is irreducible, it either has no root in L or splits into linear factors in L.

At this stage, it’s simplest to give an example of a non-normal extension: Q(2^(1/3))/Q, where the polynomial x^3 – 2 is irreducible over Q but only factors as (x – 2^1/3)(x^2 + (2^1/3)x + 4^1/3) in Q(2^(1/3)). The easiest way of seeing this is that the three roots of x^3 – 2 are 2^1/3, (2^1/3)w, and (2^1/3)w^2, where w^3 = 1; the first is real, so the field Q(2^(1/3)) is contained in R, but the other two are proper complex.

Every quadratic extension, i.e. an extension of degree 2, is normal. The proof of that is based on the fact that if an irreducible polynomial over K of degree n has a root in L, then [L:K] >= n. As a consequence of that, if f is irreducible of degree at least 3 then it has no root in L; if it is irreducible of degree 2 and has a root in L, then clearly it splits into two linear factors.

An equivalent condition for normality is that for every a in L, m(a) splits into linear factors over L. This condition is obviously necessary, since m(a) is irreducible over K; it’s also sufficient, since if f is irreducible and has a root a in L, then it must be m(a).

From there, it immediately follows that every normal extension is a splitting field. If f(i) is a family of polynomials over K, then the splitting field of f(i) over K is the smallest extension of K containing all roots of the polynomials. If the family is finite, then multiplying all its members yields one polynomial, f, such that L is the smallest field containing all roots of f.

In fact, every polynomial f over K has a unique splitting field, of degree at most n! where n = deg(f). Existence is relatively easy: given an irreducible polynomial over K, f, it’s always possible to take one root of f and adjoin it to K by letting L = K[x]/(f). Given a reducible polynomial, reduce it to irreducible factors and adjoin the root of one of them to K. After obtaining L, reduce f to irreducibles over L and continue the process until all resulting polynomials are linear.

Each step reduces the total degree of all polynomials involved by 1 and multiplies the degree of the extension of K by at most nk+1, where k is the number of the step, so the total degree is at most n!.

For uniqueness, first note that K[x]/(f) is clearly unique. The only step that involves a choice is choosing which irreducible factor of a polynomial to extract a root out of. To prove that splitting fields are unique, it’s then enough to show that adjoining a root of f first followed by a root of g induces the same extension as adjoining a root of g first followed by a root of f.

But to see that this is the case, let L be K[x]/(f), M be L[x]/(g) (or a factor of g if g is reducible over L), L’ be K[x]/(g), and M’ be L'[x]/(f). Over M and M’, both f and g have a root, so both M and M’ contain subfields isomorphic to L and L’. Further, f has a root in the extension M/L’, so M contains M’; similarly, M’ contains M. But that implies that [M:K] = [M’:K], so that M = M’.

If L is a splitting field for f over K, then L/K is normal. To see why, let a be any element of L, and let M be the splitting field for m(a); it’s enough to prove that M = L. If [M:L] > 1, that is if m(a) has a root a‘ not in L, then K(a) = K(a‘), so that the splitting field for f over K(a), L, is equal to the splitting field for f over K(a), L’. But that’s a contradiction, since on the one hand L = L’ but on the other L’ contains L, the splitting field for f over K, as well as additional elements generated by a‘. So [L:L] = [L’:L] > 1, which is absurd.

If L/K is any field extension, then L has a smallest extension that is normal over K. That extension is defined as the splitting field of all minimal polynomials over K of elements of L. We call that the normal closure of L over K.

Finally, let M/L/K be a tower of extensions. If M/K is normal then so is M/L for fairly obvious reasons. But all other possibilities are false: it’s possible for M/K to be normal but for L/K not to be normal, e.g. if K = Q, L = Q(2^1/3), M = Q(2^1/3, w); and it’s possible for M/L and L/K to be normal but for M/K not to be, e.g. if K = Q, L = Q(2^1/2), M = Q(2^1/4).

Galois Theory: Field Extensions

February 7, 2007

I’m going to write the next few math posts on Galois theory, which I handwaved into my algebraic number theory posts. At the center of Galois theory is the idea of field extensions, which I’ll consistently denote as L/K.

L naturally inherits a vector space structure over K. So it makes sense to speak of a dimension, which is written [L:K]. The case of interest in elementary Galois theory is when [L:K] is finite, in which case the extension is called finite. It’s possible to talk about infinite Galois theory too, though, so I’ll make as much as possible in this series independent of the finiteness of dimension.

If M/L/K is a tower of field extensions, then [M:K] = [M:L][L:K]. To see why, let n1 = [M:L] and n2 = [L:K]. If m1, m2, …, m(n1) form a basis for M/L, and l1, l2, …, l(n2) form a basis for L/K, then every m in M can be written as l‘1m1 + … + l‘(n1)m(n1) = k11l1m1 + k12l2m1 + … + k(1, n2)l(n2)m1 + … + k(n1, n2)l(n2)m(n1), where the k(i, j)’s are in K, making the l(j)m(i)’s a spanning set of size n1n2. It’s trivial to show that any linear dependence on this spanning set induces a linear dependence on the m(i)’s or the l(j)’s, which contradicts the fact that they’re bases.

A field F is called an intermediate field of L/K if it’s a subfield of L that contains K. As a corollary to the above tower law, [F:K][L:F] = [L:K] so that [F:K] and [L:F] divide [L:K] if it’s finite.

If a is an element of L, then a is called algebraic over K if it is the root of some polynomial over K. Otherwise, it’s called transcendental. If L = C and K = Q, then i, SQRT(2), and 3 + 2^(1/17) are algebraic, while pi and e are transcendental. If every element of L is algebraic over K, we call L/K an algebraic extension.

If L/K is finite, then it must be algebraic. To see why, let [L:K] = n, and let a be any element of L. Then the set {1, a, a^2, …, a^n} has n + 1 elements, so it’s linearly dependent, and there exist numbers k0, k1, …, k(n) in K not all zero such that k(n)a^n + … + k1a + k0 = 0. That generates a polynomial over K that has a as a root, so a is algebraic over K.

If L/K is a field extension and a is an element of L, then we write K(a) for the smallest intermediate field containing a, and K[a] for the smallest subring of L containing K and a.

We can form the ring K[x] of polynomials in one variable, x, with coefficients in K. That ring has fraction field K(x), the field of all rational functions in one variable over K. Now, K[x] projects onto K[a] by sending x to a. The resulting ring homomorphism has nonzero kernel iff there exists a nonzero polynomial over K, f, such that f(a) = 0, which holds iff a is algebraic over K.

So if a is transcendental, then the homomorphism has a nonzero kernel, and K[a] is isomorphic to K[x] so that K(a) is isomorphic to K(x). If a is algebraic, then K[a] is isomorphic to some quotient of K[x]. The kernel of the map from K[x] to K[a] is then a nonzero ideal, so as I showed ages ago, it’s generated by a single polynomial, the lowest degree polynomial that has a as a root.

Now, that polynomial is unique up to multiplication by a unit, i.e. scaling by some element of K, so it’s convenient to scale it to have leading coefficient 1. In that case it’s called the minimal polynomial of a over K and denoted by m(a)(x), or m(a), where the a is supposed to be a subscript but is written here in parentheses for ASCIIfication purposes.

Further, m(a) is an irreducible polynomial. Otherwise, write m(a) = fg with deg(f), deg(g) > 0, and note that since m(a)(a) = 0, either f(a) or g(a) is 0. But that contradicts the choice of m(a) as the polynomial of least degree that has a as a root. Therefore, K[a] = K[x]/(m(a)(x)) is a ring modulo a maximal ideal, so it’s a field, and K(a) = K[a].

Also, [K(a):K] = deg(m(a)) when a is algebraic. The proof is fairly straightforward: if deg(m(a)) = n, then {1, a, a^2, …, a^(n-1)} is linearly independent by the minimality of the degree of m(a). It spans K(a) = K[a] because it spans a^n by using m(a), a^(n+1) by using x*m(a)(x) and {a, a^2, …, a^n}, etc.; then {1, a, a^2, …} spans K[a].

To recap, a is algebraic over K iff the following equivalent conditions hold:

1. a is a root of a polynomial with coefficients in K.
2. [K(a):K] is finite.
3. K(a) = K[a].
4. K[a] is a proper quotient of K[x].

The important properties are 1 and 2.

If E and F are two intermediate fields of L/K, then the smallest intermediate field containing both E and F is denoted EF. Then [EF:E] <= [F:K] and [EF:F] <= [E:K], so [EF:K] <= [E:K][F:K]. To show that, it’s enough to prove that [EF:E] <= [F:K].

But if b1, b2, …, b(n) are basis elements for F/K, then they span EF/E. Closure under addition and subtraction is obvious, and closure under multiplication and division follows from writing each b(i)b(j), which is in F, as a sum k1b1 + … + k(n)b(n). If they also form a basis, then [EF:E] = [F:K], [EF:F] = [E:K], and [EF:K] = [E:K][F:K], in which case E and F are called linearly disjoint over K. If they don’t, then the inequalities are all strict.

A necessary but not sufficient condition for E and F to be linearly disjoint over K is that their intersection is K; to see that it’s not sufficient, let K = Q, L = Q(2^(1/4), i), E = Q(2^(1/4)), F = Q(i2^(1/4)), where EF = L, [L:K] = 8, [E:K] = [F:K] = 4. I’ll come back to a sufficient condition later on, after developing Galois theory more.

For now, note that if a and b are two algebraic elements of L/K, then [K(a)K(b):K] <= [K(a):K][K(b):K] which is finite, so every algebraic expression in a and b is algebraic over K. In particular, a + b, ab, ab, and a/b are algebraic over K, so that the set of all algebraic elements of L/K forms an intermediate field, called the algebraic closure of K in L.

Even infinite Galois theory deals only with algebraic extensions, albeit possibly infinite ones, which exist (e.g. this convoluted example of Q(SQRT(2), SQRT(3), SQRT(5), …)).

Finally, every field has an invariant called its characteristic. If 1 + 1 + … + 1 is always nonzero in a field K, we say that the characteristic of K is 0; then K contains N, so it contains Z, so that it contains its fraction field, Q.

If 1 + 1 + … + 1 = 0 after n steps but not after any smaller number of steps, we say that char(K) = n. In fact n is prime, because if n = ab, then (1 + 1 + … (a times) + 1)(1 + 1 + … + (b times) + 1) = 0 and neither factor is 0, which implies that both factors are nonzero but have no multiplicative inverse.

For each prime p, if char(K) = p then K contains F(p) = Z/pZ by adding successively many copies of 1 to itself. In each case, that field K must contain is called its prime subfield. Classical Galois theory always has K = Q; it’s also possible to build Galois theory from F(p), but finite fields’ finite extensions are remarkably easy to understand.

Carnival of Mathematics Update

February 2, 2007

First, in case you don’t regularly check the carnival website, the host of the third carnival, scheduled for 3/9, is Mikael Johansson. I’m still looking for someone to host the second carnival, which is on 2/23.

Second, a lot of the posts I’m getting are quite old; the majority so far are from 2006, at times from early 2006. This is perfectly fine, since I don’t think it makes much sense to impose a time limit in the first edition. So feel free to send your best math-blogging since the birth of the web. If you had a blog-like website in 1994, feel free to send me comments from the time about the news of the proof of Fermat’s last theorem.

Third, there is no hard limit about the topic, level of math in the post, or level of expertise of the poster. If you’re a teacher who has thoughts about how to improve math education, send them in. If you’re a first-year undergrad proving interesting things about high school geometry, send your posts.

And fourth, Brent told me that carnivals need a recognizable logo. My artistic skills were frozen at the age of 9, so I’m going to have to delegate this to whoever is willing to take on the task. If you’re a professional, I can pay you the going rate, and you can trust that whatever price you name, I won’t be able to contradict you by saying it’s above market average.

Modular Arithmetic

February 1, 2007

Jason Rosenhouse’s post about numbers mentions that he’s not touching modular arithmetic, so I suppose I should talk about that instead. Modular arithmetic is simply the arithmetic of remainders. In elementary or middle school, everyone learns that an odd number plus an odd number is an even number, an even plus an odd is odd, an odd times an even is even, and so on.

The arithmetic of odds and evens is simply arithmetic modulo 2, since we’re only concerned with whether an integer leaves a remainder of 0 or 1 when divided by 2. The three statements in the previous paragraph translate, in order, to 1 + 1 = 0 mod 2, 0 + 1 = 1 mod 2, and 0*1 = 0 mod 2. The mod operator is a shorthand for “modulo,” and pronounced either “modulo” or “mod”: “one plus one equals zero mod two.”

The same trick applies to any other integer. If a and b are any two integers, then the remainder a + b leaves when divided by 3 – or, in more technical language, the residue class of a + b mod 3 – only depends on the residue classes of a and b mod 3. To use random numbers as illustration, 2 + 8 = 10 is in the same residue class mod 3 as 5 + 2 = 7.

There’s one modular arithmetic everyone’s familiar with – arithmetic modulo 12, or sometimes 24. Four hours after ten o’clock is always two o’clock, regardless of whether it’s in the a.m. of 2006/2/1 or the p.m. of 2006/2/14. Likewise, 10 + 4 always equals 2 mod 12.

So we can add residue classes mod any integer n. We can also subtract, using the same principle. We can multiply, although multiplication sometimes acquires strange properties. For example, 2*3 = 0 mod 6, whereas in normal arithmetic, the product of two nonzero integers is always nonzero.

Division is somewhat trickier, though. Modular arithmetic is only defined for integers, not for rational numbers. It’s easy to tell that 17 = 7 mod 10, but what is 1/2 mod 10?

With normal integers, we can always form rational numbers. This is because for every nonzero integer a, there’s a rational number 1/a, with the property that a/a = 1. Then the fraction b/a = b*(1/a). So you may ask, is it even defined in modular arithmetic?

It turns out it is, sometimes. For a concrete example, let’s work mod 10. We have 3*7 = 21 = 1 mod 10, so we can define 1/3 to be 7, and 1/7 to be 3. In more technical language, 7 is the inverse of 3 and vice versa. Then division by 3 is essentially the same as multiplication by 7.

However, there exists no n such that 2*n = 1 mod 10. To see why, let’s work mod 2 instead of mod 10. We can do that because 2*5 = 10, so we can just let 1, 3, 5, 7, and 9 mod 10 be 1 mod 2, and 0, 2, 4, 6, and 8 mod 10 be 0 mod 2. We need to have 0*n = 1 mod 2, then. But that’s impossible, because 0*0 = 0*1 = 0 mod 2.

This holds for every integer other than 1 that either divides 10, or has a common factor with 10 greater than 1; the trick is to look at residue classes mod that number. Fortunately, it holds for all other integers. The numbers 2, 4, 5, 6, and 8, and of course 0, have the factors 2 or 5 in common with 10. For the others, we have 1*1 = 3*7 = 9*9 = 1 mod 10.

Not coincidentally, the best numbers to do modular arithmetic with are the prime numbers. If we work mod p, then since the only factors of p are itself and 1, the only common factor greater than 1 that n can have with p is p. But if n is divisible by p, then it must be 0 mod p. Therefore, all residue classes mod p except 0 have inverses.

Now, you may ask why anyone cares about modular arithmetic. The answer is, first, it’s important in computing. Much of computer science works mod 2, since gates only have two states – false, corresponding to 0, and true, corresponding to 1.

Second, in pure math, it’s very useful to use modular arithmetic since there’s a finite number of residue classes mod every n. My favorite example here is the equation a^2 – 7b^2 = 3. There are many solutions to the equation, like a = SQRT(3) and b = 0. But there are no solutions where both a and b are integers.

We can’t verify that with every possible pair of integers, since there are infinitely many integers. But we can verify that using modular arithmetic. If there’s a solution in integers, then there’s also a solution modulo every n. For example, let’s pick n = 7. The equation then becomes a^2 = 3 mod 7, since 7b^2 is obviously 0 mod 7.

Now that the equation is in modular arithmetic, we can just plug in every residue class mod 7 to see if it works. If a = 0 mod 7, then a^2 = 0 mod 7. If a = 1 mod 7, then a^2 = 1 mod 7. If a = 2, 3, 4, 5, 6 mod 7, then a^2 = 4, 2, 2, 4, 1 mod 7. In none of the cases is a^2 equal to 3 mod 7, so we can conclude that the equation has no solutions in integers.

Carnival of Mathematics

January 31, 2007

Mark Friday, February the 9th on your calendar, folks. That’s when the Carnival of Mathematics goes up on Abstract Nonsense. If you have any submission, or want to host the carnival at a later date, email me, and preferably include the words “Carnival of Mathematics” somewhere in the subject line, in case I have to rescue your email from the spam filter.

Tentatively, the carnival is supposed to be fortnightly, but if I get too many hosting requests, or there are too many posts per two weeks, it’ll become weekly. Similarly, if there are too few posts or hosting requests, it can become monthly.

Obviously, pure math posts, like my algebraic number theory proofs, are welcome. But so are many other things, including but not limited to the following:

– Debunking bad math and its uses in bad science and bad politics, as in many of Mark Chu-Carroll’s posts (see e.g. illegal immigrants and crime on Good Math, Bad Math).

– Math and computer science.

– Math puzzles, with or without explanations of deeper underlying concepts (see e.g. the Monty Hall problem on EvolutionBlog).

– Math and statistics in social science.

– Math in popular culture: Fermat’s Enigma, Pi, Proof, Numb3rs.

– How to write math, e.g. Polymath’s elementary proof of Morley’s Theorem.

– How to teach math, and general posts about math and education.

Of course, other topics related to math – including theoretical computer science, statistics, and mathematical physics – are still welcome. The above list is just a set of suggestions.

Since this is a first edition, I’m only limiting each blogger to three posts. Hopefully in the future there will be enough participants to enable limiting each blogger to the standard one post.

Educational Links

January 29, 2007

Mark CC has a post explaining the basics of formal logic as well as the difference between syntax and semantics.

Logic, in the sense that we generally talk about it, isn’t really one thing. Logic is a name for the general family of formal proof systems with inference rules. There are many logics, and a statement that is a valid inference (is logical) in one system may not be valid in another. To give you a very simple example, most people are familiar with the fact that in logic, if you have a statement “A”, then either the statement “A or not A” must be true. In the most common simple logic, called propositional logic, that’s a tautology – that is, a statement which is always true by definition. But in another common and useful logic – intuitionistic logic – “A or not A” is not necessarily true. You cannot infer anything about whether it’s true or false without proving whether A is true or false.

In line with the theme of studies about racial or gender bias, here‘s a study that shows that legal immigrants to the US make more money when they have lighter skin or bigger height, even after controlling for other possible variables (via Retrospectacle).

Whether and how quickly immigrants assimilate into the U.S. labor market is an issue of great policy importance and controversy. Using newly-available data from the New Immigrant Survey 2003, this paper shows that new lawful immigrants to the U.S. who have lighter skin color and are taller have higher earnings, controlling for extensive labor market and immigration status information, as well as for education, English language proficiency, outdoor work, occupation, ethnicity, race, and country of birth. Immigrants with the lightest skin color earn on average 8 to 15 percent more than comparable immigrants with the darkest skin tone. Each extra inch of height is associated with a 1 percent increase in wages.

Ruchira Paul of Accidental Blogger writes about Noor Inayat Khan, a Muslim Indian who volunteered to engage in espionage for the British in World War Two.

After a hurried (and rather incomplete) training in England, she was posted in Paris as the first woman radio operator for the SOE, entrusted with intercepting Nazi wireless transmissions. This gentle, shy and talented young woman became a thorn in the side of the German military – an unlikely, intrepid, wily spy, expertly eluding capture. Noor was later betrayed by one of her own colleagues. Captured, questioned and beaten by the Nazis, she was deported to Dachau for her non-cooperation, where after further beatings and torture, she was shot. At the time of her death Noor was thirty years old. According to her biography (and the testimony of her captors), she died without divulging any secrets and the last word she uttered was liberté.

Via Pharyngula: in honor of Charles Darwin’s upcoming 200th birthday, the Beagle Project is planning to rebuild the Beagle and sail along the same path Darwin traveled along.

Imagine: 2009, and a replica Beagle sailed around Capre Horn and through the Pacific by an international crew of young scientists sails into The Galapagos as part of a recreation of the Voyage of the Beagle. That, surely will be the TV picture of the Darwin 2009 celebrations. How can Darwin’s 200th anniversary pass without that happening? Donate, and help us give a new generation of young people the chance to see a replica Beagle built and launched, and the opportunity to head for horizons of their own.